Midjourmey and other image generators:
https://finance.yahoo.com/news/adobe-ethical-firefly-ai-trained-123004288.html
Criticism of the practice has come from inside the company: Since the early days of Firefly, there has been internal disagreement on the ethics and optics of ingesting AI-generated imagery into the model, according to multiple employees familiar with its development who asked not to be named because the discussions were private. Some have suggested weaning the system off generated images over time, but one of the people said there are no current plans to do so.
I don't know any artists opposed to AI who consider Adobe Firefly ethically trained. I know one well-known artist who left Adobe over this:
https://accidental-expert.com/p/the-bob-ross-of-adobe
https://www.reddit.com/r/Adobe/comments/1d6q0v0/kyle_webster_leaves_adobe/
EleutherAI was using The Pile for training and it included copyrighted work. Last June they released Common Pile v0.1 which supposedly doesn't have any copyright issues, but I haven't done any more reading on it.
https://en.wikipedia.org/wiki/The_Pile_(dataset)
In June 2025, EleutherAI, in partnership with the Poolside, Hugging Face, and the US Library of Congress and over two dozen researchers at 14 institutions including the University of Toronto, MIT, CMU, the Vector Institute and the Allen Institute for AI released Common Pile v0.1, a training dataset that contains only works where the licenses permit their use for training AI models.[6][12][13] The intent is to show what is possible if ethically training AI systems while respecting copyrighted works.[13] They found that the process of gathering the data could not be fully automated and was at times painstaking, with humans verifying and annotating every entry, and that resulting models could achieve impressive results even though they were still not comparable with frontier models.[13]
Note that it isn't as good as the illegally trained frontier models. Of course it wouldn't be - generative AI works as well as it does at its best only because so much intellectual property was stolen.
And btw, although a lot of coding used by AI companies to train AI was online with a Creative Commons license, the lack of attribution and other violations of that licensing would, I think, make AI companies ripping off that IP illegal as well. I saw that point made by an AI critic recently.
Anyway, the vast majority of generative AI models are illegally and unethically trained.
Ideally, every such AI model would be destroyed, and AI companies would be forced to start over using only what's in the public domain, and what they've obtained clear legal permission to use. And that would immediately make it obvious that the industry's worldwide theft of intellectual property was what created whatever value was in genAI.