Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?

“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

    1 year ago
    1. This is not REALLY about copyright - this is an attack on free and open AI models, which would be IMPOSSIBLE if copyright was extended to cover the case of using the works for training.
    2. It’s not stealing. There is literally no resemblance between the training works and the model. IP rights have been continuously strengthened due to lobbying over the last century and are already absurdly strong, I don’t understand why people on here want so much to strengthen them ever further.
    • sculd@beehaw.orgOP
      1 year ago

      Sorry AIs are not humans. Also executives like Altman are literally being paid millions to steal creator’s work.

      1 year ago

      Agreed on both counts… Except Microsoft sings a different tune when their software is being “stolen” in the exact same way. They want to have it both ways - calling us pirates when we copy their software, but it’s “without merit” when they do it. Fuck’em! Let them play by the same rules they want everyone else to play.

      1 year ago

      There is literally no resemblance between the training works and the model.

      This is way too strong a statement when some LLMs can spit out copyrighted works verbatim.

      A team of researchers primarily from Google’s DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever.

      Often, that “random content” is long passages of text scraped directly from the internet. I was able to find verbatim passages the researchers published from ChatGPT on the open internet: Notably, even the number of times it repeats the word “book” shows up in a Google Books search for a children’s book of math problems. Some of the specific content published by these researchers is scraped directly from CNN, Goodreads, WordPress blogs, on fandom wikis, and which contain verbatim passages from Terms of Service agreements, Stack Overflow source code, copyrighted legal disclaimers, Wikipedia pages, a casino wholesaling website, news blogs, and random internet comments.

      Beyond that, copyright law was designed under the circumstances where creative works are only ever produced by humans, with all the inherent limitations of time, scale, and ability that come with that. Those circumstances have now fundamentally changed, and while I won’t be so bold as to pretend to know what the ideal legal framework is going forward, I think it’s also a much bolder statement than people think to say that fair use as currently applied to humans should apply equally to AI and that this should be accepted without question.

        1 year ago

        I know it inherently seems like a bad idea to fix an AI problem with more AI, but it seems applicable to me here. I believe it should be technically feasible to incorporate into the model something which checks if the result is too similar to source content as part of the regression.

        My gut would be that this would, at least in the short term, make responses worse on the whole, so would probably require legal action or pressure to have it implemented.

          1 year ago

          The key element here is that an LLM does not actually have access to its training data, and at least as of now, I’m skeptical that it’s technologically feasible to search through the entire training corpus, which is an absolutely enormous amount of data, for every query, in order to determine potential copyright violations, especially when you don’t know exactly which portions of the response you need to use in your search. Even then, that only catches verbatim (or near verbatim) violations, and plenty of copyright questions are a lot fuzzier.

          For instance, say you tell GPT to generate a fan fiction story involving a romance between Draco Malfoy and Harry Potter. This would unquestionably violate JK Rowling’s copyright on the characters if you published the output for commercial gain, but you might be okay if you just plop it on a fan fic site for free. You’re unquestionably okay if you never publish it at all and just keep it to yourself (well, a lawyer might still argue that this harms JK Rowling by damaging her profit if she were to publish a Malfoy-Harry romance, since people can just generate their own instead of buying hers, but that’s a messier question). But, it’s also possible that, in the process of generating this story, GPT might unwittingly directly copy chunks of renowned fan fiction masterpiece My Immortal. Should GPT allow this, or would the copyright-management AI strike it? Legally, it’s something of a murky question.

          For yet another angle, there is of course a whole host of public domain text out there. GPT probably knows the text of the Lord’s Prayer, for instance, and so even though that output would perfectly match some training material, it’s legally perfectly okay. So, a copyright police AI would need to know the copyright status of all its training material, which is not something you can super easily determine by just ingesting the broad internet.

            1 year ago

            I don’t see why it wouldn’t be able to. That’s a Big Data problem, but we’ve gotten very very good at searches. Bing, for instance, conducts a web search on each prompt in order to give you a citation for what it says, which is pretty close to what I’m suggesting.

            As far as comparing to see if the text is too similar, I’m not suggesting a simple comparison or even an Expert Machine; I believe that’s something that can be trained. GANs already have a discriminator that’s essentially measuring how close to generated content is to “truth.” This is extremely similar to that.

            I completely agree that categorizing input training data by whether or not it is copyrighted is not easy, but it is possible, and I think something that could be legislated. The AI you would have as a result would inherently not be as good as it is in the current unregulated form, but that’s not necessarily a worse situation given the controversies.

            On top of that, one of the common defenses for AI is that it is learning from material just as humans do, but humans also can differentiate between copyrighted and public works. For the defense to be properly analogous, it would make sense to me that it would need some notion of that as well.

        1 year ago

        I’m gonna say those circumstances changed when digital copies and the Internet became a thing, but at least we’re having the conversation now, I suppose.

        I agree that ML image and text generation can create something that breaks copyright. You for sure can duplicate images or use copyrighted characterrs. This is also true of Youtube videos and Tiktoks and a lot of human-created art. I think it’s a fascinated question to ponder whether the infraction is in what the tool generates (i.e. did it make a picture of Spider-Man and sell it to you for money, whcih is under copyright and thus can’t be used that way) or is the infraction in the ingest that enables it to do that (i.e. it learned on pictures of Spider-Man available on the Internet, and thus all output is tainted because the images are copyrighted).

        The first option makes more sense to me than the second, but if I’m being honest I don’t know if the entire framework makes sense at this point at all.

      1 year ago

      I don’t understand why people on here want so much to strengthen them ever further.

      It is about a lawless company doing lawless things. Some of us want companies to follow the spirit, or at least the letter, of the law. We can change the law, but we need to discuss that.

          1 year ago

          The two big arguments are:

          • Substantial reproduction of the original work, you can get back substantial portions of the original work from an AI model’s output.
          • The AI model replaces the use of the original work. In short, a work that uses copyrighted material under fair use can’t be a replacement for the initial work.
            1 year ago

            you can get back substantial portions of the original work from an AI model’s output

            Have you confirmed this yourself?

              1 year ago

              In its complaint, The New York Times alleges that because the AI tools have been trained on its content, they sometimes provide verbatim copies of sections of Times reports.

              OpenAI said in its response Monday that so-called “regurgitation” is a “rare bug,” the occurrence of which it is working to reduce.

              “We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” OpenAI said.

              The tech company also accused The Times of “intentionally” manipulating ChatGPT or cherry-picking the copycat examples it detailed in its complaint.


              The thing is, it doesn’t really matter if you have to “manipulate” ChatGPT into spitting out training material word-for-word, the fact that it’s possible at all is proof that, intentionally or not, that material has been encoded into the model itself. That might still be fair use, but it’s a lot weaker than the original argument, which was that nothing of the original material really remains after training, it’s all synthesized and blended with everything else to create something entirely new that doesn’t replicate the original.

    1 year ago

    Having read through these comments, I wonder if we’ve reached the logical conclusion of copyright itself.

    • frog 🐸
      1 year ago

      Perhaps a fair compromise would be doing away with copyright in its entirety, from the tiny artists trying to protect their artwork all the way up to Disney, no exceptions. Basically, either every creator has to be protected, or none of them should be.

        1 year ago

        IMO the right compromise is to return copyright to its original 14 year term. OpenAI can freely train on anything up to 2009 which is still a gigantic amount of material while artists continue to be protected and incentivized.

        • frog 🐸
          1 year ago

          I’m increasingly convinced of that myself, yeah (although I’d favour 15 or 20 years personally, just because they’re neater numbers than 14). The original purpose of copyright was to promote innovation by ensuring a creator gets a good length of time in which to benefit from their creation, which a 14-20 year term achieves. Both extremes - a complete lack of copyright and the exceedingly long terms we have now - suppress innovation.

            1 year ago

            I’d favour 15 or 20 years personally, just because they’re neater numbers than 14

            Another neat number is: 4.

            That’s it, if you don’t make money on your creation in 4 years, then it’s likely trash anyway.

              1 year ago

              I’ve said it before and I’ll say it again! (My apologies if it happens to be to the same person, lol)

              Early access developers in shambles!

        1 year ago

        Apparently they’re going to just make only the little guy’s copyrights effectively meaningless, so yeah.

      1 year ago

      copyright has become a tool of oppression. Individual author’s copyright is constantly being violated with little resources for them to fight while big tech abuses others work and big media uses theirs to the point of it being censorship.

    1 year ago

    Alas, AI critics jumped on the conclusion this one time. Read this:

    Further, OpenAI writes that limiting training data to public domain books and drawings “created more than a century ago” would not provide AI systems that “meet the needs of today’s citizens.”

    It’s a plain fact. It does not say we have to train AI without paying.

    To give you a context, virtually everything on the web is copyrighted, from reddit comments to blog articles to open source software. Even open data usually come with copyright notice. Open research articles also.

    If misled politicians write a law banning the use of copyrighted materials, that’ll kill all AI developments in the democratic countries. What will happen is that AI development will be led by dictatorships, and that’s absolutely a disaster even for the critics. Think about it. Do we really want Xi, Putin, Netanyahu and Bin Salman to control all the next-gen AIs powering their cyber warfare while the West has to fight them with Siri and Alexa?

    So, I agree that, at the end of the day, we’d have to ask how much rule-abiding AI companies should pay for copyrighted materials, and that’d be less than the copyright holders would want. (And I think it’s sad.)

    However, you can’t equate these particular statements in this article to a declaration of fuck-copyright. Tbh Ars Technica disappointed me this time.

      1 year ago

      The issue is that fair use is more nuanced than people think, but that the barrier to claiming fair use is higher when you are engaged in commercial activities. I’d more readily accept the fair use arguments from research institutions, companies that train and release their model weights (llama), or some other activity with a clear tie to the public benefit.

      OpenAI isn’t doing this work for the public benefit, regardless of the language of altruism they wrap it in. They, and Microsoft, and hoovering up others data to build a for profit product and make money. That’s really what it boils down to for me. And I’m fine with them making money. But pay the people whose data you’re using.

      Now, in the US there is no case law on this yet and it will take years to settle. But personally, philosophically, I don’t see how Microsoft taking NYT articles and turning them into a paid product is any different than Microsoft taking an open source projects that doesn’t allow commercial use and sneaking it into a project.

    • P03
      1 year ago

      It’s bizarre. People suddenly start voicing pro-copyright arguments just to kill an useful technology, when we should be trying to burn copyright to the fucking ground. Copyright is a tool for the rich and it will remain so until it is dismantled.

      1 year ago

      It’s almost like most people are idiots who don’t understand the thing they’re against and are just parroting what they hear/read.

      1 year ago

      I’m not so much in favor of IP law as I am in favor of informed consent in every aspect of the word.

      when posting photos, art and text content years ago, I was not able to imagine it might be trained off by an AI. As such I was not able to make a decision based on informed consent if I agreed to that or not.

      Even though quotes such as “once you post it, its on the internet forever” were around, I was not aware the extend to which this reached and that had my art been vacuumed by a generative AI model (it hasnt luckily) people could create art that pretends to be created by me. Thus I could not consent

      I think this goes for a lot of artists actually, especially those who exist far more publicly than I do, who are in those databases and who are a keyword to be used in prompts. There is no possible way they could have given informed consent to that at the time they posted art/at the time they started that social media profile/youtube channel etc.

      To me, this is the real problem. I could care less about corporations.

      1 year ago

      IP law used to stop corporations from profiting off of creators’ labor without compensation? Yeah, absolutely.

      IP law used to stop individuals from consuming media where purchases wouldn’t even go to the creators, but some megacorp? Fuck that.

      I’m against downloading movies by indie filmmakers without compensating them. I’m not against downloading films from Universal and Sony.

      I’m against stealing food from someone’s garden. I’m not against stealing food from Safeway.

      If you stop looking at corporations as being the same as individuals, it’s a very simple and consistent viewpoint.

      IP law shouldn’t exist, but if it does it should only exist to protect individuals from corporations. When that’s how it’s being used, like here, I accept it as a necessary evil.

        1 year ago

        IP law used to compensate creators “until their death + 70 years”… you can spin it however you want, that’s just plain wrong.

        If you stop looking at corporations as being the same as individuals

        That’s a separate bonkers legislation. Two wrongs don’t make one right.

          1 year ago

          I never said I like IP law. I explicitly said it shouldn’t exist. I wish they’d strip out any post-humous ownership, absolutely. But I’m fine beating OpenAI over the head with that or any other law. Whether I advocate for or against copyright law will ultimately have no impact on its existence, so I may as well cheer it on when it’s used to hurt corporations, and condemn it when it’s used to protect corporations over individuals.

          That’s a separate bonkers legislation

          I’m not talking about the legislation, I’m talking about the mindset, which is very prevalent in the pro-AI tech spaces. Go to HackerNews and see just how hard the AI-bros there will fellate each other over “corporate rights”.

          My whole point is that there is nothing logically inconsistent with being against IP law, but also understanding that since its existence is reality, leveraging it as best as possible (i.e. to hurt corporations).

      1 year ago

      I’m the detractor here, I couldn’t give less of a shit about anything to do with intellectual property and think all copyright is bad.

      1 year ago

      I still think IP needs to eat shit and die. Always has, always will.

      I recently found out we could have had 3d printing 20 years earlier but patents stopped that. Cocks !

    1 year ago

    I would just like to say, with open curiosity, that I think a nice solution would be for OpenAI to become a nonprofit with clear guidelines to follow.

    What does that make me? Other than an idiot.

    Of that at least, I’m self aware.

    I feel like we’re disregarding the significance of artificial intelligence’s existence in our future, because the only thing anybody that cares is trying to do is get back control to DO something about it. But news is becoming our feeding tube for the masses. They’ve masked that with the hate of all of us.

    Anyways, sorry, diatribe, happy new year

    • sculd@beehaw.orgOP
      1 year ago

      It is supposedly a non-profit, and that is how the board of Open AI tried to fire Altman but than the big tech (Microsoft) intervened and wrestled the control.

      Its basically Microsoft now.

        1 year ago

        I would like to apologize for the following opinions, because they come from a place of unresolved hypocrisy that is me.

        Non-profit my ass. No such thing in America or anywhere else in the world, if you have the perspective to hunt and the money to signify modern value.

        Survival of the fittest, and the newborn technology that is at its core a mirror of us, to the most complex level of modern mathematics (I’m of the firm belief that logic is discovered, not created).

        With those seemingly unrelated concepts made with vague words, I ask you this:

        What does it mean to feel? To know many different kinds of “one,” to live without fear but still be whole? I am sorry, again, I’m naught but gibberish and I’m just so glad you responded. I forgot and came back to find a word I sent, and now I find what I seek, an event in which I can say we’ve been bonded.

        But now try to, now that I splay out, all I’ve got and am about, all I can see, is that to you my head, seems to be on my knees.

        Again, sorry! Thank you for responding! I’m just glad to vent, and in expression have my soul rend into two, and sent into a new view.

          1 year ago

          But what I meant to say is that non profit or not by legal definition, money allows for, in the same kind of legal, an easy and simple transition.

      1 year ago

      I think OpenAI (or some part of it) is a non-profit. But corporate fuckery means it can largely be funded by for profit companies which then turn around and profit from that relationship. Corporate law is so weak and laxly enforced that’s it’s a bit of a joke unfortunately.

      I agree that AI has an important role to play in the future, but it’s a lot more limited in the current form than a lot of people want to believe. I’m writing a tool that leverages AI as a sort of auto-DM for roleplaying, but AI hasn’t written a line of code in it because the output is garbage. And frankly I find the fun and value of the tool comes from the other humans you play with, not the AI itself. The output just isn’t that good.

        1 year ago

        I would like to say that you inspire me on your writing of such a tool. I try to write code, and all I can seem to believe in with what I know, is in a website where with words I can write, in a free flow.

        I write with a sight, and in that scene I fight, but in the freedom of inaction, I can’t help but feel flight. What signt is there to see, when your blood flows in guts of night?

  • AutoTL;DR@lemmings.worldB
    1 year ago

    🤖 I’m a bot that provides automatic summaries for articles:

    Click here to see the summary

    Further, OpenAI writes that limiting training data to public domain books and drawings “created more than a century ago” would not provide AI systems that “meet the needs of today’s citizens.”

    OpenAI responded to the lawsuit on its website on Monday, claiming that the suit lacks merit and affirming its support for journalism and partnerships with news organizations.

    OpenAI’s defense largely rests on the legal principle of fair use, which permits limited use of copyrighted content without the owner’s permission under specific circumstances.

    “Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents,” OpenAI wrote in its Monday blog post.

    In August, we reported on a similar situation in which OpenAI defended its use of publicly available materials as fair use in response to a copyright lawsuit involving comedian Sarah Silverman.

    OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”

    Saved 58% of original text.

      1 year ago

      My concern is they and other tech companies absolutely can and would pay if they have no choice. Paying fines for illegal practices if needs be.

      What absolutely wont survive a strong law to keep copyright content out of ai is the open source community which absolutely can not pay for such a thing and would be seriously lacking behind if its excluded, Strengthen the monopoly on ai by for Profit Tech. So basically this issue can have huge ramifications no matter what we end up doing.

      • frog 🐸
        1 year ago

        My understanding of the open source community is that taking copyrighted content from people who haven’t willingly signed onto the project would kind of undermine the principles of the movement. I would never feel comfortable using open source software if I had knowledge that part or all of it came from people who hadn’t actively chosen to contribute to it.

        I have seen a couple of things recently about AI models that were trained exclusively on public domain and creative commons content which apparently are producing viable content, though. The open source community could definitely use a model like that, and develop it further with more content that was ethically obtained. In the long run, there may be artists that willingly contribute to it, especially those who use open source software themselves (eg GIMP, Blender, etc). Paying it forward, kind of thing.

        The problem right now is that artists have no reason to be generous with an open source alternative to AIs, when their rights have already been stomped on and certain people in the open source community are basically saying “if we can’t steal from artists too, then we can’t compete with the corporations.” So there’s literally a trust issue between the creative and tech industries that would need to be resolved before any artists would consider offering art to an open source AI.

          1 year ago

          Its quite a mess but I definitely agree that open source needs a good model trained on consented works.

          I do fear though that the quality gap between copyright trained and purist models will be huge in the first decenia. And no matter the law, the tech is out there and corporation and criminals will be using it in secret nonetheless.

          If only things where as simple as choosing for the chad digital artists. Digital art was part of my higher education and if i Haden t get a tech job i might have been one of them so i feel torn between the divide in industries.

          This may sound doomer but since the technology exist we are in a race to obtain beyond human super intelligence and we do not know what will happen after that.

          OpenAI had multiple times stated they don’t know if copyright will still mean anything in a future with ai.

          We are also facing some huge global issues like global warming where a super intelligence could be the answer to sustain the planet, of course also risking evil ai in the process… i repeat such a mess

          I don’t fully trust sam altman, but i do believe what they say may be true. At some point its going to be here and it will be to smart to ignore.

          Its optimistically possible that in 20 years we will all be leisurely artist laughing at the idea of needing to work to earn survival.

          Its of course just as likely some statehead old bastard presses the deathbutton next week and thats the end of all of it or that climate has progressed beyond what our smartest future ai could possible solve.

          • frog 🐸
            1 year ago

            I definitely do not have the optimism that in 20 years time we’ll all be leisurely artists. That would require that the tech bros who create the AIs that displace humans are then sufficiently taxed to pay UBI for all the humans that no longer have jobs - and I don’t see that happening as long as they’re able to convince governments not to tax, regulate, or control them, because doing so will make it impossible for them to save the planet from climate change, even as their servers burn through more electricity (and thus resources) than entire countries. Tech bros aren’t going to save us, and the only reason they claim they will is so they never face any consequences of their behaviour. I don’t trust Sam Altman, or any of his ilk, any further than I can throw them.

              1 year ago

              That’s is why i am putting some of my eggs in open source, which is where the real innovation happens anyway. Free Ai tools at home running on consumers devices can level people up to build a better future ourselves without having to rely on techbros or government.

              Of course i should nuance my wording a bit. My actual opinions tend to be contrasting mix of both optimistic and pessimistic lines of evens. I dont have much hope that the good future is the one we will end on, but it remains in my speculative opinion possible from where we are standing today, yet all can change in less than a week.

  • Pete
    1 year ago

    Any reasonable person can reach the conclusion that something is wrong here.

    What I’m not seeing a lot of acknowledgement of is who really gets hurt by copyright infringement under the current U.S. scheme. (The quote is obviously directed toward the UK, but I’m reasonably certain a similar situation exists there.)

    Hint: It’s rarely the creators, who usually get paid once while their work continues to make money for others.

    Let’s say the New York Times wins its lawsuit. Do you really think the reporters who wrote the infringed-upon material will be getting royalty checks to be made whole?

    This is not OpenAI vs creatives. OK, on a basic level it is, but expecting no one to scrape blogs and forum posts rather goes against the idea of the open internet in the first place. We’ve all learned by now that what goes on the internet stays there, with attribution totally optional unless you have a legal department. What’s novel here is the scale of scraping, but I see some merit to the “transformational” fair-use defense given that the ingested content is not being reposted verbatim.

    This is corporations vs corporations. Framing it as millions of people missing out on what they’d have otherwise rightfully gotten is disingenuous.

      1 year ago

      Yep. The effect of this as currently framed is that you get data ownership clauses in EULAs forever and only major data brokers like Google or Meta can afford to use this tech at all. It’s not even a new scenario, it already happened when those exact companies were pushing facial recognition and other big data tools.

      I agree that the basics of modern copyright don’t work great with ML in the mix (or with the Internet in the mix, while we’re at it), but people are leaning on the viral negativity to slip by very unwanted consequences before anybody can make a case for good use of the tech.

      1 year ago

      This isn’t about scraping the internet. The internet is full of crap and the LLMs will add even more crap to it. It will shortly become exponentially harder to find the meaningful content on the internet.

      No, this is about dipping into high quality, curated content. OpenAI wants to be able to use all existing human artwork without paying anything for it, and then flood the world with cheap knockoff copies. It’s that simple.

        1 year ago

        Shortly? It’s happening already. I notice it when using Google and Duckduckgo. There are always a few hits that are AI written blog spam word soup

          1 year ago

          Unfortunately you haven’t seen the full impact of LLMs yet. What you’re seeing now is stuff that’s already been going on for a decade. SEO content generators have been a thing for many years and used by everybody from small business owners to site chains pinching ad pennies.

          When the LLM crap will kick in you won’t see anything except their links. I wouldn’t be surprised if we’ll have to go back to 90s tech and use human-curated webrings and directories.

            1 year ago

            It’s especially amusing when you consider that it’s not even fully autonomous yet; we’re actively doing this to ourselves.

    1 year ago

    I think viral outrage aside, there is a very open question about what constitutes fair use in this application. And I think the viral outrage misunderstands the consequences of enforcing the notion that you can’t use openly scrapable online data to build ML models.

    Effectively what the copyright argument does here is make it so that ML models are only legally allowed to make by Meta, Google, Microsoft and maybe a couple of other companies. OpenAI can say whatever, I’m not concerned about them, but I am concerned about open source alternatives getting priced out of that market. I am also concerned about what it does to previously available APIs, as we’ve seen with Twitter and Reddit.

    I get that it’s fashionable to hate on these things, and it’s fashionable to repeat the bit of misinformation about models being a copy or a collage of training data, but there are ramifications here people aren’t talking about and I fear we’re going to the worst possible future on this, where AI models are effectively ubiquitous but legally limited to major data brokers who added clauses to own AI training rights from their billions of users.

      1 year ago

      It is an open question. As others have pointed out, a human taking inspiration from the work of others is totally fine. My issue is that AI are not human.

      A human’s production of work is limited. A human can only produce so fast for so long. An AI could theoretically be scaled infinitely and produce indefinitely. I don’t want to live in a world where FAANGCORP’s OmniAI is responsible for 90% of all art, media, and music because humans can’t keep pace with it.

        1 year ago

        Mass produced garbage is still mass produced garbage. As you point out AIs aren’t human and while that removes the limitations of the flesh (including limitations that we might want there - no human ever says oops, I made a child porn), it imposes limitations of the machine. AI output isn’t that good at anything practical. It writes garbage code that even if you manage to get it working, the business manager or whoever isn’t capable of seeing the flaws in it. The art is devoid of any sort of soul and almost always has glaring flaws that require actual humans to identify and fix.

        We are about to be inundated with AI produced garbage, sure, but that only proves the lie that shady internet sites and social media have always been a cesspool of shitty, unreliable content, and connecting with hundreds of thousands of faceless strangers was never a good idea. Hopefully we’ll come up with (or go back to) solutions that don’t treat the problem as simply one of volume.

          1 year ago

          It’s not right to say that ML output isn’t good at practical tasks. It is and it’s already in use and has been for ages. The conversation about these is guided by the relatively anecdotal fact that chatbots and image generation got good so this stuff went viral, but ML models are being used for a bunch of practical uses, from speeding up repetitive, time consuming tasks (e.g. cleaning up motion capture, facial modelling or lip animation in games and movies) or specialized tasks (so much science research is using ML tools these days).

          Now, a lot of those are done using fully owned datasets, but not all, and the ramifications there are also important. People dramatically overestimate the impact of trash product flooding channels (which is already the case, as you say) and dramatically underestimate the applications of the underlying tech beyond the couple of viral apps they only got access to recently.

        1 year ago

        A lot of this can be traced back to the invention of photography, which is a fun point of reference, if one goes to dig up the debate at the time.

        In any case, the idea that humans can only produce so fast for so long and somehow that cleans the channel just doesn’t track. We are flooded by low quality content enabled by social media. There’s seven billion of us two or three billion of those are on social platforms and a whole bunch of the content being shared in channels is created by using corporate tools to make stuff by pointing phones at it. I guarantee that people will still go to museums to look at art regardless of how much cookie cutter AI stuff gets shared.

        However, I absolutely wouldn’t want a handful of corporations to have the ability to empower their employed artists with tools to run 10x faster than freelance artists. That is a horrifying proposition. Art is art. The difficulty isn’t in making the thing technically (say hello, Marcel Duchamp, I bet you thought you had already litgated this). Artists are gonna art, but it’s important that nobody has a monopoly on the tools to make art.

          1 year ago

          It’s like the classic “objection!” “On what grounds?” “It’s devastating to my case!” Scenario.

          Throughout history technology has repeatedly been developed that lets people do things faster than the people currently doing it. That’s usually the point of technological progress. Of course the people left behind by that will complain, but that alone is no reason to limit the rest of us who would benefit from the advance.

    • sculd@beehaw.orgOP
      1 year ago

      People hate them not because it is fashionable, but because they can see what is coming.

      Tech companies want to create tools that would replace million of jobs without compensating the very people that created these works in the first place.

        1 year ago

        That’s not “coming”, it’s an ongoing process that has been going on for a couple hundred years, and it absolutely does not require ChatGPT.

        People genuinely underestimate how many of these things have been an ongoing concern. A lot like crypto isn’t that different to what you can do with a server, “AI” isn’t a magic key that unlocks automation. I don’t even know how this mental model works. Is the idea that companies who are currently hiring millions of copywriters will just rely on automated tools? I get that yeah, a bunch of call center people may get removed (again, a process that has been ongoing for decades), but how is compensating Facebook for scrubbing their social media posts for text data going to make that happen less?

        Again, I think people don’t understand the parameters of the problem, which is different from saying that there is no problem here. If anything the conversation is a net positive in that we should have been having it in 2010 when Amazon and Facebook and Google were all-in on this process already through both ML tools and other forms of data analysis.

        1 year ago

        Tech companies will create those tools no matter what. Then they will charge everyone through the nose for using them.

        The question is whether:

        • ONLY tech companies capable of paying scraps during 70 years after the author’s death are allowed to create those tools
        • EVERYONE is allowed to train their own tool, without having to raise a few billion in seed capital

        In this case, OpenAI is acting as “the devil’s advocate”… and it’s working to fool people into supporting the opposite position.

    1 year ago

    Could they be legally required to open source the llm? I believe them, but that doesn’t make it right

    1 year ago

    Try to train a human comedian to make jokes without ever allowing him to hear another comedian’s jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.

    • sculd@beehaw.orgOP
      1 year ago

      AIs are not humans. Humans cannot read millions of texts in seconds and cannot split out millions of output at the same time.

    • luciole (he/him)
      1 year ago

      There’s this linguistic problem where one word is used for two different things, it becomes difficult to tell them apart. “Training” or “learning” is a very poor choice of word to describe the calibration of a neural network. The actor and action are both fundamentally different from the accepted meaning. To start with, human learning is active whereas machining learning is strictly passive: it’s something done by someone with the machine as a tool. Teachers know very well that’s not how it happens with humans.

      When I compare training a neural network with how I trained to play clarinet, I fail to see any parallel. The two are about as close as a horse and a seahorse.

        1 year ago

        Not sure what you mean by passive. It takes a hell of a lot of electricity to train one of these LLMs so something is happening actively.

        I often interact with ChatGPT 4 as if it were a child. I guide it through different kinds of mental problems, having it take notes and evaluate its own output, because I know our conversations become part of its training data.

        It feels very much like teaching a kid to me.

        • luciole (he/him)
          1 year ago

          I mean passive in terms of will. Computers want and do nothing. They’re machines that function according to commands.

          The way you feel like teaching a child when you feed input in natural language to a LLM until you’re satisfied with the output is known as the ELIZA effect. To quote Wikipedia:

          In computer science, the ELIZA effect is the tendency to project human traits — such as experience, semantic comprehension or empathy — into computer programs that have a textual interface. The effect is a category mistake that arises when the program’s symbolic computations are described through terms such as “think”, “know” or “understand.”

      1 year ago

      Try to train a human comedian to make jokes without ever allowing him to hear another comedian’s jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.

      1 year ago

      A comedian isn’t forming a sentence based on what the most probable word is going to appear after the previous one. This is such a bullshit argument that reduces human competency to “monkey see thing to draw thing” and completely overlooks the craft and intent behind creative works. Do you know why ChatGPT uses certain words over others? Probability. It decided as a result of its training that one word would appear after the previous in certain contexts. It absolutely doesn’t take into account things like “maybe this word would be better here because the sound and syllables maintains the flow of the sentence”.

      Baffling takes from people who don’t know what they’re talking about.

        1 year ago

        Text prediction seems to be sufficient to explain all verbal communication to me. Until someone comes up with a use case that humans can do that LLMs cannot, and I mean a specific use case not general high level concepts, I’m going to assume human verbal cognition works the same was as an LLM.

        We are absolutely basing our responses on what words are likely to follow which other ones. It’s literally how a baby learns language from those around them.

          1 year ago

          If you ask an LLM to help you with a legal brief, it’ll come up with a bunch of stuff for you, and some of it might even be right. But it’ll very likely do things like make up a case that doesn’t exist, or misrepresent a real case, and as has happened multiple times now, if you submit that work to a judge without a real lawyer checking it first, you’re going to have a bad time.

          There’s a reason LLMs make stuff up like that, and it’s because they have been very, very narrowly trained when compared to a human. The training process is almost entirely getting good at predicting what words follow what other words, but humans get that and so much more. Babies aren’t just associating the sounds they hear, they’re also associating the things they see, the things they feel, and the signals their body is sending them. Babies are highly motivated to learn and predict the behavior of the humans around them, and as they get older and more advanced, they get rewarded for creating accurate models of the mental state of others, mastering abstract concepts, and doing things like make art or sing songs. Their brains are many times bigger than even the biggest LLM, their initial state has been primed for success by millions of years of evolution, and the training set is every moment of human life.

          LLMs aren’t nearly at that level. That’s not to say what they do isn’t impressive, because it really is. They can also synthesize unrelated concepts together in a stunningly human way, even things that they’ve never been trained on specifically. They’ve picked up a lot of surprising nuance just from the text they’ve been fed, and it’s convincing enough to think that something magical is going on. But ultimately, they’ve been optimized to predict words, and that’s what they’re good at, and although they’ve clearly developed some impressive skills to accomplish that task, it’s not even close to human level. They spit out a bunch of nonsense when what they should be saying is “I have no idea how to write a legal document, you need a lawyer for that”, but that would require them to have a sense of their own capabilities, a sense of what they know and why they know it and where it all came from, knowledge of the consequences of their actions and a desire to avoid causing harm, and they don’t have that. And how could they? Their training didn’t include any of that, it was mostly about words.

          One of the reasons LLMs seem so impressive is that human words are a reflection of the rich inner life of the person you’re talking to. You say something to a person, and your ideas are broken down and manipulated in an abstract manner in their head, then turned back into words forming a response which they say back to you. LLMs are piggybacking off of that a bit, by getting good at mimicking language they are able to hide that their heads are relatively empty. Spitting out a statistically likely answer to the question “as an AI, do you want to take over the world?” is very different from considering the ideas, forming an opinion about them, and responding with that opinion. LLMs aren’t just doing statistics, but you don’t have to go too far down that spectrum before the answers start seeming thoughtful.

        1 year ago

        That’s not the point though. The point is that the human comedian and the AI both benefit from consuming creative works covered by copyright.

          1 year ago

          And human comedians regularly get called out when they outright steal others material and present it as their own.

          The word for this is plagiarism.

          And in OpenAIs framework, when used in a relevant commercial context, they are functionally operating and profiting off of the worlds most comprehensive plagiarism software.

          1 year ago

          Yeah except a machine is owned by a company and doesn’t consume the same way. It breaks down copyrighted works into data points so it can find the best way of putting those data points together again. If you understand anything at all about how these models work, they do not consume media the same way we do. It is not an entity with a thought process or consciousness (despite the misleading marketing of “AI” would have you believe), it’s an optimisation algorithm.

              1 year ago

              It’s so funny that this is something new. This was Grammarly’s whole schtick since before ChatGPT so how different is Grammarly AI?

                1 year ago

                Here is the bigger picture: The vast majority of tech illiterate people think something is AI because duh its called AI.

                Its literally just the power of branding and marketing on the minds of poorly informed humans.

                Unfortunately this is essentially a reverse Turing Test.

                The vast majority of humans do not know anything about AI, and also a huge majority of them can also barely tell the difference between, currently in some but not all forms, output from what is basically a brute force total internet plagiarism and synthesis software, from many actual human created content in many cases.

                To me this basically just means that about 99% of the time, most humans are actually literally NPCs, and they only do actual creative and unpredictable things very very rarely.

                  1 year ago

                  I call it AI because it’s artificial and it’s intelligent. It’s not that complicated.

                  The thing we have to remember is how scary and disruptive AI is. Given that fear, it is scary to acknowledge that we have AI emerging into our world. Because it is scary, that pushes us to want to ignore it.

                  It’s called denial, and it’s the best explanation for why people aren’t willing to acknowledge that LLMs are AI.

      • frog 🐸
        1 year ago

        I wish I could upvote this more than once.

        What people always seem to miss is that a human doesn’t need billions of examples to be able to produce something that’s kind of “eh, close enough”. Artists don’t look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn’t looking at billions of examples: it’s looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they’re trying to express.

        If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

          1 year ago

          When people say that the “model is learning from its training data”, it means just that, not that it is human, and not that it learns exactly humans. It doesn’t make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.

          Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.

          For example, when a model takes in a thousand images of circles, it doesn’t “learn” a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from “cat” to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.

            1 year ago

            In general I agree with you, but AI doesn’t learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it’s important to make the distinction.

            That is why current models aren’t regarded as actual intelligence, although people already call them that…

            1 year ago

            It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.

            But when it’s pointed out that LLMs don’t learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldn’t be judged by human standards? I don’t know if it’s intentional on your part, but that’s a pretty classic example of a motte-and-bailey fallacy. You can’t have it both ways.

          1 year ago

          What you count as “one” example is arbitrary. In terms of pixels, you’re looking at millions right now.

          The ability to train faster using fewer examples in real time, similar to what an intelligent human brain can do, is definitely a goal of AI research. But right now, we may be seeing from AI what a below average human brain could accomplish with hundreds of lifetimes to study.

          If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

          I mean, no, if you only ever look at public domain stuff you literally wouldn’t know the state of the art, which is historically happening for profit. Even the most untrained artist “doing their own thing” watches Disney/Pixar movies and listens to copyrighted music.

          • frog 🐸
            1 year ago

            If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

            And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.

            OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities - too vast for them to simply compensate all the creators. So it genuinely is not comparable to a human, because humans can, in fact, learn without using copyrighted material. If OpenAI’s argument is actually that their AI can’t compete commercially with modern art without using copyrighted works, then they should be honest about that - but then they’d be showing their hand, wouldn’t they?

              1 year ago

              It isn’t wrong to use copyrighted works for training. Let me quote an article by the EFF here:

              First, copyright law doesn’t prevent you from making factual observations about a work or copying the facts embodied in a work (this is called the “idea/expression distinction”). Rather, copyright forbids you from copying the work’s creative expression in a way that could substitute for the original, and from making “derivative works” when those works copy too much creative expression from the original.

              Second, even if a person makes a copy or a derivative work, the use is not infringing if it is a “fair use.” Whether a use is fair depends on a number of factors, including the purpose of the use, the nature of the original work, how much is used, and potential harm to the market for the original work.


              Even if a court concludes that a model is a derivative work under copyright law, creating the model is likely a lawful fair use. Fair use protects reverse engineering, indexing for search engines, and other forms of analysis that create new knowledge about works or bodies of works. Here, the fact that the model is used to create new works weighs in favor of fair use as does the fact that the model consists of original analysis of the training images in comparison with one another.

              What you want would swing the doors open for corporate interference like hindering competition, stifling unwanted speech, and monopolization like nothing we’ve seen before. There are very good reasons people have these rights, and we shouldn’t be trying to change this. Ultimately, it’s apparent to me, you are in favor of these things. That you believe artists deserve a monopoly on ideas and non-specific expression, to the detriment of anyone else. If I’m wrong, please explain to me how.

              If we’re going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

              Humans benefit from years of evolutionary development and corporeal bodies to explore and interact with their world before they’re ever expected to produce complex art. AI need huge datasets to understand patterns to make up for this disadvantage. Nobody pops out of the womb with fully formed fine motor skills, pattern recognition, understanding of cause and effect, shapes, comparison, counting, vocabulary related to art, and spatial reasoning. Datasets are huge and filled with image-caption pairs to teach models all of this from scratch. AI isn’t human, and we shouldn’t judge it against them, just like we don’t judge boats on their rowing ability.

              And humans don’t require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just… go outside and draw things they see themselves, because the sky above them and the tree across the street aren’t copyrighted. And in fact, I’d argue that a good artist should go out and find real things to draw.

              AI don’t require most modern art in order to learn to make images either, but the range of expression would be limited, just like a human’s in this situation. You can see this in cave paintings and early sculptures. They wouldn’t be limited to this same degree, but you would still be limited.

              It took us 100,000 years to get from cave drawings to Leonard Da Vinci. This is just another step for artists, like Camera Obscura was in the past. It’s important to remember that early man was as smart as we are, they just lacked the interconnectivity to exchange ideas that we have.

                1 year ago

                I think the difference in artistic expression between modern humans and humans in the past comes down to the material available (like the actual material to draw with).

                Humans can draw without seeing any image ever. Blind people can create art and draw things because we have a different understanding of the world around us than AI has. No human artist needs to look at a thousand or even at 1 picture of a banana to draw one.

                The way AI sees and “understands” the world and how it generates an image is fundamentally different from how the human brain conveys the object banana into an image of a banana.

              1 year ago

              Sure, if they want to compete with modern artists, they would need to look at modern artists

              Which is the literal goal of Dall-E, SD, etc.

              But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works

              They could definitely learn some amount of skill, I agree. I’d be very interested to see the best that an AI could achieve using only PD and CC content. It would be interesting. But you’d agree that it would look very different from modern art, just as an alien who has only been consuming earth media from 100+ years ago would be unable to relate to us.

              the sky above them and the tree across the street aren’t copyrighted.

              Yeah, I’d consider that PD/CC content that such an AI would easily have access to. But obviously the real sky is something entirely different from what is depicted in Starry Night, Star Wars, or H.P. Lovecraft’s description of the cosmos.

              OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities

              Yeah, I’d consider that a strong claim on their part; what they really mean is, it’s the easiest way to make progress in AI, and we wouldn’t be anywhere close to where we are without it.

              And you could argue “convenient that it both saves them money, and generates money for them to do it this way”, but I’d also point out that the alternative is they keep the trained models closed source, never using them publicly until they advance the tech far enough that they’ve literally figured out how to build/simulate a human brain that is able to learn as quickly and human-like as you’re describing. And then we find ourselves in a world where one or two corporations have this incredible proprietary ability that no one else has.

              Personally, I’d rather live in the world where the information about how to do all of this isn’t kept for one or two corporations to profit from, I would rather live in the version where they publish their work publicly, early, and often, show that it works, and people are able to reproduce it, open source it, train their own models, and advance the technology in a space where anyone can use it.

              You could hypothesize of a middle ground where they do the research, but aren’t allowed to profit from it without licensing every bit of data they train on. But the reality of AI research is that it only happens to the extent that it generates revenue. It’s been that way for the entire history of AI. Douglas Hofstadter has been asking deep important questions about AI as it relates to consciousness for like 60 years (ex. GEB, I am a Strange Loop), but there’s a reason he didn’t discover LLMs and tech companies did. That’s not to say his writings are meaningless, in fact I think they’re more important than ever before, but he just wasn’t ever going to get to this point with a small team of grad students, a research grant, and some public domain datasets.

              So, it’s hard to disagree with OpenAI there, AI definitely wouldn’t be where it is without them doing what they’ve done. And I’m a firm believer that unless we figure our shit out with energy generation soon, the earth will be an uninhabitable wasteland. We’re playing a game of climb the Kardashev scale, we opted for the “burn all the fossil fuels as fast as possible” strategy, and now we’re a the point where either spent enough energy fast enough to figure out the tech needed to survive this, or we suffocate on the fumes. The clock is ticking, and AI may be our best bet at saving the human race that doesn’t involve an inordinate number of people dying.

              • frog 🐸
                1 year ago

                OpenAI are not going to make the source code for their model accessible to all to learn from. This is 100% about profiting from it themselves. And using copyrighted data to create open source models would seem to violate the very principles the open source community stands for - namely that everybody contributes what they agree to, and everything is published under a licence. If the basis of an open source model is a vast quantity of training data from a vast quantity of extremely pissed off artists, at least some of the people working on that model are going to have a “are we the baddies?” moment.

                The AI models are also never going to produce a solution to climate change that humans will accept. We already know what the solution is, but nobody wants to hear it, and expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous. And an AI that is trained specifically on knowledge about the climate and technologies that can improve it, with the purpose of innovating some hypothetical technology that will fix everything without humans changing any of their behaviour, categorically does not need the entire contents of ArtStation in its training data. AIs that are trained to do specific tasks, like the ones trained to identify new antibiotics, are trained on a very limited set of data, most of which is not protected by copyright and any that is can be easily licenced because the quantity is so small - and you don’t see anybody complaining about those models!

                  1 year ago

                  OpenAI are not going to make the source code for their model accessible to all to learn from

                  OpenAI isn’t the only company doing this, nor is their specific model the knowledge that I’m referring to.

                  The AI models are also never going to produce a solution to climate change that humans will accept.

                  It is already being used to further fusion research beyond anything we’ve been able to do with standard algorithms

                  We already know what the solution is, but nobody wants to hear it

                  Then it’s not a solution. That’s like telling your therapist, “I know how to fix my relationship, my partner just won’t do it!”

                  expecting anyone to listen to ChatGPT and suddenly change their minds about using fossil fuels is ludicrous

                  Lol. Yeah, I agree, that’s never going to work.

                  categorically does not need the entire contents of ArtStation in its training data.

                  That’s a strong claim to make. Regardless of the ethics involved, or the problems the AI can solve today, the fact is we seeing rapid advances in AI research as a direct result of these ethically dubious models.

                  In general, I’m all for the capitalist method of artists being paid their fair share for the work they do, but on the flip side, I see a very possible mass extinction event on the horizon, which could cause suffering the likes of which humanity has never seen. If we assume that is the case, and we assume AI has a chance of preventing it, then I would prioritize that over people’s profits today. And I think it’s perfectly reasonable to say I’m wrong.

                  And then there’s the problem of actually enforcing any sort of regulation, which would be so much more difficult than people here are willing to admit. There’s basically nothing you can do even if you wanted to. Your Carlin example is exactly the defense a company would use: “I guess our AI just happened to create a movie that sounds just like Paul Blart, but we swear it’s never seen the film. Great minds think alike, I guess, and we sell only the greatest of minds”.

          1 year ago

          Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.

          This doesn’t even touch the fact that I’m learning to draw not by looking at other drawings but what exactly I’m trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether it’s through a digital medium or traditional pen/pencil and paper. But the skill isn’t being able replicate other drawings, it’s being able to convert something I can see into a drawing. If I’m drawing someone sitting in a wheelchair, then I’ll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I don’t want. There’s so much that goes into creative work and I’m tired of arguing with people who have no idea what it takes to produce creative works.

          • frog 🐸
            1 year ago

            It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than “well anyone can draw, children do it all the time”. They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don’t get it, and to an extent, that’s fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.

              1 year ago

              Well, that’s not me. I’m a creative, and I see deep parallels between how LLMs work and how my own mind works.

              • frog 🐸
                1 year ago

                Either you’re vastly overestimating the degree of understanding and insight AIs possess, or you’re vastly underestimating your own capabilities. :)

                  1 year ago

                  Alternatively, you might be vastly overestimating human “understanding and insight”, or how much of it is really needed to create stuff.

                  1 year ago

                  This whole AI craze has just shown me that people are losing faith in their own abilities and their ability to learn things. I’ve heard so many who use AI to generate “artwork” argue that they tried to do art “for years” without improving, and hence have come to conclusion that creativity is a talent that only some have, instead of a skill you can learn and hone. Just because they didn’t see results as fast as they’d have liked.

          1 year ago

          Children learn by watching others. We are trained from millions of examples starting from before birth.

      • You do know that comedians are copying each others material all the time though? Either making the same joke, or slightly adapting it.

        So in the context of copyright vs. model training i fail to see how the exact process of the model is relevant? At the end copyrighted material goes in and material based on that copyrighted material goes out.

        1 year ago

        A comedian isn’t forming a sentence based on what the most probable word is going to appear after the previous one.

        Neither is an LLM. What you’re describing is a primitive Markov chain.

        You may not like it, but brains really are just glorified pattern recognition and generation machines. So yes, “monkey see thing to draw thing”, except a really complicated version of that.

        Think of it this way: if your brain wasn’t a reorganization and regurgitation of the things you have observed before, it would just generate random noise. There’s no such thing as “truly original” art or it would be random noise. Every single word either of us is typing is the direct result of everything you and I have observed before this moment.

        Baffling takes from people who don’t know what they’re talking about.

        Ironic, to say the least.

        The point you should be making, is that a corporation will make this above argument up to, but not including the point where they have to treat AIs ethically. So that’s the way to beat them. If they’re going to argue that they have created something that learns and creates content like a human brain, then they should need to treat it like a human, ensure it is well compensated, ensure it isn’t being overworked or enslaved, ensure it is being treated “humanely”. If they don’t want to do that, if they want it to just be a well built machine, then they need to license all the proprietary data they used to build it. Make them pick a lane.

          1 year ago

          Neither is an LLM. What you’re describing is a primitive Markov chain.

          My description might’ve been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn’t know.

          What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.

          LLMs are an evolution of the same idea. I’m not saying it’s not impressive because it’s very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.

          It’s easy to look at what people have created throughout history and think “this looks like that” and on a point by point basis you’d be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we’ve heard recently but we’ll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don’t know. But Carlin regularly calls upon his own experiences so it’s likely that he’s referencing a event from his past that is similar to that of 200 years ago. He might’ve subconsciously absorbed the information.

          The point is that the way these models have been trained is unethical. They used material they had no license to use and they’ve admitted that it couldn’t work as well as it does without stealing other people’s work. I don’t think they’re taking the position that it’s intelligent because from the beginning that was a marketing ploy. They’re taking the position that they should be allowed to use the data they stole because there was no other way.

        1 year ago

        That’s what humans do, though. Maybe not probability directly, but we all know that some words should be put in a certain order. We still operate within standard norms that apply to aparte group of people. LLM’s just go about it in a different way, but they achieve the same general result. If I’m drawing a human, that means there’s a ‘hand’ here, and a ‘head’ there. ‘Head’ is a weird combination of pixels that mostly look like this, ‘hand’ looks kinda like that. All depends on how the model is structured, but tell me that’s not very similar to a simplified version of how humans operate.

          1 year ago

          Yeah but the difference is we still choose our words. We can still alter sentences on the fly. I can think of a sentence and understand verbs go after the subject but I still have the cognition to alter the sentence to have the effect I want. The thing lacking in LLMs is intent and I’m yet to see anyone tell me why a generative model decides to have more than 6 fingers. As humans we know hands generally have five fingers and there’s a group of people who don’t so unless we wanted to draw a person with a different number of fingers, we could. A generative art model can’t help itself from drawing multiple fingers because all it understands is that “finger + finger = hand” but it has no concept on when to stop.

            1 year ago

            I don’t choose my words man. I get a vague sense of the meaning I want to convey and the words just form themselves.

            1 year ago

            And that’s the reason why LLM generated content isn’t considered creative.

            I do believe that the person using the device has a right to copyright the unique method they used to generate the content, but the content itself isn’t anything worth protecting.

              1 year ago

              You say that yet I initially responded to someone who was comparing an LLM to what a comedian does.

              There is no unique method because there’s hardly anything unique you can do. Two people using Stable Diffusion to produce an image are putting in the same amount of work. One might put more time into crafting the right prompt but that’s not work you’re doing.

              If 90% of the work is handled by the model, and you just layer on whatever extra thing you wanted, that doesn’t mean you created the thing. That also implies you have much control over the output. You’re effectively negotiating with this machine to produce what you want.

                1 year ago

                more time into crafting the right prompt

                Thats not work to you? My company pays me to spend time to do the right thing, even though most of the work does the computer.

                I see where you are going at, but your argument also invalidates other forms of human interaction and creating.

                In my country copyright can only be granted if a certain amount of (human) work went into something. Any work.
                The difficult part is finding out whats enough and what kind of work qualify to lead to some kind of protection, even if partial.
                The difficult part was not to create something, but to prove someone did or didnt put enough work into it.
                I think we can hold generated or assisted goods to the same standard.

                Putting a simple prompt together should probably not be granted protection as no significant work went into it. But refining it, editing the result… maybe thats enough, thats really up to the society to decide.

                At the same time we have to balance the power of machines against human work, so the human work doesnt get totally invalidated, but rather shifted and treated as sub-type.
                Machines already replaced alot of work, also creative ones. Book-printing, forging, producing food… the scary part about generative AI is mainly the speed of them spreading.

                  1 year ago

                  So as a data analyst a lot of my work is done through a computer but I can apply my same skills if someone hands me a piece of paper with data printed on it and told me to come up with solutions to the problems with it. I don’t need the computer to do what I need to do, it makes it easier to manipulate data but the degree of problem solving required needs to be done by a human and that’s why it’s my job. If a machine could do it, then they would be doing it but they aren’t because contrary to what people believe about data analysis, you have to be somewhat creative to do it well.

                  Crafting a prompt is an exercise in trial and error. It’s work but it’s not skilled work. It doesn’t take talent or practice to do. Despite the prompt, you are still at the mercy of the machine.

                  Even by the case you’ve presented, I have to ask, at what point of a human editing the output of a generative model constitutes it being your own work and not the machine’s? How much do you have to change? Can you give me a %?

                  Machines were intended to automate the tedious tasks that we all have to suffer to free up our brains for more engaging things which might include creative pursuits. Automation exists to make your life easier, not to rob you of life’s pursuits or your livelihood. It never should’ve been used to produce creative work and I find the attempts to equate this abomination’s outputs to what artists have been doing for years, utterly deplorable.

                1 year ago

                Wouldn’t that lead to the same argument as originally brought against photography, though?

                A photographer is effectively negotiating with the sun, the sky and everything else to hopefully get the result they are looking for on their device.

                  1 year ago

                  One difference is that the photographer has to go the places they’re taking pictures of.

                  Another is that photography isn’t comparable to paintings and it never has been. I’m willing to bet photography and paintings have never coexisted in a contest. Except, when people say their generative art is comparable to what artists have been producing by hand, they are admitting that generative art has more in common with photography than it does with hand-crafted art but they want the prestige and recognition those artists get for their work.

          1 year ago

          As an artist you draw with an understanding of the human body, though. An understanding current models don’t have because they aren’t actually intelligent.

          Maybe when a human is an absolute beginner in drawing they will think about the different lines and replicate even how other people draw stuff that then looks like a hand.

          But eventually they will realise (hopefully, otherwise they may get frustrated and stop drawing) that you need to understand the hand to draw one. It’s mass, it’s concept or the idea of what a hand is.

          This may sound very abstract and strange but creative expression is more complex than replicating what we have seen a million times. It’s a complex function unique to the human brain, an organ we don’t even scientifically understand yet.

      • Pup
        1 year ago

        you know how the neurons in our brain work, right?

        because if not, well, it’s pretty similar… unless you say there’s a soul (in which case we can’t really have a conversation based on fact alone), we’re just big ol’ probability machines with tuned weights based on past experiences too

          1 year ago

          You are spitting out basic points and attempting to draw similarities because our brains are capable of something similar. The difference between what you’ve said and what LLMs do is that we have experiences that we are able to glean a variety of information from. An LLM sees text and all it’s designed to do is say “x is more likely to appear before y than z”. If you fed it nonsense, it would regurgitate nonsense. If you feed it text from racist sites, it will regurgitate that same language because that’s all it has seen.

          You’ll read this and think “that’s what humans do too, right?” Wrong. A human can be fed these things and still reject them. Someone else in this thread has made some good points regarding this but I’ll state them here as well. An LLM will tell you information but it has no cognition on what it’s telling you. It has no idea that it’s right or wrong, it’s job is to convince you that it’s right because that’s the success state. If you tell it it’s wrong, that’s a failure state. The more you speak with it, the more fail states it accumulates and the more likely it is to cutoff communication because it’s not reaching a success, it’s not giving you what you want. The longer the conversation goes on, the more crazy LLMs get as well because it’s too much to process at once, holding those contexts in its memory while trying to predict the next one. Our brains do this easily and so much more. To claim an LLM is intelligent is incredibly misguided, it is merely the imitation of intelligence.

          • Pup
            1 year ago

            but that’s just a matter of complexity, not fundamental difference. the way our brains work and the way an artificial neural network work aren’t that different; just that our brains are beyond many orders of magnitude bigger

            there’s no particular reason why we can’t feed artificial neural networks an enormous amount of … let’s say tangentially related experiential information … as well, but in order to be efficient and make them specialise in the things we want, we only feed them information that’s directly related to the specialty we want them to perform

            there’s some… “pre training” or “pre-existing state” that exists with humans too that comes from genetics, but i’d argue that’s as relevant to the actual task of learning, comprehension, and creating as a BIOS is to running an operating system (that is, a necessary precondition to ensure the correct functioning of our body with our brain, but not actually what you’d call the main function)

            i’m also not claiming that an LLM is intelligent (or rather i’d prefer to use the term self aware because intelligent is pretty nebulous); just that the structure it has isn’t that much different to our brains just on a level that’s so much smaller and so much more generic that you can’t expect it to perform as well as a human - you wouldn’t expect to cut out 99% of a humans brain and have them be able to continue to function at the same level either

            i guess the core of what i’m getting at is that the self awareness that humans have is definitely not present in an LLM, however i don’t think that self-awareness is necessarily a pre-requisite for most things that we call creativity. i think that’s it’s entirely possible for an artificial neural net that’s fundamentally the same technology that we use today to be able to ingest the same data that a human would from birth, and to have very similar outcomes… given that belief (and i’m very aware that it certainly is just a belief - we aren’t close to understanding our brains, but i don’t fundamentally thing there’s anything other then neurons firing that results in the human condition), just because you simplify and specialise the input data doesn’t mean that the process is different. you could argue that it’s lesser, for sure, but to rule out that it can create a legitimately new work is definitely premature

          1 year ago

          “Soul” is the word we use for something we don’t scientifically understand yet. Unless you did discover how human brains work, in that case I congratulate you on your Nobel prize.

          You can abstract a complex concept so much it becomes wrong. And abstracting how the brain works to “it’s a probability machine” definitely is a wrong description. Especially when you want to use it as an argument of similarity to other probability machines.

          • Pup
            1 year ago

            “Soul” is the word we use for something we don’t scientifically understand yet

            that’s far from definitive. another definition is

            A part of humans regarded as immaterial, immortal, separable from the body at death

            but since we aren’t arguing semantics, it doesn’t really matter exactly, other than the fact that it’s important to remember that just because you have an experience, belief, or view doesn’t make it the only truth

            of course i didn’t discover categorically how the human brain works in its entirety, however most scientists i’m sure would agree that the method by which the brain performs its functions is by neurons firing. if you disagree with that statement, the burden of proof is on you. the part we don’t understand is how it all connects up - the emergent behaviour. we understand the basics; that’s not in question, and you seem to be questioning it

            You can abstract a complex concept so much it becomes wrong

            it’s not abstracted; it’s simplified… if what you’re saying were true, then simplifying complex organisms down to a petri dish for research would be “abstracted” so much it “becomes wrong”, which is categorically untrue… it’s an incomplete picture, but that doesn’t make it either wrong or abstract

            *edit: sorry, it was another comment where i specifically said belief; the comment you replied to didn’t state that, however most of this still applies regardless

            i laid out an a leads to b leads to c and stated that it’s simply a belief, however it’s a belief that’s based in logic and simplified concepts. if you want to disagree that’s fine but don’t act like you have some “evidence” or “proof” to back up your claims… all we’re talking about here is belief, because we simply don’t know - neither you nor i

            and given that all of this is based on belief rather than proof, the only thing that matters is what we as individuals believe about the input and output data (because the bit in the middle has no definitive proof either way)

            if a human consumes media and writes something and it looks different, that’s not a violation

            if a machine consumes media and writes something and it looks different, you’re arguing that is a violation

            the only difference here is your belief that a human brain somehow has something “more” than a probabilistic model going on… but again, that’s far from certain

    1 year ago

    I will repeat what I have proffered before:

    If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.

    Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.

    Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.

    To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.

      1 year ago

      Yep, completely agree.

      Case in point: Steam has recently clarified their policies of using such Ai generated material that draws on essentially billions of both copyrighted and non copyrighted text and images.

      To publish a game on Steam that uses AI gen content, you now have to verify that you as a developer are legally authorized to use all training material for the AI model for commercial purposes.

      This also applies to code and code snippets generated by AI tools that function similarly, such as CoPilot.

      So yeah, sorry, either gotta use MIT liscensed open source code or write your own, and you gotta do your own art.

      I imagine this would also prevent you from using AI generated voice lines where you trained the model on basically anyone who did not explicitly consent to this as well, but voice gen software that doesnt use the ‘train the model on human speakers’ approach would probably be fine assuming you have the relevant legal rights to use such software commercially.

      Not 100% sure this is Steam’s policy on voice gen stuff, they focused mainly on art dialogue and code in their latest policy update, but the logic seems to work out to this conclusion.

    • sculd@beehaw.orgOP
      1 year ago


      There is nothing “fair” about the way Open AI steals other people’s work. ChatGPT is being monetized all over the world and the large number of people whose work has not been compensated will never see a cent of that money.

      At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

      Tech bros are disgusting.

        1 year ago

        Tech bros are disgusting.

        That’s not even getting into the fraternity behavior at work, hyper-reactionary politics and, er, concerning age preferences.

        • sculd@beehaw.orgOP
          1 year ago

          Yup. I said it in another discussion before but think its relevant here.

          Tech bros are more dangerous than Russian oligarchs. Oligarchs understand the people hate them so they mostly stay low and enjoy their money.

          Tech bros think they are the savior of the world while destroying millions of people’s livelihood, as well as destroying democracy with their right wing libertarian politics.

        1 year ago

        At the same time the LLM will be used to replace (at least some of ) the people who created those works in the first place.

        This right here is the core of the moral issue when it comes down to it, as far as I’m concerned. These text and image models are already killing jobs and applying downward pressure on salaries. I’ve seen it happen multiple times now, not just anecdotally from some rando on an internet comment section.

        These people losing jobs and getting pay cuts are who created the content these models are siphoning up. People are not going to like how this pans out.

          1 year ago

          The flip side of this is that many artists who simply copy very popular art styles are now functionally irrelevant, as it is now just literally proven that this kind of basically plagiarism AI is entirely capable of reproducing established styles to a high degree of basically fidelity.

          While many aspects of this whole situation are very bad for very many reasons, I am actually glad that many artists will be pressured to actually be more creative than an algorithm, though I admit this comes from basically a personally petty standpoint of having known many, many, many mediocre artists who themselves and their fans treat like gods because they can emulate some other established style.

            1 year ago

            Literally every artist copies, it’s how we all learn. The difference is that every artist out there does not have an enterprise-class-data-center-powerd-super-human ability to absorb <ALL THE ART> and then be able to spit out anything instantly. It still takes time and hard work and dedication. And through the years of hard work people put into learning how their heroes do X, Y, and Z, they develop a style of their own.

            It’s how artists cut their teeth and work their way into the profession. What you’re welcoming in is a situation where nobody can find any success whatsoever until they are absolutely original and of course that is an impossible moving target when every original ideal and design and image can just be instantly siphoned back up into the AI model.

            Nobody could survive that way. Nobody can break into the artistic industry that way. Except for the wealthy. All the low level work people get earlier in their careers that helps keep them afloat while they learn is gone now. You have to be independently wealthy to become a high level artist capable of creating truly original work. Because there’s no other way to subsidize the time and dedication that takes when all the work for people honing their craft has been hoovered up by machines.

              1 year ago

              No, I am not welcoming an artist apocalypse, that would obviously be bad.

              I am noting that I find it amusing to me on a level I already acknowledged was petty and personal that many, many mediocre artists who are absolutely awful to other people socially would have their little cults of fandom dampened by the fact that a machine can more or less to what they do, and their cult leader status is utterly unwarranted.

              I do not have a nice and neat solution to the problem you bring up.

              I do believe you are being somewhat hyperbolic, but, so was I.

              Yep, being an artist in a capitalist hellscape world with modern AI algorithms is not a very reliable way to earn a good living and you are not likely to be have such a society produce many artists who do not have either a lot of free time or money, or you get really lucky.

              At this point we are talking about completely reorganizing society in fairly large and comprehensive ways to achieve significant change on this front.

              Also this problem applies to far, far more people than just artists. One friend of mine wanted her dream job as running a little bakery! Had to set her prices too high, couldn’t afford a good location, supply chain problems, taxes, didn’t work out.

              Maybe someone’s passion is teaching! Welp, that situation is all fucked too.

              My point here is: Ok, does anyone have an actual plan that can actually transform the world into somewhere that allow the average person to be far more likely to be able to live the life they want?

              Would that plan have more to do with the minutiae of regulating a specific kind of ever advancing and ever changing technology in some kind of way that will be irrelevant when the next disruptive tech proliferates in a few years, or maybe more like an actual total overhaul of our entire society from the ground up?

          1 year ago

          Any company replacing humans with AI is going to regret it. AI just isn’t that good and probably won’t ever be, at least in it’s current form. It’s all an illusion and is destined to go the way of Bitcoin, which is to say it will shoot up meteorically and seem like the answer to all kinds of problems, and then the reality will sink in and it will slowly fade to obscurity and irrelevance. That doesn’t help anyone affected today, of course.

              1 year ago

              It’s garbage for programming. A useful tool but not one that can be used by a non-expert. And I’ve already had to have a conversation with one of my coworkers when they tried to submit absolutely garbage code.

              This isn’t even the first attempt at a smart system that enables non-programmers to write code. They’ve all been garbage. So, too, will the next one be but every generation has to try it for themselves. AGI might have some potential some day, but that’s a long long way off. Might as well be science fiction.

              Other disciplines are affected differently, but I constantly play with image and text generation and they are all some flavor of garbage. There are some areas where AI can excel but they are mostly professional tools and not profession replacements.

                1 year ago

                OpenAi, please generate your own source code but optimized and improved in all possible ways.

                not how programming works, but tech illiterate people seem to think so

                1 year ago

                It was of no use whatsoever to programming or image generation or writing a few years ago. This thing has developed very quickly and will continue to. Give it 5 years and I think things will look very differently.

      1 year ago

      I suspect the US government will allow OpenAI to continue doing as it please to keep their competitive advantage in AI over China (which don’t have problem with using copyrighted materials to train their models). They already limit selling AI-related hardware to keep their competitive advantage, so why stop there? Might as well allow OpenAI to continue using copyrighted materials to keep the competitive advantage.

      1 year ago

      With your logic all artists will have to pay copyright fees just to learn how to draw. All musicians will have to pay copyright fees just to learn their instrument.

      I guess I should clarify by saying I’m a professional musician.

      1 year ago

      So why is so much information (data) freely available on the internet? How do you expect a human artist to learn drawing, if not looking at tutorials and improving their skills through emulating what they see?