• BotCheese@beehaw.org
      link
      fedilink
      arrow-up
      0
      ·
      10 months ago

      From what I understand it is some thing for AI, to stop them from harvesting or to poison the data, by having it repeating therefore more likely to show up.

      • beefcat@beehaw.org
        link
        fedilink
        arrow-up
        8
        ·
        10 months ago

        Sounds an awful lot like that thing boomers used to do on Facebook where they would post a message on their wall rescinding Facebook’s rights to the content they post there. I’m sure it’s equally effective.

          • t3rmit3@beehaw.org
            link
            fedilink
            arrow-up
            2
            ·
            10 months ago

            That would require a significant number of people to be doing it, to ‘poison’ the input pool, as it were.

      • mozz@mbin.grits.dev
        link
        fedilink
        arrow-up
        1
        ·
        10 months ago

        I would be extremely extremely surprised if the AI model did anything different with “this comment is protected by CC license so I don’t have the legal right to it” as compared with its normal “this comment is copyright by its owner so I don’t have the legal right to it hahaha sike snork snork snork I absorb” processing mode.

        • Max-P@lemmy.max-p.me
          link
          fedilink
          arrow-up
          1
          ·
          10 months ago

          No but if they forget to strip those before training the models, it’s gonna start spitting out licenses everywhere, making it annoying for AI companies.

          It’s so easily fixed with a simple regex though, it’s not that useful. But poisoning the data is theoretically possible.

          • t3rmit3@beehaw.org
            link
            fedilink
            arrow-up
            1
            ·
            10 months ago

            Only if enough people were doing this to constitute an algorithmically-reducible behavior.

            If you could get everyone who mentions a specific word or subject to put a CC license in their comment, then an ML model trained on those comments would likely output the license name when that subject was mentioned, but they don’t just randomly insert strings they’ve seen, without context.