• dan@upvote.au
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    People that want to train AI models on Reddit content can just scrape the site, or use data from archive sites that archive Reddit content.

    • AnyOldName3@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      The archive sites used to use the API, which is another reason they wanted to get rid of it. I always found they were a great moderation tool as users would always edit their posts to no longer break the rules before they claimed a rogue moderator had banned them for no reason, and there was no way within reddit to prove them wrong.

        • AnyOldName3@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          Yeah, the Wayback Machine doesn’t use Reddit’s API, but on the other hand, I’m pretty sure they don’t automatically archive literally everything that makes it onto Reddit - doing that would require the API to tell you about every new post, as just sorting /r/all by new and collecting every link misses stuff.