Could Reddit's data be "poisoned" to prevent its use in training AI?

nodsocket@lemmy.world · edit-2 11 months ago

Could Reddit's data be "poisoned" to prevent its use in training AI?

FaceDeer@kbin.social · 11 months ago

Reddit’s surely got a copy of the PushShift archives, it’ll have all the pre-sabotage versions of those comments.

Lvxferre@mander.xyz · 11 months ago

The PS archives are publicly available. If either OpenAI or Google were to use it, they wouldn’t pay Reddit Inc. a single penny; and yet Google is paying it 60 million dollars do to do. This means that there’s content that they cannot retrieve through the PS archives that would still be valuable as LLM data.