Huge proportion of internet is AI-generated slime, researchers find

haxor@derp.foo · 11 months ago

Huge proportion of internet is AI-generated slime, researchers find

AutoTL;DR@lemmings.world · 11 months ago

This is the best summary I could come up with:

Amazon has also had a notably rough go with AI content; in addition to its serious AI-generated book listings problem, a recent Futurism report revealed that the e-commerce giant is flooded with products featuring titles such as “I cannot fulfill this request it goes against OpenAI use policy.”

Elsewhere, beyond specific platforms, numerous reports and studies have made clear that AI-generated content abounds throughout the web.

But while the English-language web is experiencing a steady — if palpable — AI creep, this new study suggests that the issue is far more pressing for many non-English speakers.

What’s worse, the prevalence of AI-spun gibberish might make effectively training AI models in lower-resource languages nearly impossible in the long run.

To train an advanced LLM, AI scientists need large amounts of high-quality data, which they generally get by scraping the web.

If a given area of the internet is already overrun by nonsensical AI translations, the possibility of training advanced models in rarer languages could be stunted before it even starts.

The original article contains 465 words, the summary contains 169 words. Saved 64%. I’m a bot and I’m open source!

Sloogs@lemmy.dbzer0.com · edit-2 11 months ago

My fear is that Google is going to succeed in using this as an excuse to unilaterally destroy the free web like they’ve already been trying with attestation.

But the modern web really does suck. I’m not sure how to fix it without corporate influence.

noUsernamesLef7@infosec.pub · 11 months ago

Curation is my answer. Return to the old ways of curating your own lists of resources and sharing them with other people. Web rings, blog rolls, link sharing, RSS

GrappleHat@lemmy.ml · 11 months ago

This

Showroom7561@lemmy.ca · 11 months ago

I’m noticing more of this on sites I’ve never visited. It’s all word salad.

If the internet continues like this, it’ll be next to useless.

henfredemars@infosec.pub · edit-2 11 months ago

This makes me irrationally angry when I’m looking for technical information. The preview looks reasonable. Click on the link, and it’s just word salad of technical terms, structured in an intelligent way, but completely devoid of meaning.

Search engines are screwed, and possibly future AI training as well.

Showroom7561@lemmy.ca · 11 months ago

Yeah, websites designed as Q&As are the worst, too.

The first few questions and answers make some sense, and then it just devolves into off topic nonsense that has some keywords you were originally using in your search.

The problem is, if you don’t know enough about a topic, you can’t even assess whether it’s real or crap.

Empathy [he/him]@beehaw.org · edit-2 11 months ago

Ironically, the best way I found to combat this is to use search engines that summarize result pages with AI (e.g., Bing Copilot or Perplexity).

It still sucks even with those options, but it at-least reduces the need to go through several pages of results before finding the first relevant one. Still, the LLMs of those engines hallucinate regularly and give very naive answers, so they’re mostly useful for finding relevant sources IMO.

Disclaimer: I pay for Perplexity. I use Perplexity every day but I haven’t tried Bing Copilot that much. I haven’t used ChatGPT much, I find it way too unreliable, I can’t trust its answers. I’m not an investor nor employee of either.

Huge proportion of internet is AI-generated slime, researchers find

Huge proportion of internet is AI-generated slime, researchers find

Huge Proportion of Internet Is AI-Generated Slime, Researchers Find