Maven Imported 1.12 Million Fediverse Posts

Sean Tilley@lemmy.world · 6 months ago

Maven Imported 1.12 Million Fediverse Posts

Larry@lemmy.world · 6 months ago

Am I misunderstanding this, or did they just fuck up the integration so it’s one way with a plan to make it two ways after, and the AI alteration is just sentiment analysis on whatever they took?

FaceDeer@fedia.io · 6 months ago

Looks like it.

In addition to pulling in posts, the import process seems to be running AI sentiment analysis to add tags and relational data after content reaches Maven’s servers. This is a core part of Maven’s product: instead of follows or likes, a model trains itself on its own data in an attempt to surface unique content algorithmically.

But of course, that news doesn’t give the reader those lovely rage endorphins or draw clicks.

This is the Fediverse, having the content we post get spread around to other servers is the whole point of all this. Is this a face-eating leopard situation? People are genuinely surprised and upset that the stuff we post here is ending up being shown in other places?

There is one thing I see here that raises my eyebrows:

Even more shocking is the revelation that somehow, even private DMs from Mastodon were mirrored on their public site and searchable. How this is even possible is beyond me, as DM’s are ostensibly only between two parties, and the message itself was sent from two hackers.town users.

But that sounds to me like a hackers.town problem, it shouldn’t be sending out private DMs to begin with.

Sean Tilley@lemmy.world · 6 months ago

They kind of fucked up everything in approaching this by not talking to the community and collecting feedback, making dumb assumptions in how the integration was supposed to work, leaking private posts, running everything through their AI system, and neglecting to represent the remote content as having came from anywhere else.

The other thing is that Maven’s whole concept is training an AI over and over again on the platform’s posts. Ostensibly, this could mean that a lot of Fediverse content ended up in the training data.