Federated services have always had privacy issues but I expected Lemmy would have the fewest, but it’s visibly worse for privacy than even Reddit.
- Deleted comments remain on the server but hidden to non-admins, the username remains visible
- Deleted account usernames remain visible too
- Anything remains visible on federated servers!
- When you delete your account, media does not get deleted on any server
Damn, Raddle seems worse than Reddit when it comes to toxic attitudes. I never looked much into it since it’s just another centralized platform like Reddit with different management, but boy oh boy are those comments just awful. Great community you folks got over there 😬
That’s a non issue. You just cannot expect to be able to delete anything you post on the internet. Even the great reddit with the awesome deletion feature cannot help you. You might be able to delete your comment there, but there is https://www.unddit.com/ https://archive.is/ https://web.archive.org/ and many others, where your comment will still be available.
Eh. Often times I want to delete it particularly on reddit or some other place. Just so that it doesn’t hang on my profile
Well, reddit doesn’t actually allow you to delete things anymore, so tough luck.
When did that happen?
Do you think about Reddit “undeleting” posts? The reason for this is that your posts in privated subs make them disappear from your profile. So when they go public again, they are there.
https://github.com/LemmyNet/lemmy/issues/2977
It’s not like they’re doing it on purpose, there’s a lot of things being worked on, and this is one of them.
First - we’re all using alpha/beta software (Lemmy is 0.17.4, Kbin is 0.10.). None of these services are “production quality” software yet, so let’s keep that in our minds - we’re all early adopters.
The points mentioned in the OP are a bad look. Naturally. User should have expectation of their data being deleted on request - especially since this request might be regulatory privacy request (GDPR related). It’s a clear failure from the software and should be improved and iterated upon.
The expectation shouldn’t be “oh well it’s on the Internet, live with it”. While Facebook might keep mining your data after deletion request, our software shouldn’t behave like that, we should strive to be better with this stuff.
And finally, ensuring privacy in federated system is hard. Mastodon suffers from same problems. We shouldn’t give up on the idea though.
The more important part for privacy: Mail address is optional, and IP addresses are not stored in the database. A correctly configured instance (at least for EU legislation) also will not log IP addresses in the web server - with that you can have profiles that can’t be tied to an actual human, and you don’t have location and movement data.
The data deletion is pretty much a nice to have - it’s on the level of the Exchange feature to recall Emails: Sure, you can ask nicely, but outside of your own server pretty much nobody will care. Lemmy is federated over multiple jurisdictions, so even with full deletion implemented there’ll almost certainly be instances which will ignore the deletion request - and it will be completely legal for them to do so. More important is education about what you publish, and a basic understanding of the technical and legal realities you’ll have to deal with if you later decide you want that information gone.
I already had that discussion with my 6 year old when she wanted to publish some videos - and she understood the problems quite well.
but outside of your own server pretty much nobody will care. Lemmy is federated over multiple jurisdictions, so even with full deletion implemented there’ll almost certainly be instances which will ignore the deletion request - and it will be completely legal for them to do so
Lemmy also seems to federate your matrix_user_id, that is clear personal data. It does not matter how the data gets to the federated server, this is still user data within the scope of the GDPR. It does not matter that that server does not have an agreement with the user, the instance that would ignore a GPDR related deletion request would be in direct violation of the GDPR. Maybe it can do that without consequences, though.
I completely understand that making Lemmy fully GPDR compliant will probably be impossible, however I don’t like the approach of “we will not succeed, so we don’t make any attempt”. Instances should actually delete data when that is requested, or instance hosts can get fined. For now, Lemmy has bigger issues to solve, but eventually they should do at least a best effort attempt to respect user data.
Lemmy also seems to federate your matrix_user_id, that is clear personal data.
Just like specifying an email address when signing up adding a matrix identifier is your personal choice. Lemmy is perfectly usable without either.
It does not matter how the data gets to the federated server, this is still user data within the scope of the GDPR. It does not matter that that server does not have an agreement with the user, the instance that would ignore a GPDR related deletion request would be in direct violation of the GDPR.
Not a lawyer, but I’d say the instance outside of EU, not targetting EU users would not be in violation - though EU instances transmitting data there might.
Instances should actually delete data when that is requested, or instance hosts can get fined.
With that part I agree - but it should be made clear when deleting something that this is a local deletion, which may or may not propagate to other instances, and will almost certainly not remove the data from the internet.
EU instances transmitting data there might.
This is an interesting thought, as data transfer between the US and EU has been an issue with other social networks. Federation between an EU instance and a US instance could be seen as the same thing - data for EU users is being transferred to non-EU servers.
It’s very possible that an EU instance that comes under regulatory scrutiny for whatever reason will have to start requiring Data Processing Agreements (DPAs) from every instance it federates with.
Ultimately that would likely result in a few paid, professionally run instances, which only federate with each other and maybe a few similar instances in other regions with the capacity to provide DPAs.
And next to that, a forest of independent, non-conforming instances flying under the regulatory radar; an entirely separate fediverse from the centralized one where instances disappearing is a regular occurence.
I had a look into the wording of the gdpr (more specifically the Data protection act as it is implemented in the UK) it seems to refer to organisations. I think most, if not all, instances are not hosted by organisations. (Just some group or individual hosting it on personal or rented hardware). Laws such as this are designed with centralization in mind, and kind of don’t make sense in the context of decentralisation.
Yea these laws are super difficult in a distributed network and I think that you would not be responsible if you made an attempt to say to the other instances that this data is now deleted. But at the moment, when you delete a message on an instance, it just flips a boolean and says the message is deleted. (mods can purge comments though, so then it is actually deleted).
And you would probably be fine as an individual, but I can see larger Lemmy instances get large enough that these kinds of rules will apply to them. I have seen a few cases where small associations got fined for violating the GDPR, that would be a waste of money that was donated for hosting the instance.
It is an early stage software and such things can be worked out, you’re right. But on the other hand, such basic elements should be based on a thorough concept before a single line is coded, and implementing something like a delete button with “Let’s just make it delete the most visible stuff for now, we can always improve that later when there is time” is recipe for disaster.
Agree, it’s a little late to change core architecture. But this is the philosophy the devs ran with, and it has the advantage of longevity when an instance goes offline, then it’s still visible to everyone else.
But is it solvable at all in principle? The only enforcement policy available is defederation, but that just means future posts won’t go to that instance, the older posts will still be there. Plus an instance could just lie when confirming delete requests and you’d never know unless the non-deleted posts leaked.
Not really, same as email. Once you send it out and it’s on somebody else’s server, you can request they delete it but that’s about it. They have a copy of your message and can do whatever they want with that.
This is not a principle that needs solving imo, it’s the nature of Internet. If you post it online then you should know that there’s a chance it’ll be there permanently.
Hmm, it’s an interesting problem. I’m afraid you are right and there’s really nothing left but defederation - on the other hand, then it’s the same as with stuff like the parsers that could show deleted reddit messages, or things like waybackmachine, which basically do the same, so the core logic of base lemmy source should be as privacy-respecting as possible.
I remember few years ago when I was reading about Signal that there is some way how you can verify that their server is running on the same code as the one published (and audited heavily), so you can be 100% sure that there were no modifications. Wouldn’t something like that be a solution? That would prevent servers from modifying the code that deletes data. I don’t know how it works, and I couldn’t find it when I tried looking for it again, but assuming such a thing is possible, each Lemmy instance could just have a verify widget on their VCS and you could be sure that this instance really does delete your data, since they didn’t modify the deletion code.
But this is just a theorycrafting, I wouldn’t really have enough experience to create something like that and I can imagine that it’s not an easy thing. But if anyone knows more details about the way Signal verification works, assuming I’m just didn’t misunderstood something (since it’s literally a memory I have of a single sentence from one random article when I was researching best private messages app), I would love to read more about the way it works!
But yeah, outside of that, I’m afraid that the following set of features is mutually exclusive:
- An user is able to delete their data, and it’s guaranteed that they are deleted from everywhere.
- If a lemmy instance dies, it’s data is not lost.
- There is not a single centralized authority for anything.
Another option would be to create some kind of reputation system, where self-hosted bots could check for servers that still provide posts and comments that should be deleted, and flag offenders. But that’s overengineering anyway, and as I’ve already said - there’s still no way how to stop scraper or anyone from simply copying your data when they see it.
The fediverse is the real internet, it’s not a company providing a service. On the real internet, once something gets out there, there can never be a guarantee that it’s taken back. Even on Reddit, once you post something, Reddit might fully delete it but someone out there may have copied it.
I had years worth of posts and comments that I deleted via the interface a while ago. Then as part of the reddit exodus I decided to run a removal tool that used the API, and it turns out 11 years worth of “deleted posts” were all still sitting out there, they were just hidden from me.
I did find it strange when I received a reply to a years old comment that my profile page said was deleted, but I just thought it was a caching issue. Turns out all of that content was still out there with my name attached, I was the only one who couldn’t see it.
What about editing the comments? Do they keep any log of the original message and the subsequent edits or something? Maybe this would be a workaround to effectively delete them.
There’s no way to know without proper testing, and I’m gone for good. I did use redact.dev, which overwrote all of my comments before deleting them, so fingers crossed that the account is nuked.
Multiple people reported Reddit undeleted stuff they had deleted from their accounts recently …
That’s why you rewrite your old comments to actively steer people away from the site. ASCII rocket ships, Lemmy links, etc
That’s what I was thinking, do someone know if Reddit keeps logs or something?
It is reasonable that people should be able to delete their posts / comments. However I don’t see how is this related to “privacy”. How can something you post on a public forum be private?
Probably in the sense that if it’s not me that posted it, then I don’t have any way of truly remove it (which I think is against the EU’s laws).
What I can think of right off the top of my head is revenge porn and doxxing. Furthermore there’s also the right to be forgotten.
That is generally true, with exceptions like leaking someone else’s private information.
But it implicates the adjacent “right to be forgotten” rather than narrowly defined “privacy”. This could be a real legal issue in the EU.
It is. GDPR in the EU dictates that every user which requests their information has to get it in 30 days, and every user who removes their information has to be able to get it removed (I think the time span for that is even shorter, so more pressure for the server admins)
The problem here is that your data is not only recopilated by your server and accessible to your server admins, the servers of the communities/magazines or people you interact with also recopilate any activity you have in relation to any community/magazine or user hosted in their server.
So, while the admin of your server has the obligation of deleting your data if you ask for it, the other servers admins don’t necessarily have that obligation.
Also, I’m reading the GDPR and the “right to be forgotten” that many are quoting seems to refer to personal information only.
It almost definitely isn’t and that’s clear looking into GDPR at all.
The right to be forgotten is not all powerful, and the lemmy instance your data originates on has an obligation to delete your data, that is true. However other servers may or may not have any of that obligation for a variety of reasons.
Now if you go to those other servers and make the request to have your information deleted, they may have an obligation to depending on whether that data is seen as currently usable.
The right to be forgotten is far weaker than you think it is, especially on public forums, under GDPR.
I’m also not sure how it’s enforceable in a distributed system.
Blockchains have the property of being append-only, so a blockchain is precisely what makes it impossible to delete transactions. That being said, in a distributed system, once the message leaves trusted servers, it is obviously also impossible to delete it.
Why are you bringing up blockchain?
Lovely, the parent comment mentioned blockchain but was since edited… Trust me I would not have brought it up otherwise.
Nothing about how lemmy or the fediverse platforms work has anything to do with blockchains. Don’t conflate “decentralization” to include blockchain. Torrents are also decentralized and have nothing to do with blockchains.
You can’t delete a mail you sent me, nor put your hand written letter to me in the bin. I can keep both and I can keep your name and addresses in my little black book. So there isn’t even that level of privacy in the real old fashioned communication.
And communication over the Internet was always the subject of storage. Your mail may be on the backup tape of a mail server. Your usenet posting is on archive.
So the assumption that the fediverse can forget….
There’s long dead people’s very private letters and diaries in museum’s and public archives. Really available on the internet now. So that’s not even a failing of the internet, if you write something people find interesting, they’ll find a way to preserve it.
I’m not sure how they think the fesiverse will be the one to solve that.
its the principle behind the ‘right to be forgotten’
if you posted something to a public forum and changed your mind, deciding it shouldnt be public after all, you should have that option
While this makes sense for corporations - it doesn’t really make sense on the internet. People will archive, take screenshots, etc. Anything that is public on the internet will likely stay on someone’s computer for years no matter how much we try to delete things.
It is kind of naive to think that the right to be forgotten will be respected by anyone other than the service provider.
In order for me to be offended, I’d first have to care about that opinion. I don’t.
Mastodon should just leave us alone!
If you think anything on the Internet can ever be forgotten… Your going to have a bad time. Passwords, one of the most protected data types, are compiled from beaches into huge databases so that hackers can use them to try to log into website. There are literally dozens of not hundreds of those password databases on the public Internet to be downloaded, not to mention private or dark web collections. If passwords are not safe, what makes you think publicly available social media would be any different?
Even if somehow the whole federation agreed to purge all post every year, things like the Internet archive and Google cache of pages would retain the data.
Mastodon’s privacy issues are just the same as the rest of the fediverse/threadiverse.
With federation there is more openness, transparency and accountability. Take care of your privacy, use alts.
The privacy stinks you say? Did you know that Likes and Dislikes are public too? That was the most shocking to me. Because it is very much not like Reddit or others.
It’s still a fantastic piece of software, with all its flaws, though.
It’s impossible to federate these without making them public in this way.
The up-votes are also mapped to favourites in Mastodon etc, so that was always public anyway.
You could argue that this should not be hidden in the Lemmy UI, but there are also good reasons to not highlight that much who voted on a post.
The up-votes are also mapped to favourites in Mastodon
Explains why this obvious issue is not brought up by Mastodon lol
I thought votes didn’t federate yet anyways… but, yes, it is possible, and i can come up off the top of my head with three or four potential implementations.
Good luck with finding an anonymous system that can not be easily abused.
FHE solves that through and through, as has been documented widely, but that’s overengineering when you could just use plain ZKP.
Zero-knowledge voting is here and has been for a while now.
Hey 👋 I know you. Hehe.
And yes, it should not be hidden. It is very much unexpected, because Reddit doesn’t do it, and it’s not visible to normal users.
@elbowmacaroni if instead of linking to the post you had boosted it, would all the replies here appear in beehaw?
i use kbin because I don’t like lemmy’s devs 🙃
bonus points that it actually deletes thingsDo you think kbin is just reaching into other servers and pulling the bytes off the disk? You can’t guarantee anything is deleted in a federated system, other servers can just ignore your delete request. So this makes no difference.
And it breaks easily. I still can see several posts on my private instance that have been deleted. The delete command never made it to my server for any number of reasons. As some posts never make it to my instance either. I guess in the long term some kind of delivery queue and guarantee would be nice.
deleted by creator
Surely this and “a federated system of deletion” is something that can be fixed by the open source devs after there done with the ungodly amount of work the influx of users creates?
There is a federated system of deletion, it just doesn’t have guarantees to work. The problem is that it’s a distributed system among untrusted actors. Usually the only way to have a distributed, secure, untrusted system is to use a blockchain. In the absence of a blockchain or equivalent, we keep the “distributed” and “untrusted” but concede the “secure”.
This doesn’t just go for deletions. It also means submissions or edits might not affect other servers. Malicious servers could also change all your posts to say “I’m an ugly moron” and those could get propagated to other servers.
This is being naive. Don’t trust a server you don’t run yourself.
Am I missing something or isnt it that no matter what Lemmy does all those same problems would still exist, just from the internet archival sites instead. Sure the privacy could be better to deter some of it, but none of those issues are fully solveable so long as thise archival sites run. I guess the media not deleting is likely the biggest thing you could effect that archives would be less likely to store in the first place.
So just to clarify this point:
Anything remains visible on federated servers!
If I delete a comment on beehaw.org, it doesn’t get deleted when accessed from another Lemmy instance that federates with Beehaw?
When you delete it your instance tells others that it was deleted, but it cannot force them to follow through.
It could defederate any non-compliant instances.
How do you know if they are non-complaint without manual verification?
It could, but actually policing it would be difficult. I don’t think there is any “yeah I’ll do that” response and even if there is an instance could say it will delete it and still do nothing.
You could defederate with instances running versions that don’t delete federated posts. Removing compatipility with older protocol implementations is not unheard of.
while this is certainly feasible, it is just a compliance checkmark of “doing your best”. It wouldn’t actually prevent someone attempting to persist that data. For example, I just need to maintain an insert-only copy of my deletion-compliant lemmy instance DB, and none of the deletions would be reflected on that.
I could then host that copy publicly on some unrelated lemmy instance, and without systematically de-federating from all other instances, you wouldn’t know which one was retaining the data.
Which is indeed a problem as it makes it impossible for any admin to host in the EU or for EU citizens, in theory. GDPR §7 makes it very clear that complete deletion of all personal data (and yes,a Lemmy comment is personal data) must be facilitated by the original data collection point.
it can’t make it impossible. If facebook sold data to amazon, so now amazon has a copy, and then facebook’s user asks their data to be deleted, facebook can’t just march into amazon’s servers and delete the data themselves. The best they can do is send a formal notice to amazon requesting it be deleted, which sounds like what lemmy does. At this point it’s up to the federated server if they comply with the law…
Actually that is exactly what the GDPR stipulates. In your example Facebook needs a data processing agreement that ensures that all rights of the data owners are secured and the GDPR is followed. Facebook is liable here, not Amazon - the user must explicitly NOT ask Amazon to delete as the user may not even know where the data went to/should not be bothered to write requests to a huge amount of different data processing locations.
But, @hikaru755@feddit.de added another interesting point: The Instance may or may not be seen as a single data processing entity that does not voluntarily hands over data to other instances. That could indeed be a reasonable cause as e.g. data scrubbers are not within the sphere of influence of e.g. a service publicly displaying data. But as the whole network is build on interconnected nodes I wouldn’t count on it if that reasoning would fly in front of a court. It may. Or it may not.
In this case though, would it not be that then if Facebook did have a processing agreement with Amazon with which they communicate information, and this agreement stipulates that (in order to comply with GDPR) data they sell to amazon must be deleted upon request, and Amazon does NOT do so, this would make amazon liable for breach of contract instead of facebook being liable for breach of GDPR?
If so, all fediverse instances would need is a copy-paste agreement when two instances federate that data from one must be deleted on the other upon request.
Partially right - Amazon would be liable, but not towards the data owner but Facebook. The data owner sues Facebook, Facebook then sues Amazon.
A copy&paste agreement is the first (and from my point of few most important step). Personally I would also integrate a automatic mechanism that deletes data (e.g. the delete request gets automatically federated) and defederates instances that do not follow them globally. Sadly this is still not enough - data handling in the US and other jurisdictions with similar bad privacy laws is also a problem, see the recent Facebook case and Schremp2. But tbh I have no idea how to solve that.
Lemmy can, by definition, not be GDPR obtain full GDPR compliance. We should make sure that best effort is ensured, especially with the right of deletion and the right to “know”(where data is stored), but also consider lobbying towards a reformed law for the federated use cases.
The originating instance definitely cannot be held responsible for failing to force a separate instance in another country to delete its cached copy of user data imo. I think what is more likely is that EU courts could force European Jimmy instances to only federate with GDPR-compliant instances.
This is incorrect if the data transfer was done voluntarily/planned. This also applies to EU data outside the EU - Meta has been fined a 1.2 billion euro for that.
And no, the definitive definition of the data transfer extent is a key point of the GDPR. Each and every data owner has the right to know where their data is stored exactly. So a “EU only” would not be enough - It is basically already mandatory as transfer to other countries is a major problem after Schrems 2.
From what I understand instance 1 has to delete data if requested, but instance 2 has no obligation to unless requested. Just like data remains archived in sites like internet archive or other private archives. Just like it works on reddit or any other site currently.
Yes and no. The Web achieve and other data scrubbers are seen differently here as the data collection is done involuntarily. E g. your website will get crawled by the Web Achieve if you want it or not and it is doing it by using the same method a intended user does.
This cannot be applied to a federated instance where content is voluntarily transfered via the Federation interface. This makes the first data collection operator liable for securing the rights of the data owner and to get a processing agreement with the data processing operator that it transfers data to.
I don’t think it’s quite that bad/simple. Viewing your main instance as the Controller and other instances as Processors in GDPR terms won’t work, because instances don’t have the necessary control over each other for that, as you say.
However, you could circumvent that issue by making the case that each instance actually acts as an independent Controller. By participating on a federated service, you are explicitly agreeing to the data you provide (your profile, posts, comments, etc.) being made public and shared with other compatible services. That should be enough as the basis for other instances to reasonably assume you want your data to be processed by them, which (I think, not a lawyer) is sufficient justification for processing the data independently, as long as it’s in line with how you generally expect the fediverse to work.
This would mean that each federated instance is its own, independent entity that processes your data, and to make use of your rights under GDPR, you need to do that with each of them individually. They effectively become their own “original data collection point”, in your words, even if that data collection was not explicitly triggered by you.
The only thing missing for that to be legal (again, in my layman’s view) is transparency about who’s processing your data and how, which is necessary under GDPR. Every instance that receives your data via federation would need to let you know about that, and make available to you information on how exactly your data is processed and how you can make use of your rights under GDPR with them. That, in turn, would probably be easiest if the protocol spoken between fediverse servers were extend with automated and standardized ways to propagate GDPR requests from your home instance to any other instance that is processing your data, so that you don’t have to actually deal with every single server yourself to get your rights enacted. Defederation in the meantime might be a problem, but there’s ways around that, too.
Will write a longer post later, mobile killed my post three times by now… Anyway: doesn’t work that way, GDPR stipulates that consent must be defined and limited. “Unlimited” card blanc consent is not possible. And the initial data processing facility is still liable for the agreement.