Look, we can debate the proper and private way to do Captchas all day, but if we remove the existing implementation we will be plunged into a world of hurt.

I run tucson.social - a tiny instance with barely any users and I find myself really ticked off at other Admin’s abdication of duty when it comes to engaging with the developers.

For all the Fediverse discussion on this, where are the github issue comments? Where is our attempt to convince the devs in this.

No, seriously WHERE ARE THEY?

Oh, you think that just because an “Issue” exists to bring back Captchas is the best you can do?

NO it is not the best we can do, we need to be applying some pressure to the developers here and that requires EVERYONE to do their part.

The Devs can’t make Lemmy an awesome place for us if us admins refuse to meaningfully engage with the project and provide feedback on crucial things like this.

So are you an admin? If so, we need more comments here: https://github.com/LemmyNet/lemmy/issues/3200

We need to make it VERY clear that Captcha is required before v0.18’s release. Not after when we’ll all be scrambling…

EDIT: To be clear I’m talking to all instance admins, not just Beehaw’s.

UPDATE: Our voices were heard! https://github.com/LemmyNet/lemmy/issues/3200#issuecomment-1600505757

The important part was that this was a decision to re-implement the old (if imperfect) solution in time for the upcoming release. mCaptcha and better techs are indeed the better solution, but at least we won’t make ourselves more vulnerable at this critical juncture.

  • HTTP_404_NotFound@lemmyonline.com
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    Hunh.

    I just had a surge of user registrations on my instance.

    All passed the captcha. All passed the email validation.

    All, had a valid-sounding response.

    I am curious to know if they are actual users, or… if I just became the host of a spam instance. :-/

    Doesn’t appear to be an easy way to determine.

    • marauderprophecy1998@beehaw.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I think what you can do is take a small subset of users that have registered in your instance and observe their behavior. If you’ve noticed a lot of them are acting in bad faith and in bad behavior then its likely that a lot of the user registrations in your instance are bots. How active are the users in your instance in terms of posting and in commenting?

        • marauderprophecy1998@beehaw.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          I mean for now it seems okay, I took the liberty to check out your instance to check it out and it seems to be okay imo too but still keep an eye out of bad actors

          • HTTP_404_NotFound@lemmyonline.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            My current assumption- based on the data I dug up, it appears to be legit traffic originating from reddit.

            I just don’t think the users realize their account was approved… perhaps. /shrugs.

            Unexpected wave of traffic I suppose.

    • th3raid0r@tucson.socialOP
      link
      fedilink
      English
      arrow-up
      9
      ·
      1 year ago

      Hmmm, I’d check the following:

      1. Do the emails follow a pattern? (randouser####@commondomain.com)
      2. Did the emails actually validate, or do you just not see bouncebacks? There is a DB field for this that admins can query (i’ll dig it up after I make this high level post)
      3. Did the surge come from the same IP? Multiple? Did it use something that doesn’t look like a browser?
      4. Did the surge traffic hit /signup or did it hit /api/v3/register exclusively?

      With those answers I should be able to tell if it’s the same or similar attacker getting more sophisticated.

      Some patterns I noticed in the attacks I’ve received:

      1. it’s exactly 9 attempts every 30 minutes from the user agent “python/requests”
      2. The users that did not get an email bounceback were still not authenticated hours later (maybe the attacker lucked out with a real email that didn’t bounce back?). There was no effort to verify from what I could determine.

      Some vulnerabilities I know that can be exploited and would expect to see next:

      1. ChatGPT is human enough sounding for the registration forms. I’ve got no idea why folks think this is the end-all solution when it could be faked just as easily.
      2. Duplicate Email conflicts can be bypassed by using a “+category” in your email. ie (someuser+lemmy@somedomain.com) This would allow someone to associate potentially hundreds of spam accounts with a single email.
      • idealium@beehaw.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        ChatGPT is human enough sounding for the registration forms. I’ve got no idea why folks think this is the end-all solution when it could be faked just as easily.

        A simple deterrent for this could be to “hide” some information in the rules and request that information in the registration form. Not only are you ensuring that your users have at least skimmed the rules, you’re also raising the bar of difficulty for spammers using LLMs to generate human-sounding applications for your instance. Granted it’s only a minor deterrent, this does nothing if the adversary is highly motivated, but then again the same can be said of a lot of anti-spammer solutions. :)

      • TehPers@beehaw.org
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        ChatGPT is human enough sounding for the registration forms. I’ve got no idea why folks think this is the end-all solution when it could be faked just as easily.

        I think it would be interesting if we could find a prompt that doesn’t work well with LLMs. Originally they struggled with math for example, but I wonder if it’d be possible to make a math problem that’s simple enough for most humans to solve but which trips up LLMs into outputting garbage.

        Duplicate Email conflicts can be bypassed by using a “+category” in your email.

        I personally use this to track who send my email address to where, since people usually don’t strip this from the address. It’s definitely abusable, but also has legitimate uses.

        • th3raid0r@tucson.socialOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 year ago

          Not so sure on the LLM front, GPT4+Wolfram+Bing plugins seems to be a doozy of a combo. If anything there should be perhaps a couple interactable elements on the screen that need to be interacted with in a dynamic order that’s newly generated for each signup. Like perhaps “Select the bubble closest to the bottom of the page before clicking submit” on one signup and “Check the box that’s the furthest to the right before clicking submit”?

          Just spitballin it there.

          As for the category on email address - certainly not suggesting they remove supporting it, buuuuutttt if we’re all about making sure 1 user = 1 email address, then perhaps we should make the duplication check a bit more robust to account for these types of emails. After all someuser+lemmy@somedomain.com is the same as someuser@somedomain.com but the validation doesn’t see that. Maybe it should?

          • TehPers@beehaw.org
            link
            fedilink
            English
            arrow-up
            5
            ·
            1 year ago

            I like your idea of interaction-based authentication. Extra care would need to go into making sure it’s accessible, but otherwise I think that would be a stronger challenge for LLMs to solve. (Keep in mind LLMs can still receive the page’s HTML as context, but that seems like it could present as a stronger challenge even still.)

            perhaps we should make the duplication check a bit more robust to account for these types of emails

            This makes sense to me. I could be wrong, but the assumption of 1 email = 1 user doesn’t seem unreasonable, especially since there’s no cost to making a new email address.

        • Katzastrophe@feddit.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          When it comes to LLMs we could use questions which they refuse to answer.

          Obviously ‘How to build a pipe bomb’ is out of the question, but something like ‘What’s your favorite weapon of mass destruction?’, or ‘If you’d need to hide a body, how would you do it?’ might be viable

      • HTTP_404_NotFound@lemmyonline.com
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago
        1. Different providers, no pattern. Some gmail. some other.
        2. Not sure
        3. Also- not sure.
        4. Not sure of that either!

        But, here is the interesting part- Other than a few people I have personally invited, I don’t think anyone else has ever requested to join.

        Then, out of the blue, boom, a ton of requests. And- then, nothing followed after.

        The responses, sounded human enough. spez bad, reddit sinking, etc.

        But, the traffic itself, didn’t follow… what I would expect from social media spreading. /shrugs.

        • Wahots@pawb.social
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          Curious if you got a mention somewhere on reddit. It used to happen to our novelty sub whenever a thread blew up and suddenly thousands of eyes were on a single comment with the subreddit link.

        • th3raid0r@tucson.socialOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 year ago

          Huh, that is interesting, yeah, that pattern is very anomalous. If you have DB access you can try to run this query to return all un-verified users and see if you can identify if the email activations are being completed:

          SELECT p.id, p.name, l.email FROM person AS p LEFT JOIN local_user AS l ON p.id=l.person_id WHERE p.local=true AND p.banned=false AND l.email_verified='f'

          • HTTP_404_NotFound@lemmyonline.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            Only 7 accounts still pending, 2 of which, are unrelated to the above flood.

            The email address are left out for privacy- however, they are EXTREMELY normal sounding email addresses.

            Based on the provided emails, usernames, and request messages- i’d say, it certainly looks like legit users.

            Just- very odd of the timing.

            • th3raid0r@tucson.socialOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              7 huh? That’s actually noteable. So far I haven’t seen a real human user take longer than a couple of hours to validate. Human registrations on my instance seem to have a 30% attrition. That is, of 10 real human users, I can reasonably expect that 3 won’t complete the flow. It seems like your case might be nearing 40-50% which isn’t unheard of but couple this with the quickness that these accounts were created - I think you are looking at bots.

              The kicker is, though, if one of them IS a real user, it’s going to be almost impossible to find out.

              This is indeed getting more sophisticated.

              I wish I could see this time period on a cloudflare security dashboard, I’m sure there could be a few more indicators there.

              • HTTP_404_NotFound@lemmyonline.com
                link
                fedilink
                English
                arrow-up
                0
                ·
                1 year ago

                cloudflare security dashboard

                Didn’t really see anything that stood out there either. A handful of users accessing via tor, but, thats about it.

                Ended up turning the security policy from low, back up a bit though, forgot I turned it down while troubleshooting some federation issues.