Hi,

I know this is quite impossible to diagnose from afar, but I came across the posting from lemmy.world admins talking about the attacks they are facing where the database will get overwhelmed and the server doesn’t respond anymore. And something similar seemed to have happened to my own servers.

Now, I’m running my own self-hosted Lemmy and Mastodon instances (on 2 seperate VPS) and had them become completely unresponsive yesterday. Mastodon and Lemmy both showed the “there is an internal/database error” message and my other services (Nextcloud and Synapse) didn’t load or respond.

Login into my VPS console showed me that both servers ran at 100% CPU load since a couple of hours. I can’t currently SSH into these servers, as I’m away for a couple of days and forgot to bring my private SSH key on my Laptop. So, for now I just switched the servers off.

Anyway, the main question is: what should I look at in troubleshooting when I’m back home? I’m a beginner in selfhosting and I run these instances just for myself and don’t mind if I’d have to roll them back a couple days (I have backups). But I would like to learn from this and get better at running my own services.

For reference: I run everything in docker containers behind Nginx Proxy Manager as my reverse proxy. I have only ports 80, 443 and 22 open to the outside. I have fail2ban set up. The Mastodon and Lemmy instances are not open for registration and just have 2 users each (admin + my account).

  • GameGod@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I don’t see anyone else actually telling you how to figure out if you’re being DoSed, so I’ll start:

    Check your logs. Look at what process is eating your CPU in htop and then look at the logs for that process. If it’s a web application, that means the error and access logs for it. If you see a flood of requests to a single URL, or some other suspicious pattern in the log, then you can try blocking the IPs associated with them temporarily and see if it alleviates the load. Repeat until the load goes down.

    If your application uses a database, check your database logs too. IIRC postgres logs queries that take longer than 5 seconds by default, which can make it easy to spot a slow query especially during a time of high load.

    I don’t think DNS amplification attacks over UDP are likely to be a problem as I think most cloud providers filter traffic with forged src addresses (correct me if I’m wrong). You can also try blocking all inbound UDP traffic if you suspect a UDP flood but this will likely break DNS lookups for you temporarily. (your machine should not have any open UDP ports in any case though if you’re just running Lemmy).

    If you want to go next level, you can use “perf” to generate a system-wide profile and flamegraph which will show you where you’re burning CPU cycles. This can be extremely useful for troubleshooting performance or optimizing applications. (you’ll find that even ipfilters takes CPU power, which is why most DDoS protection happens on dedicated hardware upstream)

  • Anafroj@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    1 year ago

    The best you can do to know if it was an attack is to inspect the logs when you have time. There are a lot of things that can cause a process going wild without being an attack. Sometimes, even filling the RAM can cause the CPU to appear overloaded (and will freeze the system anyway). One simple way to figure out if it’s an attack : reboot. If it’s a bug, everything will get back to normal. If it’s a DDoS, the problem will reappear up to a few minutes after reboot. If it’s a simple DoS (someone exploiting a bug of a software to overload it), it will reappear or not given if the exploit was automated and recurring, or was just a one-shot.

    The fact that both your machines fell at the same time would tend to make think it’s an attack. On the other hand, it may just be a surge of activity on the network with VPSes with way not enough resources to handle it. Or it may even be a noisy neighbor problem (the other people sharing with you the real hardware on which your VPSes run who will orverload it).

  • RonnyZittledong@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 year ago

    Lemmy has the disadvantage of being opensource. In the long run this can be good for security but in the short term this gives your enemies a blueprint of your software and they know exactly how to attack you.

    The only time I have every been compromised was when I was running 3rd party code open to the internet. I have been running my own code open to the internet for 20+ years and have been safe with it. I don’t think I am some kind of god coder or anything but I am mindful of best practices and most importantly I am a small fish in a big pond.

    Long story short is that running popular 3rd party code open to the internet exposes you to unique threats that you should be prepared for. Subnet/vlan it, vpn it, lock it down,

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 year ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    IP Internet Protocol
    SSH Secure Shell for remote terminal access
    UDP User Datagram Protocol, for real-time communications
    VPN Virtual Private Network
    VPS Virtual Private Server (opposed to shared hosting)

    5 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.

    [Thread #29 for this sub, first seen 12th Aug 2023, 08:45] [FAQ] [Full list] [Contact] [Source code]

  • ErwinLottemann@feddit.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Don’t copy your private key to your laptop, generate a new one and add its public key to the same user. Also use a different key for every remote host. (or don’t, but that’s kind of like using the same password for all your accounts)

  • lungdart@lemmy.ca
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    Sounds like you were out of resources. That is the goal of a DoS attack, but you’d need connection logs to detect if that was the case.

    DDoS attacks are very tricky to defend. (Source: I work in DDoS defence). There’s two sections to defense, detection and mitigation.

    Detection is very easy, just look at packets. A very common DDoS attack uses UDP services to amplify your request to a bigger response, but then spoof your src ip to the target. So large amounts of traffic is likely an attack, out of band udp traffic is likely an attack. And large amount of inband traffic could be an attack.

    Mitigation is trickier. You need something that can handle a massive amount of packet inspection and black holing. That’s done serious hardware. A script kiddie can buy a 20Gbe/1mpps attack with their moms credit card very easily.

    Your defence options are a little limited. If your cloud provider has WAF, use it. You may be able to get rules that block common botnets. Cloudflare is another decent option, they’ll man in the middle your services, and run detection and mitigation on all traffic. They also have a decent WAF.

    Best of luck!

    • Excel@lemmy.megumin.org
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      I’ve heard that enabling CloudFlare DDoS protection on Lemmy breaks federation due to the amount of ActivityPub traffic.

  • Dave@lemmy.nz
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 year ago

    I run a lemmy server. If you ban a bot and remove content (even if the bot is from another instance), if you’re removing more than a few comments the think will lock up, the server will error, and you’ll pretty much have to restart it. This could also cause other services to be unresponsive as the CPU will be sitting at 100% for the thread.

    If you think it’s genuinely a DDOS (which is unlikely if you’re a small fry, but possible), then try putting cloudflare in front of your service (it’s free) which will mitigate many types of DOS attacks.