• elgordino@fedia.io
    link
    fedilink
    arrow-up
    57
    ·
    6 months ago

    TikTok spider has been a real offender for me. For one site I host it burred through 3TB of data over 2 months requesting the same 500 images over and over. It was ignoring the robots.txt too, I ended up having to block their user agent.

    • dan@upvote.au
      link
      fedilink
      arrow-up
      23
      ·
      6 months ago

      Are you sure the caching headers your server is sending for those images are correct? If your server is telling the client to not cache the images, it’ll hit the URL again every time.

      If the image at a particular URL will never change (for example, if your build system inserts a hash into the file name), you can use a far-future expires header to tell clients to cache it indefinitely (e.g. expires max in Nginx).

      • elgordino@fedia.io
        link
        fedilink
        arrow-up
        6
        ·
        6 months ago

        Thanks for the suggestion, turns out there are no cache headers on these images. They indeed never change, I’ll try that update. Thanks again