I’m excited to announce the first alpha preview of this project that I’ve been working on for the past 4 months. I’m initially posting about this in a few small communities, and hoping to get some input from early adopters and beta testers.
What is a DHT crawler?
The DHT crawler is Bitmagnet’s killer feature that (currently) makes it unique. Well, almost unique, read on…
So what is it? You might be aware that you can enable DHT in your BitTorrent client, and that this allows you find peers who are announcing a torrent’s hash to a Distributed Hash Table (DHT), rather than to a centralized tracker. DHT’s lesser known feature is that it allows you to crawl the info hashes it knows about. This is how Bitmagnet’s DHT crawler works works - it crawls the DHT network, requesting metadata about each info hash it discovers. It then further enriches this metadata by attempting to classify it and associate it with known pieces of content, such as movies and TV shows. It then allows you to search everything it has indexed.
This means that Bitmagnet is not reliant on any external trackers or torrent indexers. It’s a self-contained, self-hosted torrent indexer, connected via the DHT to a global network of peers and constantly discovering new content.
The DHT crawler is not quite unique to Bitmagnet; another open-source project, magnetico was first (as far as I know) to implement a usable DHT crawler, and was a crucial reference point for implementing this feature. However that project is no longer maintained, and does not provide the other features such as content classification, and integration with other software in the ecosystem, that greatly improve usability.
Currently implemented features of Bitmagnet:
- A DHT crawler
- A generic BitTorrent indexer: Bitmagnet can index torrents from any source, not only the DHT network - currently this is only possible via the /import endpoint; more user-friendly methods are in the pipeline
- A content classifier that can currently identify movie and television content, along with key related attributes such as language, resolution, source (BluRay, webrip etc.) and enriches this with data from The Movie Database
- An import facility for ingesting torrents from any source, for example the RARBG backup
- A torrent search engine
- A GraphQL API: currently this provides a single search query; there is also an embedded GraphQL playground at /graphql
- A web user interface implemented in Angular: currently this is a simple single-page application providing a user interface for search queries via the GraphQL API
- A Torznab-compatible endpoint for integration with the Serverr stack
Interested?
If this project interests you then I’d really appreciate your input:
- How did you get along with following the documentation and installation instructions? Were there any pain points?
- There’s a roadmap of high-priority features on the website - what do you see as the highest priority for near-term development?
- If you’re a developer, are you interested in contributing to the project?
Thanks for your attention. If you’re interested in this project and would like to help it gain momentum then please give it a star on GitHub, and expect further updates soon!
Sounds interesting 😀 I’ll keep an eye on it, though I won’t be a primary user, I switched to usenet about a decade ago and only use torrents as a last resort.
Very cool!
Great project !
Naming conventions are missing some important information like bitrate, color depth, and most importantly language and subtitles.
Do you plan to scrape additional infos from known torrent sites (searching for torrent hashes for well named torrents) ?
Scraping torrent sites will be avoided is it’ll be prohibitively slow and break the self-sufficiency concept - we’ll infer as much as possible from the torrent meta info alone. You could have a guess at the bitrate from the file sizes. Sonarr/Radarr will already do this for you with quality profiles I think.
Very nice. This gets rid of any questionably legal gray area of using sites like Nyaa, etc for Torrent links. Also provides a bit of robustness against censorship when those sites get taken down. Looks like I’m gonna have to set up proxmox on a machine this weekend, as Windows sucks dick for docker containers and that’s what I’ve got most of my *arr stuff hosted on currently.
It’ll be a good thing anyways, as most of those instances aren’t running through my VPN yet and I should just centralize them on proxmox and run all the torrents, etc through containerized instances for security.
This gets rid of any questionably legal gray area of using sites like Nyaa, etc for Torrent links
Except that now you’re asking the swarm for metadata behind a boatload of info_hashes? Unlikely anyone would care (though you’d be surprised how many DMCAs I get when just having a simple open tracker running, not even an indexet), but I don’t see it as being any less grey than using any existing sites.
In some jurisdictions hosting links to pirated content is considered illegal. In others it is not. You are now not hosting publicly available links. Many of those rulings were based on the publicly available nature and that you were providing OTHER people with the information. You are now simply obtaining the whole of the DHT yourself. You can’t be assumed to be doing anything illegal with it, because it’s everything. You could be doing research on swarms of computers, you could be looking for a linux torrent…the act of collecting ALL of the data yourself, doesn’t violate the laws in the way they were ruled on.
Additionally some sites have been MITMd so that they saw when people were browsing…say…“Barbie Movie”…and then they watched the DHT for a client connecting soon after, and could connect them to users with VPNs because people are browsing these sites not behind a VPN, but torrenting behind a VPN when they torrent.
Browsing something like Nyaa isn’t technically illegal - but people have been targeted over it. When you don’t have to browse Nyaa using a web browser, you bypass that whole shebang.
You are now not hosting publicly available links
That’s also the case with open trackers (without indexers), yet I’ve gotten shut down way too many times. But that made me wonder, does this project share metadata if someone else in the DHT swarm queries for an info_hash you have, or does it simply “leech”? Pretty cool project regardless.
I use magnetico and have no need for the bells and whistles, but that seems really interesting!
Being relatively new to the self hosted experience and still working through how everything in an arr setup interacts, along with what issues can occur and how to troubleshoot/fix them, this sounds incredibly useful. I’ll definitely be looking into integrating this into my own setup and providing feedback when I can!
Does it infiniely crawl, storing all metadata about every torrent it finds forever?
Yep
deleted by creator
This sounds awesome, I’ll give it a try! Would this work in i2p?
I’ve never used I2P but I don’t see why not!
That was easy to set up. I’ll let it churn for a few days and we’ll see how helpful it is. Thanks for the app.
Dude this is amazing! Exactly the sort of thing I’ve been hoping would pop up to further “decentralize” the torrent search experience.
So I’m trying to run it on my machine through the docker-compose option, and I’m seeing something weird. It shows as successfully running, but when I go to the port it should be running on, I get “unable to connect” on my browser.
When I check my containers running, it shows the 3 bitmagnet containers, but the port doesn’t show.
Hi, the default port is 3333, which should be exposed if you’re using the example configuration here: https://bitmagnet.io/setup/installation.html - I’m not sure what the app is in your screenshot but the provided config definitely exposes that port and is tested on Docker for Mac.
Just pulled the latest and tried again, and it works now! Thanks
Maybe I’m misunderstanding but wouldn’t it just be easier to use a good private tracker, assuming you can get an invite?
Yes, of course.
It’s only once you install something like this that you realize just how many torrents are porno.
I’ve always been curious about ‘Anal Police Stories 2’ but I’ve never found the time.
Here is an alternative Piped link(s):
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
Is it safe to run this without a VPN if I am just using it to index?
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters NAS Network-Attached Storage Plex Brand of media server package SSD Solid State Drive mass storage VPN Virtual Private Network
4 acronyms in this thread; the most compressed thread commented on today has 9 acronyms.
[Thread #191 for this sub, first seen 5th Oct 2023, 14:25] [FAQ] [Full list] [Contact] [Source code]