I use scrapy. It has a steeper learning curve than other libraries, but it’s totally worth it.
Ok then make a spotify scraper
I scrape with bash lord help me.
you scrape WITH BASH?
My undergrad project was a scraper - there just wasn’t a name for it yet,
Scrapers have been a thing since the web exists.
One of the first search engines is even called WebCrawler
Sorry, I’m ignorant in this matter. Why exactly would you want to scrape websites aside from collecting data for ML? What kind of irreplaceable API are you using? Someone please educate me here.
API might cost a lot of money for the amount of requests you want to send. API may not include some fields in the data you want. API is rate limited, scraping might not be. API requires agreement to usage terms, scraping does not (though the recent LinkedIn scraping case might weaken that argument.)
So uh…as someone who’s currently trying to scrape the web for email addresses to add to my potential client list … where do I start researching this?
Start looking into selenium, probably in Python. It’s one of the easier to understand forms of scraping. It’s mainly used to web testing, though you can definitely use it for less… nice purposes.
Let me introduce you to WooB (formerly WEBooB).
Why on earth would they have changed that. WEBooB is a way better name.
But it’s got boob in it.
someone’s never used a good api. like mastodon
That’s why I use geddit
It’s all fun and games until you have to support all this shit and it breaks weekly!
That being said, I do miss the simplicity of maintaining selenium projects for work
Let’s see what WEI (if implemented ) will do with the scrapers. The future doesn’t look promising.
What’s that?
A google/chrome proposal for browser verification, i.e. killing addons and custom browsers.
Nice name, beat me to it
I really hope Libreddit switches to scraping, the “Error: Too many request” thing is so annoying, I have to click the redirect button in Libredirect like 20 times until I can actually see a post.
Still a better experience than Reddits official site tho.