How far I'll go to make an RSS feed of your website

tech.chrishardie.com

8 points by ChrisHardie 8 months ago

Every small town deserves someone like this. And also someone moderating local Facebook groups to tamp down the scams. For every elderly victim of scams, there’s a honest hard-working young person who discovers that the corresponding economic opportunity moved elsewhere.

ashryan 8 months ago

Love all of these RSS resources. Thanks for sharing!

Last week, I spent a couple of hours at a local hack event putting an RSS aggregator[1] together for our community. Just something fun to do.

One thing I realized when I deployed is that Substack gives a 403 if you try to read their RSS feeds from a GitHub Action. The only obvious workaround to me is to pull the content on local periodically, commit it, and then deploy. But I'd much rather have this site updating itself via GitHub Action and cron.

Have you run into this situation before?

[1]: https://github.com/astoria-tech/subcurrent-astro/

ChrisHardie 8 months ago

The usual case I run in to is that a site will block requests with User-Agent header strings that don't at least try to look like a regular browser, or that appear on some list of known bots/automation tools. (If they are using Cloudflare, this is a very easy state for a site to get in to.) I'm not sure if GH actions lets you customize the user agent in the spot you're hitting the issue, but that's where I'd start.

tene80i 8 months ago

Isn't it time-consuming to build a scraper for every website you want to get updates from? What if the HTML is a mess and full of dynamic front-end-framework classes etc?

ChrisHardie 8 months ago

Once I created the structure to support lots of different kinds of scraping-to-feed conversions, it’s usually fast to add a new target site in to the mix. There are definitely exceptions, and definitely the occasional maintenance when someone updates their CSS.

Hackbraten 8 months ago

For apps, static analysis and reverse engineering can be a good alternative (or complement) to the proxy-in-the-middle technique.

johannesal9009 8 months ago

Grato!