Feed43 Rocks
I’ve just given Feed43 a go. It’s very nifty.
Basically, it’s a pattern-based HTML-to-RSS scraper — similar to my own
Sitescooper in that respect
— but built entirely
as a web app.
Until now, I’ve been hacking up scrapers one by one, using either
Sitescooper or WWW::Mechanize, run from cron, and
putting the output up on taint.org; for example, http://taint.org/scraped/ has
the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.
Today, I came across a case where I wanted a new RSS feed, and since I’d been
hearing of Feed43, thought I’d give it a try, to save running yet another cron
on our server. It was reasonably simple, although still required a fair bit
of knowledge of the concepts of scraping via pattern matching against HTML; but
the UI was fantastic, with everything previewed using a clean AJAX UI, and
within 3 minutes I had a new feed.
For the curious — the feed was for TCAL’s Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL - Ireland. (go ahead
and sign up if you like
New web pattern, by the way — there’s a trend towards using “secret URLs”
instead of username/password authentication for the kind of “trivial” auth
task, like editing feed-scraper details. Good idea.
This post was written by Justin, source: Feed43 Rocks
