Main Content

Using Perl to read an RSS feed off a web site and extract data - via LWP and XML modules

Archive - Originally posted on "The Horse's Mouth" - 2012-09-30 09:53:38 - Graham Ellis

Perl is excellent glue-ware ... something I was reminded of towards the end of Friday's Perl Programming Course. A delegate asked me how easy it is to grab an XML resource from the web (such as an RSS feed) and extract data from it. Well - you could write the code yourself, or you could use standard Perl modules which - these days - are shipped with the distibution anyway.

I used the library for web processes to read in the XML feed data:

  use LWP::UserAgent;
  
  $agent = LWP::UserAgent->new; # Create me a browser
  $agent->agent("Well House Consultants"); # Set the browser name
  $req = HTTP::Request->new(GET => ($urlsource)); # Set up the request we'll run
  $res = $agent->request($req); # Run the request
  $page = $res -> content();


Which I then saved to a local file for caching purposes (I had decided not to check the feed more than once every quarter of an hour).

I then intepretter the data stream using the XML::Parser module:

  use XML::Parser;
  
  $parser = new XML::Parser(Handlers => {Start => \&entering, Char => \&handle_char});
  $parser -> parse($page);


And I provided subs called entering and handle_char to handle the tags of interest. Here's some sample output, showing the extracted data:

  text: Re: A good problem to have?
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11326.msg118375#msg118375
  text: Re: Unique opportunity to travel the Portbury Branch Line - Saturday 29 Sept 2012
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11167.msg118374#msg118374
  text: Re: A good problem to have?
  url: http://www.firstgreatwestern.info/coffeeshop/index.php?topic=11326.msg118373#msg118373
  text: Re: FGW (bad) experience 27/9/2012 - Maidenhead to Paddington


Full code is [here]. Other samples of LWP are [here] and [here]. Further XML::Parser examples are [here] (SAX) and [here] (DOM), and if you look at any of those examples you find links to further code and articles too.

We'll talk about XML and Web Processes briefly on any of our Per Courses is they're relevant to any of the delegates attending. If you need to get deeper into these modules, into Object Oriented Perl, and into handling large data flows, take a look at our Perl for Larger Projects course.