Perl - retrieving and caching web resources
Archive - Originally posted on "The Horse's Mouth" - 2011-10-18 23:45:40 - Graham EllisIs there some useful data on the web which you would like to use in your Perl script? If there is, go ahead and download it, grab the page, store it as a file and use it ... subject of course to copyright and re-use rules. But, alas, how can you then keep your Perl script up to date with current data should the source page change. You certainly won't want to have to grab a fresh copy of the source manually every couple of hours to update your web site ...
In Perl, there are various modules which are available to you to grab a resource from within a program, and the easiest to use is perhaps LWP::Simple. There's an example of this module in use to grab a page, extract data from it, and echo that extracted data - [here].
If you're running a popular / busy web site, or writing a script which makes repeated use of the same resource, you will not want to grab a copy every time. Quite apart from being antisocial as far as the owner of the original site is concerned, such constant reloading of data that hardly changes will slow your process down and burn up bandwidth for little or no gain. So in this case, you'll want to download the resource and store it to a local file. There's an example of grabbing a page through LWP::Simple and storing it locally - [here].
Combining the two techniques, you can check the timestamp on your local file and reload it from the original source if the local copy is stale - there's an example of that [here] using LWP::Simple, and this is a concept / principle that we use in many places on our website - not only in Perl but also in PHP.
We have a Perl Programming course coming up in about 6 weeks time - see [here], and a PHP course the previous week - see [here]. If you find this blog article in the archive, please click on the links anyway - for our site automatically refreshes the course description pages from the course dates resource, so you'll always get current course information - content, dates, prices and availability.