Setting your user_agent in PHP - telling back servers who you are
Archive - Originally posted on "The Horse's Mouth" - 2010-12-18 21:40:08 - Graham EllisIf your web page pulls information in from other sites, you'll be turning your web server into a client (browser) to those other sites - and just as your web server can identify the type of browser it's answering via the $_SERVER[HTTP_USER_AGENT] variable, so can the remote server.
The default user agent setting that PHP sends out is an empty string - it does not identify itself - and that will sometimes cause the server it's contacting to provide a limited response, or even (as I have see in one case) send back code 403 "Forbidden"s. So it's not a good idea to leave this default if you're contacting a wide range of web sites - for example if you're writing a crawler.
PHP's user_agent parameter can be set outside your application via lines in the httpd.conf file (for the whole of your site), or within a .htaccess file (for all applications in and below a specific web site directory). You can also set it on a "per application" basis within the application - for example:
ini_set("user_agent","Well House Robot - see http://www.wellho.net");
If you shouldn't use the default, then what should you use? I would suggest that you strongly and proudly announce your robot / site, and give anyone who finds you in their log files a link through which they learn who you are, and why you are on their site. And you should - of course - respect their robots.txt file too [see here].
It's also possible for you to set the user_agent to the same string that would be used by a browser - in other words, it's possible for you to pretend to be Chrome, Firefox or Explorer. If you are writing a script to test a site automatically and with permission of the site owner, then go ahead and do this - but if you pretend to be a browser you're not when visiting lots of other people's site, some of them may get rather upset with you - in effect, you'll be considered to be forging a browser's signature.
There's an example of each type of user agent string in use, and how it appeared in the log files on our server when I ran it, [here].