Thursday, October 18th, 2007

User Agent Sniffer

I've currently got several web projects at various stages of development. One thing that all of these projects have in common is that they all capture and manipulate user-agent stings.

What are user-agent strings?

User-agents strings are used by client applications such as web browsers, feed readers, bots and other software to identify themselves to the servers they are connecting to. The strings contain important information such as application type, version, language and operating system. A typical user-agent string might look like this:

Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.7 (like Gecko) (Kubuntu)

The above example is used to identify that the client is a Mozilla compatible Konqueror web browser running on Kubuntu Linux.

Collecting user-agent strings

To run tests on my projects I need some sample data to play around with [a list or database table of user-agent strings.] I figured that the best way to get this sample data would be to collect some user-agent strings from the wild. So, I wrote a quick PHP script to do just that.

I ran the script on this site [crunchbang.org] for ten days, starting on October 7th 2007. During that ten day period the script captured a total of 2272 unique user-agent strings. The captured list included both standard and non-standard strings.

A few facts about user-agent strings

After capturing the list I then edited the PHP script so that it would report a few facts. Here is a breakdown of what was returned:

1. At just 6 characters in length the shortest user-agent string captured was:

NG/2.0

2. The longest measured in at 205 characters, it was:

Mozilla/5.0 (compatible; MSIE 7.0; Windows; HTMLAB; .NET CLR 1.1.4322; 
MEGAUPLOAD 1.0; Seekmo; ZangoToolbar4.8.2; Alexa Toolbar; Hotbar 4.2.8.0) (compatible; 
Googlebot/2.1; +http://www.google.com/bot.html)

3. The average computed length of the user-agent strings was 91.2750880282 characters.

4. Most strings contained some non-alphanumeric characters, these were:

/ . ( ; - : ) + _ ! = , @ &  ' [ ] * ~ ? { }

5. The strangest user-agent string was:

Mmm.... Brains....

View the whole report

You can view the whole report here: http://crunchbang.org/misc/user-agent-report-2007-10-17.txt

Get the sample data

I thought it would be good to share the sample data. There's no private or confidential information in the data and I figure it may come in handy for other developers working on similar projects.

You can get the data as an ASCII file [one user-agent string per line] here: http://crunchbang.org/misc/sample-user-agents-ascii.txt

Or, as an SQL statement here: http://crunchbang.org/misc/sample-user-agents-mysql.txt

Get the PHP script

If you fancy having a go at collecting your own samples you can grab my PHP script here: http://crunchbang.org/misc/ua-sniffer.txt

The script requires the use of MySQL. Other than that it's a fairly straightforward affair. Just edit the four settings to define your database name, address, username and password.

I ran the script by calling it with a require_once statement. Note that the script also sets and reads a cookie so you'll need to call on it before outputting any data to the client.

require_once("ua-sniffer.php");

Once the script has collected some user-agent strings it is possible to query it and have it produce a basic report. You can do this by accessing the script through your browser like so:

http://www.example.com/ua-sniffer.php?report=true

Links to external references


Add Your Comment

Use the form below to add your comment. Markdown syntax is available. Note, comments are moderated by me for spam filtering. Alternatively, feel free to contact me privately.