Wednesday, February 20th, 2008

TwitterZoid PHP Script

It's a rather silly name, I know; however, TwitterZoid is the chosen name of my PHP script for parsing Twitter RSS feeds. I've been using Twitter quite steadily for a couple of weeks now and I thought it might be nice to include my latest tweets on my blog, so I wrote TwitterZoid to do just that.

TwitterZoid differs to other PHP based Twitter RSS parsers, at least the ones I tried before I wrote it, in that it will automatically link both lexicons and URLs found within individual tweets. It also tries to mimic Twitter's timestamping, although this could be improved.

TwitterZoid usage

I wanted to make TwitterZoid as simple to use as possible. Therefore I decided to write the script to be used as a simple include file which can be used on any PHP page. Basically, to use TwitterZoid all you need to do is set a couple of variables, include twitterzoid.php and then echo the main $TwitterZoid variable where you would like your list of tweets to appear.

Example set-up:

$twitter_username = "corenominal";
$twitter_feed = "http://twitter.com/statuses/user_timeline/99713.rss";
require_once('twitterzoid.php');

Call on the main TwitterZoid variable to produce the list of tweets:

echo $TwitterZoid;

Don't worry if this reads like gibberish, I've included an example page within the download.

TwitterZoid examples

There are currently two demonstrations of TwitterZoid in action, see:

My official "What am I doing?" Twitter page:
http://crunchbang.org/what-am-i-doing/

A more stylised version of "What am I doing?", included within the download:
http://crunchbang.org/projects/twitterzoid/demo/

Download TwitterZoid

Location: http://crunchbang.org/projects/twitterzoid/twitterzoid-0.2.tar.gz
MD5: 7c437c2ea32f45dde66fc74f690ab361

TwitterZoid license

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

http://www.gnu.org/licenses/


Thursday, October 18th, 2007

User Agent Sniffer

I've currently got several web projects at various stages of development. One thing that all of these projects have in common is that they all capture and manipulate user-agent stings.

What are user-agent strings?

User-agents strings are used by client applications such as web browsers, feed readers, bots and other software to identify themselves to the servers they are connecting to. The strings contain important information such as application type, version, language and operating system. A typical user-agent string might look like this:

Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.7 (like Gecko) (Kubuntu)

The above example is used to identify that the client is a Mozilla compatible Konqueror web browser running on Kubuntu Linux.

Collecting user-agent strings

To run tests on my projects I need some sample data to play around with [a list or database table of user-agent strings.] I figured that the best way to get this sample data would be to collect some user-agent strings from the wild. So, I wrote a quick PHP script to do just that.

I ran the script on this site [crunchbang.org] for ten days, starting on October 7th 2007. During that ten day period the script captured a total of 2272 unique user-agent strings. The captured list included both standard and non-standard strings.

A few facts about user-agent strings

After capturing the list I then edited the PHP script so that it would report a few facts. Here is a breakdown of what was returned:

1. At just 6 characters in length the shortest user-agent string captured was:

NG/2.0

2. The longest measured in at 205 characters, it was:

Mozilla/5.0 (compatible; MSIE 7.0; Windows; HTMLAB; .NET CLR 1.1.4322; 
MEGAUPLOAD 1.0; Seekmo; ZangoToolbar4.8.2; Alexa Toolbar; Hotbar 4.2.8.0) (compatible; 
Googlebot/2.1; +http://www.google.com/bot.html)

3. The average computed length of the user-agent strings was 91.2750880282 characters.

4. Most strings contained some non-alphanumeric characters, these were:

/ . ( ; - : ) + _ ! = , @ &  ' [ ] * ~ ? { }

5. The strangest user-agent string was:

Mmm.... Brains....

View the whole report

You can view the whole report here: http://crunchbang.org/misc/user-agent-report-2007-10-17.txt

Get the sample data

I thought it would be good to share the sample data. There's no private or confidential information in the data and I figure it may come in handy for other developers working on similar projects.

You can get the data as an ASCII file [one user-agent string per line] here: http://crunchbang.org/misc/sample-user-agents-ascii.txt

Or, as an SQL statement here: http://crunchbang.org/misc/sample-user-agents-mysql.txt

Get the PHP script

If you fancy having a go at collecting your own samples you can grab my PHP script here: http://crunchbang.org/misc/ua-sniffer.txt

The script requires the use of MySQL. Other than that it's a fairly straightforward affair. Just edit the four settings to define your database name, address, username and password.

I ran the script by calling it with a require_once statement. Note that the script also sets and reads a cookie so you'll need to call on it before outputting any data to the client.

require_once("ua-sniffer.php");

Once the script has collected some user-agent strings it is possible to query it and have it produce a basic report. You can do this by accessing the script through your browser like so:

http://www.example.com/ua-sniffer.php?report=true

Links to external references


Browse Posts by Tag

13th advocacy antispam artwork bash bbc bcs bittorrent bloggers blogs boobs bookmarklets cli code colour commands comments conduit crontab crunchbanglinux debian design development email fluxbuntu fonts fun gedit gimp gnome google gos hack hacks hardware hosting images javascript language launchpad licenses life lincslug linux lugradio madness meme memes microsoft mint misc monkeys motu muppets mysql n95 networking nokia openbox openoffice opensuse packaging penguins php phpmyadmin podcast ppa printer progbox programming projects puppy python random rants realplayer revu scripts security shell software ssh terminal terminator themes tools twitter typography ubuntu ubuntucse unitedhosting video virtualisation webcam webdesign whird wiki windows woot xfce4 zombies