Saturday, May 10th, 2008

5 Or More Consecutive Consonants

Carol Vorderman

"A consonant please Carol, and another, and another, and another, and another." — actually, this post is not about Carol Vorderman or Countdown, it is about some interesting[?] script output I came across when attempting to write a new spam filter. I will explain…

Just lately my website has been receiving some rather odd junk comments. The comments make no sense and they have quite obviously been sent by some automated junk flinging robot. The reason the comments make no sense is because they seem to be constructed from random characters. Apart from making no sense, these comments were also becoming a nuisance as they were easily slipping past my existing keyword filters.

So, the other night I decided to sit down and write a new filter to try and catch these random character junk comments. I started by analysing some previously submitted comments to try and find any common patterns. One such pattern I found was multiple strings containing 5 or more consecutive consonants. Thinking this to be unusual, I ran some tests against a flat file containing 21110 common English words. I thought the results were interesting. Here is what I found:

I should state that the above results are in no way definitive. I know this because I also ran the same test against another file containing 311141 words found in the Merriam-Webster dictionary. Still, by using the results of the initial test I was able to construct a list of safe words to use with my new spam filter.

Finally, yes, I did consider not writing this post; however, I am sure my publishing of these results will not change anything. Besides, Arthur, my 80 year old neighbour, is the biggest Countdown fan on the planet, he is also quite Internet savvy and definitely thinks Carol Vorderman is hot — he may find these results quite useful in increasing his daily Countdown score!


Wednesday, May 7th, 2008

A(nother) Regular Expression Test Tool

I came across another regular expression test tool today. This one is an Ajax enabled regex tool which lets you evaluate regex expressions in several languages, including PHP PCRE and PHP POSIX, with instant results. You can choose which functions to use, such as match, match all, replace, split etc. I much prefer it to the similar regex tool I mentioned a couple of months ago. Everything considered, it's a very handy resource for when you are struggling with those pesky expressions.

URL: http://regex.larsolavtorvik.com/
Blog: Lars Olav Torvik - Programming and computer stuff.


Sunday, May 4th, 2008

Wiki Rewrite

Over the last couple of nights I have completely rewritten my personal wiki. The wiki previously used the PHP WikkaWiki wiki engine, it now uses a bunch of custom PHP scripts. The scripts are similar to those used by my blog software, Whird. I decided to perform the rewrite for numerous reasons, some of which I have listed below:

  1. I was unhappy with how WikkaWiki was formatting the underlying HTML, specifically the way in which it would never use the paragraph tag, opting instead to insert break tags. While this probably sounds like a minor issue, it was really beginning to bug me.

  2. I started this site [crunchbang.org] with the intention of coding all of the software/scripts myself. Therefore, and somewhat obviously, my use of WikkaWiki was always going to provide reason for my conscience to niggle me.

  3. As mentioned before, WikkaWiki is very hackable; however, it was never going to be as hackable as something I had produced myself.

  4. I wanted both my blog and my wiki to use the Markdown markup language. While this was not a problem for my blog [it has always used Markdown] I could not find any suitable plugins/hacks for enabling Markdown within WikkaWiki.

The rewrite is pretty much complete and is now live. I have tried to make sure any URLs used by WikkaWiki are either reused or redirected. Please feel free to drop me a comment if you notice anything funky occurring.


Friday, March 28th, 2008

Perplexed by Web Frameworks

The latest LugRadio episode features a discussion about Django and other web frameworks. I found the feature interesting, but I have to admit that I find the subject of web frameworks somewhat perplexing. I have yet to fully embrace any such framework, although I have played with the Zend Framework. I think there are a number of reasons I have not fully adopted any frameworks:

  1. I am put off by having to learn all the new classes, structures and methods employed by said frameworks. Surely my time would be better spent actually learning more about the core language?

  2. I fear that using a framework would somehow stifle innovation. I am under no illusions of being the most innovative player, however, I find it hard to shake. I guess I am questioning where the innovation comes from if everyone is using the same framework?

  3. It is fun to write your own code, functions, classes and routines etc. I like to experiment with code, I like to make mistakes, before fixing them — it is this that keeps me interested. I would be concerned that using a framework would takeaway much of that.

Having said all that, I'm not totally opposed to frameworks and I think they have their place; three PHP frameworks of interest to me are:

  1. Zend Framework: http://framework.zend.com/
  2. CakePHP: http://www.cakephp.org/
  3. Symfony: http://www.symfony-project.org/

I am going to look into the above to see what they have to offer; however, I think I will probably continue to hack together my own code for a while to come.


Monday, February 18th, 2008

Regular Expression Test Tool

I've used this web based regular expression test tool a couple of times over the last day or so. While I don't normally struggle with regular expressions, this tool has still come in handy; it has saved me from the "code it and cross your fingers" approach I normally take.

This is a great tool for anyone who works with PHP, especially as the service uses PHP regular expression functions as a base for its operations.

URL: http://www.solmetra.lt/scripts/regex/


Sunday, February 17th, 2008

Wicked Cool Shell Scripts

I've not read the book, but the Wicked Cool Shell Scripts site and its example shell scripts are, erm, wicked cool. The site offers a whole host of scripts, some of which could quite easily be adapted/hacked into useful tools. If you're remotely interested in Shell scripting, you should take a look, even people with scripting experience might learn a thing or two.

URL: http://www.intuitive.com/wicked/index.shtml

Download the script library: wicked-cool-shell-scripts.tgz


Sunday, November 18th, 2007

Bash Script: MySQL Backup

I thought that it might be a good idea to start posting a few of my scripts; it'll be handy to have them on my site for future reference. Also, I learn a lot by reading example scripts — I guess others might be able to learn from mine.

I wrote the following Bash script to perform a backup of a remote MySQL database. The script first connects via SSH and performs a MySQL dump, saving the results to file. It then connects via SFTP and downloads the file. Once the file has been downloaded, it restores the database to my local MySQL server.

It is quite a simple Bash script and it should be fairly straightforward to follow.

#!/bin/sh
# Settings
#############################
REMOTEHOST="example.com"
REMOTEBACKUPDIR="backup/sql"
SQLHOST="localhost"
SQLDB="database_name"
SQLUSER="username"
SQLPASS="password"
SQLFILE="database_name.sql"
LOCALBACKUPDIR="backup/sql"
#############################
# Start main
echo "* Connecting via SSH..."
ssh $REMOTEHOST <<**
echo "* Performing SQL dump..."
if [ -d $REMOTEBACKUPDIR ]; then
    cd $REMOTEBACKUPDIR
else
    mkdir $REMOTEBACKUPDIR
    cd $REMOTEBACKUPDIR
fi
mysqldump -h $SQLHOST --user="$SQLUSER" --password="$SQLPASS" $SQLDB > $SQLFILE
echo "* Closing SSH connection..."
exit
**
cd ~
if [ -d $LOCALBACKUPDIR ]; then
    cd $LOCALBACKUPDIR
else
    mkdir $LOCALBACKUPDIR
    cd $LOCALBACKUPDIR
fi
echo "* Connecting via SFTP..."
sftp $REMOTEHOST <<**
cd $REMOTEBACKUPDIR
get $SQLFILE
exit
**
echo "* Restoring SQL dump to local server..."
mysql --user "$SQLUSER" --password="$SQLPASS" $SQLDB < $SQLFILE
echo "* SQL backup complete."
cd ~
exit 0

Notes

  1. For automation purposes, this script assumes that SSH and SFTP have been configured for automatic login. See "Creating Private/Public SSH Keys"
  2. It also assumes there is a mirrored MySQL server and user account running on the local machine.
  3. The script can be automated using Crontab.
  4. Lacks any error handling and/or logging!?
  5. I've worked with some commercial hosting providers who do not grant table locking privileges to their MySQL users — table locking can be bypassed by adding the "--skip-lock-tables" option to the "mysqldump" command. Use with caution.

Sunday, November 4th, 2007

Ternary Operator in PHP

Over on Planet PHP there's a running debate [see here, here & here] over the use of the ternary operator. I love a good debate so I thought I'd chip in with my 2 pennies.

The ternary operator looks attractive and can reduce the amount of code that you write. I know this but I still don't use it. I think PHP is an awesome language and a major contributing factor to its awesomeness is its simplicity. It's a relatively easy language to learn and I think that the wide spread use of the ternary operator would only increase the barrier to entry [not a good thing!]

Also, I find it interesting that the ternary operator is hardly ever referenced in the comments and code examples at PHP.net. Indeed chapter 16 "Control Structures" of the manual hardly mentions it at all.

Ternary operator code example

In case you've no idea what this post is about.

Before: no ternary operator, easy to understand

if ($treat == 'cream') {
    $cat = 'Happy!';
} else {
    $cat = 'Not so happy.';
}

After: ternary operator in use, less code but not so easy to understand

$cat = ($treat == 'cream') ? 'Happy!' : 'Not so happy.';

Tagged with: code, php, programming | Comments [1]


Thursday, October 18th, 2007

User Agent Sniffer

I've currently got several web projects at various stages of development. One thing that all of these projects have in common is that they all capture and manipulate user-agent stings.

What are user-agent strings?

User-agents strings are used by client applications such as web browsers, feed readers, bots and other software to identify themselves to the servers they are connecting to. The strings contain important information such as application type, version, language and operating system. A typical user-agent string might look like this:

Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.7 (like Gecko) (Kubuntu)

The above example is used to identify that the client is a Mozilla compatible Konqueror web browser running on Kubuntu Linux.

Collecting user-agent strings

To run tests on my projects I need some sample data to play around with [a list or database table of user-agent strings.] I figured that the best way to get this sample data would be to collect some user-agent strings from the wild. So, I wrote a quick PHP script to do just that.

I ran the script on this site [crunchbang.org] for ten days, starting on October 7th 2007. During that ten day period the script captured a total of 2272 unique user-agent strings. The captured list included both standard and non-standard strings.

A few facts about user-agent strings

After capturing the list I then edited the PHP script so that it would report a few facts. Here is a breakdown of what was returned:

1. At just 6 characters in length the shortest user-agent string captured was:

NG/2.0

2. The longest measured in at 205 characters, it was:

Mozilla/5.0 (compatible; MSIE 7.0; Windows; HTMLAB; .NET CLR 1.1.4322; 
MEGAUPLOAD 1.0; Seekmo; ZangoToolbar4.8.2; Alexa Toolbar; Hotbar 4.2.8.0) (compatible; 
Googlebot/2.1; +http://www.google.com/bot.html)

3. The average computed length of the user-agent strings was 91.2750880282 characters.

4. Most strings contained some non-alphanumeric characters, these were:

/ . ( ; - : ) + _ ! = , @ &  ' [ ] * ~ ? { }

5. The strangest user-agent string was:

Mmm.... Brains....

View the whole report

You can view the whole report here: http://crunchbang.org/misc/user-agent-report-2007-10-17.txt

Get the sample data

I thought it would be good to share the sample data. There's no private or confidential information in the data and I figure it may come in handy for other developers working on similar projects.

You can get the data as an ASCII file [one user-agent string per line] here: http://crunchbang.org/misc/sample-user-agents-ascii.txt

Or, as an SQL statement here: http://crunchbang.org/misc/sample-user-agents-mysql.txt

Get the PHP script

If you fancy having a go at collecting your own samples you can grab my PHP script here: http://crunchbang.org/misc/ua-sniffer.txt

The script requires the use of MySQL. Other than that it's a fairly straightforward affair. Just edit the four settings to define your database name, address, username and password.

I ran the script by calling it with a require_once statement. Note that the script also sets and reads a cookie so you'll need to call on it before outputting any data to the client.

require_once("ua-sniffer.php");

Once the script has collected some user-agent strings it is possible to query it and have it produce a basic report. You can do this by accessing the script through your browser like so:

http://www.example.com/ua-sniffer.php?report=true

Links to external references


Sunday, September 30th, 2007

Silly Variable Names

They're not big and they're not clever. So how come they keep slipping into my code?

if(!empty($bum)){
    unload($bum);
}

Note to self: stop with the silly variable names already!

Tagged with: code, fun, programming | Comments [0]


Saturday, September 29th, 2007

Whird Prototype - Time to Branch

CrunchBang.org is currently served up by an early prototype of a new PHP blogging application. I'm creating the application [which I've provisionally named Whird] on my development server at home. I mention this because I've now reached a point where I need to make a decision.

An issue and potential problem has arisen due to the fact that I've been developing the application in the style of this here site. While this hasn't given me any problems so far, I don't want cause any additional work for myself in the future. I need to branch the project so that any customisations that I make specific to CrunchBang.org don't slip into the final project.

I guess this is a milestone in this projects history; from now on I'll be writing for a more generic code base. Hopefully it'll stop things from getting too messy round here.


Saturday, September 29th, 2007

Everyone Should Learn Some Programming

From an article on Brane Dump:

This, in my opinion, is why everybody (literally, every single person) should learn to program at some fairly simple level, like learning to read and write. By using plain text and small scripts, I've got a workflow that works for me, and it's cost me less time than I'd spend learning some large pre-written app and putting all my data into it. Being able to manipulate data like this is, I think, a fairly important tool in the modern world, and I don't think it should be left to any sort of priesthood of developers — it should be as universal as most countries try to make literacy and numeracy.

I couldn't agree more with this, however I think it's a bit of a stretch of the imagination to think it'll ever happen — more so when you read stuff like this.

Tagged with: programming | Comments [0]


Wednesday, September 26th, 2007

PHP: strrpos() vs strpos()

I normally wouldn't post about such trivial matters, however this particular trivial matter bugged me for several hours this afternoon, so I thought I mention it.

It seems that one of the problems of coding on a development server that runs PHP 5 for a production server that runs PHP 4 is the subtle differences in the language. These subtle differences can really throw a spanner in the works [like one did to me this afternoon.]

Basically my development code used the strrpos() function to attempt to find the first occurrence of a string within a string. While this worked just fine with PHP 5, it bombed big time with PHP 4 [which will only find the first occurrence of a single character.] Apparently I should have been using the strpos() function instead!

Oh well, live and learn.

Tagged with: php, programming | Comments [0]


Wednesday, September 26th, 2007

7 Reasons He Switched Back to PHP

I didn’t abandon the rewrite IDEA, though. I just asked myself one important question:
“Is there anything Rails can do, that PHP CAN’T do?”
The answer is no.

Derek Silvers writes a good article with 7 good reasons as to why he switched back to PHP after 2 years of coding Ruby on Rails.

I've always liked the idea of learning Rails, [is it just me or has Rails been portrayed as a sexy/trendy language?] but I've never started because PHP has always been good enough for my own projects. I'm glad I didn't waste 2 years learning another language just to find that out!

My favourite reason from the article:

7 - PROGRAMMING LANGUAGES ARE LIKE GIRLFRIENDS: THE NEW ONE IS BETTER BECAUSE YOU ARE BETTER

Tagged with: php, programming | Comments [0]


Browse Posts by Tag

13th advocacy antispam artwork bash bbc bcs bittorrent bloggers blogs boobs bookmarklets cli code colour commands comments conduit crontab crunchbanglinux debian design development email fluxbuntu fonts fun gedit gimp gnome google gos hack hacks hosting images javascript language launchpad life lincslug linux lugradio madness meme memes misc monkeys motu muppets mysql n95 networking nokia openbox openoffice opensuse packaging penguins php phpmyadmin podcast ppa progbox programming projects puppy python random rants realplayer revu scripts security shell software ssh terminal terminator themes tools twitter typography ubuntu ubuntucse unitedhosting video virtualisation webdesign whird wiki windows woot xfce4 zombies