Monday, June 16th, 2008

Contractions, Waffle, Mrs Briggs & Data

A stylised image of the Star Trek character Data.

For the last month or so I've I have been attempting to eliminate contractions from my blog posts. Initially I found the process quite difficult and I'd I would often find myself struggling with basic English. One word which troubled me was, "cannot", which for a while at least, existed in my head as two separate words; I can't can not cannot imagine why? Anyhow, I think I'm I am finally beginning to get the hang of it.

I'm I am not entirely sure why I decided to stop using contractions; maybe it's it has got something to do with my need to experiment? Or, maybe I'd I had previously read somewhere that contractions cause issues with non-human translation services. Either way, I'm I am quite enjoying the experience, although I fear that it doesn't does not aid the flow of my written gibberish.

While I'm I am on the subject of my poorly scribed waffle, it's it has got to be said that writing doesn't does not come naturally to me. The reason my writing isn't is not often easy to read isn't is not entirely due to my recent sans-contraction experiment, no, I believe it's it has more to do with Mrs Briggs, who was both my secondary school English teacher and the biggest distraction throughout my secondary education. Actually, that's that is not completely true, the distractions were her long legs, short skirts and fancy knickers [don't do not ask]; which in my humble opinion, isn't is not suitable attire for a secondary school English teacher. Maybe I should've should have said something at the time? Thinking about it now, I'm I am glad I didn't did not say anything because I'm I am sure she'd've she would have flipped out; besides, no normal hormonal teenage boy is going to complain about such things.

Anyway, back to the subject of contractions; if you're you are wondering how all this relates to Data, well, it's it is a known fact that Data's Data has got issues with verbal contractions in ordinary speech, which is amusing when you consider he's he has got a total linear computational speed rated at sixty trillion operations per second, yet he can't can not cannot say, "can't". Silly android.

P.S. I thought it'd it would be fun to write like this, but to be honest, 'tisn't it is not. 'tisn't It is not going to happen again ;-)

Tagged with: fun, language, life, random | Comments [3]


Saturday, May 10th, 2008

5 Or More Consecutive Consonants

Carol Vorderman

"A consonant please Carol, and another, and another, and another, and another." — actually, this post is not about Carol Vorderman or Countdown, it is about some interesting[?] script output I came across when attempting to write a new spam filter. I will explain…

Just lately my website has been receiving some rather odd junk comments. The comments make no sense and they have quite obviously been sent by some automated junk flinging robot. The reason the comments make no sense is because they seem to be constructed from random characters. Apart from making no sense, these comments were also becoming a nuisance as they were easily slipping past my existing keyword filters.

So, the other night I decided to sit down and write a new filter to try and catch these random character junk comments. I started by analysing some previously submitted comments to try and find any common patterns. One such pattern I found was multiple strings containing 5 or more consecutive consonants. Thinking this to be unusual, I ran some tests against a flat file containing 21110 common English words. I thought the results were interesting. Here is what I found:

I should state that the above results are in no way definitive. I know this because I also ran the same test against another file containing 311141 words found in the Merriam-Webster dictionary. Still, by using the results of the initial test I was able to construct a list of safe words to use with my new spam filter.

Finally, yes, I did consider not writing this post; however, I am sure my publishing of these results will not change anything. Besides, Arthur, my 80 year old neighbour, is the biggest Countdown fan on the planet, he is also quite Internet savvy and definitely thinks Carol Vorderman is hot — he may find these results quite useful in increasing his daily Countdown score!


Browse Posts by Tag

13th advocacy antispam artwork bash bbc bcs bittorrent bloggers blogs boobs bookmarklets cli code colour commands comments conduit crontab crunchbanglinux debian design development email fluxbuntu fonts fun gedit gimp gnome google gos hack hacks hosting images javascript language launchpad life lincslug linux lugradio madness memes misc monkeys motu mysql n95 networking nokia openbox openoffice opensuse packaging penguins php phpmyadmin podcast ppa progbox programming projects puppy python random rants realplayer revu scripts security shell software ssh terminal terminator themes tools twitter typography ubuntu ubuntucse unitedhosting video virtualisation webdesign whird wiki windows woot zombies