You are here: CrunchBang → Tags → whird
Saturday, May 17th, 2008
I have created a new page on my website. For lack of a better name, I have called the page "Interesting Stuff Elsewhere". The page features a list of links to "stuff" which I have deemed interesting enough to share via my Google Reader account.
The "stuff" is a hodgepodge collection of articles, blog posts, podcasts and videos; having said that, the "stuff" could be anything and the only real way to find out what the "stuff" is, is to click on the links. One thing is for sure, all the links lead to dead good "stuff" :)
See: http://crunchbang.org/elsewhere/
Saturday, May 10th, 2008

"A consonant please Carol, and another, and another, and another, and another." — actually, this post is not about Carol Vorderman or Countdown, it is about some interesting[?] script output I came across when attempting to write a new spam filter. I will explain…
Just lately my website has been receiving some rather odd junk comments. The comments make no sense and they have quite obviously been sent by some automated junk flinging robot. The reason the comments make no sense is because they seem to be constructed from random characters. Apart from making no sense, these comments were also becoming a nuisance as they were easily slipping past my existing keyword filters.
So, the other night I decided to sit down and write a new filter to try and catch these random character junk comments. I started by analysing some previously submitted comments to try and find any common patterns. One such pattern I found was multiple strings containing 5 or more consecutive consonants. Thinking this to be unusual, I ran some tests against a flat file containing 21110 common English words. I thought the results were interesting. Here is what I found:
- 85 unique strings containing 5 or more consecutive consonants.
- 113 words containing 5 or more consecutive consonants.
- 9 words containing 5 or more consecutive consonants and no vowels: crypt, lymph, lynch, myrrh, nymph, pygmy, rhythm, sylph, tryst
- 10 words containing 6 or more consecutive consonants: latchstring, metempsychosis, polysyllabic, polysyllable, porphyry, rhythm, skyscraper, strychnine, synchronize, synchronous
- 1 word containing 6 consecutive consonants and no vowels: rhythm
- 1 word containing 7 consecutive consonants: strychnine
I should state that the above results are in no way definitive. I know this because I also ran the same test against another file containing 311141 words found in the Merriam-Webster dictionary. Still, by using the results of the initial test I was able to construct a list of safe words to use with my new spam filter.
Finally, yes, I did consider not writing this post; however, I am sure my publishing of these results will not change anything. Besides, Arthur, my 80 year old neighbour, is the biggest Countdown fan on the planet, he is also quite Internet savvy and definitely thinks Carol Vorderman is hot — he may find these results quite useful in increasing his daily Countdown score!
Sunday, May 4th, 2008
Over the last couple of nights I have completely rewritten my personal wiki. The wiki previously used the PHP WikkaWiki wiki engine, it now uses a bunch of custom PHP scripts. The scripts are similar to those used by my blog software, Whird. I decided to perform the rewrite for numerous reasons, some of which I have listed below:
I was unhappy with how WikkaWiki was formatting the underlying HTML, specifically the way in which it would never use the paragraph tag, opting instead to insert break tags. While this probably sounds like a minor issue, it was really beginning to bug me.
I started this site [crunchbang.org] with the intention of coding all of the software/scripts myself. Therefore, and somewhat obviously, my use of WikkaWiki was always going to provide reason for my conscience to niggle me.
As mentioned before, WikkaWiki is very hackable; however, it was never going to be as hackable as something I had produced myself.
I wanted both my blog and my wiki to use the Markdown markup language. While this was not a problem for my blog [it has always used Markdown] I could not find any suitable plugins/hacks for enabling Markdown within WikkaWiki.
The rewrite is pretty much complete and is now live. I have tried to make sure any URLs used by WikkaWiki are either reused or redirected. Please feel free to drop me a comment if you notice anything funky occurring.
Sunday, April 20th, 2008
Tonight I have mainly been working on Whird. I have been rewriting large chunks of code in an effort to optimise a bunch of functions. As a result of this, I had to change a series of strings in a number of files. As per normal when it comes to fiddly grep, sed and awk commands, I fired up Google and searched for some pointers. Whilst refreshing my memory, I came across a comment by an anonymous reader who suggested using the rpl command.
I had not previously come across rpl before, so I investigated. Turns out that rpl is a really handy text replacement tool — it makes recursive text replacement commands really simple; as simple as:
rpl [options] old_string new_string target_file(s)
Available options are:
--version show program's version number and exit
-h, --help show this help message and exit
-L, --license show the software license
-x SUFFIX specify file suffix to match
-i, --ignore-case do a case insensitive match
-w, --whole-words whole words (old_string matches on word boundaries only)
-b, --backup make a backup before overwriting files
-q, --quiet quiet mode
-v, --verbose verbose mode
-s, --dry-run simulation mode
-R, --recursive recurse into subdirectories
-e, --escape expand escapes in old_string and new_string
-p, --prompt prompt before modifying each file
-f, --force ignore errors when trying to preserve permissions
-d, --keep-times keep the modification times on modified files
-t, --use-tmpdir use $TMPDIR for storing temporary files
-a, --all do not ignore files and directories starting with .
rpl is available to install from the Ubuntu repositories, install with the following command:
sudo apt-get install rpl
For more information about rpl, see: http://www.laffeycomputer.com/rpl.html
Thursday, April 17th, 2008
Looking at my project page for Whird, I can see I have somewhat neglected the project recently. Actually, this is not entirely true, I have been testing Whird extensively for about 7 months. As things stand at the moment, Whird could be considered as either, a working prototype, or, classic vaporware. I prefer to think of it as a working prototype; however, I am not one of the many people who, over the last few months, have contacted me, asking questions about the project.
Why the neglect?
There are several reasons for my recent lack of commitment to Whird:
I have spent a large amount of time [probably too much] working on other projects; since the beginning of the year, CrunchBang Linux has consumed most of my free time.
Now that I have a working copy of Whird, I seem to have become far too comfortable simply using it, instead of developing it. Since starting this blog, in September last year, I have written 165 posts [not including this one.] Maybe I should have spent more time developing, instead of writing?! Hang on a minute, I am doing it again now. Doh!
Related to the last point; now that I have test driven Whird, I pretty much know what is wrong, what needs fixing and what works. Knowing this, I would ideally like to perform a complete rewrite. A daunting thought!
I have no self-imposed deadlines. This is both a blessing and a curse. It is a blessing, because it means any work I put into Whird remains a fun activity, something I can perform at my leisure. It is a curse, because I may not actually touch the project for months on end.
Will Whird ever be released?
Answer; maybe, probably, I hope so. Having said that, I would not hold your breath. It has occurred to me that Whird is in danger of becoming [if it hasn't done so already] "that" project, the project that provides endless hours of fun, without actually resulting in anything tangible.
Anyhow, that pretty much sums up the current status of Whird. Time to crack on and have a bit more fun :) In the meantime, if you are reading this and you are looking to start a new blog, why not try Steve Kemp's Chronicle. I have not actually tried it myself, but from what I have read, it looks like an interesting piece of software, something I could see myself using.
Tuesday, January 29th, 2008
So, last night I was mainly experimenting with Whird, more specifically I was trying to build an internal search feature. This is something I've been putting off for long enough, and I'd really like to get it coded up. I didn't finish it last night, but at least I've made a start.
Anyhow, during my experimentation I wrote this little PHP function for filtering words out of a string. Basically the function takes 2 strings as arguments before filtering words out of the first string based on words found in the second. I've posted it below for future reference:
function word_filter($string1,$string2){
$string1 = trim($string1);
$string1 = preg_replace('/s+/', ' ', $string1);
$string1 = explode(" ",$string1);
$wordcount = count($string1);
$i = 0;
while ($i < $wordcount) {
$string = $string1[$i];
if (strstr(strtolower($string2),strtolower($string))) {
$string1[$i] = "";
}
$i++;
}
$string1 = implode(" ",$string1);
return($string1);
}
Example usage
This is probably not the best example, but this:
$poem = <<<EOD
<p><em>GIVE me women, wine, and snuff <br />
Untill I cry out "hold, enough!" <br />
You may do so sans objection <br />
Till the day of resurrection: <br />
For, bless my beard, they aye shall be <br />
My beloved Trinity.</em></p>
EOD;
$common_words = 'and be but do for is it may me my not of the they there so was you';
echo word_filter($poem,$common_words);
Would output something like this:
GIVE women, wine, snuff
Untill cry out "hold, enough!"
sans objection
Till day resurrection:
For, bless beard, aye shall
beloved Trinity.
John Keats would be proud, not! Please feel free to optimise, or let me know if a one line equivalent already exists :)
Saturday, December 8th, 2007
I'm currently designing a range of templates/themes for use with Whird. As a result of this I've found myself thinking an awful lot about colour schemes and combinations. I've also been researching colour theory and application. To be honest, I'm about all coloured out.
The conclusion I've come to, with regards to choosing colour schemes for my Whird templates, is that I'm going to use neutral/passive colours. My main reason for this doesn't actually stem from the research I've been doing, it comes from my experience of selling houses. What I learned from my time in the housing market was this:
When selling your house, stage [decorate] with neutral colours. Neutral colours appeal to the broader market and will help to sell your house quickly.
I'm going to apply this working colour theory to my web templates; after all, if it's good enough for the multimillion £ housing market, it's good enough for my free web templates :)
Thursday, November 8th, 2007
I've received a few emails of late with questions regarding my blogging app, Whird. Development is currently slow but steady. At the moment I'm rewriting the "new post" and "edit post" features — I've been experiencing a few problems with PHP's 'magic_quotes'.
There is still no release schedule for the code, but I'm hoping to get something out as soon as possible. This prototype site has been performing well and I'm actually quite pleased with how Whird operates. Compared to Wordpress [not that it's anything like Wordpress] it's blindingly quick and feels a lot less clunky.
I'll post more info soon and hopefully I'll have a better idea as to when I'll be able to release some code.
Wednesday, October 31st, 2007
A quote from the article, "A new way of social engineering" by PandaLabs:
Now, look at yourself, you are a human automated captcha reader. If you type the correct interpretation of the image, you are sending the information necessary to break the protection of the targeted site. This attack could be used to create massive mail accounts, for comment posting…
My blog has recently begun attracting its first comment spam and I found this really interesting. Personally, I've never liked captchas. Apart from being really hard to read they also present accessibility issues. I think I'm going to stick to moderating my comments — at least until I've created a better solution!
Having said that, I'd really like to put some comment spam protection in place for Whird before its first release. It seems only the proper [responsible] thing to do. I can guarantee it will not involve the use of a captcha.
Thanks to Adrian von Bidder for pointing out the PandaLabs article.
Monday, October 29th, 2007
It's been a long time coming but my website [CrunchBang.org] is now finally running on PHP 5. My hosting provider performed the upgrade earlier this month and made the switch optional on a domain basis. Any domain on their servers can run either PHP version 4 [default] or upgrade to version 5 by adding a script handler in .htaccess:
Action php5-script /interpreters/php5-script
AddHandler php5-script .php
My development systems all run PHP 5 so the switch was a breeze. Hopefully from now on I'll be able to avoid stuff like this.
Also, I've been working on my Whird project over the weekend. I've now added feeds for individual tags and updated various features to improve usability. I've had to place some URL rewrites for the new feeds and I'm hoping that the various planets [Planet Ubuntu Users, Planet Ubuntu UK] don't get flooded — I apologise if they do :)
Thursday, October 25th, 2007
It was inevitable and now it's happened. It's taken just over a month for my blog to attract its first comment spam. I'm unsure at the moment whether the comment was entered manually or by some kind of bot [I'm guessing a bot.] Either way I'm actually quite pleased that it's finally happened; now I can start developing a defence system!
Tuesday, October 16th, 2007
My prototype blogging application, "Whird" has survived its recent digging intact. I'd like to thank everyone who dug [is that grammatically correct?] my recent post — you've really helped me to get a better understanding of Whird's performance under stressful loads.
The post itself made it to both the front page of Digg, and Del.icio.us. It attracted more than 15,000 unique visitors and 30,000 page views over a two day period. Thankfully the increased traffic caused no noticeable/obvious problems and the site performed well. I didn't have to make any modifications to the code or enable any page caching.
To be honest, I wasn't quite sure what to expect in the way of traffic from a successful Digg — so I contacted a fellow Ubuntu user who also had an article featured on Digg this weekend. Jonathan was kind enough to share his Wordpress stats with me and from what I can tell, the stats are roughly what one should expect.
All-in-all I'm really pleased with how things have gone. Hopefully it'll encourage me to pull my finger out and get the initial release of Whird out the door.
Saturday, September 29th, 2007
CrunchBang.org is currently served up by an early prototype of a new PHP blogging application. I'm creating the application [which I've provisionally named Whird] on my development server at home. I mention this because I've now reached a point where I need to make a decision.
An issue and potential problem has arisen due to the fact that I've been developing the application in the style of this here site. While this hasn't given me any problems so far, I don't want cause any additional work for myself in the future. I need to branch the project so that any customisations that I make specific to CrunchBang.org don't slip into the final project.
I guess this is a milestone in this projects history; from now on I'll be writing for a more generic code base. Hopefully it'll stop things from getting too messy round here.
Browse Posts by Tag
13th
advocacy
antispam
artwork
bash
bbc
bcs
bittorrent
bloggers
blogs
boobs
bookmarklets
cli
code
colour
commands
comments
conduit
crontab
crunchbanglinux
debian
design
development
email
fluxbuntu
fonts
fun
gedit
gimp
gnome
google
gos
hack
hacks
hosting
images
javascript
language
launchpad
life
lincslug
linux
lugradio
madness
memes
misc
monkeys
motu
mysql
n95
networking
nokia
openbox
openoffice
opensuse
packaging
penguins
php
phpmyadmin
podcast
ppa
progbox
programming
projects
puppy
python
random
rants
realplayer
revu
scripts
security
shell
software
ssh
terminal
terminator
themes
tools
twitter
typography
ubuntu
ubuntucse
unitedhosting
video
virtualisation
webdesign
whird
wiki
windows
woot
xfce4
zombies