<?xml version="1.0"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
   <channel>
      <pubDate>Sat, 11 Oct 2008 07:03:29 GMT</pubDate>
      <lastBuildDate>Sat, 11 Oct 2008 07:03:29 GMT</lastBuildDate>
      <language>en</language>
      <docs>http://www.rssboard.org/rss-specification</docs>
      <title>CrunchBang ~ antispam</title>
      <link>http://crunchbang.org/tags/antispam/</link>
      <description>Code, Design &amp; GNU/Linux</description>

<item>
    <title>5 Or More Consecutive Consonants</title>
    <link>http://crunchbang.org/archives/2008/05/10/5-or-more-consecutive-consonants/</link>
    <pubDate>Sat, 10 May 2008 09:29:11 GMT</pubDate>
    <dc:creator>Philip Newborough</dc:creator>
    <guid>http://crunchbang.org/archives/2008/05/10/5-or-more-consecutive-consonants/</guid>
    <description><![CDATA[
    <p><img src="http://crunchbang.org/uploads/051008084936-carol_vorderman.jpg" alt="Carol Vorderman" style="float:left;border:0px;margin-right:20px;margin-bottom:10px; padding:4px; background: #babdb6;" /></p>

<p>&#34;A consonant please Carol, and another, and another, and another, and another.&#34; &#8212; actually, this post is not about <a href="http://en.wikipedia.org/wiki/Carol_Vorderman " title="Wikipedia - Carol Vorderman">Carol Vorderman</a> or <a href="http://en.wikipedia.org/wiki/Countdown_%28game_show%29 " title="Wikipedia - Countdown">Countdown</a>, it is about some interesting[<em>?</em>] script output I came across when attempting to write a new spam filter. I will explain&#8230;</p>

<p>Just lately my website has been receiving some rather odd junk comments. The comments make no sense and they have quite obviously been sent by some automated junk flinging robot. The reason the comments make no sense is because they seem to be constructed from random characters. Apart from making no sense, these comments <em>were</em> also becoming a nuisance as they <em>were</em> easily slipping past my existing keyword filters.</p>

<p>So, the other night I decided to sit down and write a new filter to try and catch these random character junk comments. I started by analysing some previously submitted comments to try and find any common patterns. One such pattern I found was multiple strings containing 5 or more consecutive consonants. Thinking this to be unusual, I ran some tests against a <a href="http://crunchbang.org/misc/common_words.txt " title="flat file containing 21110 common English words">flat file containing 21110 common English words</a>. I thought the results were interesting. Here is what I found:</p>

<ul>
<li><a href="http://crunchbang.org/wiki/strings-containing-5-or-more-consecutive-consonants/ " title="85 unique strings containing 5 or more consecutive consonants.">85 unique strings containing 5 or more consecutive consonants</a>.</li>
<li><a href="http://crunchbang.org/wiki/words-containing-5-or-more-consecutive-consonants/ " title="113 words containing 5 or more consecutive consonants.">113 words containing 5 or more consecutive consonants</a>.</li>
<li>9 words containing 5 or more consecutive consonants and no vowels: <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=crypt " title="dict.org - crypt">crypt</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=lymph " title="dict.org - lymph">lymph</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=lynch " title="dict.org - lynch">lynch</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=myrrh " title="dict.org - myrrh">myrrh</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=nymph " title="dict.org - nymph">nymph</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=pygmy " title="dict.org - pygmy">pygmy</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=rhythm " title="dict.org - rhythm">rhythm</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=sylph " title="dict.org - sylph">sylph</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=tryst " title="dict.org - tryst">tryst</a></li>
<li>10 words containing 6 or more consecutive consonants: <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=latchstring " title="dict.org - latchstring">latchstring</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=metempsychosis " title="dict.org - metempsychosis">metempsychosis</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=polysyllabic " title="dict.org - polysyllabic">polysyllabic</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=polysyllable " title="dict.org - polysyllable">polysyllable</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=porphyry " title="dict.org - porphyry">porphyry</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=rhythm " title="dict.org - rhythm">rhythm</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=skyscraper " title="dict.org - skyscraper">skyscraper</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=strychnine " title="dict.org - strychnine">strychnine</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=synchronize " title="dict.org - synchronize">synchronize</a>, <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=synchronous " title="dict.org - synchronous">synchronous</a></li>
<li>1 word containing 6 consecutive consonants and no vowels: <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=rhythm " title="dict.org - rhythm">rhythm</a></li>
<li>1 word containing 7 consecutive consonants: <a href="http://www.dict.org/bin/Dict?Form=Dict1&amp;Strategy=*&amp;Database=*&amp;Query=strychnine " title="dict.org - strychnine">strychnine</a></li>
</ul>

<p>I should state that the above results are in no way definitive. I know this because I also ran the same test against another file containing 311141 words found in the <a href="http://en.wikipedia.org/wiki/An_American_Dictionary_of_the_English_Language " title="Wikipedia - Merriam-Webster dictionary">Merriam-Webster dictionary</a>. Still, by using the results of the initial test I was able to construct a list of safe words to use with my new spam filter.</p>

<p><em>Finally, yes, I did consider not writing this post; however, I am sure my publishing of these results will not change anything. Besides, Arthur, my 80 year old neighbour, is the biggest Countdown fan on the planet, he is also quite Internet savvy and definitely thinks <a href="http://images.google.com/images?q=carol+vorderman+countdown " title="Carol Vorderman, hot or not?">Carol Vorderman is hot</a> &#8212; he may find these results quite useful in increasing his daily Countdown score!</em></p>

    <p style="font-size:smaller;">Tags: <a href="http://crunchbang.org/tags/antispam/" title="Browse all posts tagged with &#8220;antispam&#8221;">antispam</a>, <a href="http://crunchbang.org/tags/language/" title="Browse all posts tagged with &#8220;language&#8221;">language</a>, <a href="http://crunchbang.org/tags/programming/" title="Browse all posts tagged with &#8220;programming&#8221;">programming</a>, <a href="http://crunchbang.org/tags/projects/" title="Browse all posts tagged with &#8220;projects&#8221;">projects</a>, <a href="http://crunchbang.org/tags/whird/" title="Browse all posts tagged with &#8220;whird&#8221;">whird</a></p>
    ]]></description>
</item>

<item>
    <title>Human Automated Captcha Reader</title>
    <link>http://crunchbang.org/archives/2007/10/31/human-automated-captcha-reader/</link>
    <pubDate>Wed, 31 Oct 2007 15:21:50 GMT</pubDate>
    <dc:creator>Philip Newborough</dc:creator>
    <guid>http://crunchbang.org/archives/2007/10/31/human-automated-captcha-reader/</guid>
    <description><![CDATA[
    <p>A quote from the article, &#34;<a href="http://pandalabs.pandasecurity.com/archive/A-new-way-of-social-engineering.aspx " title="A new way of social engineering">A new way of social engineering</a>&#34; by PandaLabs:</p>

<blockquote>
  <p>Now, look at yourself, you are a human automated captcha reader. If you type the correct interpretation of the image, you are sending the information necessary to break the protection of the targeted site. This attack could be used to create massive mail accounts, for comment posting&#8230;</p>
</blockquote>

<p>My blog has recently begun attracting its first comment spam and I found this really interesting. Personally, I&#39;ve never liked captchas. Apart from being really hard to read they also present accessibility issues. I think I&#39;m going to stick to moderating my comments &#8212; at least until I&#39;ve created a better solution!</p>

<p>Having said that, I&#39;d really like to put some comment spam protection in place for <a href="http://crunchbang.org/tags/whird/ " title="See all Whird related project posts.">Whird</a> before its first release. It seems only the proper [<em>responsible</em>] thing to do. I can guarantee it will not involve the use of a captcha.</p>

<p>Thanks to <a href="http://fortytwo.ch/blog/archives/2007/10/#e2007-10-31T08_10_51.txt " title="An almost completely debian-unrelated weblog.">Adrian von Bidder</a> for pointing out the PandaLabs article.</p>

    <p style="font-size:smaller;">Tags: <a href="http://crunchbang.org/tags/antispam/" title="Browse all posts tagged with &#8220;antispam&#8221;">antispam</a>, <a href="http://crunchbang.org/tags/projects/" title="Browse all posts tagged with &#8220;projects&#8221;">projects</a>, <a href="http://crunchbang.org/tags/whird/" title="Browse all posts tagged with &#8220;whird&#8221;">whird</a></p>
    ]]></description>
</item>

<item>
    <title>My First Comment Spam</title>
    <link>http://crunchbang.org/archives/2007/10/25/my-first-comment-spam/</link>
    <pubDate>Thu, 25 Oct 2007 09:09:58 GMT</pubDate>
    <dc:creator>Philip Newborough</dc:creator>
    <guid>http://crunchbang.org/archives/2007/10/25/my-first-comment-spam/</guid>
    <description><![CDATA[
    <p>It was inevitable and now it&#39;s happened. It&#39;s taken just over a month for my blog to attract its first comment spam. I&#39;m unsure at the moment whether the comment was entered manually or by some kind of bot [<em>I&#39;m guessing a bot.</em>] Either way I&#39;m actually quite pleased that it&#39;s finally happened; now I can start developing a defence system!</p>

    <p style="font-size:smaller;">Tags: <a href="http://crunchbang.org/tags/antispam/" title="Browse all posts tagged with &#8220;antispam&#8221;">antispam</a>, <a href="http://crunchbang.org/tags/projects/" title="Browse all posts tagged with &#8220;projects&#8221;">projects</a>, <a href="http://crunchbang.org/tags/whird/" title="Browse all posts tagged with &#8220;whird&#8221;">whird</a></p>
    ]]></description>
</item>

 </channel>
</rss>