The ghosts of Children’s House

I’ve been checking the webserver logfiles here on MT.Net and note that a number of Google searches have brought people here looking for information on the Children’s House of Raleigh (CHR). Every time I discover someone else searching for that now-defunct school it makes me sad. Among other kids, our daughter got a great education at CHR. I felt a real kinship with the staff and other parents. Then the wheels came off. I’m not really sure what happened, but for whatever reason it just didn’t work out.

It’s tough to see something you poured love and work into come to an inglorious end.

Upping the spambot ante

This morning I was surprised to see that a spammer had apparently breached my WordPress anti-spambot gauntlet. What does this mean in English, you ask? A potential hacker actually succeeded in registering an account on MT.Net, from which he could potentially attack my website.

At first I thought a bot had solved my CAPTCHA challenge, but after looking at the log entries it does not appear that this was an automated attack. Some dumb schmuck actually typed in the code by hand. That’s what most visitors to my website do, but most people don’t do it using email and IP addresses associated with hackers.

I’ve since turned on SABRE’s RBL lookup tests. This will automatically check the incoming IP against a list of suspect addresses. If there’s a match, the rogue visitor get automatically booted before he even begins.

It’s not perfect security, but one part of many defenses needed to protect a website.

MSN can’t take no for an answer

Earlier this week I banned MSN’s msnbot from spidering my website. I did this with an entry in the robots.txt file:

User-Agent: msnbot
Disallow: /

I checked with MSN’s robots.txt verifier to make sure this would keep msnbot from spidering my site. The only problem is that I also blocked the MSN IP addresses. Thus msnbot couldn’t fetch robots.txt to tell it was no longer wanted.

So, I unblocked the IPs and allowed msnbot to grab the robots.txt file, which it did repeatedly (this is a small sample):
Continue reading

More MSN search bot shenanigans

Got more funny hits this morning from MSN’s search bot (emphasis mine):

65.55.104.132 – – [26/Oct/2009:10:20:24 -0400] “GET /2009/03/06/sailing-this-weekend/ HTTP/1.1” 200 4398 “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)”
65.55.104.132 – – [26/Oct/2009:10:20:25 -0400] “GET /wp-content/themes/mtdotnet/style.css HTTP/1.1” 200 10345 “http://www.markturner.net/2009/03/06/sailing-this-weekend/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)”
65.55.104.132 – – [26/Oct/2009:10:20:25 -0400] “GET /wp-includes/js/comment-reply.js?ver=20090102 HTTP/1.1” 200 786 “http://www.markturner.net/2009/03/06/sailing-this-weekend/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)”

The IP address 65.55.104.132 resolves to msnbot-65-55-104-132.search.msn.com.

I’m about ready to kick MSN off my sites permanently.

Update 10:41: Done. MSN is no longer welcome at my site. I’ve never banned a search engine before but this is inexcusable behavior and Microsoft should know better.

MSN now snooping anonymously

In a very strange occurrence, my website got visited from what appears to be an MSN spider that didn’t identify itself (fake user agent has been highlighted below):

65.55.231.117 – – [22/Oct/2009:10:02:07 -0400] “GET /robots.txt HTTP/1.1” 200 24 “-” “Mozilla/4.0”
65.55.231.117 – – [22/Oct/2009:10:02:07 -0400] “GET /wp-content/uploads/2009/10/oculan-screenshot-300×230.png HTTP/1.1” 200 120896 “-” “Mozilla/4.0”
65.55.210.80 – – [22/Oct/2009:10:02:20 -0400] “GET /page/2/?q=node%2F1699 HTTP/1.1” 200 29922 “-” “msnbot/1.1 (+http://search.msn.com/msnbot.htm)”
65.55.230.228 – – [22/Oct/2009:10:08:13 -0400] “GET /robots.txt HTTP/1.1” 200 24 “-” “Mozilla/4.0”
65.55.230.228 – – [22/Oct/2009:10:08:13 -0400] “GET /2009/10/15/big-names-in-sources-of-suspicious-traffic/ HTTP/1.1” 200 10502 “-” “Mozilla/4.0”

65.55.230.228 resolves to msnbot-65-55-230-228.search.msn.com. 65.55.231.117 is a Microsoft address but doesn’t have an entry in DNS.

Just to make sure someone wasn’t spoofing the MSN namespace, I checked the whois record for these host. Sure enough, they belong to Microsoft:
Continue reading

Another mystery bot example

Here’s another example of bizarre hits. Two hits for this six-year-old page coming in within 30 minutes of each other:

138.162.8.57 – – [15/Oct/2009:12:12:16 -0400] “GET /2003/07/28/blimps-and-other-things-bizarre/ HTTP/1.1” 200 5094 “-” “Mozilla/4.0 (compatible;)”

[snip]

138.163.106.72 – – [15/Oct/2009:12:44:33 -0400] “GET /2003/07/28/blimps-and-other-things-bizarre/ HTTP/1.1” 200 5094 “-” “Mozilla/4.0 (compatible;)”

The first resolves to gate2-jacksonville.nmci.navy.mil and the second resolves to gate2-bremerton.nmci.navy.mil. It looks like there’s a full-scale botnet attack going on behind the DoD firewalls right now.

More clues in the government botnet mystery

The plot thickens in the government botnet mystery I recently wrote about. This morning I got hits from the Navy-Marine Corps-Internet, specifically a host identified as gate3-norfolk.nmci.navy.mil:

Again, it started off innocently with a Google search, with the browser properly identified:

138.162.0.41 – – [15/Oct/2009:08:36:27 -0400] “GET /2008/12/19/beware-the-police-protective-fund/ HTTP/1.1” 200 6377 “http://www.google.com/search?hl=en&source=hp&q=police+protective+fund&aq=f&oq=&aqi=g10” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)”

A few more hits down, I see the random jumping around I’d seen before:

138.162.0.41 – – [15/Oct/2009:08:36:30 -0400] “GET /2008/12/20/a-mange-in-a-wager/ HTTP/1.1” 200 4191 “-” “Mozilla/4.0 (compatible;)”
138.162.0.42 – – [15/Oct/2009:08:36:30 -0400] “GET /2003/07/29/goodbye-bplog-hello-drupal/ HTTP/1.1” 200 14042 “-” “Mozilla/4.0 (compatible;)”
138.162.0.44 – – [15/Oct/2009:08:36:30 -0400] “GET /2003/07/27/action-packed_weekend/ HTTP/1.1” 200 4371 “-” “Mozilla/4.0 (compatible;)”
138.162.0.43 – – [15/Oct/2009:08:36:30 -0400] “GET /2003/07/24/keys_keys_keys/ HTTP/1.1” 200 5531 “-” “Mozilla/4.0 (compatible;)”
138.162.0.45 – – [15/Oct/2009:08:36:31 -0400] “GET /2008/12/18/progress/feed/ HTTP/1.1” 200 1973 “-” “Mozilla/4.0 (compatible;)”

My site is apparently being indexed by computers on a government-run network, but the question is exactly what is indexing it? Is this some sort of proxy technology that government gateways are now using, sampling websites that government users are viewing to ensure that these websites don’t have questionable content? Or, is this a botnet of compromised government computers as I recently suggested? Or (tinfoil hats, please), is this a secret spidering project run by a three-letter agency that uses the gateways of various government departments as cover?

The bottom line is these hits are inconsistent with a human browser. Beyond that I’m not sure what to make of them.

U.S. Government networks thoroughly penetrated

I saw this in my webserver logs today, from the U.S. Nuclear Regulatory Agency. Clearly it’s a botnet bot.

148.184.174.62 – – [13/Oct/2009:12:25:44 -0400] “GET /wp-content/themes/mtdotnet
/images/kubrickfooter.jpg HTTP/1.1” 200 2443 “http://www.markturner.net/2009/10/01/michael-jordans-net-worth/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)”
148.184.174.62 – – [13/Oct/2009:12:25:44 -0400] “GET /2009/10/02/oculan-in-the-news/feed/ HTTP/1.1” 200 797 “-” “Mozilla/4.0 (compatible;)”
148.184.174.62 – – [13/Oct/2009:12:25:44 -0400] “GET /2009/10/02/u2-yesterday-and-today/ HTTP/1.1” 200 6617 “-” “Mozilla/4.0 (compatible;)”
148.184.174.62 – – [13/Oct/2009:12:25:44 -0400] “GET /2009/09/30/juggling-breakthrough/feed/ HTTP/1.1” 200 2083 “-” “Mozilla/4.0 (compatible;)”
148.184.174.62 – – [13/Oct/2009:12:25:44 -0400] “GET /2009/09/30/netflixs-plan-to-take-over-the-world/ HTTP/1.1” 200 6419 “-” “Mozilla/4.0 (compatible;)”
148.184.174.62 – – [13/Oct/2009:12:25:45 -0400] “GET /2009/10/02/u2-yesterday-and-today/feed/ HTTP/1.1” 200 1375 “-” “Mozilla/4.0 (compatible;)”
148.184.174.62 – – [13/Oct/2009:12:25:45 -0400] “GET /2003/07/27/action-packed-weekend/feed/ HTTP/1.1” 200 1260 “-” “Mozilla/4.0 (compatible;)”

Continue reading