PDA

View Full Version : Inktomi Slurp Attack


Don McR
Mon 11th Aug '03, 11:55pm
On August 3, 2003, the vbulletin forum on my site was targeted by the "Slurp" bot from Inktomi. For 36 hours, there were as many as 50 connections from 66.196.72.xx. I knew at the time that it was a search engine spider, but assumed that its activty would be benign. That was until I checked my logs. In that 36 hours, the spider generated 10GB of data transfer. This is over 2/3 of our monthly allotment and means that I will likely have to pay for excess bandwidth. As soon as I discovered this excess traffic, I checked the web and found out how to implement "robots.txt" to exclude bot access to the vbulletin directory. This worked, and sent the bot on its way.

I assumed the bulk of my problems were over, but this was not the case. Just this weekend, access to my site was locked due to exceeding the disk storage allotment from our ISP. An investigation revealed that the Inktomi bot had generated so much activity that the log file of its access was 155MB! I have subsequently deleted this to restore access.

I fully blame Inktomi for this mess and am in the process of seeking restitution. However, I fully realize that my prospects are slim. The reason that I am writing on this forum is to ask if there is something about the structure of vbulletin that compounds this problem. I would also like to know if there is something that can be done in the standard installation so that others to not experience a similar horror story. Obviously, I know that "robots.txt" is a solution. My guess is that few users will even be aware of the need for such a solution until after they experience an attack like I did.

I have had previous forum packages installed on my site and have been visited by "Slurp" before. However, nothing like this has ever happened. For whatever reason, once the bot hits a vbulletin forum, it appears to latch on in an endless loop. I don't know that it ever would have detached on its own.

Regards
Don McRitchie
Webmaster
Lansing Heritage Website
http://www.audioheritage.org

firewire
Tue 12th Aug '03, 3:06am
The server logs should really NOT count for your disk quota, that's a bad business practice for a provider. :(

The best hint is robots.txt. Most search engines and web spiders respect that file, I only know very few who don't, mostly user-controlled mirroring tools. There should be a vB option to limit requests from a single IP per timeframe but I realize this is difficult to achieve.

Don McR
Tue 12th Aug '03, 11:02am
The server logs should really NOT count for your disk quota, that's a bad business practice for a provider. :(

The best hint is robots.txt. Most search engines and web spiders respect that file, I only know very few who don't, mostly user-controlled mirroring tools. There should be a vB option to limit requests from a single IP per timeframe but I realize this is difficult to achieve.

The actual server logs do not count against the disk quota. However, there is a utility that generates detailed reports on site access on a weekly basis. If you don't delete the old reports, they count against the quota. One of the reports lists access to individual directories. The report for the vbulletin directory was 155MB for the one week that included the "Slurp" attack.

I'm still curious why the spider would latch onto vbulletin and not let go. The forum database is 20MB, yet the bot downloaded 10GB just from the vbulletin directory. That is the equivalent of downloading the entire forum 500 times. I can assume that the dynamic nature of forum has something to do with it, but this is ludicrous. Of course Inktomi (Yahoo) will not respond to my emails.

Regards
Don McRitchie
Webmaster
Lansing Heritatge Website
http://www.audioheritage.org

HostDirectory
Fri 22nd Aug '03, 2:36pm
My forums are new and very small - www.hostcompanies.com/forums. However that hasnt stopped the intomi bot from using over 1GB of my bandwidth in the last 21 days!

Joe
Fri 22nd Aug '03, 3:15pm
Block it. Better yet, ad some code to your vB script so bots are not shown session ID's. Apparently Slurp dosent handle session ID's very well...

HostDirectory
Fri 22nd Aug '03, 5:22pm
Block it. Better yet, ad some code to your vB script so bots are not shown session ID's. Apparently Slurp dosent handle session ID's very well...

Do you know where i could get that code from?
Is its just a few lines of code, could you post that here in the thread? I would appreciate it.
I still have lots of bandwidth before i go over, but i don't like so much bandwidth used by a bot, besides if i got it as bad as the poor 10 GB guy, i might go over my bandwidth.