PDA

View Full Version : vBulletin Spiders Directory (for updated spiders_vbulletin.xml files)



Pages : [1] 2

Dream
Sat 26th Jan '08, 3:29am
Heya,

I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.

Hope this helps everyone.

http://spiderlist.codeforgers.com

================================================

If you have questions about how this works read the help

http://spiderlist.codeforgers.com/help.php

================================================

FYI, this is a nice mod made by Paul M. to keep track of your daily spiders visits and show them on forum home:

http://www.vbulletin.org/forum/showthread.php?t=167278

================================================

How to Submit a Spider:

Post on this thread the following info

1. Spider name

2. Spider Ident (aka user-agent) (It's not the IP address! please read here for more info (http://spiderlist.codeforgers.com/help.php))

3. Spider Website (optional)

4. Spider Contact Email (optional)

5. Spider Country (optional)

6. Web page with information about the spider (usually on the About Us on the company website)

Milado
Sat 26th Jan '08, 6:16am
You may add:
- the type of the crawler.
- how many spiders your file does contain.
- when last updated.

Regards

Dream
Sat 26th Jan '08, 9:45am
http://dream.epicfailed.us/ ?

Milado
Sat 26th Jan '08, 12:00pm
http://dream.epicfailed.us/ ?
What do you mean?

Floris
Sat 26th Jan '08, 12:20pm
rofl :)

Means he recognizes his fault and went: EPIC FAIL

Nice job though Dream, I am looking forward to an updated version.

Dream
Sat 26th Jan '08, 3:29pm
Actually I meant no one seems to care about this so far. I thought the first half-hour after I posted this would go:

- dream you rock!
- YAY!!!

But it's fine, I only lost one night's sleep over this, so no biggie.


You may add:
- the type of the crawler.
- how many spiders your file does contain.
- when last updated.

Regards

I added both but the type of the crawler, not sure what you mean.

If you have more suggestions let me know.

Lynne
Sat 26th Jan '08, 3:40pm
I think it's very cool that you did this, Dream. Don't forget that it is the weekend and people may be out and about at the time.

Dream
Sat 26th Jan '08, 5:49pm
Thanks, yes maybe you are right :)

I use the old spiders list (http://www.vbulletin.com/forum/showpost.php?p=565415) on my forums, and I'm adding them to the new list as I see them appear on my forums, just so you know. Please only submit spiders from the old list if you know they still exist.

Milado
Sat 26th Jan '08, 8:29pm
Actually I meant no one seems to care about this so far. I thought the first half-hour after I posted this would go:

- dream you rock!
- YAY!!!

But it's fine, I only lost one night's sleep over this, so no biggie.
Never mind, Thanks are not the only reason behind why we do work.

Dream
Sat 26th Jan '08, 8:44pm
Never mind for the type of the crawler, or never mind as I shouldn't mind people not coming into this thread?

Milado
Sat 26th Jan '08, 8:48pm
the second :)

Dream
Sat 26th Jan '08, 8:49pm
Oh ok :) cheers

Milado
Sat 26th Jan '08, 8:49pm
The types are: blog crawler, RSS crawler or crawler etc.

See the definitions in the XML file in this thread for more information http://www.vbulletin.com/forum/showpost.php?p=565415

DoE
Sat 26th Jan '08, 9:19pm
Great idea, and job!!! I will definitly be downloading and using them, :) Thanks.
It would be nice if this thread was stickied.

I use spiders as random "foes" appearing on my forum. MSN as an Orc, Yahoo as a Goblin, etc. :cool:

Lynne
Sat 26th Jan '08, 9:28pm
I use spiders as random "foes" appearing on my forum. MSN as an Orc, Yahoo as a Goblin, etc. :cool:

DoE
Sat 26th Jan '08, 10:06pm
Well I do, it keeps with the setting, :)

Spiders appear anyways, so I figured I should put them to good use, :cool:

Lynne
Sun 27th Jan '08, 12:36am
Well I do, it keeps with the setting, :)

Spiders appear anyways, so I figured I should put them to good use, :cool:
I just remembered when I first started separating out the guests and spiders count on my forum and people were wondering what spiders were. I'm trying to imagine what they would think if I changed the spider names to "goblins" and "orcs". Sounds like some fun for October though.

Floris
Sun 27th Jan '08, 1:08am
Dream could offer an spider.xml file for the FAQ for admins to import. And instructions on how to change a phrase to link from who's online to the faq explanation. Might be cool for some sites. (just brainstorming)

iardon
Sun 27th Jan '08, 2:21am
Very nice! I'll be using this.


Heya,

I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.

Hope this helps everyone.

http://spiderlist.codeforgers.com

Dream
Sun 27th Jan '08, 2:30am
Cool :) don't forget to submit some spiders if you know of any.


Dream could offer an spider.xml file for the FAQ for admins to import. And instructions on how to change a phrase to link from who's online to the faq explanation. Might be cool for some sites. (just brainstorming)

I might not be the best person to create a FAQ about spiders, I'm just the guy who coded the system. I have to confess I don't know exactly what the "ident" is, if anyone would be so kind to explain.

Also those spider types (rss, search) the old xml has, anyone knows if vBulletin uses that info?

Lynne
Mon 28th Jan '08, 3:58pm
I've been trying to 'watch' my guest ips lately to add some to your list, but I really don't get too many other than yahoo, google, and msnbot (which I already have listed on my site).

Dream
Mon 28th Jan '08, 4:25pm
Yes I can't find a good spider list even in Google.

Jose Amaral Rego
Mon 28th Jan '08, 4:30pm
Do you want all spiders or just the good ones on that list of yours.

Dream
Mon 28th Jan '08, 4:31pm
I'm hoping for all existing ones.

Jose Amaral Rego
Mon 28th Jan '08, 4:52pm
This is pretty much why I do not care for having this renewed, as you think you have good bots and not caring that some bots do harm for example; by using any images you have and then posting it up somewhere eles.

I was thinking more something like this, but it would seem vBulletin is moving away from using or adding an area to show names of banned spiders coming to your board.
http://www.vbulletin.com/forum/showpost.php?p=1496224&postcount=616

Dream
Mon 28th Jan '08, 4:54pm
Of course I'm not aiming to have India website cloning bots Jose...

Dream
Mon 28th Jan '08, 8:22pm
It has CAPTCHA! I'm so proud of myself :P

Floris
Tue 29th Jan '08, 2:54am
Here are all reported spiders so far: http://www.vbulletin.com/forum/showthread.php?t=76662
Feel free to compare it against your list.

Dream
Tue 29th Jan '08, 5:53pm
No one congratulated me for my leet captcha skills :(

Thanks Floris, I only not add all of those because I think most don't exist anymore. I could use some help checking the ones that do exist and registering them.

zappsan
Sat 9th Feb '08, 3:24pm
That's a pretty good idea. I've always been looking for an updated spiders list.


I might not be the best person to create a FAQ about spiders, I'm just the guy who coded the system. I have to confess I don't know exactly what the "ident" is, if anyone would be so kind to explain.

I'm not sure about the ident either. What would I have to put in there? It says user-agent, should I just put the info here which is displayed when I choose to display the user agent?

I am willing to help if I'm sure about what I should put into the fields.

Dream
Sat 9th Feb '08, 3:31pm
The ident or user agent is the ID of the spider, that shows in the who's online when you choose to show the user agent, for example

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

In this case, this is the Google spider with the ident Googlebot.

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

This is the Yahoo! spider with ident Yahoo! Slurp.

AWS
Sat 9th Feb '08, 7:03pm
Here is a list I use.

http://www.botsvsbrowsers.com/category/1/index.html

It's the biggest I've seen so far.

Boofo
Sun 10th Feb '08, 4:09am
Here are all reported spiders so far: http://www.vbulletin.com/forum/showthread.php?t=76662
Feel free to compare it against your list.

That one hasn't been updated for a while though. Is there anything newer?

Dream
Sun 10th Feb '08, 1:00pm
I didn't want to add everything because some spiders don't even exist anymore.

I ask your help to submit the spiders you know exist to the system.

Boofo
Sun 10th Feb '08, 1:09pm
Well, all I know is I get 4 to 5 guests at a time and I know they are spiders as they don't do anything. What spiders they are, I don't know. As far as any that don't exist any more in the lisitng, that won't hurt leaving them in there for now. But someone needs to update the list, even if only a few at a time.

Jose Amaral Rego
Sun 10th Feb '08, 1:24pm
This is a good list, but again you would need to click though or do a internet search to make sure these spiders are still active. I would still like to be able to ban a spider, as some do just take your images.
http://www.user-agents.org/

Boofo
Sun 10th Feb '08, 3:07pm
This is a good list, but again you would need to click though or do a internet search to make sure these spiders are still active. I would still like to be able to ban a spider, as some do just take your images.
http://www.user-agents.org/

Add this to the top of your robots.txt file:


User-Agent: Googlebot-Image
Disallow: /

Boofo
Sun 10th Feb '08, 3:09pm
Here is a list I use.

http://www.botsvsbrowsers.com/category/1/index.html

It's the biggest I've seen so far.

First. is that fairly accurate? And second, wanna share the xml file for it? ;)

Jose Amaral Rego
Sun 10th Feb '08, 3:14pm
Add this to the top of your robots.txt file:

I rather just checkmark a spider instead of the tedious task of typing it out.

I do know about using that method and .htaccess, just rather have it shown in admincp and choose which spider can access what area and prevent any from all site.



First. is that fairly accurate? And second, wanna share the xml file for it? ;)

Some are dead links and so goes for what I have posted...

Dream
Tue 12th Feb '08, 10:56am
Three weeks and not one submission :(

Boofo
Tue 12th Feb '08, 11:23am
Three weeks and not one submission :(

And you probably won't get any until they see an update of some sort released. That seems to get people motivated.

Dream
Tue 12th Feb '08, 11:33am
If people expect I code and host the system AND add 200 spiders the project dies here.

Boofo
Tue 12th Feb '08, 11:50am
If people expect I code and host the system AND add 200 spiders the project dies here.

If Stadler had had that attittude it never would have started in the first place. If it dies, it dies. I was just letting you know what it takes to get people motivated.

Dream
Tue 12th Feb '08, 12:43pm
Well, where's Stadler? I'm not motivated to add 200 spiders alone, and I'm just letting you know. If you think my attitude is wrong so be it.

Boofo
Tue 12th Feb '08, 1:00pm
That is why I am using the latest one he did. It may be old and outdated on a few Spiders but it does recognize a lot of them. That is better than nothing in my book. It isn't like the old days when everyone jumped in for a common purpose. No one said you had to add 200 spiders. All I said was give them an update, even if only with 10 new spiders, to wet their appetite. If you aren't willing to do that, then scrapping it is probably your best bet.

zappsan
Tue 12th Feb '08, 7:01pm
I added some now, hope I filled out everything correctly.
I think you should add some on your own aswell...

Dream
Wed 13th Feb '08, 1:19pm
I added some now, hope I filled out everything correctly.
I think you should add some on your own aswell...
I approved the 4 you sent

I'm adding the ones I find in my forum

Dream
Wed 20th Feb '08, 4:37am
Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.

So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p

Jose Amaral Rego
Wed 20th Feb '08, 4:59am
Thought of making a .htaccess version ( or simular) for Apache, Solaris, BSD, Linux, Windows, Mac. It should not be that hard to make a list of bad bots. :)

Boofo
Wed 20th Feb '08, 5:52am
Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.

So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p

You are the man! Excellent job! ;)

Why is the xml file so much smaller than the other one?

Milado
Wed 20th Feb '08, 7:19am
Ok, I was using the old spiders XML file to find spiders on my forum to add to the system, but I realized I would never remove it for fear of missing a spider. So, I added all spiders from the old XML to the system, and we'll remove the ones that doesn't exist anymore.

So, now we have 400 spiders in the system, and I need a hug after all this work or I'll cry. :p
You deserve a hug. give me a hug.

Regards

Dream
Wed 20th Feb '08, 5:48pm
You are the man! Excellent job! ;)

Why is the xml file so much smaller than the other one?

I noticed that, because it doesn't have the spider type I think.

Boofo
Wed 20th Feb '08, 6:08pm
I noticed that, because it doesn't have the spider type I think.

Shouldn't the type be in there?

Dream
Wed 20th Feb '08, 6:19pm
Well, I decided it shouldn't, for some reason at the start of the project.

Most info on that was wrong I have a hunch too. I think the ones he didn't know were classified searchspiders. I considered later adding spider type, but haven't got to it, and thought having the spider website was enough.

Also I'm not sure vbulletin uses the spider type field.

Boofo
Fri 22nd Feb '08, 6:39am
Here's one for you to add to the list:



Accoona
209.212.73.133
accoona-a133.client.pins.net

Dream
Fri 22nd Feb '08, 6:13pm
Here's one for you to add to the list:

Someone else submitted it, and I approved it.

Boofo
Fri 22nd Feb '08, 6:37pm
Well, I did not know that. How do I know if it is already been submitted? Better to submit it than not, right?

How often do you update the spiders xml for download?

Dream
Fri 22nd Feb '08, 6:52pm
Someone submitted yours I think, I thought it was you honestly.

You just look at the spider list and see if the spider is already there. If not you can submit it.

The list is updated whenever there are new spiders to approve.

Boofo
Fri 22nd Feb '08, 9:49pm
Someone submitted yours I think, I thought it was you honestly.

You just look at the spider list and see if the spider is already there. If not you can submit it.

The list is updated whenever there are new spiders to approve.

Is there a way to send out a notice when it is updated by chance?

And I will keep an eye out for spiders and report them to you as I have some pesky ones show up every now and then.

Dream
Fri 22nd Feb '08, 9:53pm
I thought about making a mailing, I'll do it eventually. Just subscribe this thread for now.

Boofo
Fri 22nd Feb '08, 10:05pm
I already am subscribed. ;)

Let me know if I can help in any way.

zappsan
Sun 24th Feb '08, 12:39am
I added some new ones which I came across today.
I also resubmitted one, it had a different ident than the first time I've seen it (I've explained it in the notes field).

Dream
Sun 24th Feb '08, 1:06am
Thanks zappsan :)

I think "woriobot" catches both "woriobot heritrix" and "woriobot", so I removed the heritrix one. I'm not 100% sure of that though, so if you see it again please let me know.

Boofo
Mon 25th Feb '08, 12:00pm
This accoona-a133.client.pins.net is showing up as a Guest. It was added in your last update as Accoona which I am using. It is not showing as a Spider. The IP address is 209.212.73.133. Maybe taking that extra code as you call it out of the old spiders xml might not have been such a good idea? I don't have my setup to resolve IPs addresses.

Dream
Mon 25th Feb '08, 12:48pm
Did you submit the user agent? The "ident" field is not the IP, it's the user agent. You can get that on the who's online page.

Boofo
Mon 25th Feb '08, 1:14pm
This is what I submitted and you said it had already been submitted. It is in the current xml file.

accoona-a133.client.pins.net

but it shows up as a guest, not a spider.

Dream
Mon 25th Feb '08, 1:18pm
Sorry I didn't notice that was the IP.

There's one Acconna in the xml, but I never said it was duplicated, did I? Anyway, when you see it as a guest in your forums again, paste the User Agent for me ok? So I can fix it.

edit: oh yes, someone else submitted it for you actually.

Redseal
Mon 25th Feb '08, 11:14pm
I have no idea what this is for. Forgive my ignorance, but what does this do for a forum? Does it just help vbulletin list the correct bot that is sucking content from the forums?

Dream
Mon 25th Feb '08, 11:49pm
yep

Redseal
Tue 26th Feb '08, 12:42am
yep
Cool, I had no idea there were this many bots out there. Awesome, thanks for making it. So do you just import it somehow into vbulletin?

Dream
Tue 26th Feb '08, 12:44am
No you just upload it to includes/xml/.

Just don't forget to not overwrite it when you upgrade your vb.

Redseal
Tue 26th Feb '08, 9:50am
sweet thanks!

Boofo
Tue 26th Feb '08, 1:03pm
Sorry I didn't notice that was the IP.

There's one Acconna in the xml, but I never said it was duplicated, did I? Anyway, when you see it as a guest in your forums again, paste the User Agent for me ok? So I can fix it.

edit: oh yes, someone else submitted it for you actually.

And nobody said you said any such thing. What is that all about?

Here is another one for you, but this time only the IP showed and it was showing as a Guest. It is explained in the quote box.



omgilibot

http://www.omgili.com/Crawler.html

Here is the IP that shows:
194.90.190.48

I got the Ident info from do a var_dump for something else. The IP is all that showed for this and it was showing as a guest. The IP did NOT resolve to anything other than itself. The ident came from the var_dump.


In these next ones, the first one shows up fine. The next three show up as guests. And your accoona-a133.client.pins.net is still showing up as a guest.



livebot-65-55-209-98.search.live.com - MSNBot Spider
livebot-65-55-165-117.search.live.com - http://search.msn.com/msnbot.htm
livebot-65-55-165-52.search.live.com - http://search.msn.com/msnbot.htm
livebot-65-55-165-42.search.live.com - http://search.msn.com/msnbot.htm


I think I'm going back to Stadler's version as it caught a lot more spiders than this version does. I'll just add them to that as I find them. Good luck!

Dream
Tue 26th Feb '08, 5:01pm
No problem. As I said, I need the User Agent of the spider, not the IP. This is a sample user agent:

Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12

I don't know how the old xml can get more spiders, as this one has all spiders the old one has. But it's your call and you may be right, if you find out why please tell me.

Also, the Omgili spider was added and the file updated.

Boofo
Tue 26th Feb '08, 5:33pm
No problem. As I said, I need the User Agent of the spider, not the IP. This is a sample user agent:

Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12

I don't know how the old xml can get more spiders, as this one has all spiders the old one has. But it's your call and you may be right, if you find out why please tell me.

Also, the Omgili spider was added and the file updated.

Why does the Accoona one keep showing up as a guest now?

I'll, try your file again. I owe you that much.

How do I go about getting the user agent stuff? When I resolve it it doesn't come up with that stuff.

Dream
Tue 26th Feb '08, 6:00pm
You don't owe me anything, use whatever you like ;)

To get the User Agent, in the Who's Online page there's an option at the bottom "show user-agent: yes / no". It's that simple.

Boofo
Tue 26th Feb '08, 6:04pm
I forgot about that on the Who's Online page. I never use it. ;)

Dream
Tue 26th Feb '08, 11:51pm
I never used it before doing this project either.

Boofo
Wed 27th Feb '08, 1:52am
Here's another one for you. This one came in as a Guest.



Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Dealio Toolbar 3.1 Firef

Boofo
Wed 27th Feb '08, 11:27am
I just submitted the ident string for thre Accoona spider that keeps showing up as a guest.

Boofo
Wed 27th Feb '08, 12:27pm
Another one:



Deepnet Explorer

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)

Dream
Wed 27th Feb '08, 8:17pm
Here's another one for you. This one came in as a Guest.

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Dealio Toolbar 3.1 Firef

this one doesn't look like a spider

Boofo
Wed 27th Feb '08, 8:25pm
this one doesn't look like a spider

I wasn't really sure on that one but I thought it better to report it than take the chance it might be a spider.

Boofo
Thu 28th Feb '08, 7:39am
2 more guests and one of them is the Accoona again. Accoona keeps showing up as a guest.



server1932015481.serverpool.info

Mozilla/5.0 (compatible; http://www.whoisde.de/2.1; +http://www.whoisde.de)




accoona-a133.client.pins.net

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1

Boofo
Thu 28th Feb '08, 2:46pm
Another one.



Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; BCD2000; SV1; FunWebProducts)

rob30UK
Fri 29th Feb '08, 8:27am
Another one.

That last one boofo is just some dodgy IE plugin abit like mywebsearch or some other nasty. Definitly not a spider ;)

Boofo
Fri 29th Feb '08, 9:02am
That last one boofo is just some dodgy IE plugin abit like mywebsearch or some other nasty. Definitly not a spider ;)

I checked out their site and it looked like a spider to me. You may be right, I don't know. Better to be safe than sorry in reporting it. ;)

Is anyone still updating the list?

Dream
Fri 29th Feb '08, 5:25pm
Sorry I was really busy this week.

I updated the list, but deleted the toolbars and plugins as they aren't spiders.

Boofo, the accoona user agents you are submitting only contain common stuff that any user agent can have. For vbulletin to be able to detect a spider the Ident must have just the string that is unique to the spider. Like this:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)

The detection ident is

Deepnet Explorer

Gaf_8008
Fri 29th Feb '08, 8:23pm
thanks

Boofo
Fri 29th Feb '08, 9:49pm
Sorry I was really busy this week.

I updated the list, but deleted the toolbars and plugins as they aren't spiders.

Boofo, the accoona user agents you are submitting only contain common stuff that any user agent can have. For vbulletin to be able to detect a spider the Ident must have just the string that is unique to the spider. Like this:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer 1.5.0; .NET CLR 1.0.3705)

The detection ident is

Deepnet Explorer

Sorry, but the Accoona one I gave you the INDENT string for straight from the Who's Online. Maybe that's how it slips by the robots.txt file.

Dream
Fri 29th Feb '08, 10:43pm
Ok, but that ident has no unique string from Acconna. vBulletin won't be able to detect it with that.

Boofo
Fri 29th Feb '08, 11:07pm
There's no way to detect it by IP address?

Dream
Fri 29th Feb '08, 11:57pm
Not that I know of, sorry. Stadler's XML had IP entries, but I'm not sure if vBulletin can detect spiders by IP.

To be honest, I'm curious now and I'm gonna ask it in the support forums. If vBulletin does detect spiders by IP, I'll add spiders IPs to the system.

Boofo
Mon 3rd Mar '08, 11:27pm
That last spider I submitted IS a spider. I checked out their web site. No sense in submitting them if they are going to get ignored.

Dream
Mon 3rd Mar '08, 11:29pm
:rolleyes:

The ident couldn't be used...

Boofo
Mon 3rd Mar '08, 11:36pm
Before you roll your eyes in that sarcastic tone, maybe you ought to check this out and learn how to use an IDENT properly.

http://www.botsvsbrowsers.com/details/46560/index.html

T312461 is what you need.

Dream
Tue 4th Mar '08, 12:05am
Well then tell me so in the note, or you expect I discover "T312461" is an ident? Why didn't you submit only "T312461"?

If you don't like my work, then use something else. I don't have to put up with demands and angryness as I'm not being paid to do this. And you talk about my attitude.

Boofo
Tue 4th Mar '08, 12:25am
You told me to submit the IDENT. That is what I did. All I did to find out it was a legit spider was to do a Google search for T312461. Aren't you even doing that?

I don't like your work because you don't bother to check them out. Or if you are, then you need to find another source. You whined because no one was submitting spider info. I start submitting them and you reject them. What sense does that make? Just because you are doing a spider list, doesn't make you an expert or any better than the rest of us. Come down off of your high horse and join the rest of the world.

Dream
Tue 4th Mar '08, 12:28am
I won't bother replying that. Re-submit your spider and I'll add it.

Dream
Tue 4th Mar '08, 12:54am
the spider was added

if you submit more, just put in the ident field the string that is unique to the spider, no need for those "Mozilla 4.0" etc, and I'm not sure but I think that can change. I'm gonna do a FAQ now and add this to it.

Boofo
Tue 4th Mar '08, 1:03am
The bad thing is I just caught another spider posing as a Guest. The IDENT string was normal but after following the link and resolve it turned out to be a spider site. I posted it in the thread you were asking about the spider IP. We have got to figure out a way to get the ones who bypass the IDENT string catch.

With the type of site I have with such low traffic, it is easier to catch them on there than it is on a much busier site. Most of them have no business there as to what the site is about.

Dream
Tue 4th Mar '08, 1:15am
You sure it's a spider? Could be someone on the offices where the IP is used I think. Say an employee from accoona is surfing. Maybe unlikely though, depending on what your site is.

Boofo
Tue 4th Mar '08, 1:28am
Trust me, it's a spider. Check out my site URL and tell me what you think then.

http://www.fathers-rights-forums.com/forums/

Then you and I will form a plan to tackle this dilemma. ;)

Dream
Tue 4th Mar '08, 1:31am
Yeah I agree with you, it's most likely a spider, depending on the occurrence and URL it's visiting.

There's not much I can do but to whine for them to add IP recognition in the suggestions forum though.

rob30UK
Tue 4th Mar '08, 5:00am
That last spider I submitted IS a spider. I checked out their web site. No sense in submitting them if they are going to get ignored.

Boofo, I beg to differ greatly. This is the scumware that is mywebsearch IE toolbars and their many variants. Why else would they identify in many different IE versions?

I would love to see the source for your information...

Here are my sources:-
http://www.webmasterworld.com/forum39/1510.htm
http://www.seroundtable.com/archives/001430.html

There are even lots of pages across the net detailing how to get rid of said 'scumware'. Here is just one of them:-
http://www.liamdelahunty.com/tips/fun_web_products.php

Boofo
Tue 4th Mar '08, 5:11am
The spider I was referring to in the post you quoted was just added by Dream. ;)

Boofo
Tue 4th Mar '08, 5:15am
Yeah I agree with you, it's most likely a spider, depending on the occurrence and URL it's visiting.

There's not much I can do but to whine for them to add IP recognition in the suggestions forum though.

There has to be a way around this even if it means us coming up with a hack to do it. Are you game?

rob30UK
Tue 4th Mar '08, 12:22pm
The spider I was referring to in the post you quoted was just added by Dream. ;)

I really dont care if Dream added it or not.... it's not a spider, period.

If he choses to add it, or not add it - despite the glaring evidence it is NOT a spider, then thats totally up to him.

A massive part of my job involves checking log files and I stand to be corrected only if you can post me url's proving this to be a spider.

As far as I am concerned, I've proved it isnt - if you wish to call black white, then please post the proof.

Dream
Tue 4th Mar '08, 2:32pm
There has to be a way around this even if it means us coming up with a hack to do it. Are you game?

Sorry this goes beyond my interest in the problem, but if you get someone to do it I'll add IPs to the system.

Whatever you guys decide is good by me, let me know if I have to remove the spider.

Boofo
Tue 4th Mar '08, 3:16pm
I really dont care if Dream added it or not.... it's not a spider, period.

If he choses to add it, or not add it - despite the glaring evidence it is NOT a spider, then thats totally up to him.

A massive part of my job involves checking log files and I stand to be corrected only if you can post me url's proving this to be a spider.

As far as I am concerned, I've proved it isnt - if you wish to call black white, then please post the proof.

You ought to learn to read closer. The last spider I submited was not the FunWebProducts that you are referring to. That issue is done. The spider I posted, I posted the link for the proof.

rob30UK
Tue 4th Mar '08, 4:30pm
You ought to learn to read closer. The last spider I submited was not the FunWebProducts that you are referring to. That issue is done. The spider I posted, I posted the link for the proof.

Ok, I apologise.... I must have mis-interpreted the conversation.

Here are some regular bots on my boards:-

85.225.137.240
Mozilla/4.0 (BejiBot Crawler 1.2a)

88.131.106.7
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)

82.80.252.110
BoardTracker (http://www.boardtracker.com/spider.html) (Mozilla/4.0 compatible; MSIE 6.0; Linux Cent

61.247.217.36
Yeti/0.01 (nhn/1noon, yetibot@naver.com, check robots.txt daily and follow it)

209.11.177.198
Mozilla/4.0 (compatible; BOTW Spider; +http://botw.org)

142.166.3.122
R6_CommentReader(www.radian6.com/crawler)

Boofo
Tue 4th Mar '08, 5:17pm
Sorry this goes beyond my interest in the problem, but if you get someone to do it I'll add IPs to the system.

No problem, I'll take care of it on my own.


Here are some regular bots on my boards:-


Did you already add these to the spiders list then?

Dream
Tue 4th Mar '08, 8:04pm
Ok, I apologise.... I must have mis-interpreted the conversation.

Here are some regular bots on my boards:-

85.225.137.240
Mozilla/4.0 (BejiBot Crawler 1.2a)

88.131.106.7
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)

82.80.252.110
BoardTracker (http://www.boardtracker.com/spider.html) (Mozilla/4.0 compatible; MSIE 6.0; Linux Cent

61.247.217.36
Yeti/0.01 (nhn/1noon, yetibot@naver.com, check robots.txt daily and follow it)

209.11.177.198
Mozilla/4.0 (compatible; BOTW Spider; +http://botw.org)

142.166.3.122
R6_CommentReader(www.radian6.com/crawler (http://www.radian6.com/crawler))

thanks

added BejiBot, Bot W, Radian6 Comment Reader, Radian6 FeedFetcher and Yeti

the others were already there

Dream
Tue 4th Mar '08, 8:47pm
Wow just updated my xml and caught 10 Yeti spiders already. I think they are from www.naver.com, but I can't be sure as the site is in chinese.

Joe Gronlund
Tue 4th Mar '08, 8:50pm
This thread seems to be picking up a couple IM,ing feeds.. :)

Dream
Tue 4th Mar '08, 8:56pm
Cool. Sorry, IM as in instant messenger feeds?

Boofo
Wed 5th Mar '08, 1:05am
Here you go.


WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru

Dream
Wed 5th Mar '08, 2:25am
updated

Boofo
Fri 7th Mar '08, 2:44pm
Another one.


EnaBot/1.2 (http://www.enaball.com/crawler.html)

Dream
Fri 7th Mar '08, 3:20pm
Ok, added EnaBall spider.

ShadyNight
Fri 7th Mar '08, 6:49pm
Submitted dragonfly

ebingbong#playstarmusic.com (though they say the # is an @?)
http://www.ebingbong.com/help/ourRobot.php

Dream
Sat 8th Mar '08, 12:11am
Thanks, added as eBingBong.

ShawnV
Mon 10th Mar '08, 4:14pm
Very needed, thank you.

_V

ShadyNight
Fri 14th Mar '08, 1:07am
New one not picked up.

*submitting*

c0c.entireweb.com
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)

Dream
Fri 14th Mar '08, 2:35am
Ok I added it, wasn't it on there already though? It was there with a different ident, it wasn't being picked up?

boro_boy
Fri 14th Mar '08, 6:37am
Heya,

I made a system where people can submit spiders and download updated spiders_vbulletin.xml files for their forum. After you submit a spider I must approve it for it to be included in the list.

Hope this helps everyone.

http://spiderlist.codeforgers.com

thank you great work mate :D

Lynne
Fri 14th Mar '08, 5:20pm
I installed your list of spiders about a week or so ago and now I see all sorts of the things that I had never heard about before! "Long, thin, slimy ones; Short, fat, juicy ones, Itsy, bitsy, fuzzy wuzzy spiders." Ok, ok, that is supposed to be regarding worms, but I thought it was appropriate here. :)

Dream
Sat 15th Mar '08, 6:00pm
Great to hear that :)

Boofo
Tue 18th Mar '08, 1:15am
Another one:



Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)

Dream
Tue 18th Mar '08, 2:18am
Thanks, added.

Boofo
Thu 20th Mar '08, 10:17am
This just came in as a guest:


194.90.190.48
omgilibot/0.3 +http://www.omgili.com/Crawler.html

Dream
Thu 20th Mar '08, 7:56pm
Thanks, that bot was in the list, I just updated the Ident.

Boofo
Thu 20th Mar '08, 11:21pm
I thought it was already in there but it came in as a guest so I thought I should report it.

Dream
Fri 21st Mar '08, 3:39am
I changed the Ident from http://www.omgili.com/Crawler.html to omgilibot, tell me if it still doesn't work.

Boofo
Fri 21st Mar '08, 5:09am
Will do.

Here is another one:



64.13.138.6 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=64.13.138.6)
Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)

Dream
Fri 21st Mar '08, 7:20am
Thanks, added.

Boofo
Fri 21st Mar '08, 7:32am
It's hard to believe that the spiders only like my site. ;)

Dream
Fri 21st Mar '08, 7:40am
Actually, the ones you are submitting appear on my site too :)

But yeah, I'm a very lazy guy :p

Dream
Fri 21st Mar '08, 11:42am
Added GurujiBot.

Lynne
Fri 21st Mar '08, 1:13pm
It's hard to believe that the spiders only like my site. ;)
I don't know how you notice them all!

Boofo
Fri 21st Mar '08, 1:48pm
I think it's because I have no real life except for catching spiders.

Boofo
Sat 22nd Mar '08, 9:16am
Here's a strange one:


38.104.58.118 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=38.104.58.118)
panscient.com
(http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=38.104.58.118)
The panscient.com was the full User Agent string, believe it or not.

1QuickSI
Sat 22nd Mar '08, 11:28am
Google Spider
Searching Forums
User: Beerman1 (http://www.monstermayhem.org/forums/member.php?u=79)

Yahoo! Slurp Spider
Viewing Who Posted
sweet find (http://www.monstermayhem.org/forums/showthread.php?t=4065)



Why do Google spiders show up with a physical user of the site? The others only show the spider and what thread it is picking up which is what I assume is the way it is supposed to work.

Boofo
Sat 22nd Mar '08, 11:31am
I don't understand what you mean.

1QuickSI
Sat 22nd Mar '08, 12:38pm
See my cut and past at the top of my post. Google Bot shows as user "Beerman" Was wondering why it lists as a user and not like the Yahoo Bot that is just viewing.

Dream
Sun 23rd Mar '08, 1:52pm
Added spider Panscient, thanks.


Google Spider
Searching Forums
User: Beerman1 (http://www.monstermayhem.org/forums/member.php?u=79)

Yahoo! Slurp Spider
Viewing Who Posted
sweet find (http://www.monstermayhem.org/forums/showthread.php?t=4065)



Why do Google spiders show up with a physical user of the site? The others only show the spider and what thread it is picking up which is what I assume is the way it is supposed to work.


See my cut and past at the top of my post. Google Bot shows as user "Beerman" Was wondering why it lists as a user and not like the Yahoo Bot that is just viewing.

In my forums it shows as Google Spider. Are you sure the user name isn't on the field "activity"? As in, searching for posts of user XX?

DiverTree
Mon 24th Mar '08, 12:10am
not sure of the email or website, but i submitted this ...


livebot-65-55-165-84.search.live.com
livebot-65-55-165-114.search.live.com

65.55.165.84 (http://www.gangroomforum.com/online.php?do=resolveip&ipaddress=65.55.165.84) & 65.55.165.114 (http://www.gangroomforum.com/online.php?do=resolveip&ipaddress=65.55.165.84)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)is this what your looking for? thanks for all the work you do with the spider list. its greatly appreciated. :)

Dream
Mon 24th Mar '08, 12:21am
Thanks :), I added it.

1QuickSI
Mon 24th Mar '08, 11:50am
Added spider Panscient, thanks.

In my forums it shows as Google Spider. Are you sure the user name isn't on the field "activity"? As in, searching for posts of user XX?

This was a simple cut and past from the Whos Online list.

Crow
Mon 24th Mar '08, 5:10pm
thank you dream for the latest lists. :)

unitedpunjab
Tue 25th Mar '08, 9:14am
Added GurujiBot.
Thanks :)

A little question about livebot spider.


<spider ident=".NET CLR 1.1.4322">
<name>Livebot</name>
</spider>
Isn;t this is the computer with .NET Framework installed ?

Dream
Tue 25th Mar '08, 9:15am
Maybe, I don't know for sure.

You are right:

http://www.webmasterworld.com/forum11/2715.htm


.net clr 1.1.4322 will be present in any IE (90%+ of the market) that is on a machine with .net FrameWork installed. I think this framework became part of Service Pack 2, so its very widespread -- you will ban innocent people.

I removed Livebot from the list.

Milado
Wed 26th Mar '08, 12:09am
http://www.cuill.com/twiceler/robot.html

Dream
Wed 26th Mar '08, 7:56pm
Twiceler is already on the list.

Boofo
Mon 31st Mar '08, 3:36pm
Got another one that just hit the site along with 25 yahoo spiders.



69.90.42.67 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=69.90.42.67)
Mozilla/5.0 (compatible; OWPBot/0.3; http://www.openwhitepages.com/)

Dream
Tue 1st Apr '08, 1:44pm
Thanks, added.

ShadyNight
Tue 1st Apr '08, 9:59pm
Not sure if this has been reported ....

crawl2.nat.svl.searchme.com
Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/)

Dream
Tue 1st Apr '08, 10:07pm
Nope, I added it, thanks.

rolfw
Wed 2nd Apr '08, 7:14am
Just submitted this one

Charlotte
Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/) (2549df004ae664faef17dce174913cea

http://www.searchme.com/support/pages/spider.php


info@searchme.com

SuperDave71
Wed 2nd Apr '08, 4:19pm
Here are two that are HAMMERING MY SITE.:


c138.cyan.fastwebserver.de <--This one the prefix changes color ( purple.fastwebserver.de..etc etc)

and


ns.km31707.keymachine.de


I have no idea what they are but I have blocked the IP addesses and they STILL keep hammering away.


HELP!!



-Dave

Dream
Wed 2nd Apr '08, 7:18pm
Just submitted this one

Charlotte
Mozilla/5.0 (compatible; Charlotte/1.0b; http://www.searchme.com/support/) (2549df004ae664faef17dce174913cea

http://www.searchme.com/support/pages/spider.php


info@searchme.com
This was submitted a few posts back.

rolfw
Wed 2nd Apr '08, 8:20pm
I do apologise, I use the spiders.xml to tell my guests who have visited plugin (http://www.vbulletin.org/forum/showthread.php?t=131314) which visitors are spiders and as this one doesn't identify as a spider, I assumed that it wasn't in the latest list I downloaded.

Dream
Wed 2nd Apr '08, 9:53pm
Yes I use that too. Were you using the latest XML?

Nathan1977
Thu 3rd Apr '08, 4:13am
Thank you Dream

I totally forgot about the spider list :D

This is really good

Thanks again ;)

rolfw
Thu 3rd Apr '08, 6:09am
Yes I use that too. Were you using the latest XML?

Downloaded again last night to double check and the searchme spider still shows as a regular guest. :) Also upgraded to the latest release of product to make sure that it wasn't at fault, but made no difference.

This of course has not too much to do with this thread and I would like to say how much I appreciate you keeping the spider list updated. :)

Boofo
Thu 3rd Apr '08, 7:09am
Acoona does the same thing. They have figured out how to squeeze by. ;)

DiverTree
Sat 5th Apr '08, 11:10pm
the livebot spider seems to be including anonomouse proxies. i confirmed this by visiting my site through a webpage that offers an anonomouse browsing service and sure enough, it idintified me as a livebot spider. im also afraid that the
ns.km31707.keymachine.de spider that was submitted a few posts back, is actually a spambot.

DiverTree
Sat 5th Apr '08, 11:11pm
Maybe, I don't know for sure.

You are right:

http://www.webmasterworld.com/forum11/2715.htm



I removed Livebot from the list.oops ... nevermind about the last post. :D

ShadyNight
Sun 6th Apr '08, 3:57pm
Another one not showing up ...

38.98.19.67
Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9

Dream
Sun 6th Apr '08, 4:50pm
Thanks, added.

globalinsites
Tue 8th Apr '08, 6:31am
I am not familair at all with spiderlists so please forgive my newbee questions; is this spiderlist suitable for all types of sites?

I suppose what it does is get your site spidered by the spiders that are listed in the .xml file? Wouldn't these spiders normally find your sites by themselves?

DiverTree
Tue 8th Apr '08, 1:45pm
I am not familair at all with spiderlists so please forgive my newbee questions; is this spiderlist suitable for all types of sites?

I suppose what it does is get your site spidered by the spiders that are listed in the .xml file? Wouldn't these spiders normally find your sites by themselves?cant help you with the first question, but to answer your second one ...
it doesnt help get your site spidered, it simply identifies the spiders/bots that are comming to your site. :)

Dream
Tue 8th Apr '08, 8:18pm
A second Yeti bot was added.

DiverTree
Tue 8th Apr '08, 8:24pm
A second Yeti bot was added.you are the spider/bot God :cool:

Dream
Tue 8th Apr '08, 9:09pm
I'm just the spider list janitor :P

buro9
Mon 14th Apr '08, 6:22pm
Just added these two:
Yodao Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )
Mac OS X RSS Apple-PubSub/59

Dream
Mon 14th Apr '08, 9:23pm
Thanks, approved both.

buro9
Tue 15th Apr '08, 6:44am
Just added three:
Windows RSS Windows-RSS-Platform/1.0
Windows RSS Windows-RSS-Platform/2.0
Mozilla/4.0 (vBSEO; http://www.vbseo.com)

I've taken the windows platform off of the Windows RSS ident string as that would unnecessarily bloat the number of idents needed and I'm not sure anyone would care whether it's XP or Vista that the RSS requests are coming from.

buro9
Tue 15th Apr '08, 6:48am
And added this one too:
NetNewsWire/3.1b4 (Mac OS X; Lite; http://www.newsgator.com/Individuals/NetNewsWire/)

Dream
Tue 15th Apr '08, 7:41pm
Thanks buro9, your submissions were added and are greatly appreciated.

Stubbed
Tue 15th Apr '08, 8:33pm
Great idea Dream, I'll be using this for other projects rather than just vB :)

buro9
Wed 16th Apr '08, 6:35am
Added another:
TinEye/1.1 (http://tineye.com/crawler.html)

buro9
Wed 16th Apr '08, 6:38am
Just to let you know why I like this project so much:
I show adsense adverts to guests only.
I have the hack that shows spiders separately on the home page.
By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.

I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.

Boofo
Wed 16th Apr '08, 6:50am
Just to let you know why I like this project so much:
I show adsense adverts to guests only.
I have the hack that shows spiders separately on the home page.
By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.

I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.

I'd be interested in seeing what you come up with for that hack. If you need any help, let me know. And good to see you again, sir. ;)

Dream
Wed 16th Apr '08, 11:51am
Just to let you know why I like this project so much:
I show adsense adverts to guests only.
I have the hack that shows spiders separately on the home page.
By having the spiders accurately identified, I can get a very quick glance from the home page of the size of the audience viewing adverts at that moment.

I'm thinking of creating a vBulletin hack to add a scheduled task to fetch the spider list weekly to ensure it's never too far out of date. Not daily as it doesn't change that much and the traffic to your server might be excessive if it ended up a popular hack.
I would think Google has a list of spiders of their own, and don't show adverts to those spiders on googlesyndication.com, but I'm not sure.

Joey805
Wed 16th Apr '08, 4:44pm
Could someone please provide me with info on how to gather the data needed to submit a spider? I am using the most up to date xml file and when I view guests on my forum, I see a list of ip addresses. One I click on some of them, the resolve to names with the word "spider" in them.

With that said, I'm assuming they are not listed within the XML file I am using and they are a bot.

How do I gather enough info a spider to submit it?



Thanks guys

Dream
Wed 16th Apr '08, 6:30pm
I think the help in the spider site has info on that.

Dream
Wed 16th Apr '08, 7:43pm
Spider TinEye added.

Dream
Thu 17th Apr '08, 9:54pm
Added spider Najdi.si.

The_Gun_Man
Sat 19th Apr '08, 12:58pm
This is excellent now I can see the acual names instead of a forum packed full of hungry guests.

Also: I submitted a spider he's called "Visions"

Dream
Sat 19th Apr '08, 7:13pm
Thanks, I added it to the list.

buro9
Sun 20th Apr '08, 2:48pm
Another: MLBot (www.metadatalabs.com)

Dream
Sun 20th Apr '08, 5:51pm
Thanks, added.

Dream
Mon 21st Apr '08, 6:28pm
Added spider Internet for learning.

rolfw
Mon 21st Apr '08, 7:11pm
Added spider Internet for learning.

Ah good, wasn't sure whether that was enough information. :) New to this spider identification.

Boofo
Mon 28th Apr '08, 5:29am
Here's another one that showed up as a Guest:


194.90.190.48 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=194.90.190.48)
omgilibot/0.3 +http://www.omgili.com/Crawler.html

Dream
Mon 28th Apr '08, 6:37pm
That's weird, omgili is already on the list.

Boofo
Mon 28th Apr '08, 6:44pm
Exactly. I have no idea why it showed up as a Guest, but it did, sir. ;).

Dream
Mon 28th Apr '08, 6:55pm
Ok, I added it exactly how you submitted it.

Boofo
Mon 28th Apr '08, 7:53pm
Was it different than what was already in there? I didn't look.

Dream
Mon 28th Apr '08, 8:10pm
The other one ident's is only 'omgilibot'.

Gabrielt
Mon 12th May '08, 5:12pm
Excellent, that was exactly what I was looking for.

By the way. It is recommended to install this mod:

http://www.vbulletin.org/forum/showthread.php?t=152321

To remove spiders from the "Currently Active Users" list.

Cheers,
Gabriel.

Gabrielt
Mon 12th May '08, 5:53pm
Hi,

I'd like to add that a few spiders/bots are not being removed using your list together with the mod I described above. See the attached picture.

Maybe Yeti isn't being removed because it identifies itself as Yeti/1.0, see screenshot.

Cheers,
Gabriel.

Dream
Mon 12th May '08, 6:18pm
Yeti/1.0 added.

Dream
Mon 19th May '08, 7:17pm
Added

Internet Research Institute UK
Scrubby

Dream
Wed 21st May '08, 11:01pm
Added SiteVibeBot

Can anyone confirm this one gets detected? I'm trying to understand vB's regex for the IDENT string (because if I ask in the How To forum no one will know).

DiverTree
Wed 21st May '08, 11:12pm
if i see it, ill let you know. :)

Dream
Sat 24th May '08, 8:55pm
1,000 downloads :)

Boofo
Sat 24th May '08, 9:15pm
Congratulations! Although there are many repeat ones, I'm sure. ;)

rolfw
Sun 25th May '08, 7:34am
Removed, problem solved.

Alfa1
Mon 2nd Jun '08, 9:20am
Many thanks Dream!

Dream
Mon 2nd Jun '08, 11:36am
You are welcome Alfa1 :)

Dream
Mon 2nd Jun '08, 7:55pm
Added

Begun Robot Crawler
Mail.Ru

the scorpion
Thu 5th Jun '08, 9:06pm
This is sweet. Thanks for the hard work, man.

Boofo
Thu 19th Jun '08, 2:54pm
Here's another one:

93.103.33.238 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=93.103.33.238)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19

And here is a link talking about what it is:

http://www.webhostingtalk.com/showthread.php?t=660662

Dream
Thu 19th Jun '08, 2:57pm
Ok, added

Boofo
Thu 19th Jun '08, 3:06pm
That was quick. And you're welcome. ;)

Dream
Sun 22nd Jun '08, 9:41pm
added soso

ALcorn
Sun 22nd Jun '08, 10:15pm
Here's another one:

93.103.33.238 (http://www.fathers-rights-forums.com/forums/online.php?do=resolveip&ipaddress=93.103.33.238)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19

And here is a link talking about what it is:

http://www.webhostingtalk.com/showthread.php?t=660662It's wrong to include this useragent in the spiders list because this is the normal user with installed mail.ru Agent application (something like Google Talk). Many people from russian speaking country install mail.ru agent. Please remove it.

Boofo
Sun 22nd Jun '08, 10:52pm
Did you bother to check the link and this? MRSPUTNIK

That is no regular user.

ALcorn
Sun 22nd Jun '08, 11:04pm
Did you bother to check the link and this? MRSPUTNIK

That is no regular user.Yes, I did. If MRSPUTNIK is included in the spiders list we will get wrong results, because it's not really a bot, but a regular user just browsing the forum with mail.ru Agent (and some other products from mail.ru) installed.

Boofo
Sun 22nd Jun '08, 11:19pm
So, I had 12 other regular people using mail.ru that same day on at the same time on my little 65 member site? Not likely.

Dream
Sun 22nd Jun '08, 11:25pm
is mail.ru a spam site or it's "ok"?

ALcorn
Sun 22nd Jun '08, 11:25pm
So, I had 12 other regular people using mail.ru that same day on at the same time on my little 65 member site? Not likely.Boofo, I don't know why there was 12 regular people browsing your little site, but I know for sure that this useragent string belongs to regular users. If you bother to check user-agents.org for MRA and MRSPUTNIK now, you will see that both was dropped from the listing.

Update: user-agents.org still report MRA, but as regular browser

ALcorn
Sun 22nd Jun '08, 11:27pm
is mail.ru a spam site or it's "ok"?mail.ru is one of biggest webmail services in Russia, just like hotmail.com or yahoo.com in US.

Boofo
Mon 23rd Jun '08, 12:07am
Well, according to what I have read, MRSPUTNIK is spammers. It's up to Dream, it is his list.

ALcorn
Mon 23rd Jun '08, 3:55am
Well, according to what I have read, MRSPUTNIK is spammers. It's up to Dream, it is his list.Well, please quote your sources. Anyway, it's possible that guys running browsers with this user-agent string spam some forums, but it's true for any user-agents strings.

Boofo
Mon 23rd Jun '08, 4:01am
I quoted one already and there are many more on the net. Use the search like I did. MRSPUTNIK is spammers, no matter how you try to justify it. Are you one of them?

ALcorn
Mon 23rd Jun '08, 4:33am
I quoted one already and there are many more on the net. Use the search like I did.You quoted a thread with link to the mail.ru Agent, nothing more. Also, like I previously explained MRSPUTNIK is a string insereted into the browser user-agent when mail.ru Agents software is installed (it's not a spyware, it's not a malware, it's not a virusn not a spamware). Just FYI: mail.ru is a russian TOP10 portal, getting about 15.000.000 unique visitors by month. And here is the home page of mail.ru Agent (http://agent.mail.ru/en/) (in english).


MRSPUTNIK is spammers, no matter how you try to justify it. Are you one of themHmmm, some people is so brillant, nothing can escape them :rolleyes:

Dream
Mon 23rd Jun '08, 7:22am
ok removed mrsputnik and added google mobile spider

ALcorn
Mon 23rd Jun '08, 7:34am
ok removed mrsputnik and added google mobile spiderThank you.

Another user agent to be removed (IMHO) is Google Wireless Transcoder. In fact it's not a crawler, but a service (http://google.com/gwt/n). So, even if the connection is coming from Google's IP, there's a user browsing forum thru this service.

Dream
Mon 23rd Jun '08, 8:20am
thats arguable, people may want to see how many people using that service is there

MR K
Wed 25th Jun '08, 1:55pm
nice job Dream :)

i've a little question about this: yesterday i've upgraded the vB and unfortunately i forgot to delete the newer file from the vB package ... so today i can't detect correctly the spiders ... however i just uploaded your latest file, but i saw that it doesn't reconize some spiders like 'snapshots' that before your older files reconize ... so the question is: how much time it takes before that the system runs 'correctly'?

Dream
Wed 25th Jun '08, 11:38pm
It should detect as soon as you upload the new file.

Dream
Wed 25th Jun '08, 11:39pm
added Attributor and YandexBlog

Boofo
Thu 26th Jun '08, 3:41am
thats arguable, people may want to see how many people using that service is there

Then I think MRSPUTNIK needs to be re-added as I originally reported. As you can see, the link I posted says what it is. His argument pretty well re-enforces my original report.

MR K
Thu 26th Jun '08, 6:03am
It should detect as soon as you upload the new file.

Dream, the snapshots spiders aren't detected ... (48 hours are gone) i know this cause i perfectly know its IP ... :)

MrNase
Thu 26th Jun '08, 10:00am
Excellent work, thank you! :)

MR K
Thu 26th Jun '08, 11:42am
nothing it doesn't works ... however, Dream i submitted the 'new' agent of the snapbot with its IPs ... check out ... thx

Dream
Thu 26th Jun '08, 11:46am
Excellent work, thank you! :)

You are welcome :)


Dream, the snapshots spiders aren't detected ... (48 hours are gone) i know this cause i perfectly know its IP ... :)
I added your Snapbot, see if you can detect them now :)

Dream
Thu 26th Jun '08, 11:48am
Then I think MRSPUTNIK needs to be re-added as I originally reported. As you can see, the link I posted says what it is. His argument pretty well re-enforces my original report.
Then we should add that MSN .NET service too by that logic...

MR K
Thu 26th Jun '08, 3:32pm
just updated the spider.xml nothing is changed ... (ok i'll wait a bit) however check the pic attached ... it's a weird '+' before the 'http' ...

Dream
Thu 26th Jun '08, 4:15pm
MR K, I'm not sure what's up, the list should have detected the bot with one of those two idents used... You said they were being detected before? Did you upload the xml to the correct folder?

MR K
Thu 26th Jun '08, 4:18pm
MR K, I'm not sure what's up, the list should have detected the bot with one of those two idents used... You said they were being detected before? Did you upload the xml to the correct folder?

yes, absolutely, it runs cause it detects the others bots like google or yahoo ... i put the file into the /includes/xml folder ...

MR K
Thu 26th Jun '08, 4:53pm
i've just found another spider that isn't recognized (from yahoo) ... check the attachment ...

Dream
Thu 26th Jun '08, 7:08pm
No Yahoo! Slurps are being detected? You should post this in the support forums.

MR K
Tue 1st Jul '08, 8:20am
just added another msnbot for the verification ...

pauloo
Tue 1st Jul '08, 7:48pm
Hello,

The last file "Thu 26th Jun 2008" don't work.
Any spiders is show. I think the file is not correct.
Thank you.

Boofo
Tue 1st Jul '08, 8:03pm
Hello,

The last file "Thu 26th Jun 2008" don't work.
Any spiders is show. I think the file is not correct.
Thank you.

They show fine for me. Maybe you got a bad copy.

Dream
Tue 1st Jul '08, 10:37pm
just added another msnbot for the verification ...
That one is not MSNBOT, it's users who have .NET installed, it was discussed earlier in the thread.