PDA

View Full Version : Search help needed


RC Nitro
Wed 12th Jun '02, 5:32pm
I have a question about the search function. When searching for a specific string of characters, I get mixed results. For example, I searched for links to ebay within the posts. A typical string for an eBay page always starts with the following:

http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=

I can enter the above in the search field and get no results.

I can enter http://cgi and get no results.

I can enter cgi.ebay and get no results.

I can start and end each string with a wildcard and get no results.

The only instance where I'm able to get a result is when I search for: *ebay*, but then I end up getting results filled with EVERY mention of ebay, and not just the links to an eBay page. All the above searches would narrow it down to just the links, but those strings appear to trip up the search engine.

Anyone have any idea how to get around this?

BTW - It would be a REALLY nice feature if the admins were NOT bound the search floodcheck time setting. :rolleyes:

RC Nitro
Thu 13th Jun '02, 8:37am
Bump - anyone? Buehler, anyone?...

filburt1
Thu 13th Jun '02, 9:20am
Try searching for http://cgi.ebay*

RC Nitro
Thu 13th Jun '02, 9:40am
Still doesn't work. You would think it would work because it had the exact initial characters as part of the search parameters with a wildcard at the end, but it doesn't return any results. I would just like to know from the developers or someone well versed in MySQL (assuming the search parameters are determined by MySQL) exactly what is acceptable search criteria. It appears the use of special characters is throwing off the process. If there is someone that knows this well enough to suggest an alternative or that knows where I'm making a mistake in syntax, PLEASE let me know... :confused:

filburt1
Thu 13th Jun '02, 10:19am
Just for fun, try [url]http://cgi.ebay* . It might be ignoring it because of the autoparsed URLs. Or *http://cgi.ebay* .

RC Nitro
Thu 13th Jun '02, 10:26am
Nope *sigh* didn't work. I do think it's the special characters that's causing the problem. We have a considerable database of of nearly 600,000 posts, and it comes back immediately with a "no results" reply. Whenever I enter valid search criteria, it takes upwards of 10 seconds for a reply because of the massive amounts of data it must parse. The frustrating part is, I'm not even getting "syntax error" message or anything that might help me narrow down the problem. It just says "no results." Thanks for the suggestions however, at least you responded. Anyone else want to take a crack at this?

Steve Machol
Thu 13th Jun '02, 11:19am
Confirmed in my unhacked 2.2.6 test board. Not sure if it's a bug or just the way MySQL handles special characters, but I'll move this to Bugs so a Developer can look at it.

Mike Sullivan
Thu 13th Jun '02, 12:39pm
Actually, I don't think this is a bug. You're probably getting hit by the "Maximum Word Length" option (it defaults to 20) and therefore the word isn't indexed. You may want to try increasing that and rebuilding the search index.

Also worth mentioning is that the following characters are stripped out of text before it is split into words: ()"':;[]?!#{}_-+\

RC Nitro
Thu 13th Jun '02, 2:57pm
I will try resetting the maximum word length and run it again.

RC Nitro
Thu 13th Jun '02, 3:22pm
OK, I just tried it again with the following changes and got very odd results:

Set the maximum word length to 100 characters (to make sure I wasn't bumping up against the limit)

search parameters:
[list=1]
cgi.ebay.com
cgi.ebay.com*
*cgi.ebay.com*
ebay.com
ebay.com*
*ebay.com*
[/list=1]

None of the above returned a positive result for the intended url, BUT I did get a hit on "ebay.com" when searching for "ebay.com*," and I got another for *www.ebay.com* when searching for "*ebay.com*"

It simply wont pick up these characters when contained within a URL, even if I stay away from the specical characters you mentioned as being converted to spaces( ()"':;[]?!#{}_-+\ ). This isn't the only url either. I have searched for other URLs and have not had any luck. It appears as though it's ignoring the wildcards (*). Yes, I do have the BB set to use wildcards. Any other ideas?

BTW - thanks for the help thus far. I appreciate it.

andrewpfeifer
Thu 13th Jun '02, 3:23pm
Did you reindex after you changed it?

RC Nitro
Thu 13th Jun '02, 3:27pm
How do you re-index? I was not able to find a function to rebuild. Maybe I'm looking in the wrong place...:confused:

andrewpfeifer
Thu 13th Jun '02, 3:29pm
Admin CP -> Rebuild Counters -> Reindex

RC Nitro
Thu 13th Jun '02, 3:35pm
Excellent! Thank you. I'll wait until tonight to re-index. I think it will take a long time to get through all the threads. I'll keep my fingers crossed and let you know how it works out tomorrow.

Thx.

Steve Machol
Thu 13th Jun '02, 4:29pm
Originally posted by Ed Sullivan
Actually, I don't think this is a bug. You're probably getting hit by the "Maximum Word Length" option (it defaults to 20) and therefore the word isn't indexed. D'OH! :o

RC Nitro
Wed 19th Jun '02, 10:46am
OK - so here's the latest. I completed rebuilding the index (which took about 8 hours to go through about 630,000 posts :rolleyes: ), and set the maximum word length to 100 characters so a lenghty URL wouldn't escape the index. Unfortunately, I am still not able to search for URLs without problems. Here's a few examples you can try right here:


If a search for www.vbulletin.com, I only get results where www.vbulletin.com is entered literally, without parsing the URL. I get only 44 results.

If I search for *www.vbulletin.com*, which should grab any instance of this within a URL (the wildcards should pickup whatever characters preceed or follow within the URL), I get no results.

If I search for *vbulletin*, I get 23000 results, more than 44 of which are part of a URL that wasn't picked up by the searches above. However, the search results also contain every mention of the word vbulletin, so the results are somewhat useless unless I want to click every result to see if it's a proper result.

If I search for http*, I think I get all the URLs posted on the website (some 2600+), but not the specific ones I'm trying to find.

If I narrow my search to http://www.vbulletin*, I got two oddball results that appear to be a fluke.


The bottom line is, I would like to search for specific URLs posted on our VB site, but I'm not able to. For some odd reason using typical database/search engine syntax, I'm not able to figure out how to accomplish this. Any other ideas or suggestions?

Mystics
Wed 19th Jun '02, 11:20am
All words of the search index are stored in the table word, in a field named title. But this field has the type char(50), so all words >50 letters got cut at the length of 50.

So you also should enlarge this field to char(100) or bigger.

You also should note, that a slash (/) and a period (.) are stripped out of the message/title before indexing it.

Have you tried this search query?
http* AND *vbulletin* (This may take a while ;))

Mystics

RC Nitro
Wed 19th Jun '02, 2:27pm
I will have the field size changed to match the setting in the VB software, but I don't think that's the problem. Many of the URLs I've been searching for are well below 50 characters, so this shouldn't be an issue. I also tried the search as you suggested, using http* and *vbulletin*. It always timed out because it couldn't process the search.


Still open for ideas...

RC Nitro
Wed 19th Jun '02, 4:40pm
You also should note, that a slash (/) and a period (.) are stripped out of the message/title before indexing it.

Mystics

In light of this, I also tried *wwwvbulletincom* and got no results. If the slashes and periods are stripped from the data when indexed, why wouldn't this show up? Please remember that I'm just using the vbulletin URL as an example - the same trouble exists regardless of the domain or URL. Is there any VB staff here that has expertise in MySQL or whatever else might be the source of this odd problem that could help?

Mystics
Wed 19th Jun '02, 4:42pm
Originally posted by RC Nitro
If the slashes and periods are stripped from the data when indexed, why wouldn't this show up? Sorry, I meant "converted to spaces" :p

[EDIT]
But I actually tested it now, and the whole URL (without ":") was indexed

I made a new thread with the content:
http://www.vbulletin.com/forum/

and "http//www.vbulletin.com/forum/" got indexed

I was able to find that thread with:
http//www.vbulletin.com/forum/
http://www.vbulletin.com/forum/
*www.vbulletin.com*

RC Nitro
Wed 19th Jun '02, 4:46pm
Originally posted by Mystics
Sorry, I meant "converted to spaces" :p

Guess what...

Searching for *www vbulletin com* doesn't work either. :( :rolleyes: :D