FAQ  •  Search  •  Memberlist  •  Usergroups   •  Register  •  Profile  •  Log in to check your private messages  •  Log in

Your ad here, right now: as low as $0

Below are the TaBB Archives! The new forum is here. Even if you're already registered here, you'll need to create a new account there. CLEAN SLATE.

 Is the SWORD mightier than the WORD? View next topic
View previous topic
This forum is locked: you cannot post, reply to, or edit topics.This topic is locked: you cannot edit posts or make replies.
Author Message
T Campbell



Joined: 15 Dec 2005
Posts: 12

PostPosted: Sat Feb 18, 2006 9:30 am Reply with quoteBack to top

One of the things Ryan and I have been discussing is whether or not to change the way the engine handles character strings. I lean one way and he leans another, here. I thought I'd see how you felt about it before I said anything about "what our users want."

At present, if you search for GAV, you get not only all the transcriptions that use "Gav" (the name of the lead character in Nukees) but also all the transcriptions that use "gave" or "gavel." If you want ONLY the word "Gav," then you have to put quotation marks and spaces around the string, like this:

" Gav "

The advantage to this system is that it allows for automatic "stemming" without a whole lot of coding work: searches for CRASH also turn up the related concepts CRASHED and CRASHING. Also, comics sometimes play with words, so you could get the sound effect BASHSMASHCRASH, which almost no one would think to search for on their own.

On the downside, most other engines do whole-word matching, and I doubt that non-techies will think of treating the string the way I treated the "Gav" example above. We can put instructions on how to use it into the help pages, but I dunno... AltaVista put how-to-use instructions on its site for the years that it was the most popular pure search engine, and they didn't seem to get used very much.

All in all, I lean toward changing the search interface so that it searches the strings it's given as "whole words only," with the exception of the apostrophe-ess possessive (Gav's). In other words, search for GAV EATS and it should act the way it now acts if you search for " GAV " " EATS ".

This matters for most small words like WORD, CAR and SEE, though, admittedly, not for most larger words.

What do you think?

_________________
T Campbell
http://www.tcampbell.net

Last edited by T Campbell on Sat Feb 18, 2006 12:25 pm; edited 2 times in total
View user's profileSend private message
Bearclaw
Captain Sensible


Joined: 10 Nov 2004
Posts: 3690
Location: Jewtopia

PostPosted: Sat Feb 18, 2006 9:36 am Reply with quoteBack to top

I'm all for making machines more intelligent. I'd say a bit of each like (I think) you said.

BUT I DON'T REALLY KNOW ANYTHING ABOUT WHAT THIS ENTAILS

_________________
Rang tang ding dong I am the Japanese Sandman
View user's profileSend private messageSend e-mailAIM Address
T Campbell



Joined: 15 Dec 2005
Posts: 12

PostPosted: Sat Feb 18, 2006 9:38 am Reply with quoteBack to top

Dang, you're fast. Less than ten minutes after posting, I made some changes which I hope clear up my opinion a bit...

_________________
T Campbell
http://www.tcampbell.net
View user's profileSend private message
Bearclaw
Captain Sensible


Joined: 10 Nov 2004
Posts: 3690
Location: Jewtopia

PostPosted: Sat Feb 18, 2006 10:01 am Reply with quoteBack to top

Well, again, I don't know if going for whole words only is the answer. I WOULD like it if, like you said, the word crash also gave me result for crashing and crashes and crashmas.

Perhaps make the default whole words only, but give people the option for more finely tuned searches still?

_________________
Rang tang ding dong I am the Japanese Sandman
View user's profileSend private messageSend e-mailAIM Address
The Famous Mr. Klaw
Totally Klawsome


Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania

PostPosted: Sat Feb 18, 2006 3:47 pm Reply with quoteBack to top

I'd like it if you could make it sort of the opposite as it is now. I.E. if you want stemming from crash, you'd search for crash* instead of having to search for " crash " if you don't want stemming.

_________________
A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw.
View user's profileSend private messageVisit poster's website
The Famous Mr. Klaw
Totally Klawsome


Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania

PostPosted: Sat Feb 18, 2006 3:48 pm Reply with quoteBack to top

Of course, it'd be cool if you could also make it know when you search for something like "singing" that you might also want results with "sang" or "sung" in them.

_________________
A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw.
View user's profileSend private messageVisit poster's website
Smajie



Joined: 12 Jul 2005
Posts: 1049
Location: Da Holy Land

PostPosted: Sat Feb 18, 2006 3:53 pm Reply with quoteBack to top

I'd think that requires dictionariness, which would probably require more resources than are currently available.

_________________
"Waitaminit -- so you're saying that existentialism is an onomatopoeia?"
View user's profileSend private messageSend e-mail
monkeyangst



Joined: 04 Nov 2005
Posts: 4
Location: Austin, TX

PostPosted: Sat Feb 18, 2006 4:32 pm Reply with quoteBack to top

I think the simplest means of accommodating both schools would be to keep the default search behavior the way it is ("GAV" matches "GAVEL" and "GAVIN MACLEOD") but have a check box below the main search field for "Match whole words only."

That seems more intuitive than quotes and spaces for whole words and asterisks for partial words...

_________________
Monkey Law - A webcomic for the highly evolved
View user's profileSend private messageVisit poster's websiteAIM Address
jamused



Joined: 17 Dec 2005
Posts: 120
Location: Haverford, PA

PostPosted: Sun Feb 19, 2006 12:10 am Reply with quoteBack to top

I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring.

_________________
WebAmused
View user's profileSend private messageVisit poster's website
Tim Tylor



Joined: 16 Dec 2005
Posts: 2
Location: Cornwall, G Britain

PostPosted: Sun Feb 19, 2006 9:13 am Reply with quoteBack to top

monkeyangst wrote:
I think the simplest means of accommodating both schools would be to keep the default search behavior the way it is ("GAV" matches "GAVEL" and "GAVIN MACLEOD") but have a check box below the main search field for "Match whole words only."

That seems more intuitive than quotes and spaces for whole words and asterisks for partial words...


I second this. Ideally, it would be nice to have options like the "exact phrase / all of words / none of words" at the top of the Google advanced search page, but I've no idea whether that's practical or not.
View user's profileSend private message
BenB



Joined: 29 Sep 2004
Posts: 3088
Location: The Alternate Universe TABB

PostPosted: Mon Feb 20, 2006 10:33 am Reply with quoteBack to top

Smajie wrote:
I'd think that requires dictionariness, which would probably require more resources than are currently available.
Actually, you can get pretty far with a simple word stemmer. At any rate, I come down on Ryan's side on this one, for the simple reason that I often sort-of remember what happened in the comic I am searching for, but not exactly.

_________________
AUGH! There's no Brillo Pad for the soul, BenB!
- justinpie
View user's profileSend private message
The Famous Mr. Klaw
Totally Klawsome


Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania

PostPosted: Mon Feb 20, 2006 1:25 pm Reply with quoteBack to top

jamused wrote:
I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring.


Google does pretty advanced dictionary-based matching I think.

_________________
A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw.
View user's profileSend private messageVisit poster's website
jamused



Joined: 17 Dec 2005
Posts: 120
Location: Haverford, PA

PostPosted: Mon Feb 20, 2006 2:09 pm Reply with quoteBack to top

The Famous Mr. Klaw wrote:
jamused wrote:
I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring.


Google does pretty advanced dictionary-based matching I think.


According to their basics page, they do stemming though they didn't used to. A simple experiment seems to bear this out: philander yields 632,000 hits, philander OR philandering 1,430,000, philand gets 1,700, and there's no way to add a wildcard on the end of philand. On the other hand diet yields 183,000,000 as does diet OR dietary. That seems consistent with whole word search plus some simple stemming. Certainly if they do anything more complicated, they don't give a hint of it or how to take advantage of it on their advanced search pages. (I'm talking just about the search, not the "did you mean X instead?" hint, which probably does use some dictionary.)

At any rate, the basic point is that people who are used to Google will expect whole word matching or something that ends up looking very much like it, and not searches that return "diet" when you were looking for "die."

One nifty Google feature, that might be useful in OhNo if it's not too hard to program (wouldn't be hard in Python or Perl, but I don't know anything about the guts of OhNo) is the * whole-word wildcard, so "Oh No *" would return "Oh No Robot" or "Oh No Batman" or "Oh No No", etc. At least I can readily imagine trying to find a particular comic that contains a piece of dialogue where I'm not sure of a word.

_________________
WebAmused
View user's profileSend private messageVisit poster's website
Bo Lindbergh



Joined: 18 Dec 2005
Posts: 2
Location: 59°20'N 18°03'E

PostPosted: Mon Feb 20, 2006 2:46 pm Reply with quoteBack to top

T Campbell wrote:
If you want ONLY the word "Gav," then you have to put quotation marks and spaces around the string, like this:

" Gav "

But that won't find "Gav" at the very start of a strip text, since there's no space before the G.
View user's profileSend private message
The Famous Mr. Klaw
Totally Klawsome


Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania

PostPosted: Mon Feb 20, 2006 4:29 pm Reply with quoteBack to top

jamused wrote:
The Famous Mr. Klaw wrote:
jamused wrote:
I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring.


Google does pretty advanced dictionary-based matching I think.


At any rate, the basic point is that people who are used to Google will expect whole word matching or something that ends up looking very much like it, and not searches that return "diet" when you were looking for "die."


Hmm, but I just searched for "singing" and the first hit is "singingfish."

_________________
A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw.
View user's profileSend private messageVisit poster's website
Squidd



Joined: 18 Aug 2005
Posts: 5652
Location: The Truth and Beauty Bomb Shelter (but not yet)

PostPosted: Mon Feb 20, 2006 5:18 pm Reply with quoteBack to top

It's Google. It knew what you really wanted.

_________________
Image
View user's profileSend private messageSend e-mail
jamused



Joined: 17 Dec 2005
Posts: 120
Location: Haverford, PA

PostPosted: Mon Feb 20, 2006 5:29 pm Reply with quoteBack to top

The Famous Mr. Klaw wrote:
Hmm, but I just searched for "singing" and the first hit is "singingfish."


Ah, that's because Google treats Titles and URLs differently. If you use Advanced Search and specify that you want to search only the text of the page (or use the allintext: modifier), singingfish dissappears from the results.

_________________
WebAmused
View user's profileSend private messageVisit poster's website
Display posts from previous:      
This forum is locked: you cannot post, reply to, or edit topics.This topic is locked: you cannot edit posts or make replies.


 Jump to:   



View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum