| Author |
Message |
T Campbell
Joined: 15 Dec 2005
Posts: 12
|
Posted:
Sat Feb 18, 2006 9:30 am |
  |
One of the things Ryan and I have been discussing is whether or not to change the way the engine handles character strings. I lean one way and he leans another, here. I thought I'd see how you felt about it before I said anything about "what our users want."
At present, if you search for GAV, you get not only all the transcriptions that use "Gav" (the name of the lead character in Nukees) but also all the transcriptions that use "gave" or "gavel." If you want ONLY the word "Gav," then you have to put quotation marks and spaces around the string, like this:
" Gav "
The advantage to this system is that it allows for automatic "stemming" without a whole lot of coding work: searches for CRASH also turn up the related concepts CRASHED and CRASHING. Also, comics sometimes play with words, so you could get the sound effect BASHSMASHCRASH, which almost no one would think to search for on their own.
On the downside, most other engines do whole-word matching, and I doubt that non-techies will think of treating the string the way I treated the "Gav" example above. We can put instructions on how to use it into the help pages, but I dunno... AltaVista put how-to-use instructions on its site for the years that it was the most popular pure search engine, and they didn't seem to get used very much.
All in all, I lean toward changing the search interface so that it searches the strings it's given as "whole words only," with the exception of the apostrophe-ess possessive (Gav's). In other words, search for GAV EATS and it should act the way it now acts if you search for " GAV " " EATS ".
This matters for most small words like WORD, CAR and SEE, though, admittedly, not for most larger words.
What do you think? |
_________________ T Campbell
http://www.tcampbell.net
Last edited by T Campbell on Sat Feb 18, 2006 12:25 pm; edited 2 times in total |
|
  |
 |
Bearclaw
Captain Sensible

Joined: 10 Nov 2004
Posts: 3690
Location: Jewtopia
|
Posted:
Sat Feb 18, 2006 9:36 am |
  |
I'm all for making machines more intelligent. I'd say a bit of each like (I think) you said.
BUT I DON'T REALLY KNOW ANYTHING ABOUT WHAT THIS ENTAILS |
_________________ Rang tang ding dong I am the Japanese Sandman |
|
    |
 |
T Campbell
Joined: 15 Dec 2005
Posts: 12
|
Posted:
Sat Feb 18, 2006 9:38 am |
  |
Dang, you're fast. Less than ten minutes after posting, I made some changes which I hope clear up my opinion a bit... |
_________________ T Campbell
http://www.tcampbell.net |
|
  |
 |
Bearclaw
Captain Sensible

Joined: 10 Nov 2004
Posts: 3690
Location: Jewtopia
|
Posted:
Sat Feb 18, 2006 10:01 am |
  |
Well, again, I don't know if going for whole words only is the answer. I WOULD like it if, like you said, the word crash also gave me result for crashing and crashes and crashmas.
Perhaps make the default whole words only, but give people the option for more finely tuned searches still? |
_________________ Rang tang ding dong I am the Japanese Sandman |
|
    |
 |
The Famous Mr. Klaw
Totally Klawsome

Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania
|
Posted:
Sat Feb 18, 2006 3:47 pm |
  |
I'd like it if you could make it sort of the opposite as it is now. I.E. if you want stemming from crash, you'd search for crash* instead of having to search for " crash " if you don't want stemming. |
_________________ A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw. |
|
   |
 |
The Famous Mr. Klaw
Totally Klawsome

Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania
|
Posted:
Sat Feb 18, 2006 3:48 pm |
  |
Of course, it'd be cool if you could also make it know when you search for something like "singing" that you might also want results with "sang" or "sung" in them. |
_________________ A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw. |
|
   |
 |
Smajie

Joined: 12 Jul 2005
Posts: 1049
Location: Da Holy Land
|
Posted:
Sat Feb 18, 2006 3:53 pm |
  |
I'd think that requires dictionariness, which would probably require more resources than are currently available. |
_________________ "Waitaminit -- so you're saying that existentialism is an onomatopoeia?" |
|
   |
 |
monkeyangst

Joined: 04 Nov 2005
Posts: 4
Location: Austin, TX
|
Posted:
Sat Feb 18, 2006 4:32 pm |
  |
I think the simplest means of accommodating both schools would be to keep the default search behavior the way it is ("GAV" matches "GAVEL" and "GAVIN MACLEOD") but have a check box below the main search field for "Match whole words only."
That seems more intuitive than quotes and spaces for whole words and asterisks for partial words... |
_________________ Monkey Law - A webcomic for the highly evolved |
|
    |
 |
jamused

Joined: 17 Dec 2005
Posts: 120
Location: Haverford, PA
|
Posted:
Sun Feb 19, 2006 12:10 am |
  |
I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring. |
_________________ WebAmused |
|
   |
 |
Tim Tylor
Joined: 16 Dec 2005
Posts: 2
Location: Cornwall, G Britain
|
Posted:
Sun Feb 19, 2006 9:13 am |
  |
| monkeyangst wrote: |
I think the simplest means of accommodating both schools would be to keep the default search behavior the way it is ("GAV" matches "GAVEL" and "GAVIN MACLEOD") but have a check box below the main search field for "Match whole words only."
That seems more intuitive than quotes and spaces for whole words and asterisks for partial words... |
I second this. Ideally, it would be nice to have options like the "exact phrase / all of words / none of words" at the top of the Google advanced search page, but I've no idea whether that's practical or not. |
|
|
  |
 |
BenB

Joined: 29 Sep 2004
Posts: 3088
Location: The Alternate Universe TABB
|
Posted:
Mon Feb 20, 2006 10:33 am |
  |
| Smajie wrote: |
| I'd think that requires dictionariness, which would probably require more resources than are currently available. |
Actually, you can get pretty far with a simple word stemmer. At any rate, I come down on Ryan's side on this one, for the simple reason that I often sort-of remember what happened in the comic I am searching for, but not exactly. |
_________________ AUGH! There's no Brillo Pad for the soul, BenB!
- justinpie |
|
  |
 |
The Famous Mr. Klaw
Totally Klawsome

Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania
|
Posted:
Mon Feb 20, 2006 1:25 pm |
  |
| jamused wrote: |
| I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring. |
Google does pretty advanced dictionary-based matching I think. |
_________________ A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw. |
|
   |
 |
jamused

Joined: 17 Dec 2005
Posts: 120
Location: Haverford, PA
|
Posted:
Mon Feb 20, 2006 2:09 pm |
  |
| The Famous Mr. Klaw wrote: |
| jamused wrote: |
| I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring. |
Google does pretty advanced dictionary-based matching I think. |
According to their basics page, they do stemming though they didn't used to. A simple experiment seems to bear this out: philander yields 632,000 hits, philander OR philandering 1,430,000, philand gets 1,700, and there's no way to add a wildcard on the end of philand. On the other hand diet yields 183,000,000 as does diet OR dietary. That seems consistent with whole word search plus some simple stemming. Certainly if they do anything more complicated, they don't give a hint of it or how to take advantage of it on their advanced search pages. (I'm talking just about the search, not the "did you mean X instead?" hint, which probably does use some dictionary.)
At any rate, the basic point is that people who are used to Google will expect whole word matching or something that ends up looking very much like it, and not searches that return "diet" when you were looking for "die."
One nifty Google feature, that might be useful in OhNo if it's not too hard to program (wouldn't be hard in Python or Perl, but I don't know anything about the guts of OhNo) is the * whole-word wildcard, so "Oh No *" would return "Oh No Robot" or "Oh No Batman" or "Oh No No", etc. At least I can readily imagine trying to find a particular comic that contains a piece of dialogue where I'm not sure of a word. |
_________________ WebAmused |
|
   |
 |
Bo Lindbergh

Joined: 18 Dec 2005
Posts: 2
Location: 59°20'N 18°03'E
|
Posted:
Mon Feb 20, 2006 2:46 pm |
  |
| T Campbell wrote: |
If you want ONLY the word "Gav," then you have to put quotation marks and spaces around the string, like this:
" Gav " |
But that won't find "Gav" at the very start of a strip text, since there's no space before the G. |
|
|
  |
 |
The Famous Mr. Klaw
Totally Klawsome

Joined: 04 Feb 2005
Posts: 15555
Location: Klawsylvania
|
Posted:
Mon Feb 20, 2006 4:29 pm |
  |
| jamused wrote: |
| The Famous Mr. Klaw wrote: |
| jamused wrote: |
| I think whatever Google does is what most people expect. Which would mean stemming, but not simple substring. |
Google does pretty advanced dictionary-based matching I think. |
At any rate, the basic point is that people who are used to Google will expect whole word matching or something that ends up looking very much like it, and not searches that return "diet" when you were looking for "die."
|
Hmm, but I just searched for "singing" and the first hit is "singingfish." |
_________________ A claw is a claw, and nobody has seen a talking claw unless that claw is the famous Mr. Klaw. |
|
   |
 |
Squidd

Joined: 18 Aug 2005
Posts: 5652
Location: The Truth and Beauty Bomb Shelter (but not yet)
|
Posted:
Mon Feb 20, 2006 5:18 pm |
  |
It's Google. It knew what you really wanted. |
_________________
 |
|
   |
 |
jamused

Joined: 17 Dec 2005
Posts: 120
Location: Haverford, PA
|
Posted:
Mon Feb 20, 2006 5:29 pm |
  |
| The Famous Mr. Klaw wrote: |
| Hmm, but I just searched for "singing" and the first hit is "singingfish." |
Ah, that's because Google treats Titles and URLs differently. If you use Advanced Search and specify that you want to search only the text of the page (or use the allintext: modifier), singingfish dissappears from the results. |
_________________ WebAmused |
|
   |
 |
|
|