Skip navigation

Full-Text Search Basics

I've created a full-text index on a table, and I have questions about SQL Server 2000's full-text search component. First, can full-text search look for a suffix (e.g., match the phrase *mation to determine whether a field contains words such as information)? Second, can you use full-text search to search for words in close proximity to each other? If you can, how many words can separate the search words before the search fails? Third, how can you modify the noise-word list?

The answer to your first question is that you can't directly do a suffix search. However, you can work around the limitation in a couple of ways. You can use the LIKE predicate in a clause such as WHERE mytext LIKE N'%suffix '. However, using LIKE requires a scan of all the rows in the table you're searching, which could impede performance. Alternatively, you can duplicate and index the data in reverse (e.g., imagination becomes noitanigami), then search for the string noitani*.

The answer to your second question is yes, you can use a proximity search, which searches for words near one another. The correct syntax is

USE pubs
GO
SELECT title, notes
FROM titles
WHERE CONTAINS (notes, 'user NEAR computers')
GO

You can also use the tilde (~) character and mix and match with postfix operators. (For details about this technique, see "Searching for Words or Phrases Close to Another Word or Phrase (Proximity Term)" in SQL Server Books Online—BOL.) The ranking values that the search returns determine what qualifies as "near." If you use the CONTAINSTABLE clause, you can get the ranking values and do further experimenting.

The answer to your third question is that the word-breaker characters are inherent in the languages used and you can modify the noise-word list. Noise words are words that are automatically excluded from a full-text query search. For example, a typical search excludes the words a, and, and the. The "Full-Text Index and Querying Concepts" topic in BOL explains how to modify the noise-word files. Use caution when changing the noise-word lists because if you allow more words in your searches, you can dramatically increase the size of your indexes.

TAGS: SQL
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish