MySQL full-text indexing & TLAs

· by Steve · Read in about 4 min · (665 Words)

Here’s a quick tip - if you run a technical site and have decided to use MySQL’s full-text indexing feature to make searches faster, you will want to change something out of the box. I’m upgrading the Ogre forum over xmas to phpBB3, and I’d found that the default ‘native’ search index took over 18 hours to build, simply because the forum is so large (and has in fact been ‘pruned’ back to the last 3-4 years now), plus my test server here is a more modest spec. I tried the MySQL full-text index instead, and it built in under 30 minutes, so that was a major boon, in addition to the searches themselves being faster. I have to admit I didn’t realise MySQL had this facility (it has had it since version 3 in fact); I’d only used similar features on Oracle and SQL Server in years gone by.

However, there was one problem - it didn’t seem to give me the results I was expecting, at least not all the time. The problem was that Ogre, being a technical forum, uses a lot of TLAs (Three Letter Acronyms), which¬† you really want to be able to search for. However, the default configuration for the MySQL full-text search is only to look for words that are between 4 and 84 characters, presumably to avoid small common words like ‘the’, ‘for’ etc (although those will get excluded by the ‘stop word’ list anyway).

mysql> show variables like 'ft%';<br /> +--------------------------+----------------+<br /> | Variable_name | Value |<br /> +--------------------------+----------------+<br /> | ft_boolean_syntax | + -><()~*:""&| |<br /> | ft_max_word_len | 84 |<br /> | ft_min_word_len | 4 |<br /> | ft_query_expansion_limit | 20 |<br /> | ft_stopword_file | (built-in) |<br /> +--------------------------+----------------+<br /> 5 rows in set (0.00 sec)

So, the answer is to change your configuration file (by default, /etc/mysql/my.cnf on Debian / Ubuntu, just /etc/my.cnf on RedHat), like so:

``Here’s a quick tip - if you run a technical site and have decided to use MySQL’s full-text indexing feature to make searches faster, you will want to change something out of the box. I’m upgrading the Ogre forum over xmas to phpBB3, and I’d found that the default ‘native’ search index took over 18 hours to build, simply because the forum is so large (and has in fact been ‘pruned’ back to the last 3-4 years now), plus my test server here is a more modest spec. I tried the MySQL full-text index instead, and it built in under 30 minutes, so that was a major boon, in addition to the searches themselves being faster. I have to admit I didn’t realise MySQL had this facility (it has had it since version 3 in fact); I’d only used similar features on Oracle and SQL Server in years gone by.

However, there was one problem - it didn’t seem to give me the results I was expecting, at least not all the time. The problem was that Ogre, being a technical forum, uses a lot of TLAs (Three Letter Acronyms), which¬† you really want to be able to search for. However, the default configuration for the MySQL full-text search is only to look for words that are between 4 and 84 characters, presumably to avoid small common words like ‘the’, ‘for’ etc (although those will get excluded by the ‘stop word’ list anyway).

mysql> show variables like 'ft%';<br /> +--------------------------+----------------+<br /> | Variable_name | Value |<br /> +--------------------------+----------------+<br /> | ft_boolean_syntax | + -><()~*:""&| |<br /> | ft_max_word_len | 84 |<br /> | ft_min_word_len | 4 |<br /> | ft_query_expansion_limit | 20 |<br /> | ft_stopword_file | (built-in) |<br /> +--------------------------+----------------+<br /> 5 rows in set (0.00 sec)

So, the answer is to change your configuration file (by default, /etc/mysql/my.cnf on Debian / Ubuntu, just /etc/my.cnf on RedHat), like so:

``

Hey presto, once you bounce the MySQL daemon and rebuild your indexes, you can search for words like ‘GPU’, ‘CPU’ and ‘PCF’ again. Phew.