MySQL's latest InnoDB engine can now do extensive, high-performance, full text search. A quick primer delivers all the goodies
Oracle recently provided access to many new MySQL 5.6 features through http://labs.mysql.com for the user community to test and comment on. One notable feature is the InnoDB Full-Text Search (FTS) engine. It lets users build FULLTEXT indexes on InnoDB tables to represent text-based content and speed up searches for words and phrases. Combining full-text search with InnoDB tables opens up text capability to transactional applications, where the textual data is frequently inserted, updated, and deleted. Given the importance of this feature to the MySQL audience, this article explains the design of InnoDB FTS and provides recipes for its use, as well as a short list of the limitations of this release.
The Design of Full Text Search
Like most database-powered search features, InnoDB Full-Text Search is designed as an "inverted index," in which the incoming text strings are tokenized into individual words, and these words are stored in one or more auxiliary tables. For each word, a list of Document IDs and word position pairs is stored. We call such a Document ID/Position pair list an "ilist." Thus, in the index table, two important columns are word and ilist. InnoDB FTS supports Proximity Search, which has been lacking in the MyISAM FTS feature, by storing the position (as a byte offset) for each word.
InnoDB FTS represents each search index as six tables. The words are divided among these tables based on their first characters. Currently, the partition criteria are hard coded, targeted to the Latin character set.
The partitioned design helps to parallelize the operation. In the current release, InnoDB only parallelizes the CREATE INDEX operation, but it can parallelize other operations such as query and full-text search later.
Let's move to an example to illustrate the auxiliary tables with an InnoDB FTS index.
Starting in MySQL 5.6, you can examine InnoDB metadata (the InnoDB system tables) through corresponding tables in the INFORMATION_SCHEMA database. Let's see how these auxiliary tables are created alongside the FTS index. These tables are specific to InnoDB, so you will not find them in MySQL's metadata, and you cannot query them through normal SQL interfaces.
Here is the definition of the quotes table used for our example:
create table quotes
( id int unsigned auto_increment primary key
, author varchar(64)
, quote varchar(400)
, source varchar(64)
-- Create the fulltext index
create fulltext index idx on quotes(quote);
-- Now check tables in the InnoDB SYS_TABLES:
mysql>SELECT table_id, name FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES;