/ sphinx

Install, configure and test Sphinx on Ubuntu 16.04

Sphinx is an open source search engine that allows full-text searches. It is best known for performing searches over large data very efficiently. The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, mailboxes, etc.

Some key features of Sphinx are:

High indexing and searching performance

  • Advanced indexing and querying tools
  • Advanced result set post-processing
  • Proven scalability up to billions of documents, terabytes of data, and thousands of queries per second
  • Easy integration with SQL and XML data sources, and SphinxQL, SphinxAPI, or SphinxSE search interfaces
  • Easy scaling with distributed searches

In this tutorial, we will set up Sphinx with MySQL server using the sample SQL file included in the distribution package. It will give you a basic idea of how to use Sphinx for your project.

1. Installing Sphinx

sudo apt-get install sphinxsearch

Before starting the Sphinx daemon, let's create a test DB (step 2.) and next configure it (step 3.).

2. Creating the Test Database

We'll set up a database using the sample data in the SQL file provided with the package. This will allow us to test that Sphinx search is working later.

Let's import the sample SQL file into the database. First, log in to the MySQL server shell.
mysql -u root -p
Create a dummy database. Here, we're calling it test, but you can name it whatever you want.
mysql> CREATE DATABASE test;
Import the example SQL file.
mysql> SOURCE /etc/sphinxsearch/example.sql;
Then leave the MySQL shell.
mysql> quit

3. Configuring Sphinx

The configuration consists of 3 main blocks that are essential to run:

  • index
  • searchd
  • source
    First, create the sphinx.conf file.
    sudo vi /etc/sphinxsearch/sphinx.conf
    or find file:
    find -name "sphinx.conf"
source src1
{
  type          = mysql

  sql_host      = localhost
  sql_user      = root
  sql_pass      = your_root_mysql_password
  sql_db        = test
  sql_port      = 3306

  sql_query     = \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint         = group_id
  sql_attr_timestamp    = date_added
}
index test1
{
  source            = src1
  path              = /var/lib/sphinxsearch/data/test1
  docinfo           = extern
}
searchd
{
  listen            = 9306:mysql41
  log               = /var/log/sphinxsearch/searchd.log
  query_log         = /var/log/sphinxsearch/query.log
  read_timeout      = 5
  max_children      = 30
  pid_file          = /var/run/sphinxsearch/searchd.pid
  seamless_rotate   = 1
  preopen_indexes   = 1
  unlink_old        = 1
  binlog_path       = /var/lib/sphinxsearch/data
}

INDEX

The index component contains the source and the path to store the data.

  • source: Name of the source block. In our example, this is src1.
  • path: The path to save the index.

SEARCHD

The searchd component contains the port and other variables to run the Sphinx daemon.

  • `listen: The port which the Sphinx daemon will run, followed by the protocol. In our example, this is 9306:mysql41. Known protocols are :sphinx (SphinxAPI) and :mysql41 (SphinxQL)
  • query_log: The path to save the query log.
  • pid_file: The path to PID file of Sphinx daemon.
  • seamless_rotate: Prevents searchd stalls while rotating indexes with huge amounts of data to precache.
  • preopen_indexes: Whether to forcibly preopen all indexes on startup.
  • unlink_old: Whether to delete old index copies on successful rotation.

SOURCE

The source block contains the type of source, username and password to the MySQL server. The first column of the sql_query should be a unique id. The SQL query will run on every index and dump the data to Sphinx index file. Below are the descriptions of each field and the source block itself.

  • type: Type of data source to index. In our example, this is mysql. Other supported types include pgsql, mssql, xmlpipe2, odbc, and more.
  • sql_host: Hostname for the MySQL host. In our example, this is localhost. This can be a domain or IP address.
  • sql_user: Username for the MySQL login. In our example, this is root.
  • sql_pass: Password for the MySQL user. In our example, this is the root MySQL user's password.
  • sql_db: Name of the database that stores data. In our example, this is test.
  • sql_query: The query thats dumps data from the database to the index.

4. Managing the Index

In this step, we'll add data to the Sphinx index and make sure the index stays up to date using cron.
sudo indexer --all
The follow cronjob will run on every hour and add new data to the index using the configuration file we created earlier. Copy and paste it at the end of the file, then save and close the file.
@hourly /usr/bin/indexer --rotate --config /etc/sphinxsearch/sphinx.conf --all

5. Starting Sphinx

By default, the Sphinx daemon is tuned off. First, we'll enable it by changing the line START=no to START=yes in /etc/default/sphinxsearch.
sudo sed -i 's/START=no/START=yes/g' /etc/default/sphinxsearch
Then, use systemctl to restart the Sphinx daemon.
sudo systemctl restart sphinxsearch.service
To check if the Sphinx daemon is running correctly, run.
sudo systemctl status sphinxsearch.service

6. Testing

Now that everything is set up, let's test the search functionality. Connect to the SphinxQL (on port 9306) using the MySQL interface. Your prompt will change to mysql>.
mysql -h0 -P9306
Let's search a sentence.
SELECT * FROM test1 WHERE MATCH('test document'); SHOW META;

+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
|    1 |        1 | 1465979047 |
|    2 |        1 | 1465979047 |
+------+----------+------------+
2 rows in set (0.00 sec)

+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 2        |
| total_found   | 2        |
| time          | 0.000    |
| keyword[0]    | test     |
| docs[0]       | 3        |
| hits[0]       | 5        |
| keyword[1]    | document |
| docs[1]       | 2        |
| hits[1]       | 2        |
+---------------+----------+
9 rows in set (0.00 sec`

In the result above you can see that Sphinx found 2 matches from our test1 index for our test sentence. The SHOW META; command shows hits per keyword in the sentence as well.
Let's search some keywords.
mysql> CALL KEYWORDS ('test one three', 'test1', 1);
You should get something that looks like the following.

+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | test      | test       | 3    | 5    |
| 2    | one       | one        | 1    | 2    |
| 3    | three     | three      | 0    | 0    |
+------+-----------+------------+------+------+
3 rows in set (0.00 sec)

In the result above you can see that in the test1 index, Sphinx found:

5 matches in 3 documents for the keyword 'test'
2 matches in 1 document for the keyword 'one'
0 matches in 0 documents for the keyword 'three'

In this tutorial, we have shown you how to install Sphinx and make a simple search using SphinxQL and MySQL.

You can also find official native SphinxAPI implementations for PHP, Perl, Python, Ruby and Java (https://github.com/sphinxsearch/sphinx/tree/master/api). If you are using Nodejs, you can also use the SphinxAPI package.

By using Sphinx, you can easily add a custom search to your site. For more information on using Sphinx, visit the project website(http://sphinxsearch.com/).


See also

Install, configure and test Sphinx on Ubuntu 16.04
Share this

Subscribe to NL Slack