Introduction

Xapian is an open source search engine library, which allows developers to add advanced indexing and search facilities to their own applications.

This document aims to be a guide to getting up and running with your first database, explaining basic concepts and providing code examples of the library’s core functionality.

If you just want to follow our code examples, you can skip the chapter on “Core Concepts” and go straight to A practical example - but you should probably make sure you have Xapian installed first!

Note

If you’re looking for a way of getting a search system running without having to write any code, you may want to look at Omega, Xapian’s pre-packaged web search application. It’s designed so that as your needs grow, you can extend or even replace it without having to change your database; the structure that Omega sets up will work when you start writing code directly against Xapian.

Installation

There are two pieces of Xapian you need to follow this guide: the library itself, and support for the language you’re going to be using. This guide was originally written with examples in Python, and we’ve made a start on full translations into Java, Perl, PHP and C++. Help with completing these translations and with translating the examples into other languages would be most welcome.

This guide documents Xapian 1.4 (except where a different version is explicitly mentioned) so you’ll find it easier to follow if you use a version from the 1.4 release series. So let’s get that onto your system.

Installation on Debian or Ubuntu

Recent releases of Debian and Ubuntu both have Xapian 1.4 packages. If you’re still using Ubuntu xenial you can install packages from our PPA.

Once you have a suitable repo configured, then you can do one of the following depending on whether you want to work through the examples in Python or C++:

$ sudo apt-get install python3-xapian
$ sudo apt-get install libxapian-dev

Packages of the PHP bindings aren’t available due to a licence compatibility issue, but you can build your own packages.

Installation on other systems

Many operating systems have packages available to make Xapian easy to install; information is available on our download page. This covers most popular Linux distributions, FreeBSD, Mac OS (Python and C++ only) and Windows using Microsoft Visual Studio.

If you’re using a different operating system, you will need to compile from source, which should work on any Unix-like operating system, and Windows using any one of Cygwin, MSYS+mingw or MSVC. Source code is again available from our download page.

Datasets and example code

If you want to run the code we use to demonstrate Xapian’s features (and we recommend you do), you’ll need both the code itself and the two datasets we use. You can grab the source for this guide from github, which contains the example code in each language (in the code subdirectory), and also the data files listed below (in the data subdirectory).

The example code is available in Python, Perl, PHP, C++ and Java so far, although there’s only a complete set of examples for Python at present.

The first dataset is the first 100 objects taken from museum catalogue data released by the Science Museum. We downloaded this data from their API site, but this has since shut down. The second dataset we have curated ourselves from information on Wikipedia about the 50 US States. Both are provided as gzipped CSV files. The first dataset is released under the Creative Commons license Attribution-NonCommercial-ShareAlike license, and the second under Creative Commons Attribution-Share Alike 3.0.

If you haven’t grabbed the git repository, you can also view these datasets online on github:

As we describe how to use Xapian, and show how to use it with practical code examples, we provide the commands needed to compile (if necessary) and run the code described. These commands are intended to be run from the top-level directory of the source for this guide.

Todo

link to here from every howto and everything that needs the data files and example code

Contributing

The source for this documentation is being kept on github; the best way to contribute is to add issues, comments and pull requests there. If you want to chat to us interactively during the sprint sessions (and in general) there are details of how to connect to our chat channel on our website.

To be able to generate this documentation from a git checkout, you’ll need the Sphinx documentation tool. If you’re using Debian or Ubuntu or another Debian-derived distro, you can get this by installing either the python-sphinx or python3-sphinx package. Once you have Sphinx installed, you can generate HTML output with make html (for a full list of available formats, see make).