Getting Started with Xapian

Contents:

Contributing

The source for this documentation is being kept on http://github.com/jaylett/xapian-docsprint. The best way to contribute is to add issues, comments and pull requests there.

Assuming you’re on Debian or Ubuntu or another Debian-derived distro, you’ll need to install either the python-sphinx or python3-sphinx package in order to be able to generate this documentation from a git checkout. Once you’ve done that, you can generate HTML output with make html.

We’re monitoring IRC during the sprint sessions (and in general) so you can also contact us on channel #xapian on irc.freenode.net (webchat link: http://webchat.freenode.net/?channels=%23xapian).

Indices and tables

Todo list

Todo

add more and fill out those already here a little more

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/concepts/indexing/limitations.rst, line 8.)

Todo

actually, reference the FAQ instead of saying something here.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/concepts/indexing/limitations.rst, line 17.)

Todo

add or link to some details of how to do this

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/concepts/search/search_limitations.rst, line 51.)

Todo

point out that lowercasing by TermGenerator or similar will prevent unexpected matching of prefixes terms by “real” words in the source data

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/boolean_filters.rst, line 4.)

Todo

Write about how to use collapsing with searches. Discuss collapsing to return only a unique item for each value, and collapsing to return multiple. Make clear that it simply excludes some items from the result set; doesn’t reorder results, or guarantee to return the top N from each category. Also add a code example.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/collapsing.rst, line 7.)

Todo

It is probably worth saying that a typical database will not hit these performance issues

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/facets.rst, line 104.)

Todo

Write about how to index geolocation information, and how to use the geo posting sources and keymaker to sort by distance, bias results by distance, and limit results by distance. Discuss storing geo bounding boxes terms for accelerating distance limited searches.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/geospatial.rst, line 8.)

Todo

Check if there is existing documentation which can be used as a skeleton for this.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/geospatial.rst, line 13.)

Todo

Write about how to iterate a system to get better results. Discuss analysis of log data.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/improve_results.rst, line 4.)

Todo

Check if there is existing documentation which can be used as a skeleton for this.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/improve_results.rst, line 7.)

Todo

list up front the various methods

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/query_authorisation.rst, line 9.)

Todo

Check if there is existing documentation which can be used as a skeleton for this.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/query_authorisation.rst, line 11.)

Todo

Discuss filtering results coming back from a query, and the problems with just doing that.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/query_authorisation.rst, line 17.)

Todo

Discuss implementing auth schemes by indexing appropriate data.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/query_authorisation.rst, line 23.)

Todo

Discuss hybrid schemes (implementing auth using indexed terms, and also filtering results).

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/query_authorisation.rst, line 29.)

Todo

Discuss issues relating to updates (in particular, how fast does something need to be hidden if it is changed to being private).

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/query_authorisation.rst, line 35.)

Todo

check valueranges.rst to see if anything else needs moving across

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/range_queries.rst, line 6.)

Todo

the above paragraph isn’t entirely inaccurate; the processor is unweighted, so if there’s no other query, and the docid ordering is don’t care or ascending, then the search can terminate early. If the VRP isn’t matching many documents, that could still be slow, but might not be. If it’s not matching any documents, it might be fast because the bounds on stored values may show that it can’t match anything. Oh, it’s all quite complicated really. It would be nice to explain how this is done somewhere, but probably not here.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/range_queries.rst, line 287.)

Todo

actually, you can’t safely combine the query with an external filter, because other bits of the query might be higher level. For example, a query of ‘1790..1799 OR york’ couldn’t have the filter applied to the generated query because it shouldn’t be applied to the “york” part.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/range_queries.rst, line 330.)

Todo

possibly implementing this example would help make it more clear.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/range_queries.rst, line 336.)

Todo

Make the old docs link correctly

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/replication.rst, line 45.)

Todo

expand this section.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/replication.rst, line 68.)

Todo

XAPIAN_MAX_CHANGESETS is not irrelevant in brass. We should talk about that here but only if we are clear what version it was added in.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/replication.rst, line 74.)

Todo

Getting a correction for a single word “by hand”

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/spelling.rst, line 69.)

Todo

Discuss interactions with stemming (ie, should the input and/or output values in the synonym table be stemmed).

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/synonyms.rst, line 26.)

Todo

Document this!

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/synonyms.rst, line 32.)

Todo

Query.OP_SYNONYM, and how that relates to synonym expansion.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/synonyms.rst, line 72.)

Todo

Say something more useful about tuning the parameters!

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/howtos/weighting_scheme.rst, line 34.)

Todo

finalise datasets and code and link to them from here

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/overview.rst, line 90.)

Todo

link to here from every howto and everything that needs the data files and example code

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/overview.rst, line 113.)

Todo

This example should really be pulled directly from the code. There seems to be a bug with line number limiting somewhere in the literal include directive.

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/practical_example/searching/prefix.rst, line 18.)

Todo

Once brass settles down, update this for brass

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/xapian-core-rst/admin_notes.rst, line 5.)

Todo

ensure this really is up to date for 1.3.0

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/xapian-core-rst/admin_notes.rst, line 28.)

Todo

Provide some more examples!

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/xapian-core-rst/postingsource.rst, line 242.)

Todo

“why you might want to do this” (e.g. scenario) too

(The original entry is located in /var/build/user_builds/getting-started-with-xapian/checkouts/latest/xapian-core-rst/postingsource.rst, line 243.)