Query Parser¶
To make searching databases simpler, Xapian provides a QueryParser class which converts a human readable query string into a Xapian Query object, for example:
apple AND a NEAR word OR "a phrase" NOT (too difficult) +eh
The above example shows how some of the basic modifiers are interpreted by the QueryParser; the operators it supports follow the operators described earlier, for example:
apple AND pear
matches documents where both terms are presentapple OR pear
matches documents where either term (or both) are presentapple NOT pear
matches documents whereapple
is present andpear
is not
Term Generation¶
The QueryParser uses an internal process to convert the query string into terms. This is similar to the process used by the TermGenerator, which can be used at index time to convert a string into terms. It is often easiest to use QueryParser and TermGenerator on the same database.
Operators¶
As well as the basic logical operators, QueryParser supports the additional operators discussed earlier and introduces some new ones, for example:
apple NEAR dessert
president "united states"
"race condition" -horse
+recipe +apple pie cake dessert
The NEAR and phrase support behaves in the same way as described earlier; the new features are the + and - operators, which select documents based on the presence or absence of specified terms, for example:
"race condition" -horse
Matches all documents with the phrase "race condition"
but not horse
; and:
+recipe +apple pie cake desert
Which matches all documents which have the terms recipe
and apple
; then
all documents with these terms are weighted according to the weight of the
additional terms.
One thing to note is that the behaviour of the +/- operators vary depending
on the default operator used and the above examples assume that
OP_OR
is used.
Bracketed Expressions¶
When queries contain both OR and AND operators, AND takes precedence. To change the precedence of parts of the query, brackets can be used. For example, with the query:
apple OR pear AND dessert
The query parser will interpret this query as:
apple OR (pear AND dessert)
So to change the precedence and make the dessert a requirement, you would write the query initially as:
(apple OR pear) AND dessert
Stop words¶
Xapian also supports a stop word list, which allows you to specify words which should be removed from a query before processing. This list can be overridden within user search, so stop words can still be searched for if desired, for example if a stop word list contained ‘the’ and a search was for:
+the +document
Then the search would find relevant documents which contained both ‘the’ and ‘document’. Also, when searching for phrases, stop words do not apply, for example here we will retrieve documents with the exact phrase including ‘the’:
"the green space"
Searching with Prefixes¶
When a database is populated using prefixed terms (for example, title, author) it is possible to tell the QueryParser that these fields can be searched for using a human-readable prefix; for example:
author:"william shakespeare" title:juliet
Ranges¶
The QueryParser also supports range searches on document values, matching documents which have values within a given range. There are several types of range processors available, but the two discussed here are Date and Number, which require that values are serialised as they are indexed.
To use a range, additional programming is required to tell the QueryParser what format a range is specified in and which value is to be searched for matches within that range. This then gives rise to the ability to specify ranges as:
$10..50
5..10kg
01/01/1970..01/03/1970
size:3..7
When date ranges are configured (as a DateRangeProcessor), you can configure which format dates are to be interpreted as (i.e. month-day-year) or otherwise.
Wildcards¶
It is also possible to use wildcards to match any number of trailing characters within a term; for example:
wild*
matcheswild, wildcard, wildcat, wilderness
This feature is disabled by default; to enable it, see ‘Parser Flags’
below. In Xapian 1.2.x and earlier it also required a database to be set on
the QueryParser (so that it could find the list of terms to expand the wildcard
to) - in Xapian 1.4.x and later the wildcard expansion doesn’t happen until
xapian.Enquire.get_mset()
is called.
By default the wildcard will expand to as many terms as there are with
the specified prefix. This can cause performance problems, so you can limit
the number of terms a wildcard will expand to by calling
xapian.QueryParser.set_max_wildcard_expansion()
. If this limit
would be exceeded then an exception will be thrown. The exception may
be thrown by the QueryParser, or later when Enquire handles the query.
Searching for Partially Entered Queries¶
The QueryParser also supports performing a search with a query which has only been partially entered. This is intended for use with “incremental search” systems, which don’t wait for the user to finish typing their search before displaying an initial set of results. For example, in such a system a user would enter a search, and the system would display a new set of results after each letter, or whenever the user pauses for a short period of time (or some other similar strategy).
The problem with this kind of search is that the last word in a partially entered query often has no semantic relation to the completed word. For example, a search for “dynamic cat” would return a quite different set of results to a search for “dynamic categorisation”. This results in the set of results displayed flicking rapidly as each new character is entered. A much smoother result can be obtained if the final word is treated as having an implicit terminating wildcard, so that it matches all words starting with the entered characters - thus, as each letter is entered, the set of results displayed narrows down to the desired subject.
A similar effect could be obtained simply by enabling the wildcard matching option, and appending a “*” character to each query string. However, this would be confused by searches which ended with punctuation or other characters.
This feature is disabled by default - pass
FLAG_PARTIAL
flag in the flags argument of
xapian.QueryParser.parse_query(query_string, flags)
to enable it,
and tell the QueryParser which database to expand wildcards from using
the xapian.QueryParser.set_database(database)
method.
Default Operator¶
When the QueryParser receives a query, it joins together its component
queries using a default operator which defaults to
OP_OR
but can be modified at run time.
Parser Flags¶
The operation of the QueryParser can be altered through the use of flags, combined with the bitwise OR operator; these flags include:
FLAG_BOOLEAN
: enables support for AND, OR, etc and bracketed expressionsFLAG_PHRASE
: enables support for phrase expressionsFLAG_LOVEHATE
: enables support for + and - operatorsFLAG_BOOLEAN_ANY_CASE
: enables support for lower/mixed case boolean operatorsFLAG_WILDCARD
: enables support for wildcards
You can find more information about the available flags in the API documentation.
By default, the QueryParser enables FLAG_BOOLEAN
,
FLAG_PHRASE
and FLAG_LOVEHATE
.