Full-text document indexing - part Ⅰ
Introduction
This article shows a simple way of doing full-text indexing and search of Office Open XML and OpenDocument1 documents using SQLite.
- prepare the database
- insert documents (indexing)
- query using full-text search
Preparation
Create a virtual table using SQLite's full-text search feature:
CREATE VIRTUAL TABLE documents
USING FTS5 (name, document, size)
Indexing
Because both the Office Open XML and OpenDocument formats are just zip archives containg a bunch of XML this is easy.
Add a single document
INSERT INTO documents (name, document, size)
SELECT :name, data, LENGTH(readfile(:name))
FROM zipfile(:name)
WHERE name = 'content.xml'
Note that this uses parameterized queries, so adapt it to your needs.
Search
SELECT rank, name, size
FROM documents
WHERE documents MATCH :terms
ORDER BY rank
This query could result in something like this:
rank | name | size |
---|---|---|
-0.845353371460758 | fiets.docx | 8814 |
-0.438947157337124 | products.ods | 8845 |
Next up?
In Full-text document indexing - part Ⅱ I'll dive into more details.