Full-text document indexing - part Ⅰ
Introduction
This article shows a simple way of doing full-text indexing and search of Office Open XML and OpenDocument1 documents using SQLite.
- prepare the database
- insert documents (indexing)
- query using full-text search
Preparation
Create a virtual table using SQLite's full-text search feature:
CREATE VIRTUAL TABLE documents USING FTS5 (name, document, size)
Indexing
Because both the Office Open XML and OpenDocument formats are just zip archives containg a bunch of XML this is easy.
Add a single document
INSERT INTO documents (name, document, size) SELECT :name, data, LENGTH(readfile(:name)) FROM zipfile(:name) WHERE name = 'content.xml'
Note that this uses parameterized queries, so adapt it to your needs.
Search
SELECT rank, name, size FROM documents WHERE documents MATCH :terms ORDER BY rank
This query could result in something like this:
rank | name | size |
---|---|---|
-0.845353371460758 | fiets.docx | 8814 |
-0.438947157337124 | products.ods | 8845 |
Next up?
In Full-text document indexing - part Ⅱ we'll implement a simple search engine.