joriszwart.nl

I code dreams™

Full-text document indexing - part Ⅰ

Full-text search Office Open XML OpenDocument SQLite

Introduction

This article shows a simple way of doing full-text indexing and search of Office Open XML and OpenDocument1 documents using SQLite.

  1. prepare the database
  2. insert documents (indexing)
  3. query using full-text search

Preparation

Create a virtual table using SQLite's full-text search feature:

CREATE VIRTUAL TABLE documents
USING FTS5 (name, document, size)

Indexing

Because both the Office Open XML and OpenDocument formats are just zip archives containg a bunch of XML this is easy.

Add a single document

INSERT INTO documents (name, document, size)
    SELECT :name, data, LENGTH(readfile(:name))
    FROM zipfile(:name)
    WHERE name = 'content.xml'

Note that this uses parameterized queries, so adapt it to your needs.

SELECT rank, name, size
FROM documents
WHERE documents MATCH :terms
ORDER BY rank

This query could result in something like this:

rank name size
-0.845353371460758 fiets.docx 8814
-0.438947157337124 products.ods 8845

Next up?

In Full-text document indexing - part Ⅱ I'll dive into more details.

Notes