Search & Discovery Improvements

by Harsh Chandwani

project
gsoc
gsoc2026
SearchDiscovery
search
opensearch

Project Overview

Hi, I’m Harsh Chandwani, participating in Google Summer of Code 2026 with CDLI. I’ve been accepted for the project Search & Discovery Improvements.

CDLI (Cuneiform Digital Library Initiative) hosts over 340,000 cuneiform artifacts, and search is how most people actually use it. Today that search has three gaps: indexing runs only once nightly (a Logstash full-rebuild of Elasticsearch), so an approved edit can take up to 24 hours to appear; only artifacts go through the search engine, while publications, collections, proveniences, and periods fall back to plain database LIKE queries; and Elasticsearch’s license conflicts with CDLI’s open-source model.

This project fixes all three. It moves to event-driven incremental indexing (Transactional Outbox) so approved edits appear in seconds, migrates the backend to OpenSearch (Apache 2.0) with a PHP document builder replacing the Logstash Ruby transform, and extends full search to publications, collections, proveniences, and periods.

Student Harsh Chandwani
Mentors Émilie Pagé-Perron, Vedant Wakalkar
Proposal Search & Discovery Improvements
GSoC Project GSoC’26
Project Idea Ideas List #4.2
Contributions Merge Requests
Repository cdli/framework

Search & Discovery: Community Bonding

by Harsh Chandwani


Search & Discovery: Week 1

by Harsh Chandwani


Search & Discovery: Week 2

by Harsh Chandwani


Search & Discovery: Week 3

by Harsh Chandwani