Thanks a lot for your comment! We agree that a dataset as small as 5 GB may sound strange but it was a conscious decision. Check out our blog post to read more about the methodology of this benchmark itself.
TLDR It's not our choice, but it's meaningful. Because this 5GB is single data segment and literally what you will have in Elastic/etc when you have overall TBs of data. See https://www.elastic.co/docs/deploy-manage/production-guidanc... (single shard is one Lucene index that contains multiple data segments)
Great results! Refreshing to see a project that actually went the extra mile and built the core search engine in C++ from scratch, unlike most similar projects that just wrap an existing library.
https://blog.serenedb.com/search-benchmark-game-overview