Estimating Query Timings in Elasticsearch

Authors

  • Sikha Bagui University of West Florida
  • Evorell Fridge University of West Florida

DOI:

https://doi.org/10.14738/tnc.92.9887

Keywords:

Elasticsearch, Elasticsearch Query, Query Cost Model, Document Frequency, Total Term Frequency, Term Vectors.

Abstract

In a shared Elasticsearch environment it can be useful to know how long a particular query will take to execute. This information can be used to enforce rate limiting or distribute requests equitably among multiple clusters. Elasticsearch uses multiple Lucene instances on multiple hosts as an underlying search engine implementation, but this abstraction makes it difficult to predict execution with previously known predictors such as the number of postings. This research investigates the ability of different pre-retrieval statistics, available through Elasticsearch, to accurately predict the execution time of queries on a typical Elasticsearch cluster. The number of terms in a query and the Total Term Frequency (TTF) from Elasticsearch’s API are found to significantly predict execution time. Regression models are then built and compared to find the most accurate method for predicting query time.

Author Biographies

Sikha Bagui, University of West Florida

Professor, Computer Science

Evorell Fridge, University of West Florida

Dr. Evorell Fridge is a web applications engineer and adjunct instructor in The Department of Computer Science at the University of West Florida in Pensacola, Florida. He enjoys introducing students to full-stack software development and researching topics in information retrieval and big data.

Downloads

Published

2021-04-23

How to Cite

Bagui, S., & Fridge, E. (2021). Estimating Query Timings in Elasticsearch. Discoveries in Agriculture and Food Sciences, 9(2), 15–36. https://doi.org/10.14738/tnc.92.9887