Bridging the Indexing Gap: Empirical Evaluation of Application-Side Optimisation Techniques for Native Vector Search in MySQL 9.0

Authors

  • Amos Ngoah School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
  • Dawit Abdisa Jebessa School of Software Engineering, University of Science and Technology of China, Suzhou, China
  • Yubo Yan School of Computer Science and Technology, University of Science and Technology of China, Hefei, China

DOI:

https://doi.org/10.14738/ejas.1403.6884

Keywords:

MySQL, Vector Search, Database Optimisation, RAG, High-Dimensional Data

Abstract

The integration of vector similarity search into relational database management systems has become important for applications built around Retrieval-Augmented Generation. MySQL 9.0 introduces a native VECTOR data type, but the Community Edition still lacks server-side vector indexes and native distance functions, forcing similarity search to rely on a client-side full-scan model. This paper evaluates the practical consequences of that design through experiments on synthetic datasets of up to 10^5 vectors, together with a SIFT1M validation and small-scale sentence-embedding checks. The study focuses on where time is spent during execution by separating the costs of transfer, parsing, and computation. Results show that native binary VECTOR storage substantially improves data-handling efficiency over JSON-based storage, achieving up to 5.9x lower end-to-end latency in the tested settings, and that the dominant bottleneck shifts toward data movement as scale and dimensionality grow. Two established deployment strategies are then assessed. Hybrid Search, which combines scalar B-Tree filtering with vector ranking, achieves up to 54x speedup when selective metadata predicates are available. A two-pass strategy based on Principal Component Analysis (PCA) reduces transfer volume and achieves up to a 15.5x speedup for high-dimensional workloads, though its recall depends on the compression setting and data domain. Additional tests show that Hybrid Search is the most stable option under modest concurrency. The paper concludes with a practical decision framework for choosing between native VECTOR storage, scalar pre-filtering, PCA compression, and migration to a system with native approximate nearest-neighbour execution.

Downloads

Published

2026-05-13

How to Cite

Ngoah, A., Jebessa, D. A., & Yan, Y. (2026). Bridging the Indexing Gap: Empirical Evaluation of Application-Side Optimisation Techniques for Native Vector Search in MySQL 9.0. European Journal of Applied Sciences, 14(03), 43–66. https://doi.org/10.14738/ejas.1403.6884