Skip to content

Bug Report: Spikes in resource usage due to inefficient routing of UNION queries #18288

@arthurschreiber

Description

@arthurschreiber

Overview of the Issue

We've been seeing spikes in memory usage on our vtgates, sometimes leading to OOM kills of our vtgate pods due to a very specific query pattern.

Basically, we're running large-ish UNION queries that look like:

SELECT id FROM music WHERE music.id = 1 AND music.user_id IN (1)
UNION
SELECT id FROM music WHERE music.id = 2 AND music.user_id IN (1)
UNION
SELECT id FROM music WHERE music.id = 3 AND music.user_id IN (1)
UNION
...

Note

user_id is the primary vindex (hash), id is a secondary vindex (lookup)

These have up to 100 subqueries UNIONed together.


There seem to be two distinct issues with how these queries are handled by Vitess.

First, the queries are not routed efficiently. The whole query is broken up, and the individual subqueries are all executed separately, even if they share the same target route. At most, this should execute as many queries as there are shards in the target keyspace.

Second, the queries are all executed in parallel, without a limit. With large unions like this, this means that (in our case) a hundred goroutines get started at the same time. When multiple of these queries are executed around the same time, this causes grave spikes in memory usage in our vtgate pods. It also causes the connection pool on vttablet pods to be quickly exhausted, causing downstream effects.

Reproduction Steps

(see above)

Binary Version

19 and later

Operating System and Environment details

N/A

Log Fragments

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions