-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Overview of the Issue
We've been seeing spikes in memory usage on our vtgates, sometimes leading to OOM kills of our vtgate pods due to a very specific query pattern.
Basically, we're running large-ish UNION queries that look like:
SELECT id FROM music WHERE music.id = 1 AND music.user_id IN (1)
UNION
SELECT id FROM music WHERE music.id = 2 AND music.user_id IN (1)
UNION
SELECT id FROM music WHERE music.id = 3 AND music.user_id IN (1)
UNION
...Note
user_id is the primary vindex (hash), id is a secondary vindex (lookup)
These have up to 100 subqueries UNIONed together.
There seem to be two distinct issues with how these queries are handled by Vitess.
First, the queries are not routed efficiently. The whole query is broken up, and the individual subqueries are all executed separately, even if they share the same target route. At most, this should execute as many queries as there are shards in the target keyspace.
Second, the queries are all executed in parallel, without a limit. With large unions like this, this means that (in our case) a hundred goroutines get started at the same time. When multiple of these queries are executed around the same time, this causes grave spikes in memory usage in our vtgate pods. It also causes the connection pool on vttablet pods to be quickly exhausted, causing downstream effects.
Reproduction Steps
(see above)
Binary Version
19 and laterOperating System and Environment details
N/ALog Fragments
N/A