[PATCH] pg_restore_extended_stats() can store an MCV list that cannot be read back

Started by Ewan Young8 days ago3 messageshackers
Jump to latest
#1Ewan Young
kdbase.hack@gmail.com

Hi,

pg_restore_extended_stats() does not bound the number of items in an
imported MCV list, but the read path rejects any list with more than
STATS_MCVLIST_MAX_ITEMS (= 10000) items. So an oversized list imports
successfully, gets written to pg_statistic_ext_data, and then makes
every read of that statistics object fail.

Reproduction (current master):

CREATE TABLE t (a int, b int);
INSERT INTO t SELECT g % 100, g % 50 FROM generate_series(1, 1000) g;
CREATE STATISTICS t_s (mcv) ON a, b FROM t;
ANALYZE t;

SELECT pg_restore_extended_stats(
'schemaname', 'public', 'relname', 't',
'statistics_schemaname', 'public', 'statistics_name', 't_s',
'inherited', false,
'most_common_vals', (SELECT array_agg(ARRAY[g::text, (g*7)::text])
FROM generate_series(1, 10001) g),
'most_common_freqs', (SELECT array_agg((0.5/10001)::float8)
FROM generate_series(1, 10001) g),
'most_common_base_freqs', (SELECT array_agg((0.5/10001)::float8)
FROM generate_series(1, 10001) g));
-- returns t

EXPLAIN SELECT * FROM t WHERE a = 1 AND b = 7;
-- ERROR: invalid length (10001) item array in MCVList (XX000)

The statistics object is then unusable until cleared. With more than
65535 items, an assertion-enabled build crashes instead (the Assert in
mcv_get_match_bitmap()).

The cause is a write/read asymmetry: import_mcv()
(extended_stats_funcs.c) hands the input item count to
statext_mcv_import() unbounded, while statext_mcv_deserialize()
(mcv.c) rejects nitems > STATS_MCVLIST_MAX_ITEMS. This is the same
family as 6d6348f0329 (CVE-2026-6575) and 0b8fa5fd37b, both of which
note import_mcv() was not affected by their issue -- the item-count
bound is a separate, still-open gap.

The attached patch adds that bound to import_mcv(), rejecting an
oversized list with a WARNING before anything is stored. A regression
test is added to stats_import.sql; "make check" passes.

Thanks,
Ewan Young

Attachments:

v1-0001-Reject-oversized-MCV-lists-in-extended-statistics.patchapplication/octet-stream; name=v1-0001-Reject-oversized-MCV-lists-in-extended-statistics.patchDownload+52-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Ewan Young (#1)
Re: [PATCH] pg_restore_extended_stats() can store an MCV list that cannot be read back

On Tue, Jun 16, 2026 at 11:15:46AM +0800, Ewan Young wrote:

The statistics object is then unusable until cleared. With more than
65535 items, an assertion-enabled build crashes instead (the Assert in
mcv_get_match_bitmap()).

The cause is a write/read asymmetry: import_mcv()
(extended_stats_funcs.c) hands the input item count to
statext_mcv_import() unbounded, while statext_mcv_deserialize()
(mcv.c) rejects nitems > STATS_MCVLIST_MAX_ITEMS. This is the same
family as 6d6348f0329 (CVE-2026-6575) and 0b8fa5fd37b, both of which
note import_mcv() was not affected by their issue -- the item-count
bound is a separate, still-open gap.

Hmm. While I was re-reading statext_mcv_[de]serialize(), my first
thought was if we'd have a risk of out-of-bound read when the data is
loaded back, but I don't see a pattern here. So while the problem is
the same as 6d6348f0329, the consequences are not alarming. The case
where we have more than 65k items is problematic because we can have a
wraparound calculation in statext_mcv_serialize() (close to the
"compute index within the deduplicated array"), meaning that we would
read buggy data, not point at an incorrect memory area.

The attached patch adds that bound to import_mcv(), rejecting an
oversized list with a WARNING before anything is stored. A regression
test is added to stats_import.sql; "make check" passes.

Sounds good to me. Thanks for the report.

Before someone asks, the extstats restore code has inherited this
pattern from the attribute restore code, where functions like
var_eq_const() don't care about the limitation in the number of MCV
items, even with a attstattarget MAX_STATISTICS_TARGET (10k) that caps
the number of MCVs on ANALYZE. So one could inject more items than
10k, but contrary to the extstats case they can be loaded back without
an error.
--
Michael

#3Ewan Young
kdbase.hack@gmail.com
In reply to: Michael Paquier (#2)
Re: [PATCH] pg_restore_extended_stats() can store an MCV list that cannot be read back

On Tue, Jun 16, 2026 at 1:02 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jun 16, 2026 at 11:15:46AM +0800, Ewan Young wrote:

The statistics object is then unusable until cleared. With more than
65535 items, an assertion-enabled build crashes instead (the Assert in
mcv_get_match_bitmap()).

The cause is a write/read asymmetry: import_mcv()
(extended_stats_funcs.c) hands the input item count to
statext_mcv_import() unbounded, while statext_mcv_deserialize()
(mcv.c) rejects nitems > STATS_MCVLIST_MAX_ITEMS. This is the same
family as 6d6348f0329 (CVE-2026-6575) and 0b8fa5fd37b, both of which
note import_mcv() was not affected by their issue -- the item-count
bound is a separate, still-open gap.

Hmm. While I was re-reading statext_mcv_[de]serialize(), my first
thought was if we'd have a risk of out-of-bound read when the data is
loaded back, but I don't see a pattern here. So while the problem is
the same as 6d6348f0329, the consequences are not alarming. The case
where we have more than 65k items is problematic because we can have a
wraparound calculation in statext_mcv_serialize() (close to the
"compute index within the deduplicated array"), meaning that we would
read buggy data, not point at an incorrect memory area.

The attached patch adds that bound to import_mcv(), rejecting an
oversized list with a WARNING before anything is stored. A regression
test is added to stats_import.sql; "make check" passes.

Sounds good to me. Thanks for the report.

Thanks for the review!

Before someone asks, the extstats restore code has inherited this
pattern from the attribute restore code, where functions like
var_eq_const() don't care about the limitation in the number of MCV
items, even with a attstattarget MAX_STATISTICS_TARGET (10k) that caps
the number of MCVs on ANALYZE. So one could inject more items than
10k, but contrary to the extstats case they can be loaded back without
an error.
--
Michael

Regards,
Ewan