Do table-level CHECK constraints affect the query optimizer?

Started by Ronalmost 5 years ago10 messagesgeneral
Jump to latest
#1Ron
ronljohnsonjr@gmail.com

Postgresql 12.5

I've got a big (about 50M rows, but 1.4TB because of xml attachments)
partitioned table full of data that we're seeing sequential scans on, even
though there are supporting indices. Will adding CHECK constraints on the
children, which match the partition ranges influence the query optimizer?

(We'd try this on the test system -- which has just a few weeks less data
than prod -- but queries on it use the proper indices, and thus perform as
expected.  Both "test" and "prod" have identical IO systems, and
postgresql.conf files.)

Partitions: request_xml_p2015_07 FOR VALUES FROM ('2015-07-01 00:00:00') TO
('2015-08-01 00:00:00'),
            request_xml_p2015_08 FOR VALUES FROM ('2015-08-01 00:00:00') TO
('2015-09-01 00:00:00'),
            request_xml_p2015_09 FOR VALUES FROM ('2015-09-01 00:00:00') TO
('2015-10-01 00:00:00'),
            request_xml_p2015_10 FOR VALUES FROM ('2015-10-01 00:00:00') TO
('2015-11-01 00:00:00'),
[snip]
            request_xml_p2021_06 FOR VALUES FROM ('2021-06-01 00:00:00') TO
('2021-07-01 00:00:00'),
            request_xml_p2021_07 FOR VALUES FROM ('2021-07-01 00:00:00') TO
('2021-08-01 00:00:00'),
            request_xml_p2021_08 FOR VALUES FROM ('2021-08-01 00:00:00') TO
('2021-09-01 00:00:00')

--
Angular momentum makes the world go 'round.

#2Michael Lewis
mlewis@entrata.com
In reply to: Ron (#1)
Re: Do table-level CHECK constraints affect the query optimizer?

Are vacuum and analyze happening regularly on the live system? What's an
example query that uses indexes on test and does not on live? Does the live
system show poor estimates when executing 'explain analyze select...' and
the test system show semi-accurate row estimates?

50 million seems to be a fairly low row count to be partitioned. What
version is this on?

#3Ron
ronljohnsonjr@gmail.com
In reply to: Michael Lewis (#2)
Re: Do table-level CHECK constraints affect the query optimizer?

On 6/29/21 10:41 AM, Michael Lewis wrote:

Are vacuum and analyze happening regularly on the live system?

Yes.  There's a nightly cron job which vacuums those it thinks needs it
(though it's INSERT-heavy), and ditto ANALYZE.

Specifically, I ran ANALYZE on the prod table just before running the query.

Also, the sampling rate on all tables in both prod and test is 60000 rows.

What's an example query that uses indexes on test and does not on live?

SELECT COUNT(*) FROM sep_info_report_extract;

On prod, there's a list of "Parallel Seq Scan on xxxx_partname" records in
the EXPLAIN output, while the test system has a list of "Parallel Index Only
Scan using ..._idx" records.

(Yes, this is a simple COUNT(*) but it's *a simple reproducer*, which
*demonstrates* the kind of problems we're having on *much more complex
queries*.)

Does the live system show poor estimates when executing 'explain analyze
select...' and the test system show semi-accurate row estimates?

They're within 2% of each other.

50 million seems to be a fairly low row count to be partitioned. What
version is this on?

As explained in the OP, each record has a (sometimes large) XML record;
months can have up to 240GB.  Besides, partitioning makes for easy archiving.

--
Angular momentum makes the world go 'round.

#4Michael Lewis
mlewis@entrata.com
In reply to: Ron (#3)
Re: Do table-level CHECK constraints affect the query optimizer?

Other than rows being frozen on test (and not on live), I'm not aware of
anything that would push the planner to choose to do an index scan on an
entire table. Maybe someone else will chime in. Or, if you try running
vacuum freeze on live and can verify if that changes the result.

I'm not sure why sampling rate would matter at all if you are reading the
entire set of data.

What version?

Show quoted text
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#3)
Re: Do table-level CHECK constraints affect the query optimizer?

Ron <ronljohnsonjr@gmail.com> writes:

On 6/29/21 10:41 AM, Michael Lewis wrote:

What's an example query that uses indexes on test and does not on live?

SELECT COUNT(*) FROM sep_info_report_extract;

On prod, there's a list of "Parallel Seq Scan on xxxx_partname" records in
the EXPLAIN output, while the test system has a list of "Parallel Index Only
Scan using ..._idx" records.

It'd be worth checking pg_class.relallvisible page counts for the
partitions on both systems. If an IOS is possible, the main thing
that might push the planner to do a seqscan instead is if it thinks
that too little of the table is all-visible, which would tend to
inflate the index-only scan towards the same cost as a regular index
scan (which'll almost always be considered slower than seqscan).

If there's a significant difference in relallvisible fractions, that
would point to something different in your VACUUM housekeeping on
the two systems.

regards, tom lane

#6Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#5)
Re: Do table-level CHECK constraints affect the query optimizer?

On 6/29/21 11:42 AM, Tom Lane wrote:

Ron <ronljohnsonjr@gmail.com> writes:

On 6/29/21 10:41 AM, Michael Lewis wrote:

What's an example query that uses indexes on test and does not on live?

SELECT COUNT(*) FROM sep_info_report_extract;
On prod, there's a list of "Parallel Seq Scan on xxxx_partname" records in
the EXPLAIN output, while the test system has a list of "Parallel Index Only
Scan using ..._idx" records.

It'd be worth checking pg_class.relallvisible page counts for the
partitions on both systems.

Lots of 0 records in prod, and lots of "numbers" in test.

If an IOS is possible, the main thing
that might push the planner to do a seqscan instead is if it thinks
that too little of the table is all-visible, which would tend to
inflate the index-only scan towards the same cost as a regular index
scan (which'll almost always be considered slower than seqscan).

If there's a significant difference in relallvisible fractions, that
would point to something different in your VACUUM housekeeping on
the two systems.

Prod is brand new.  Loaded on Saturday; we saw this problem on Sunday during
pre-acceptance.  Thus, while running ANALYZE was top of the list of Things
To Do, running VACUUM was low.

Is that a mistaken belief?

--
Angular momentum makes the world go 'round.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#6)
Re: Do table-level CHECK constraints affect the query optimizer?

Ron <ronljohnsonjr@gmail.com> writes:

On 6/29/21 11:42 AM, Tom Lane wrote:

If there's a significant difference in relallvisible fractions, that
would point to something different in your VACUUM housekeeping on
the two systems.

Prod is brand new.  Loaded on Saturday; we saw this problem on Sunday during
pre-acceptance.  Thus, while running ANALYZE was top of the list of Things
To Do, running VACUUM was low.

Is that a mistaken belief?

ANALYZE won't update relallvisible AFAIR, while VACUUM will.
So if you are depending on lots of IOS, you need a round of
vacuuming.

regards, tom lane

#8Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Ron (#6)
Re: Do table-level CHECK constraints affect the query optimizer?

On Jun 29, 2021, at 10:33 AM, Ron <ronljohnsonjr@gmail.com> wrote:

Prod is brand new. Loaded on Saturday; we saw this problem on Sunday during pre-acceptance. Thus, while running ANALYZE was top of the list of Things To Do, running VACUUM was low.

Is that a mistaken belief?

You might want to run VACUUM FREEZE and then retry your test query using EXPLAIN. See if it switches to an index only scan after the VACUUM FREEZE.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#7)
Re: Do table-level CHECK constraints affect the query optimizer?

On 6/29/21 12:46 PM, Tom Lane wrote:

Ron <ronljohnsonjr@gmail.com> writes:

On 6/29/21 11:42 AM, Tom Lane wrote:

If there's a significant difference in relallvisible fractions, that
would point to something different in your VACUUM housekeeping on
the two systems.

Prod is brand new.  Loaded on Saturday; we saw this problem on Sunday during
pre-acceptance.  Thus, while running ANALYZE was top of the list of Things
To Do, running VACUUM was low.
Is that a mistaken belief?

ANALYZE won't update relallvisible AFAIR, while VACUUM will.
So if you are depending on lots of IOS, you need a round of
vacuuming.

What's an IOS?

--
Angular momentum makes the world go 'round.

#10Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Ron (#9)
Re: Do table-level CHECK constraints affect the query optimizer?

On Jun 29, 2021, at 11:02 AM, Ron <ronljohnsonjr@gmail.com> wrote:

What's an IOS?

An Index Only Scan. See https://www.postgresql.org/docs/14/indexes-index-only-scans.html

Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company