Helping planner to chose sequential scan when it improves performance

Started by Patrick O'Toolealmost 3 years ago6 messagesgeneral
Jump to latest
#1Patrick O'Toole
patrick.otoole@sturdy.ai

Hi all,

I recently started at a new firm and have been trying to help to grok
certain planner behavior. A strip-down example of the sort of join we do in
the database looks like this, wherein we join two tables that have about 1
million rows:

-- VACUUM (FULL, VERBOSE, ANALYZE), run the query twice first, then...
EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
SELECT
ci.conversation_uuid,
ci.item_uuid,
ci.tenant_id,
it.item_subject,
it.item_body_start
FROM
conversation_item AS ci
INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

-- The necessary DDL that creates these tables and indexes is attached.
I've commented out some extra stuff that isn't directly related to the
above query.

Depending on config, we get different results in terms of performance
(EXPLAIN output attached):

PLAN A (default config, effective cache size just shy of 15GB): 3.829
seconds. A nested loop is used to probe the hash index
`conversation_item_item_hash_index` for each row of item_text. Although the
cost of probing once is low, a fair amount of time passes because the
operation is repeated ~1.3 million times.

PLAN B (enable_indexscan off, effective cache same as before): 3.254
seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially
and conversation_item is hashed before results are combined with a hash
join.

PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same
as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN
B, some differences in buffers and I/O. I'll note we had to get to 8.0
before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low
to make a difference.

Environment:

Postgres 15.2
Amazon RDS — db.m6g.2xlarge

Questions:

1. In Plan A, what factors are causing the planner to select a
substantially slower plan despite having recent stats about number of rows?
2. Is there a substantial difference between the on-the-fly hash done in
Plan B and Plan C compared to the hash-index used in Plan A? Can I assume
they are essentially the same? Perhaps there are there differences in how
they're applied?
3. Is it common to see values for random_page_cost set as high as 8.0?
We would of course need to investigate whether we see a net positive or net
negative impact on other queries, to adopt this as a general setting, but
is it a proposal we should actually consider?
4. Maybe we are barking up the wrong tree with the previous questions.
Are there other configuration parameters we should consider first to
improve performance in situations like the one illustrated?
5. Are there other problems with our schema, query, or plans shown here?
Other approaches (or tools/analyses) we should consider?

Attachments:

PLANC.txttext/plain; charset=US-ASCII; name=PLANC.txtDownload
PLANB.txttext/plain; charset=US-ASCII; name=PLANB.txtDownload
PLANA.txttext/plain; charset=US-ASCII; name=PLANA.txtDownload
create_statements.postgre.sqlapplication/octet-stream; name=create_statements.postgre.sqlDownload
#2Ruslan Zakirov
ruslan.zakirov@gmail.com
In reply to: Patrick O'Toole (#1)
Re: Helping planner to chose sequential scan when it improves performance

On Tue, Jun 13, 2023 at 10:28 PM Patrick O'Toole <patrick.otoole@sturdy.ai>
wrote:

Hi all,

Questions:

1. In Plan A, what factors are causing the planner to select a
substantially slower plan despite having recent stats about number of rows?

Estimated overall cost. For Plan A it is ~200k. For plans B/C (haven't

noticed any differences in these two) it is ~250k. The planner uses a less
expensive plan.

Also, in the plans you can see that Pg estimates the number of rows
correctly.

1. Is there a substantial difference between the on-the-fly hash done
in Plan B and Plan C compared to the hash-index used in Plan A? Can I
assume they are essentially the same? Perhaps there are there differences
in how they're applied?

I don't see any difference in plans B and C, but you report timing

changes. To me this looks like just a fluctuation in measurements. So I
wouldn't trust any measurements for plan A either.

I'm not a big expert, but can not say that plan A and B are essentially the
same.

Plan A: DB scans item_text table and for every record looks into the index
of conversation_item table, then looks into the table itself.

Plan B/C: DB scans conversation_item table without looking into its indexes
building a hash table on the fly.

1. Is it common to see values for random_page_cost set as high as 8.0?
We would of course need to investigate whether we see a net positive or net
negative impact on other queries, to adopt this as a general setting, but
is it a proposal we should actually consider?

No idea.

1. Maybe we are barking up the wrong tree with the previous questions.
Are there other configuration parameters we should consider first to
improve performance in situations like the one illustrated?

Recheck your numbers.

1. Are there other problems with our schema, query, or plans shown
here? Other approaches (or tools/analyses) we should consider?

You can try the following index:

CREATE INDEX conversation_item_ruz1 ON conversation_item(item_uuid,
conversation_uuid, tenant_id);

I believe this index would allow Pg to use "index only scan" as variation
of Plan A and avoid touching the conversation_item table completely.

--
Best regards, Ruslan.

#3Patrick O'Toole
patrick.otoole@sturdy.ai
In reply to: Patrick O'Toole (#1)
Fwd: Helping planner to chose sequential scan when it improves performance

Hello! I tried asking this over on the general listserv before realizing
pgsql-performance is probably better suited.

Hi all,

I recently started at a new firm and have been trying to help to grok
certain planner behavior. A strip-down example of the sort of join we do in
the database looks like this, wherein we join two tables that have about 1
million rows:

-- VACUUM (FULL, VERBOSE, ANALYZE), run the query twice first, then...
EXPLAIN(ANALYZE, VERBOSE, COSTS, SETTINGS, BUFFERS, WAL, TIMING, SUMMARY)
SELECT
ci.conversation_uuid,
ci.item_uuid,
ci.tenant_id,
it.item_subject,
it.item_body_start
FROM
conversation_item AS ci
INNER JOIN item_text AS it ON it.item_uuid = ci.item_uuid;

-- The necessary DDL that creates these tables and indexes is attached.
I've commented out some extra stuff that isn't directly related to the
above query.

Depending on config, we get different results in terms of performance
(EXPLAIN output attached):

PLAN A (default config, effective cache size just shy of 15GB): 3.829
seconds. A nested loop is used to probe the hash index
`conversation_item_item_hash_index` for each row of item_text. Although the
cost of probing once is low, a fair amount of time passes because the
operation is repeated ~1.3 million times.

PLAN B (enable_indexscan off, effective cache same as before): 3.254
seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially
and conversation_item is hashed before results are combined with a hash
join.

PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache same
as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as PLAN
B, some differences in buffers and I/O. I'll note we had to get to 8.0
before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low
to make a difference.

Environment:

Postgres 15.2
Amazon RDS — db.m6g.2xlarge

Questions:

1. In Plan A, what factors (read: which GUC settings) are causing the
planner to select a substantially slower plan despite having recent stats
about number of rows?
2. Is there a substantial difference between the on-the-fly hash done in
Plan B and Plan C compared to the hash-index used in Plan A? Can I assume
they are essentially the same? Perhaps there are there differences in how
they're applied?
3. Is it common to see values for random_page_cost set as high as 8.0?
We would of course need to investigate whether we see a net positive or net
negative impact on other queries, to adopt this as a general setting, but
is it a proposal we should actually consider?
4. Maybe we are barking up the wrong tree with the previous questions.
Are there other configuration parameters we should consider first to
improve performance in situations like the one illustrated?
5. Are there other problems with our schema, query, or plans shown here?
Other approaches (or tools/analyses) we should consider?

Attached also are the results of

SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');

- Patrick

Attachments:

PLANC.txttext/plain; charset=US-ASCII; name=PLANC.txtDownload
PLANB.txttext/plain; charset=US-ASCII; name=PLANB.txtDownload
PLANA.txttext/plain; charset=US-ASCII; name=PLANA.txtDownload
create_statements.postgre.sqlapplication/sql; name=create_statements.postgre.sqlDownload
non-default_server_config_values.csvtext/csv; charset=US-ASCII; name=non-default_server_config_values.csvDownload
#4Jeff Janes
jeff.janes@gmail.com
In reply to: Patrick O'Toole (#1)
Re: Helping planner to chose sequential scan when it improves performance

On Tue, Jun 13, 2023 at 3:28 PM Patrick O'Toole <patrick.otoole@sturdy.ai>
wrote:

run the query twice first, then...

Is that a realistic way to run the test? Often forcing all the data needed
for the query into memory is going to make things less realistic, not more
realistic. Assuming the system has more stuff to do than just perform this
one query, it might be unusual for the query to find everything it needs in
memory. Also, if you really do want to do it this way, then you should do
this for every plan. Different plans need to access a different
collections of buffers, so prewarming just one plan will privilege that one
over the others.

PLAN A (default config, effective cache size just shy of 15GB): 3.829
seconds. A nested loop is used to probe the hash index
`conversation_item_item_hash_index` for each row of item_text. Although the
cost of probing once is low, a fair amount of time passes because the
operation is repeated ~1.3 million times.

PLAN B (enable_indexscan off, effective cache same as before): 3.254
seconds (~15% speedup, sometimes 30%). Both tables are scanned sequentially
and conversation_item is hashed before results are combined with a hash
join.

PLAN C: (random_page_cost = 8.0, instead of default 4, effective cache
same as before): 2.959 (~23% speedup, sometimes 38%). Same overall plan as
PLAN B, some differences in buffers and I/O. I'll note we had to get to 8.0
before we saw a change to planner behavior; 5.0, 6.0, and 7.0 were too low
to make a difference.

The difference between B and C looks like it is entirely noise, having to
do with how many buffers it found already in the cache and how many of them
needed cleaning (which causes the buffer to be dirty as the cleaned version
now needs to be written to disk) and how many dirty buffers it found that
needed to be written in order to make way to read other buffers it needs.
(This last number most generally reflects dirty buffers left around by
other things which this query encountered, not the buffers the query itself
dirtied). None of this is likely to be reproducible, and so not worth
investigating.

And the difference between A and BC is small enough that it is unlikely to
be worth pursuing, either, even if it is reproducible. If your apps runs
this one exact query often enough that a 30% difference is worth worrying
about, you would probably be better served by questioning the business
case. What are you doing with 1.4 million rows once you do fetch them,
that it needs to be repeated so often?

If you think that taking a deep dive into this one query is going to
deliver knowledge which will pay off for other (so far unexamined) queries,
I suspect you are wrong. Look for queries where the misestimation is more
stark than 30% to serve as your case studies.

Environment:

Postgres 15.2
Amazon RDS — db.m6g.2xlarge

Questions:

In Plan A, what factors are causing the planner to select a substantially
slower plan despite having recent stats about number of rows?

Even if it were worth trying to answer this (which I think it is not),
there isn't much we can do with dummy tables containing no data. You would
need to include a script to generate data of a size and distribution which
reproduces the given behavior.

Is there a substantial difference between the on-the-fly hash done in

Plan B and Plan C compared to the hash-index used in Plan A? Can I assume
they are essentially the same? Perhaps there are there differences in how
they're applied?

They are pretty much entirely different. Once jumps all over the index on
disk, the other reads the table sequentially and (due to work_mem) parcels
it out into chunks where it expects each chunk can also be read back in
sequentially as well. About the only thing not different is that they both
involve computing a hash function.

Is it common to see values for random_page_cost set as high as 8.0? We

would of course need to investigate whether we see a net positive or net
negative impact on other queries, to adopt this as a general setting, but
is it a proposal we should actually consider?

I've never needed to set it that high, but there is no a priori reason it
wouldn't make sense to do. Settings that high would probably only be
suitable for HDD (rather than SSD) storage and when caching is not very
effective, which does seem to be the opposite of your situation. So I
certainly wouldn't do it just based on the evidence at hand.

Cheers,

Jeff

Show quoted text
#5David Rowley
dgrowleyml@gmail.com
In reply to: Patrick O'Toole (#1)
Re: Helping planner to chose sequential scan when it improves performance

On Wed, 14 Jun 2023 at 07:28, Patrick O'Toole <patrick.otoole@sturdy.ai> wrote:

Maybe we are barking up the wrong tree with the previous questions. Are there other configuration parameters we should consider first to improve performance in situations like the one illustrated?

random_page_cost and effective_cache_size are the main settings which
will influence plan A vs plan B. Larger values of
effective_cache_size will have the planner apply more seq_page_costs
to the index scan. Lower values of effective_cache_size will mean
more pages will be assumed to cost random_page_cost.

David

#6Jeff Janes
jeff.janes@gmail.com
In reply to: David Rowley (#5)
Re: Helping planner to chose sequential scan when it improves performance

On Sun, Jun 25, 2023 at 3:48 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 14 Jun 2023 at 07:28, Patrick O'Toole <patrick.otoole@sturdy.ai>
wrote:

Maybe we are barking up the wrong tree with the previous questions. Are

there other configuration parameters we should consider first to improve
performance in situations like the one illustrated?

random_page_cost and effective_cache_size are the main settings which
will influence plan A vs plan B. Larger values of
effective_cache_size will have the planner apply more seq_page_costs
to the index scan.

Squeezing otherwise-random page costs towards seq_page_costs is what bitmap
scans do, and what large index scans with high pg_stats.correlation do.
But effective_cache_size does something else, it squeezes the per page
costs towards zero, not towards seq_page_costs. This is surely not
accurate, as the costs of locking the buffer mapping partition, finding the
buffer or reading it from the kernel cache if not found, maybe faulting the
buffer from main memory into on-CPU memory, pinning the buffer, and
read-locking it are certainly well above zero, even if not nearly as high
as seq_page_cost. I'd guess they are truly about 2 to 5 times a
cpu_tuple_cost per buffer. But zero is what they currently get, there is
no knob to twist to change that.

Lower values of effective_cache_size will mean
more pages will be assumed to cost random_page_cost.

Sure, but it addresses the issue only obliquely (as does raising
random_page_cost) not directly. So the change you need to make to them
will be large, and will likely make other things worse.

Cheers,

Jeff