No access to TOAST tables shown in EXPLAIN ( ANALYZE, BUFFERS )
Hi,
I've found that EXPLAIN (ANALYZE,BUFFERS) does not show any access to the TOAST tables when a toasted column is only SELECTed and not used in any way in the query.
pg_statio_all_tables only shows access to the heap.
I imagine that some optimization does not bother to go into the TOAST table, as its content is neither printed neither used.
As a consequence, EXPLAIN plans can show huge differences against the actual query results, in buffers count and query time. As EXPLAIN is supposed to "display the true row counts and true run time accumulated within each plan node", I think there may be a problem.
I didn't see any related caveats in https://www.postgresql.org/docs/14/using-explain.html#USING-EXPLAIN-CAVEATS
For example, a simple "SELECT" from my one-column table takes ~ 20 ms and reads 9000 buffers, but the EXPLAIN ANALYZE is much much quicker:
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on public.noises (actual time=0.025..0.393 rows=3000 loops=1)
Output: noise
Buffers: shared hit=20
Query Identifier: 2512658633781838584
Planning Time: 0.030 ms
Execution Time: 0.689 ms
If I force the use of the result in the query (concatenation, length…), I see the 9000 missing hits again in the EXPLAIN plan.
This seems to be this way since 9.1 at least.
I'm wondering if this can be changed, or at least documented.
SQL script for a full demo is joined. The main parts of the output are below (I hope this is readable):
"8 MB test table"
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+--------+-------+----------+-------------+---------------+---------+-------------
public | noises | table | postgres | unlogged | heap | 8384 kB |
"** Reset stats **"
"**"
"** Query : SELECT noise FROM noises : ~ about 10-20 ms **"
"**"
Time: 16.350 ms
relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
---------+------------+------------------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
4697174 | public | noises | 0 | 20 | | | 0 | 3000 | 0 | 6001
4697177 | pg_toast | pg_toast_4697174 | 0 | 3000 | 0 | 6001 | | | |
"** Reset stats **"
"**"
"** Same query + EXPLAIN ANALYZE : < 1 ms , 20 hits **"
"**"
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on public.noises (actual time=0.006..0.199 rows=3000 loops=1)
Output: noise
Buffers: shared hit=20
Query Identifier: 1313741123942831781
Planning Time: 0.028 ms
Execution Time: 0.334 ms
pg_sleep
----------
relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
---------+------------+------------------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
4697174 | public | noises | 0 | 20 | | | 0 | 0 | 0 | 0
4697177 | pg_toast | pg_toast_4697174 | 0 | 0 | 0 | 0 | | | |
"** Reset stats **"
"**"
"** Force use of toast data + EXPLAIN ANALYZE : 10-20 ms, 9020 hits **"
"**"
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on public.noises (actual time=0.023..9.679 rows=3000 loops=1)
Output: (noise || ''::text)
Buffers: shared hit=9020
Query Identifier: -6122032322937607654
Planning Time: 0.026 ms
Execution Time: 9.843 ms
relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
---------+------------+------------------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
4697174 | public | noises | 0 | 20 | | | 0 | 3000 | 0 | 6000
4697177 | pg_toast | pg_toast_4697174 | 0 | 3000 | 0 | 6000 | | | |
--
Christophe Courtois
Consultant Dalibo
https://dalibo.com/
Attachments:
explain_court-circuite_toasts.sql.txttext/plain; charset=UTF-8; name=explain_court-circuite_toasts.sql.txtDownload
Christophe Courtois <christophe.courtois@dalibo.com> writes:
I've found that EXPLAIN (ANALYZE,BUFFERS) does not show any access to the TOAST tables when a toasted column is only SELECTed and not used in any way in the query.
This is probably because of the documented, long-standing behavior that
EXPLAIN does not convert the query's output rows to text, nor send them
to the client. If the datatype output functions aren't called, the
output datums won't get detoasted either.
I don't see anything here we want to change. The argument for not
bothering to model output costs has always been that they'd be the
same for every possible query plan, and I think that that applies to
detoasting costs as much as the actual output conversions. Moreover,
if we're not sending data to the client, the skipped network I/O could
easily represent a larger cost than anything else --- but there's no
reasonable way to account for that.
If you do care about those costs, a possible answer is to run
auto_explain, allowing you to capture data behind-the-scenes for
queries that really are sending data to clients.
regards, tom lane
Hi,
(redirecting to pgsql-docs, original discussion:
/messages/by-id/2586583.1643295137@sss.pgh.pa.us )
Le 27/01/2022 à 15:52, Tom Lane a écrit :
Christophe Courtois <christophe.courtois@dalibo.com> writes:
I've found that EXPLAIN (ANALYZE,BUFFERS) does not show any access to the TOAST tables when a toasted column is only SELECTed and not used in any way in the query.
This is probably because of the documented, long-standing behavior that
EXPLAIN does not convert the query's output rows to text, nor send them
to the client. If the datatype output functions aren't called, the
output datums won't get detoasted either.
So I suggest to add this case to the caveats
https://www.postgresql.org/docs/14/using-explain.html#USING-EXPLAIN-CAVEATS
"First, since no output rows are delivered to the client,
network transmission costs, I/O conversion costs
**(as TOAST tables access)**, are not included."
I don't see anything here we want to change. The argument for not
bothering to model output costs has always been that they'd be the
same for every possible query plan, and I think that that applies to
detoasting costs as much as the actual output conversions. Moreover,
if we're not sending data to the client, the skipped network I/O could
easily represent a larger cost than anything else --- but there's no
reasonable way to account for that.
OK.
If you do care about those costs, a possible answer is to run
auto_explain, allowing you to capture data behind-the-scenes for
queries that really are sending data to clients.
I'm afraid that auto_explain is not better in this case :-\
In the following example, auto_explain only shows the 30000 blocks
of the toast table when forced to use it,
although the whole table was dumped to a file through \o.
"*** SELECT without TOAST : short ; only heap table is read"
LOG: duration: 2.739 ms plan:
Query Text: SELECT from noises ;
Seq Scan on public.noises (cost=0.00..106.40 rows=10000 width=0) (actual time=0.006..1.668 rows=10000 loops=1)
Buffers: shared hit=64
-rw-rw-r-- 1 christ christ 19 févr. 4 14:33 /tmp/noise.txt
"*** SELECT including TOAST : long (the file is generated) but autoexplain shows only the heap!"
LOG: duration: 183.165 ms plan:
Query Text: SELECT noise from noises ;
Seq Scan on public.noises (cost=0.00..106.40 rows=10000 width=32) (actual time=0.005..1.311 rows=10000 loops=1)
Output: noise
Buffers: shared hit=64
-rw-rw-r-- 1 christ christ 24677211 févr. 4 14:33 /tmp/noise.txt
"*** SELECT with manipulated TOAST : same file and autoexplain shows all the hits"
LOG: duration: 198.416 ms plan:
Query Text: SELECT noise||'' from noises ;
Seq Scan on public.noises (cost=0.00..131.40 rows=10000 width=32) (actual time=0.034..51.882 rows=10000 loops=1)
Output: (noise || ''::text)
Buffers: shared hit=30064
-rw-rw-r-- 1 christ christ 24677211 févr. 4 14:33 /tmp/noise.txt
--
Christophe Courtois
Consultant Dalibo
https://dalibo.com/