pg_dump from v13 is slow
I ran into this while running pg_upgrade from beta2 to beta3:
$ time sudo -u postgres sh -c 'cd /var/lib/pgsql; /usr/pgsql-13/bin/pg_upgrade -b /usr/pgsql-13b2/bin/ -d ./13b2/data -D ./13/data --link'
real 94m18.335s
This instances has many table partitions, and the production instance uses
tablespaces. Some of our tables are wide. This VM is not idle, but does not
account for being 20x slower.
pg_dump -v --section=pre-data ts |wc
1846659 4697507 59575253
real 39m8.524s
Compare v12 and v13:
|$ /usr/pgsql-12/bin/initdb -D 12
|$ /usr/pgsql-12/bin/postgres -D 12 -c shared_buffers=256MB -c max_locks_per_transaction=128 -c port=5678 -c unix_socket_directories=/tmp&
|$ psql -h /tmp -p 5678 postgres </srv/cdrperfbackup/ts/2020-08-10/pg_dumpall-g
|$ time pg_restore /srv/cdrperfbackup/ts/2020-08-10/pg_dump-section\=pre-data -d postgres -h /tmp -p 5678 --no-tablespaces # --clean --if-exist
| real 4m56.627s
|$ time pg_dump --section=pre-data postgres -h /tmp -p 5678 |wc
| 1823612 4504584 58379810
| real 1m4.452s
|/usr/pgsql-13/bin/initdb -D 13
|/usr/pgsql-13/bin/postgres -D 13 -c shared_buffers=256MB -c max_locks_per_transaction=128 -c port=5678 -c unix_socket_directories=/tmp&
|psql -h /tmp -p 5678 postgres </srv/cdrperfbackup/ts/2020-08-10/pg_dumpall-g
|time pg_restore /srv/cdrperfbackup/ts/2020-08-10/pg_dump-section\=pre-data -d postgres -h /tmp -p 5678 --no-tablespaces # --clean --if-exist
| real 6m49.964s
|$ time pg_dump --section=pre-data postgres -h /tmp -p 5678 |wc
| 1823612 4504584 58379813
| real 19m42.918s
I'm trying to narrow this down, but I'd be very happy for suggestions.
--
Justin
On 2020-Aug-13, Justin Pryzby wrote:
I'm trying to narrow this down, but I'd be very happy for suggestions.
Maybe you can time "pg_dump --binary-upgrade" to see if the slowness
comes from there.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2020-08-13 17:48:23 -0500, Justin Pryzby wrote:
I'm trying to narrow this down, but I'd be very happy for suggestions.
Would be worth knowing how much of the time pgbench is 100% CPU
utilized, and how much of the time it is basically waiting for server
side queries and largely idle.
If it's close to 100% busy for a significant part of that time, it'd be
useful to get a perf profile. If it's largely queries to the server that
are the issue, logging those would be relevant.
Greetings,
Andres Freund
I can reproduce the issue with generated data:
pryzbyj=# SELECT format('create table t%s(i int)', i) FROM generate_series(1,9999)i;
\set ECHO errors
\set QUIET
\gexec
$ time pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 |wc
110015 240049 1577087
real 0m50.445s
$ time pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612 |wc
110015 240049 1577084
real 0m11.203s
On Thu, Aug 13, 2020 at 04:30:14PM -0700, Andres Freund wrote:
Would be worth knowing how much of the time pgbench is 100% CPU
utilized, and how much of the time it is basically waiting for server
side queries and largely idle.
Good question - I guess you mean pg_dump.
$ command time -v pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612 |wc
Command being timed: "pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5612"
User time (seconds): 0.65
System time (seconds): 0.52
Percent of CPU this job got: 9%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.85
$ command time -v pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 |wc
Command being timed: "pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613"
User time (seconds): 0.79
System time (seconds): 0.49
Percent of CPU this job got: 2%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:48.51
So v13 was 4.5x slower and it seems to be all on the server side.
I looked queries like this:
time strace -ts999 -e sendto pg_dump --section=pre-data -d pryzbyj -h /tmp -p 5613 2>strace-13-3 |wc
cut -c1-66 strace-13-3 |sort |uniq |less
Most of the time is spent on these three queries:
|12:58:11 sendto(3, "Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|...
|12:58:30 sendto(3, "Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|12:58:32 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH
|...
|12:58:47 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH
|12:58:49 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p WHERE pr.prrelid
|...
|12:59:01 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p WHERE pr.prrelid
Compare with v12:
|12:57:58 sendto(3, "Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|...
|12:58:03 sendto(3, "Q\0\0\3\215SELECT\na.attnum,\na.attname,\na.atttypmod,\na.attstattarget,\na.attstorage,\nt.typstorage,\na.attn
|12:58:05 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH
|...
|12:58:07 sendto(3, "Q\0\0\1\314SELECT oid, tableoid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN pol.polroles = '{0}' TH
|12:58:09 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p WHERE pr.prrelid
|...
|12:58:11 sendto(3, "Q\0\0\0\213SELECT pr.tableoid, pr.oid, p.pubname FROM pg_publication_rel pr, pg_publication p WHERE pr.prrelid
The first query regressed the worst.
$ psql -h /tmp -Ap 5612 pryzbyj
psql (13beta3, server 12.4)
pryzbyj=# explain analyze SELECT a.attnum,a.attname,a.atttypmod,a.attstattarget,a.attstorage,t.typstorage,a.attnotnull,a.atthasdef,a.attisdropped,a.attlen,a.attalign,a.attislocal,pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,a.attgenerated,CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval,a.attidentity,pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E', ') AS attfdwoptions,CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,array_to_string(a.attoptions, ', ') AS attoptions FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = t.oid WHERE a.attrelid = '191444'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2 ORDER BY a.attnum;
QUERY PLAN
Nested Loop Left Join (cost=0.58..16.72 rows=1 width=217) (actual time=0.205..0.209 rows=1 loops=1)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.29..8.31 rows=1 width=189) (actual time=0.030..0.032 rows=1 loops=1)
Index Cond: ((attrelid = '191444'::oid) AND (attnum > '0'::smallint))
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.29..8.30 rows=1 width=9) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (oid = a.atttypid)
SubPlan 1
-> Sort (cost=0.09..0.09 rows=3 width=64) (actual time=0.119..0.119 rows=0 loops=1)
Sort Key: pg_options_to_table.option_name
Sort Method: quicksort Memory: 25kB
-> Function Scan on pg_options_to_table (cost=0.00..0.06 rows=3 width=64) (actual time=0.010..0.010 rows=0 loops=1)
Planning Time: 1.702 ms
Execution Time: 0.422 ms
$ psql -h /tmp -Ap 5613 pryzbyj
psql (13beta3)
pryzbyj=# explain analyze SELECT a.attnum,a.attname,a.atttypmod,a.attstattarget,a.attstorage,t.typstorage,a.attnotnull,a.atthasdef,a.attisdropped,a.attlen,a.attalign,a.attislocal,pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,a.attgenerated,CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval,a.attidentity,pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E', ') AS attfdwoptions,CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,array_to_string(a.attoptions, ', ') AS attoptions FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = t.oid WHERE a.attrelid = '164518'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2 ORDER BY a.attnum;
QUERY PLAN
Nested Loop Left Join (cost=0.58..16.72 rows=1 width=217) (actual time=0.134..0.139 rows=1 loops=1)
-> Index Scan using pg_attribute_relid_attnum_index on pg_attribute a (cost=0.29..8.31 rows=1 width=189) (actual time=0.028..0.030 rows=1 loops=1)
Index Cond: ((attrelid = '164518'::oid) AND (attnum > '0'::smallint))
-> Index Scan using pg_type_oid_index on pg_type t (cost=0.29..8.30 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (oid = a.atttypid)
SubPlan 1
-> Sort (cost=0.09..0.09 rows=3 width=64) (actual time=0.065..0.065 rows=0 loops=1)
Sort Key: pg_options_to_table.option_name
Sort Method: quicksort Memory: 25kB
-> Function Scan on pg_options_to_table (cost=0.00..0.06 rows=3 width=64) (actual time=0.005..0.005 rows=0 loops=1)
Planning Time: 1.457 ms
Execution Time: 0.431 ms
I don't know if it's any issue, but I found that pg12 can process "null
statements" almost 2x as fast:
$ time for a in `seq 1 9999`; do echo ';'; done |psql -h /tmp -p 5613 postgres
real 0m0.745s
$ time for a in `seq 1 9999`; do echo ';'; done |psql -h /tmp -p 5612 postgres
real 0m0.444s
--
Justin
Hmm, I wonder if you're comparing an assert-enabled pg13 build to a
non-assert-enabled pg12 build, or something like that.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 13, 2020 at 08:53:46PM -0400, Alvaro Herrera wrote:
Hmm, I wonder if you're comparing an assert-enabled pg13 build to a
non-assert-enabled pg12 build, or something like that.
Great question - I thought of it myself but then forgot to look..
$ rpm -q postgresql1{2,3}-server
postgresql12-server-12.4-1PGDG.rhel7.x86_64
postgresql13-server-13-beta3_1PGDG.rhel7.x86_64
$ /usr/pgsql-12/bin/pg_config |grep -o cassert || echo not found
not found
$ /usr/pgsql-13/bin/pg_config |grep -o cassert || echo not found
cassert
It looks like the beta packages are compiled with cassert, which makes sense.
Thanks and sorry for noise.
--
Justin
Justin Pryzby <pryzby@telsasoft.com> writes:
I can reproduce the issue with generated data:
pryzbyj=# SELECT format('create table t%s(i int)', i) FROM generate_series(1,9999)i;
Hm, I tried this case and didn't really detect much runtime difference
between v12 and HEAD.
I don't know if it's any issue, but I found that pg12 can process "null
statements" almost 2x as fast:
Now I'm suspicious that you're comparing an assert-enabled v13 build
to a non-assert-enabled v12 build. Check the output of
"pg_config --configure" from each installation to see if they're
configured alike.
regards, tom lane