Hot Standby Conflict on pg_attribute
Hello,
A client has recently had a couple of hot standby query conflict pile-ups
around AccessShare lock waits on pg_attribute. Here is an example from the
log:
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-1] sql_error_code = 00000 LOG: process 118946 still waiting for
AccessShareLock on relation 1249 of database 16401 after 1000.127 ms at
character 92
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-2] sql_error_code = 00000 DETAIL: Process holding the lock: 9. Wait
queue: 118948, 118950, 118708, 118818, 118886, 118961, 118960, 118806,
118963, 118959, 118881, 118887, 118878, 118896, 118964, 118965, 118945,
118949, 118946, 118743, 118966, 118947, 118967, 118968.
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-3] sql_error_code = 00000 STATEMENT: SELECT uc.id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-4] uc.some_id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-5] uc.utr_id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-6] utr.name
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-7] FROM usertable1 uc
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-8] INNER JOIN usertable2 utr
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-9] ON uc.utr_id = utr.id
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-10] WHERE uc.some_id = $1
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-11] ORDER BY name
Relation 1249 is pg_attribute and process 9 that was holding the lock was
RecoveryWalAll process. I've confirmed that autovacuum had removed some
pages from pg_attribute shortly before this, which happens somewhat
regularly since this client runs a couple thousand REFERSH MATARIALIZED
VIEW queries per day which look to cause inserts and deletes there so it
having an exclusive lock on pg_attribute makes sense.
The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute? Thus far we've been unable to recreate
any transacitons with the above query (and others) that show any
pg_attribute locks. There is no ORM in play here and these queries are
being sent as single query transactions via this Node.js postgres adapter:
https://github.com/brianc/node-postgres which is pretty bare bones.
--
Erik Jones
mage2k@gmail.com
Hi,
On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute? Thus far we've been unable to recreate
any transacitons with the above query (and others) that show any
pg_attribute locks. There is no ORM in play here and these queries are
being sent as single query transactions via this Node.js postgres adapter:
https://github.com/brianc/node-postgres which is pretty bare bones.
Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.
You can reproduce that fairly easily:
S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;
S1 could execute the select, because it has a cached view of the way the
relation looks.
S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.
Greetings,
Andres Freund
Hi Andres,
Thank you very much! That's exactly what I needed.
On Fri, May 10, 2019 at 12:14 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute? Thus far we've been unable to recreate
any transacitons with the above query (and others) that show any
pg_attribute locks. There is no ORM in play here and these queries are
being sent as single query transactions via this Node.js postgresadapter:
https://github.com/brianc/node-postgres which is pretty bare bones.
Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.You can reproduce that fairly easily:
S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;S1 could execute the select, because it has a cached view of the way the
relation looks.S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.Greetings,
Andres Freund
--
Erik Jones
mage2k@gmail.com
Andres Freund <andres@anarazel.de> writes:
On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute?
Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.
Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.
regards, tom lane
On Fri, May 10, 2019 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute?Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.regards, tom lane
Right, that part I understood after checking out pg_attribute's
insert/delete counts in pg_stat_sys_tables before and after some REFRESH
MATERIALIZED VIEW runs on an otherwise idle server. With them running 2k+
refreshes per day autovac is regularly working on their catalog tables.
Thanks!
--
Erik Jones
mage2k@gmail.com
Just a quick footnote: If autovac truncations are frequently causing replica lag, and if this is a problem for you, IIUC one way you can stop autovac from doing the truncations even on older versions is setting old_snapshot_threshold to any value at all besides zero. (On 12+ you can directly control the truncation behavior.)
-Jeremy
Sent from my TI-83
Show quoted text
On May 10, 2019, at 12:46, Erik Jones <mage2k@gmail.com> wrote:
On Fri, May 10, 2019 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
On 2019-05-09 13:03:50 -0700, Erik Jones wrote:
The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute?Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.regards, tom lane
Right, that part I understood after checking out pg_attribute's insert/delete counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW runs on an otherwise idle server. With them running 2k+ refreshes per day autovac is regularly working on their catalog tables.
Thanks!
--
Erik Jones
mage2k@gmail.com
Thanks for the ti
On Sat, May 11, 2019 at 9:15 AM Jeremy Schneider <schneider@ardentperf.com>
wrote:
Just a quick footnote: If autovac truncations are frequently causing
replica lag, and if this is a problem for you, IIUC one way you can stop
autovac from doing the truncations even on older versions is setting
old_snapshot_threshold to any value at all besides zero. (On 12+ you can
directly control the truncation behavior.)-Jeremy
Thanks for the tip!
--
Erik Jones
mage2k@gmail.com