Hot Standby Conflict on pg_attribute

Started by Erik Jonesalmost 7 years ago7 messagesgeneral
Jump to latest
#1Erik Jones
mage2k@gmail.com

Hello,

A client has recently had a couple of hot standby query conflict pile-ups
around AccessShare lock waits on pg_attribute. Here is an example from the
log:

Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-1] sql_error_code = 00000 LOG: process 118946 still waiting for
AccessShareLock on relation 1249 of database 16401 after 1000.127 ms at
character 92
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-2] sql_error_code = 00000 DETAIL: Process holding the lock: 9. Wait
queue: 118948, 118950, 118708, 118818, 118886, 118961, 118960, 118806,
118963, 118959, 118881, 118887, 118878, 118896, 118964, 118965, 118945,
118949, 118946, 118743, 118966, 118947, 118967, 118968.
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-3] sql_error_code = 00000 STATEMENT: SELECT uc.id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-4] uc.some_id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-5] uc.utr_id,
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-6] utr.name
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-7] FROM usertable1 uc
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-8] INNER JOIN usertable2 utr
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-9] ON uc.utr_id = utr.id
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-10] WHERE uc.some_id = $1
Mar 27 12:06:37 ip-10-0-125-5 7dc68e48_fbd9_41d7_9ab1_65599036dd75[118946]:
[9-11] ORDER BY name

Relation 1249 is pg_attribute and process 9 that was holding the lock was
RecoveryWalAll process. I've confirmed that autovacuum had removed some
pages from pg_attribute shortly before this, which happens somewhat
regularly since this client runs a couple thousand REFERSH MATARIALIZED
VIEW queries per day which look to cause inserts and deletes there so it
having an exclusive lock on pg_attribute makes sense.

The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute? Thus far we've been unable to recreate
any transacitons with the above query (and others) that show any
pg_attribute locks. There is no ORM in play here and these queries are
being sent as single query transactions via this Node.js postgres adapter:
https://github.com/brianc/node-postgres which is pretty bare bones.

--
Erik Jones
mage2k@gmail.com

#2Andres Freund
andres@anarazel.de
In reply to: Erik Jones (#1)
Re: Hot Standby Conflict on pg_attribute

Hi,

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:

The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute? Thus far we've been unable to recreate
any transacitons with the above query (and others) that show any
pg_attribute locks. There is no ORM in play here and these queries are
being sent as single query transactions via this Node.js postgres adapter:
https://github.com/brianc/node-postgres which is pretty bare bones.

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

You can reproduce that fairly easily:

S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;

S1 could execute the select, because it has a cached view of the way the
relation looks.

S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;

Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.

Greetings,

Andres Freund

#3Erik Jones
mage2k@gmail.com
In reply to: Andres Freund (#2)
Re: Hot Standby Conflict on pg_attribute

Hi Andres,

Thank you very much! That's exactly what I needed.

On Fri, May 10, 2019 at 12:14 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:

The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute? Thus far we've been unable to recreate
any transacitons with the above query (and others) that show any
pg_attribute locks. There is no ORM in play here and these queries are
being sent as single query transactions via this Node.js postgres

adapter:

https://github.com/brianc/node-postgres which is pretty bare bones.

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

You can reproduce that fairly easily:

S1: CREATE TABLE foo();
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;
S2: COMMIT;

S1 could execute the select, because it has a cached view of the way the
relation looks.

S2: ALTER TABLE foo ADD COLUMN bar INT;
S2: BEGIN; LOCK pg_attribute;
S1: SELECT * FROM foo;

Here S1 is blocked, because it needs to look at pg_attribute to figure
out the "shape" of the table, but it's currently locked.

Greetings,

Andres Freund

--
Erik Jones
mage2k@gmail.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: Hot Standby Conflict on pg_attribute

Andres Freund <andres@anarazel.de> writes:

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:

The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute?

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

regards, tom lane

#5Erik Jones
mage2k@gmail.com
In reply to: Tom Lane (#4)
Re: Hot Standby Conflict on pg_attribute

On Fri, May 10, 2019 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:

The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute?

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

regards, tom lane

Right, that part I understood after checking out pg_attribute's
insert/delete counts in pg_stat_sys_tables before and after some REFRESH
MATERIALIZED VIEW runs on an otherwise idle server. With them running 2k+
refreshes per day autovac is regularly working on their catalog tables.

Thanks!
--
Erik Jones
mage2k@gmail.com

#6Jeremy Schneider
schneider@ardentperf.com
In reply to: Erik Jones (#5)
Re: Hot Standby Conflict on pg_attribute

Just a quick footnote: If autovac truncations are frequently causing replica lag, and if this is a problem for you, IIUC one way you can stop autovac from doing the truncations even on older versions is setting old_snapshot_threshold to any value at all besides zero. (On 12+ you can directly control the truncation behavior.)

-Jeremy

Sent from my TI-83

Show quoted text

On May 10, 2019, at 12:46, Erik Jones <mage2k@gmail.com> wrote:

On Fri, May 10, 2019 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-05-09 13:03:50 -0700, Erik Jones wrote:

The question then is: Why would these user queries be waiting on an
AccessShare lock on pg_attribute?

Queries that access a table for the *first* time after DDL happened
(including truncating the relation), need an AccessShareLock on
pg_attribute (and pg_class, pg_index, ...) for a short time.

Also, it seems likely that what's really triggering the issue is
autovacuum on pg_attribute trying to truncate off empty pages
in pg_attribute (after a bunch of dead rows were generated there
by DDL activity). That requires exclusive lock on pg_attribute,
which would propagate down to the standby.

regards, tom lane

Right, that part I understood after checking out pg_attribute's insert/delete counts in pg_stat_sys_tables before and after some REFRESH MATERIALIZED VIEW runs on an otherwise idle server. With them running 2k+ refreshes per day autovac is regularly working on their catalog tables.

Thanks!
--
Erik Jones
mage2k@gmail.com

#7Erik Jones
mage2k@gmail.com
In reply to: Jeremy Schneider (#6)
Re: Hot Standby Conflict on pg_attribute

Thanks for the ti

On Sat, May 11, 2019 at 9:15 AM Jeremy Schneider <schneider@ardentperf.com>
wrote:

Just a quick footnote: If autovac truncations are frequently causing
replica lag, and if this is a problem for you, IIUC one way you can stop
autovac from doing the truncations even on older versions is setting
old_snapshot_threshold to any value at all besides zero. (On 12+ you can
directly control the truncation behavior.)

-Jeremy

Thanks for the tip!

--
Erik Jones
mage2k@gmail.com