Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

Started by cbwalmost 6 years ago8 messagesbugs
Jump to latest
#1cbw
cbwhitebu@gmail.com

I have a DO block that has a couple of inserts (copying large amounts
of data from staging tables) that I am passing to the server using
JDBC,

When I execute the inserts independently, they work fine. But when I
submit them as part of the do block, the backend goes into
trigger.c:afterTriggerInvokeEvents and never returns. This happens
after the second insert starts running.

I have tried this in version 11.7 and 12.2 (on both Linux and Windows).

I grabbed the source from git and did a Linux build (REL_12_STABLE)
and can see that the method in question just keeps looping through
'chunks' and 'events'. I disabled the user triggers on the table in
question and the 'events' seem to be various foreign key references
and other constraints.

I bumped the log level up to debug5 and can see rows in the first
table getting inserted. Sometimes I see a few rows for the second
table and sometimes I don't. After that there are no additional log
entries from the backend process. I have let the process run overnight
(the inserts take about 12 minutes to complete when run
independently).

Any tips about how to go about debugging this would be appreciated. I
am struggling to see exactly what the events are so maybe so
suggestions on the best place to add some logging?

Here is the DO block:

DO
$$
BEGIN
PERFORM XeP_set_identifier('xi_batch_user');

alter table xe_patient_visit disable trigger USER;
alter table xe_auth disable trigger USER;

RAISE NOTICE '% : inserting visits...', clock_timestamp();

INSERT INTO
xe_patient_visit
(
createtstamp,
creationuser,
modifiedtstamp,
modifieduser,
active,
visitid,
sourcesystem,
status,
visittypeid,
ipid,
accountid,
ivid
)
SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
'xi_batch_user' AS creationuser,
(now() AT TIME ZONE 'utc') AS modifiedtstamp,
'xi_batch_user' AS modifieduser,
'y' AS active,
authorizationid AS visitid,
'staging' AS sourcesystem,
a.status,
'AUTH' AS visittypeid,
p.ipid,
e.accountid,
nextval('seq_xe_patient_visit') AS ivid
FROM (SELECT authorizationid,
memberid,
CASE
WHEN authorizationstatus = 'Fully Approved'
THEN 'AUTH_APPROVED'
WHEN authorizationstatus = 'Partially
Approved' THEN 'AUTH_REDUCED'
WHEN authorizationstatus = 'Voided' THEN
'AUTH_COMPLETED'
WHEN authorizationstatus = 'Incomplete' THEN
'AUTH_PEND'
WHEN authorizationstatus = 'Pending Decision'
THEN 'AUTH_PEND'
WHEN authorizationstatus = 'Denied' THEN
'AUTH_DENIED' END

AS status,
row_number() OVER (PARTITION BY authorizationid
ORDER BY authorizationid) AS rownum
FROM staging."authorization") a
JOIN xe_patient p ON p.patientid = a.memberid
JOIN xe_enterprise_data e ON e.accountid =
p.accountid AND e.enterpriseid = 'staging'
WHERE rownum = 1
ON CONFLICT (accountid, visitid)
DO NOTHING;

RAISE NOTICE '% : inserting auths...', clock_timestamp();
INSERT INTO
xe_auth
(
createtstamp,
creationuser,
modifiedtstamp,
modifieduser,
active,
accountid,
receiveddate,
authnum,
authtypeid,
authsubtypeid,
umurgencyid,
ivid
)
SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
'xi_batch_user' AS creationuser,
(now() AT TIME ZONE 'utc') AS modifiedtstamp,
'xi_batch_user' AS modifieduser,
'y' AS active,
ed.accountid,
receiveddate,
authnum,
a.authtypeid,
at.authtypeid,
umurgencyid,
ivid
FROM (
SELECT cast(receiveddate AS timestamp) AS receiveddate,
authorizationid AS authnum,
CASE
WHEN authorizationcategory = 'Inpatient'
THEN 'AUTH_IPA'
ELSE 'AUTH_SVC' END AS authtypeid,
authorizationtype,
CASE
WHEN authorizationurgency = 'ROUTINE' THEN 'STD'
WHEN authorizationurgency = 'EXPEDITED' THEN 'EXP'
END
AS umurgencyid
FROM staging."authorization"
) a
JOIN xe_patient_visit v ON v.visitid = a.authnum
JOIN xe_enterprise_data ed ON ed.accountid =
v.accountid AND ed.enterpriseid = 'staging'
JOIN xe_auth_type at ON at.name = a.authorizationtype
ON CONFLICT (authnum, accountid)
DO NOTHING;

END
$$

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: cbw (#1)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

cbw <cbwhitebu@gmail.com> writes:

I have a DO block that has a couple of inserts (copying large amounts
of data from staging tables) that I am passing to the server using
JDBC,
When I execute the inserts independently, they work fine. But when I
submit them as part of the do block, the backend goes into
trigger.c:afterTriggerInvokeEvents and never returns. This happens
after the second insert starts running.

Have you got deferred uniqueness or exclusion constraints on the
target table?

If so, perhaps a plausible theory is that when you submit the queries
separately, the unique_key_recheck trigger is never fired at all --- but
if they are in the same transaction, then recheck events get queued
because the index can't tell whether the earlier row should be treated
as committed. This requires some assumptions about the table schema
(which you haven't shown us) but it's a bit hard to see why the second
query would act differently in the two contexts otherwise.

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts. Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

The short answer very likely is going to be that you need to perform
the queries as separate transactions, or the second one drowns in
trigger overhead. Tracing down exactly why might not be worth a
lot of trouble.

regards, tom lane

#3cbw
cbwhitebu@gmail.com
In reply to: Tom Lane (#2)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

Tom,

Thanks for that feedback.

I have tried adding a commit after the first insert but the problem
still persists so I don't think this problem is related to transaction
boundaries.

I am seeing this same problem with other table inserts as well.

My current hypothesis is that this bug happens when I am inserting
data that references rows in another table that I just inserted in the
same DO block. I don't know yet whether it only happens with the large
data sets I am processing.

I have also tried creating a procedure for the same code instead of
the DO block and the problem happens there as well.

This is a huge schema but I have included the definitions of the two
table in my examples. As you can see the second table references the
primary key of the first table (see constraint xe_auth_f5). There are
no deferred constraints.

--- first table-----
create table xe_patient_visit
(
    accountid numeric(10) not null,
    createtstamp timestamp not null,
    creationuser varchar(60) not null,
    modifiedtstamp timestamp not null,
    modifieduser varchar(60) not null,
    active varchar(1) default 'y'::character varying not null
        constraint xe_patient_visit_active_check
            check ((active)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    iscmgmtpatient varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_iscmgmtpatient_check
            check ((iscmgmtpatient)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isinerror varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_isinerror_check
            check ((isinerror)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    ismlc varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_ismlc_check
            check ((ismlc)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isreadmit varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_isreadmit_check
            check ((isreadmit)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text, ('u'::character
varying)::text])),
    isvisitlocked varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_isvisitlocked_check
            check ((isvisitlocked)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    iswalkin varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_iswalkin_check
            check ((iswalkin)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    needsbed varchar(1) default 'n'::character varying not null
        constraint xe_patient_visit_needsbed_check
            check ((needsbed)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    accidentdatetime timestamp,
    accidentdatetimesrv timestamp,
    authorizedadmitdate date,
    clearancedatetime timestamp,
    clearancedatetimesrv timestamp,
    closedate timestamp,
    closedatesrv timestamp,
    enddate timestamp,
    enddatesrv timestamp,
    expectedenddate timestamp,
    expectedenddatesrv timestamp,
    expireattemptdatetime timestamp,
    mergeverifieddate timestamp,
    prioradmissiondate date,
    referringletterdate date,
    requestedadmitdate date,
    startdate timestamp,
    startdatesrv timestamp,
    accidentdatetimeoff varchar(5),
    accidentstate varchar(100),
    accidenttypeid varchar(32)
        constraint xe_patient_visit_f0
            references xg_svc_code_catalog,
    admitcategoryid varchar(16)
        constraint xe_patient_visit_f1
            references xe_admit_cat,
    bedclassid varchar(32),
    bedtext varchar(100),
    clearancedatetimeoff varchar(5),
    closecomment varchar(200),
    closedateoff varchar(5),
    closeusername varchar(60),
    comments varchar(200),
    dischargeinstruction varchar(2000),
    dischargelocation varchar(100),
    displayroombed varchar(50),
    enddateoff varchar(5),
    errordescription varchar(1024),
    expectedenddateoff varchar(5),
    expectedlengthofstay varchar(16),
    expectedlengthofstayunits varchar(64)
        constraint xe_patient_visit_f10
            references xe_unit,
    mlcdescription varchar(500),
    mergecomments varchar(200),
    mergeverifiedby varchar(60),
    patientstatus varchar(16),
    performlocation varchar(100),
    priordiagnosisdesc varchar(200),
    reasonfordischarge varchar(100),
    reasonforvisit varchar(256),
    referralsourceid varchar(20),
    referraltype varchar(20),
    referringhospitalname varchar(200),
    referringhospitalpatientid varchar(100),
    referringlettercontext varchar(200),
    referringletterid varchar(200),
    referringphysicianname varchar(200),
    referringphysicianphoneno varchar(50),
    sourcesystem varchar(100) default 'SYNERGY'::character varying not null,
    startdateoff varchar(5),
    status varchar(16)
        constraint xe_patient_visit_f24
            references xe_visit_status,
    statusreasonid varchar(16)
        constraint xe_patient_visit_f25
            references xe_visit_status,
    statusreasontext varchar(200),
    transferstatus varchar(32),
    visitid varchar(30) not null,
    visittypeid varchar(16) not null
        constraint xe_patient_visit_f27
            references xe_visit_type,
    bedid numeric(18),
    clinicid numeric(18),
    dischargedisposition numeric(18)
        constraint xe_patient_visit_f5
            references xg_clinical_ref_data,
    dischargeorderowner numeric(18)
        constraint xe_patient_visit_f6
            references xe_resource,
    dischargereasonid numeric(18),
    dischargetounitid numeric(18),
    dischargetype numeric(18)
        constraint xe_patient_visit_f9
            references xg_clinical_ref_data,
    facilityid numeric(18),
    hcu numeric(18),
    ipid numeric(18) not null,
    incomingconditionid numeric(18)
        constraint xe_patient_visit_f14
            references xg_clinical_ref_data,
    locationid numeric(18)
        constraint xe_patient_visit_f15
            references xe_location,
    mergesourceipid numeric(18),
    orgunitid numeric(18),
    parentid numeric(18),
    patientmrdid numeric(18),
    patienttypeid numeric(18),
    previousivid numeric(18),
    readmitdays numeric(3),
    readmitivid numeric(18),
    restrictvalue numeric(5) default '1'::numeric not null,
    servicebookingid numeric(18),
    servicecenterid numeric(18),
    serviceid numeric(18)
        constraint xe_patient_visit_f22
            references xg_svc_base,
    sourceofadmitid numeric(18)
        constraint xe_patient_visit_f23
            references xg_pos,
    typeofcareid numeric(18)
        constraint xe_patient_visit_f26
            references xg_clinical_ref_data,
    workid numeric(18),
    ivid numeric(18) not null,
    constraint pk_xe_patient_visit
        primary key (ivid, accountid),
    constraint xe_patient_visit_eu
        unique (visitid, accountid),
    constraint xe_patient_visit_f11
        foreign key (facilityid, accountid) references xe_facility,
    constraint xe_patient_visit_f12
        foreign key (hcu, accountid) references xe_org_unit,
    constraint xe_patient_visit_f13
        foreign key (ipid, accountid) references xe_patient,
    constraint xe_patient_visit_f16
        foreign key (mergesourceipid, accountid) references xe_patient,
    constraint xe_patient_visit_f17
        foreign key (orgunitid, accountid) references xe_org_unit,
    constraint xe_patient_visit_f18
        foreign key (patientmrdid, accountid) references xe_patient_mrd,
    constraint xe_patient_visit_f19
        foreign key (patienttypeid, accountid) references xe_patient_type,
    constraint xe_patient_visit_f2
        foreign key (bedclassid, accountid) references xe_bed_class,
    constraint xe_patient_visit_f20
        foreign key (servicebookingid, accountid) references xe_srvc_booking,
    constraint xe_patient_visit_f21
        foreign key (servicecenterid, accountid) references xe_org_unit,
    constraint xe_patient_visit_f28
        foreign key (workid, accountid) references xe_workflow_inst,
    constraint xe_patient_visit_f3
        foreign key (bedid, accountid) references xe_bed,
    constraint xe_patient_visit_f4
        foreign key (clinicid, accountid) references xe_org_unit,
    constraint xe_patient_visit_f7
        foreign key (dischargereasonid, accountid) references
xe_discharge_reason,
    constraint xe_patient_visit_f8
        foreign key (dischargetounitid, accountid) references xe_org_unit
);

alter table xe_patient_visit owner to xowner;

create index xe_patient_visit_12
on xe_patient_visit (startdate, accountid, status, visittypeid);

create index xe_patient_visit_13
on xe_patient_visit (accountid, status, visittypeid, ivid, enddate);

create index xe_patient_visit_14
on xe_patient_visit (enddate, hcu, accountid);

create index xe_patient_visit_15
on xe_patient_visit (servicebookingid, accountid);

create index xe_patient_visit_16
on xe_patient_visit (parentid, accountid, ivid);

create index xe_patient_visit_i1
on xe_patient_visit (ipid, accountid);

create trigger xe_patient_visit_ht
after insert or update or delete
on xe_patient_visit
for each row
execute procedure xep_xe_patient_visit();

create trigger xe_patient_visit_xe_auth_te
after insert or update
on xe_patient_visit
for each row
execute procedure xep_xe_patient_visit_xe_auth_tef();

create trigger xe_patient_visit_xe_patient_issue_te
after insert or update
on xe_patient_visit
for each row
execute procedure xep_xe_patient_visit_xe_patient_issue_tef();

create trigger xe_patient_visit_xe_visit_case_mgmt_pgm_te
after insert or update
on xe_patient_visit
for each row
execute procedure xep_xe_patient_visit_xe_visit_case_mgmt_pgm_tef();

create trigger xe_patient_visit_xe_visit_case_mgmt_te
after insert or update
on xe_patient_visit
for each row
execute procedure xep_xe_patient_visit_xe_visit_case_mgmt_tef();

--- second table ---
create table xe_auth
(
    accountid numeric(10) not null,
    createtstamp timestamp not null,
    creationuser varchar(60) not null,
    modifiedtstamp timestamp not null,
    modifieduser varchar(60) not null,
    active varchar(1) default 'y'::character varying not null
        constraint xe_auth_active_check
            check ((active)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    additionalcomments varchar(4000),
    isappeal varchar(1) default 'n'::character varying not null
        constraint xe_auth_isappeal_check
            check ((isappeal)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isextended varchar(1) default 'n'::character varying not null
        constraint xe_auth_isextended_check
            check ((isextended)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    ispatientreq varchar(1) default 'n'::character varying not null
        constraint xe_auth_ispatientreq_check
            check ((ispatientreq)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    ispreauth varchar(1) default 'n'::character varying not null
        constraint xe_auth_ispreauth_check
            check ((ispreauth)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isreopen varchar(1) default 'n'::character varying not null
        constraint xe_auth_isreopen_check
            check ((isreopen)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    isselfreferral varchar(1) default 'n'::character varying not null
        constraint xe_auth_isselfreferral_check
            check ((isselfreferral)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    medicalnecessity varchar(4000),
    needsccr varchar(1) default 'y'::character varying not null
        constraint xe_auth_needsccr_check
            check ((needsccr)::text = ANY (ARRAY[('y'::character
varying)::text, ('n'::character varying)::text])),
    approveddate timestamp,
    cancelleddate timestamp,
    compliancedate timestamp,
    denieddate timestamp,
    duedate timestamp,
    expectedbydate timestamp,
    expirationdate timestamp,
    nextreviewdate timestamp,
    outboundqueuedtime timestamp,
    receiveddate timestamp,
    authnum varchar(9) not null,
    authsubtypeid varchar(30)
        constraint xe_auth_f0
            references xe_auth_type,
    authtypeid varchar(30) not null
        constraint xe_auth_f1
            references xe_auth_type,
    denialcode varchar(60),
    deniedbyusername varchar(60),
    guidelines varchar(200),
    pricingnote varchar(100),
    primaryusername varchar(60),
    umurgencyid varchar(32)
        constraint xe_auth_f13
            references xg_um_urgency,
    allowedlos numeric(3),
    authorizedlos numeric(3),
    denialreasonid numeric(18)
        constraint xe_auth_f2
            references xg_clinical_ref_data,
    expediterequestortypeid numeric(18)
        constraint xe_auth_f4
            references xg_clinical_ref_data,
    owner numeric(18),
    requestedlos numeric(3),
    requestedposid numeric(18)
        constraint xe_auth_f8
            references xg_pos,
    restrictvalue numeric(5) default '1'::numeric not null,
    servicetypeid numeric(18)
        constraint xe_auth_f9
            references xg_clinical_ref_data,
    sourceofadmitid numeric(18)
        constraint xe_auth_f10
            references xg_pos,
    treatmentcategoryid numeric(18)
        constraint xe_auth_f11
            references xg_clinical_ref_data,
    typeofcareid numeric(18)
        constraint xe_auth_f12
            references xg_clinical_ref_data,
    ivid numeric(18) not null,
    constraint pk_xe_auth
        primary key (ivid, accountid),
    constraint xe_auth_eu
        unique (authnum, accountid),
    constraint xe_auth_f3
        foreign key (deniedbyusername, accountid) references xe_user_data,
    constraint xe_auth_f5
        foreign key (ivid, accountid) references xe_patient_visit,
    constraint xe_auth_f6
        foreign key (owner, accountid) references xe_staff,
    constraint xe_auth_f7
        foreign key (primaryusername, accountid) references xe_user_data
);

alter table xe_auth owner to xowner;

create trigger xe_auth_ht
after insert or update or delete
on xe_auth
for each row
execute procedure xep_xe_auth();

create trigger xe_auth_xe_patient_visit_te
after insert or update
on xe_auth
for each row
execute procedure xep_xe_auth_xe_patient_visit_tef();

Show quoted text

On Tue, Apr 21, 2020 at 7:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

cbw <cbwhitebu@gmail.com> writes:

I have a DO block that has a couple of inserts (copying large amounts
of data from staging tables) that I am passing to the server using
JDBC,
When I execute the inserts independently, they work fine. But when I
submit them as part of the do block, the backend goes into
trigger.c:afterTriggerInvokeEvents and never returns. This happens
after the second insert starts running.

Have you got deferred uniqueness or exclusion constraints on the
target table?

If so, perhaps a plausible theory is that when you submit the queries
separately, the unique_key_recheck trigger is never fired at all --- but
if they are in the same transaction, then recheck events get queued
because the index can't tell whether the earlier row should be treated
as committed. This requires some assumptions about the table schema
(which you haven't shown us) but it's a bit hard to see why the second
query would act differently in the two contexts otherwise.

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts. Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

The short answer very likely is going to be that you need to perform
the queries as separate transactions, or the second one drowns in
trigger overhead. Tracing down exactly why might not be worth a
lot of trouble.

regards, tom lane

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

On 2020-Apr-21, Tom Lane wrote:

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts. Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

I wonder if there are any funny interactions between trigger tuple
acquisition and the ON CONFLICT stuff. The only thing that occurs to me
to explain the fact that it only fails with the two stmts in the DO
block is that the second insert can see rows as inserted by the same
transaction.

I would start by taking a few dozen backtraces and comparing them to see
if any progress is being made.

The fact that this reproduces in 11.2 would seem to discard theories
about tuple table slot changes and table AM.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2020-Apr-21, Tom Lane wrote:

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts. Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

I wonder if there are any funny interactions between trigger tuple
acquisition and the ON CONFLICT stuff. The only thing that occurs to me
to explain the fact that it only fails with the two stmts in the DO
block is that the second insert can see rows as inserted by the same
transaction.

Yeah, I'll bet a good deal that the use of ON CONFLICT is part of the
issue here. That can queue firings of the unique_key_recheck trigger,
but I'm too lazy to go figure out exactly when that happens or gets
optimized away. (AfterTriggerSaveEvent definitely does know about
optimizing it away in some cases...)

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#4)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

Hi,

On 2020-04-21 11:19:35 -0400, Alvaro Herrera wrote:

On 2020-Apr-21, Tom Lane wrote:

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts. Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

I wonder if there are any funny interactions between trigger tuple
acquisition and the ON CONFLICT stuff. The only thing that occurs to me
to explain the fact that it only fails with the two stmts in the DO
block is that the second insert can see rows as inserted by the same
transaction.

I wonder if there's potentially some issue with the wrong snapshot being
used for the different statements...

I would start by taking a few dozen backtraces and comparing them to see
if any progress is being made.

It could also be informative to look at the walstream with pg_waldump
while the problem is occuring. Would tell us about row locks acquired
during on conflict / trigger handling etc.

The fact that this reproduces in 11.2 would seem to discard theories
about tuple table slot changes and table AM.

Phew ;)

Greetings,

Andres Freund

#7cbw
cbwhitebu@gmail.com
In reply to: Andres Freund (#6)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

Andres,

Thanks for taking a look at this.

Here is a sample of pg_waldump while the problem is occurring:

rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D37435A8, prev A/D37415A0, desc: INSERT off 40 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3743668, prev A/D37435A8, desc: INSERT_LEAF off 96, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 54/ 54, tx: 35961, lsn:
A/D37436B0, prev A/D3743668, desc: LOCK off 41: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1792
rmgr: Heap len (rec/tot): 194/ 194, tx: 35961, lsn:
A/D37436E8, prev A/D37436B0, desc: INSERT off 41 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D37437B0, prev A/D37436E8, desc: INSERT_LEAF off 93, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8175, tx: 35961, lsn:
A/D37437F8, prev A/D37437B0, desc: LOCK off 33: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1799 FPW
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D3745800, prev A/D37437F8, desc: INSERT off 42 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D37458C0, prev A/D3745800, desc: INSERT_LEAF off 95, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby len (rec/tot): 66/ 66, tx: 0, lsn:
A/D3745908, prev A/D37458C0, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
A/D3745950, prev A/D3745908, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap len (rec/tot): 59/ 8175, tx: 35961, lsn:
A/D3745988, prev A/D3745950, desc: LOCK off 4: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1772 FPW
rmgr: Heap len (rec/tot): 194/ 194, tx: 35961, lsn:
A/D3747990, prev A/D3745988, desc: INSERT off 43 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3747A58, prev A/D3747990, desc: INSERT_LEAF off 89, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 54/ 54, tx: 35961, lsn:
A/D3747AA0, prev A/D3747A58, desc: LOCK off 3: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1799
rmgr: Heap len (rec/tot): 193/ 193, tx: 35961, lsn:
A/D3747AD8, prev A/D3747AA0, desc: INSERT off 44 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3747BA0, prev A/D3747AD8, desc: INSERT_LEAF off 96, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 54/ 54, tx: 35961, lsn:
A/D3747BE8, prev A/D3747BA0, desc: LOCK off 16: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1799
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D3747C20, prev A/D3747BE8, desc: INSERT off 45 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3747CE0, prev A/D3747C20, desc: INSERT_LEAF off 97, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8167, tx: 35961, lsn:
A/D3747D28, prev A/D3747CE0, desc: LOCK off 16: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1801 FPW
rmgr: Heap len (rec/tot): 186/ 186, tx: 35961, lsn:
A/D3749D28, prev A/D3747D28, desc: INSERT off 46 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3749DE8, prev A/D3749D28, desc: INSERT_LEAF off 99, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby len (rec/tot): 66/ 66, tx: 0, lsn:
A/D3749E30, prev A/D3749DE8, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
A/D3749E78, prev A/D3749E30, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap len (rec/tot): 59/ 8191, tx: 35961, lsn:
A/D3749EB0, prev A/D3749E78, desc: LOCK off 13: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1802 FPW
rmgr: Heap len (rec/tot): 190/ 190, tx: 35961, lsn:
A/D374BEC8, prev A/D3749EB0, desc: INSERT off 47 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D374BF88, prev A/D374BEC8, desc: INSERT_LEAF off 100, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8239, tx: 35961, lsn:
A/D374BFD0, prev A/D374BF88, desc: LOCK off 54: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104057 blk 1 FPW
rmgr: Heap len (rec/tot): 59/ 8167, tx: 35961, lsn:
A/D374E030, prev A/D374BFD0, desc: LOCK off 41: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1822 FPW
rmgr: Heap len (rec/tot): 178/ 178, tx: 35961, lsn:
A/D3750030, prev A/D374E030, desc: INSERT off 48 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D37500E8, prev A/D3750030, desc: INSERT_LEAF off 104, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby len (rec/tot): 66/ 66, tx: 0, lsn:
A/D3750130, prev A/D37500E8, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
A/D3750178, prev A/D3750130, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap len (rec/tot): 59/ 8199, tx: 35961, lsn:
A/D37501B0, prev A/D3750178, desc: LOCK off 42: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1826 FPW
rmgr: Heap len (rec/tot): 194/ 194, tx: 35961, lsn:
A/D37521D0, prev A/D37501B0, desc: INSERT off 49 flags 0x00, blkref
#0: rel 1663/99095/104041 blk 8
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3752298, prev A/D37521D0, desc: INSERT_LEAF off 106, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8175, tx: 35961, lsn:
A/D37522E0, prev A/D3752298, desc: LOCK off 41: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1833 FPW
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D37542E8, prev A/D37522E0, desc: INSERT+INIT off 1 flags 0x00,
blkref #0: rel 1663/99095/104041 blk 9
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D37543A8, prev A/D37542E8, desc: INSERT_LEAF off 108, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8183, tx: 35961, lsn:
A/D37543F0, prev A/D37543A8, desc: LOCK off 4: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1834 FPW
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D3756400, prev A/D37543F0, desc: INSERT off 2 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D37564C0, prev A/D3756400, desc: INSERT_LEAF off 109, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 54/ 54, tx: 35961, lsn:
A/D3756508, prev A/D37564C0, desc: LOCK off 7: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1834
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D3756540, prev A/D3756508, desc: INSERT off 3 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D3756600, prev A/D3756540, desc: INSERT_LEAF off 110, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Standby len (rec/tot): 66/ 66, tx: 0, lsn:
A/D3756648, prev A/D3756600, desc: LOCK xid 35961 db 99095 rel 125491
xid 35961 db 99095 rel 125485 xid 35961 db 99095 rel 125490
rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn:
A/D3756690, prev A/D3756648, desc: RUNNING_XACTS nextXid 35962
latestCompletedXid 35960 oldestRunningXid 35961; 1 xacts: 35961
rmgr: Heap len (rec/tot): 59/ 8191, tx: 35961, lsn:
A/D37566C8, prev A/D3756690, desc: LOCK off 27: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1549 FPW
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D37586E0, prev A/D37566C8, desc: INSERT off 4 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D37587A0, prev A/D37586E0, desc: INSERT_LEAF off 69, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8191, tx: 35961, lsn:
A/D37587E8, prev A/D37587A0, desc: LOCK off 30: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1554 FPW
rmgr: Heap len (rec/tot): 185/ 185, tx: 35961, lsn:
A/D375A800, prev A/D37587E8, desc: INSERT off 5 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D375A8C0, prev A/D375A800, desc: INSERT_LEAF off 70, blkref #0: rel
1663/99095/113582 blk 2
rmgr: Heap len (rec/tot): 59/ 8175, tx: 35961, lsn:
A/D375A908, prev A/D375A8C0, desc: LOCK off 46: xid 35961: flags 0x00
LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/99095/104336 blk 1555 FPW
rmgr: Heap len (rec/tot): 199/ 199, tx: 35961, lsn:
A/D375C910, prev A/D375A908, desc: INSERT off 6 flags 0x00, blkref #0:
rel 1663/99095/104041 blk 9
rmgr: Btree len (rec/tot): 72/ 72, tx: 35961, lsn:
A/D375C9D8, prev A/D375C910, desc: INSERT_LEAF off 71, blkref #0: rel
1663/99095/113582 blk 2

Show quoted text

On Wed, Apr 22, 2020 at 8:39 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-04-21 11:19:35 -0400, Alvaro Herrera wrote:

On 2020-Apr-21, Tom Lane wrote:

A variant of that theory is that foreign key trigger firings are being
skipped in one case but not the other; but offhand I think those
optimizations only apply to update/delete cases not inserts. Anyway
that still requires some assumptions about moving parts that you
haven't shown us.

I wonder if there are any funny interactions between trigger tuple
acquisition and the ON CONFLICT stuff. The only thing that occurs to me
to explain the fact that it only fails with the two stmts in the DO
block is that the second insert can see rows as inserted by the same
transaction.

I wonder if there's potentially some issue with the wrong snapshot being
used for the different statements...

I would start by taking a few dozen backtraces and comparing them to see
if any progress is being made.

It could also be informative to look at the walstream with pg_waldump
while the problem is occuring. Would tell us about row locks acquired
during on conflict / trigger handling etc.

The fact that this reproduces in 11.2 would seem to discard theories
about tuple table slot changes and table AM.

Phew ;)

Greetings,

Andres Freund

#8cbw
cbwhitebu@gmail.com
In reply to: cbw (#1)
Re: Backend stuck in tirigger.c:afterTriggerInvokeEvents forever

This turned out to be the result of outdated table statistics.

We were adding a large number of rows to a table then adding a large
number of rows to a table that referenced the first table. Because the
table statistics were so wrong we ended up with triggers, etc using
catastrophically bad execution plans.

The fix was to add an 'analyze first_table' statement after adding the
rows to the first table.

Many thanks to Christophe Pettus of PGExperts for solving this issue.

Show quoted text

On Mon, Apr 20, 2020 at 9:07 PM cbw <cbwhitebu@gmail.com> wrote:

I have a DO block that has a couple of inserts (copying large amounts
of data from staging tables) that I am passing to the server using
JDBC,

When I execute the inserts independently, they work fine. But when I
submit them as part of the do block, the backend goes into
trigger.c:afterTriggerInvokeEvents and never returns. This happens
after the second insert starts running.

I have tried this in version 11.7 and 12.2 (on both Linux and Windows).

I grabbed the source from git and did a Linux build (REL_12_STABLE)
and can see that the method in question just keeps looping through
'chunks' and 'events'. I disabled the user triggers on the table in
question and the 'events' seem to be various foreign key references
and other constraints.

I bumped the log level up to debug5 and can see rows in the first
table getting inserted. Sometimes I see a few rows for the second
table and sometimes I don't. After that there are no additional log
entries from the backend process. I have let the process run overnight
(the inserts take about 12 minutes to complete when run
independently).

Any tips about how to go about debugging this would be appreciated. I
am struggling to see exactly what the events are so maybe so
suggestions on the best place to add some logging?

Here is the DO block:

DO
$$
BEGIN
PERFORM XeP_set_identifier('xi_batch_user');

alter table xe_patient_visit disable trigger USER;
alter table xe_auth disable trigger USER;

RAISE NOTICE '% : inserting visits...', clock_timestamp();

INSERT INTO
xe_patient_visit
(
createtstamp,
creationuser,
modifiedtstamp,
modifieduser,
active,
visitid,
sourcesystem,
status,
visittypeid,
ipid,
accountid,
ivid
)
SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
'xi_batch_user' AS creationuser,
(now() AT TIME ZONE 'utc') AS modifiedtstamp,
'xi_batch_user' AS modifieduser,
'y' AS active,
authorizationid AS visitid,
'staging' AS sourcesystem,
a.status,
'AUTH' AS visittypeid,
p.ipid,
e.accountid,
nextval('seq_xe_patient_visit') AS ivid
FROM (SELECT authorizationid,
memberid,
CASE
WHEN authorizationstatus = 'Fully Approved'
THEN 'AUTH_APPROVED'
WHEN authorizationstatus = 'Partially
Approved' THEN 'AUTH_REDUCED'
WHEN authorizationstatus = 'Voided' THEN
'AUTH_COMPLETED'
WHEN authorizationstatus = 'Incomplete' THEN
'AUTH_PEND'
WHEN authorizationstatus = 'Pending Decision'
THEN 'AUTH_PEND'
WHEN authorizationstatus = 'Denied' THEN
'AUTH_DENIED' END

AS status,
row_number() OVER (PARTITION BY authorizationid
ORDER BY authorizationid) AS rownum
FROM staging."authorization") a
JOIN xe_patient p ON p.patientid = a.memberid
JOIN xe_enterprise_data e ON e.accountid =
p.accountid AND e.enterpriseid = 'staging'
WHERE rownum = 1
ON CONFLICT (accountid, visitid)
DO NOTHING;

RAISE NOTICE '% : inserting auths...', clock_timestamp();
INSERT INTO
xe_auth
(
createtstamp,
creationuser,
modifiedtstamp,
modifieduser,
active,
accountid,
receiveddate,
authnum,
authtypeid,
authsubtypeid,
umurgencyid,
ivid
)
SELECT (now() AT TIME ZONE 'utc') AS createtstamp,
'xi_batch_user' AS creationuser,
(now() AT TIME ZONE 'utc') AS modifiedtstamp,
'xi_batch_user' AS modifieduser,
'y' AS active,
ed.accountid,
receiveddate,
authnum,
a.authtypeid,
at.authtypeid,
umurgencyid,
ivid
FROM (
SELECT cast(receiveddate AS timestamp) AS receiveddate,
authorizationid AS authnum,
CASE
WHEN authorizationcategory = 'Inpatient'
THEN 'AUTH_IPA'
ELSE 'AUTH_SVC' END AS authtypeid,
authorizationtype,
CASE
WHEN authorizationurgency = 'ROUTINE' THEN 'STD'
WHEN authorizationurgency = 'EXPEDITED' THEN 'EXP'
END
AS umurgencyid
FROM staging."authorization"
) a
JOIN xe_patient_visit v ON v.visitid = a.authnum
JOIN xe_enterprise_data ed ON ed.accountid =
v.accountid AND ed.enterpriseid = 'staging'
JOIN xe_auth_type at ON at.name = a.authorizationtype
ON CONFLICT (authnum, accountid)
DO NOTHING;

END
$$