BUG #12918: Segfault in BackendIdGetTransactionIds

Started by Vladimir Borodinabout 11 years ago12 messagesbugs
Jump to latest
#1Vladimir Borodin
root@simply.name

The following bug has been logged on the website:

Bug reference: 12918
Logged by: Vladimir
Email address: root@simply.name
PostgreSQL version: 9.4.1
Operating system: RHEL 6.6
Description:

Hello.

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3 0x00000000006f9d5a in pg_stat_get_db_numbackends (fcinfo=<value
optimized out>) at pgstatfuncs.c:1080
#4 0x000000000059c345 in ExecMakeFunctionResultNoSets (fcache=0x1f4c270,
econtext=0x1f4bbe0, isNull=0x1f5e588 "", isDone=<value optimized out>) at
execQual.c:2023
#5 0x00000000005981a3 in ExecTargetList (projInfo=<value optimized out>,
isDone=0x0) at execQual.c:5304
#6 ExecProject (projInfo=<value optimized out>, isDone=0x0) at
execQual.c:5519
#7 0x00000000005a458d in advance_aggregates (aggstate=0x1f4bdc0,
pergroup=0x1f5e380) at nodeAgg.c:556
#8 0x00000000005a4da5 in agg_retrieve_direct (node=<value optimized out>)
at nodeAgg.c:1223
#9 ExecAgg (node=<value optimized out>) at nodeAgg.c:1115
#10 0x0000000000597638 in ExecProcNode (node=0x1f4bdc0) at
execProcnode.c:476
#11 0x0000000000596252 in ExecutePlan (queryDesc=0x1eae6d0, direction=<value
optimized out>, count=0) at execMain.c:1486
#12 standard_ExecutorRun (queryDesc=0x1eae6d0, direction=<value optimized
out>, count=0) at execMain.c:319
#13 0x0000000000686797 in PortalRunSelect (portal=0x1ea5660, forward=<value
optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#14 0x00000000006879c1 in PortalRun (portal=0x1ea5660,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1f5a528,
altdest=0x1f5a528, completionTag=0x7fff277b3b80 "") at pquery.c:790
#15 0x000000000068404e in exec_simple_query (query_string=0x1e989d0 "SELECT
sum(numbackends) FROM pg_stat_database;") at postgres.c:1072
#16 0x00000000006856c8 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x1e7f398 "postgres", username=<value
optimized out>) at postgres.c:4074
#17 0x0000000000632d7d in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4155
#18 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3829
#19 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1597
#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1244
#21 0x00000000005cadb8 in main (argc=3, argv=0x1e7e5e0) at main.c:228
(gdb)

Unfortunatelly, I can't give a clear sequence of steps to reproduce the
problem, segfaults are happening in quiet random time and under random
workloads :( So I'm trying to reproduce it on testing stand where PostgreSQL
is built with --enable-debug flag to give you more information (but still no
luck for last two weeks).

The common conditions are:
1. it happens only on master hosts (never on any of the streaming
replicas),
2. it happens on simple queries to pg_catalog or system views as shown in
the backtrace above,
3. it happens only with direct connecting to PostgreSQL
(production-queries go through pgbouncer and no coredumps contain production
queries). And till now it happened only with python-psycopg2 (we have tried
versions 2.5.3-1.rhel6 with postgresql93-libs, 2.5.4-1.rhel6 and 2.6-1.rhel6
with postgresql94-libs). I've asked about it on psycopg-list [0]/messages/by-id/CA+mi_8a246TK6YBLzf_7c5sc+XuiMaGafG0mhrFbp6Nq+SQt3w@mail.gmail.com but it
doesn't seem to be the client problem.

[0]: /messages/by-id/CA+mi_8a246TK6YBLzf_7c5sc+XuiMaGafG0mhrFbp6Nq+SQt3w@mail.gmail.com
/messages/by-id/CA+mi_8a246TK6YBLzf_7c5sc+XuiMaGafG0mhrFbp6Nq+SQt3w@mail.gmail.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vladimir Borodin (#1)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

root@simply.name writes:

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

Hmm ... looks to me like BackendIdGetTransactionIds is simply busted.
It supposes that there are no inactive entries in the sinval array
within the range 0 .. lastBackend. But there can be, in which case
dereferencing stateP->proc crashes. The reason it's hard to reproduce
is the relatively narrow window between where pgstat_read_current_status
saw the backend as active and where we're inspecting its sinval entry.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Vladimir Borodin
root@simply.name
In reply to: Tom Lane (#2)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

30 марта 2015 г., в 19:33, Tom Lane <tgl@sss.pgh.pa.us> написал(а):

root@simply.name writes:

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

Hmm ... looks to me like BackendIdGetTransactionIds is simply busted.
It supposes that there are no inactive entries in the sinval array
within the range 0 .. lastBackend. But there can be, in which case
dereferencing stateP->proc crashes. The reason it's hard to reproduce
is the relatively narrow window between where pgstat_read_current_status
saw the backend as active and where we're inspecting its sinval entry.

I’ve also tried to revert dd1a3bcc where this function appeared but couldn’t do it :( If you would be able to make a build without this commit (if it is easier than fix it in right way), I could install it on several production hosts to test it.

regards, tom lane

--
May the force be with you…
https://simply.name

#4Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#2)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

root@simply.name writes:

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

Hmm ... looks to me like BackendIdGetTransactionIds is simply busted.
It supposes that there are no inactive entries in the sinval array
within the range 0 .. lastBackend. But there can be, in which case
dereferencing stateP->proc crashes. The reason it's hard to reproduce
is the relatively narrow window between where pgstat_read_current_status
saw the backend as active and where we're inspecting its sinval entry.

As an immediate short-term workaround, from what I can tell,
disabling calls to pg_stat_activity, and pg_stat_database (views), and
pg_stat_get_activity, pg_stat_get_backend_idset, and
pg_stat_get_db_numbackends (functions) should prevent triggering this
bug.

These are likely being run by a monitoring system (eg: check_postgres
from Nagios).

Thanks!

Stephen

#5Vladimir Borodin
root@simply.name
In reply to: Stephen Frost (#4)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

30 марта 2015 г., в 19:44, Stephen Frost <sfrost@snowman.net> написал(а):

* Tom Lane (tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>) wrote:

root@simply.name writes:

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

Hmm ... looks to me like BackendIdGetTransactionIds is simply busted.
It supposes that there are no inactive entries in the sinval array
within the range 0 .. lastBackend. But there can be, in which case
dereferencing stateP->proc crashes. The reason it's hard to reproduce
is the relatively narrow window between where pgstat_read_current_status
saw the backend as active and where we're inspecting its sinval entry.

As an immediate short-term workaround, from what I can tell,
disabling calls to pg_stat_activity, and pg_stat_database (views), and
pg_stat_get_activity, pg_stat_get_backend_idset, and
pg_stat_get_db_numbackends (functions) should prevent triggering this
bug.

I suppose, pg_stat_replication should not be asked too. We have already done that on most critical databases but it is hard to be blind :(

These are likely being run by a monitoring system (eg: check_postgres
from Nagios).

Thanks!

Stephen

--
May the force be with you…
https://simply.name

#6Stephen Frost
sfrost@snowman.net
In reply to: Vladimir Borodin (#5)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

* Vladimir Borodin (root@simply.name) wrote:

30 марта 2015 г., в 19:44, Stephen Frost <sfrost@snowman.net> написал(а):
* Tom Lane (tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>) wrote:

root@simply.name writes:

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

Hmm ... looks to me like BackendIdGetTransactionIds is simply busted.
It supposes that there are no inactive entries in the sinval array
within the range 0 .. lastBackend. But there can be, in which case
dereferencing stateP->proc crashes. The reason it's hard to reproduce
is the relatively narrow window between where pgstat_read_current_status
saw the backend as active and where we're inspecting its sinval entry.

As an immediate short-term workaround, from what I can tell,
disabling calls to pg_stat_activity, and pg_stat_database (views), and
pg_stat_get_activity, pg_stat_get_backend_idset, and
pg_stat_get_db_numbackends (functions) should prevent triggering this
bug.

I suppose, pg_stat_replication should not be asked too. We have already done that on most critical databases but it is hard to be blind :(

Ah, yes, not sure where I dropped that; it was in my initial list but
didn't make it into the final email.

I would expect a fix to be included in the next point release, hopefully
released in the next couple of months.

Thanks!

Stephen

#7Stephen Frost
sfrost@snowman.net
In reply to: Vladimir Borodin (#3)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

* Vladimir Borodin (root@simply.name) wrote:

30 марта 2015 г., в 19:33, Tom Lane <tgl@sss.pgh.pa.us> написал(а):

root@simply.name writes:

After upgrading from 9.3.6 to 9.4.1 (both installed from packages on
yum.postgresql.org) we have started getting segfaults of different backends.
Backtraces of all coredumps look similar:
(gdb) bt
#0 0x000000000066bf9b in BackendIdGetTransactionIds (backendID=<value
optimized out>, xid=0x7f2a1b714798, xmin=0x7f2a1b71479c) at sinvaladt.c:426
#1 0x00000000006287f4 in pgstat_read_current_status () at pgstat.c:2871
#2 0x0000000000628879 in pgstat_fetch_stat_numbackends () at pgstat.c:2342

Hmm ... looks to me like BackendIdGetTransactionIds is simply busted.
It supposes that there are no inactive entries in the sinval array
within the range 0 .. lastBackend. But there can be, in which case
dereferencing stateP->proc crashes. The reason it's hard to reproduce
is the relatively narrow window between where pgstat_read_current_status
saw the backend as active and where we're inspecting its sinval entry.

I’ve also tried to revert dd1a3bcc where this function appeared but couldn’t do it :( If you would be able to make a build without this commit (if it is easier than fix it in right way), I could install it on several production hosts to test it.

Hopefully a fix will be forthcoming shortly. Reverting it won't work
though, no, as it included a catalog bump.

Thanks,

Stephen

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vladimir Borodin (#3)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

Vladimir Borodin <root@simply.name> writes:

I���ve also tried to revert dd1a3bcc where this function appeared but couldn���t do it :( If you would be able to make a build without this commit (if it is easier than fix it in right way), I could install it on several production hosts to test it.

Try this.

regards, tom lane

Attachments:

BackendIdGetTransactionIds-crash.patchtext/x-diff; charset=us-ascii; name=BackendIdGetTransactionIds-crash.patchDownload+13-15
#9Vladimir Borodin
root@simply.name
In reply to: Tom Lane (#8)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

30 марта 2015 г., в 20:00, Tom Lane <tgl@sss.pgh.pa.us> написал(а):

Vladimir Borodin <root@simply.name> writes:

I’ve also tried to revert dd1a3bcc where this function appeared but couldn’t do it :( If you would be able to make a build without this commit (if it is easier than fix it in right way), I could install it on several production hosts to test it.

Try this.

38 minutes from a bug report to the patch with a fix! You are fantastic. Thanks.

It compiles, passes 'make check' and 'make check-world’ (I think, you have checked it but just in case...). I’ve built a package and installed it on one host. If everything would be ok, tomorrow I will install it on several hosts and slowly farther. The problem reproduces on our number of hosts approximately once a week. If the problem disappears I will let you know in a couple of weeks.

Thanks again.

regards, tom lane

diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 81b85c0..a2fde89 100644
*** a/src/backend/storage/ipc/sinvaladt.c
--- b/src/backend/storage/ipc/sinvaladt.c
*************** BackendIdGetProc(int backendID)
*** 403,411 ****
void
BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
{
- 	ProcState  *stateP;
SISeg	   *segP = shmInvalBuffer;
- 	PGXACT	   *xact;
*xid = InvalidTransactionId;
*xmin = InvalidTransactionId;
--- 403,409 ----
*************** BackendIdGetTransactionIds(int backendID
*** 415,425 ****

if (backendID > 0 && backendID <= segP->lastBackend)
{
! stateP = &segP->procState[backendID - 1];
! xact = &ProcGlobal->allPgXact[stateP->proc->pgprocno];

! *xid = xact->xid;
! *xmin = xact->xmin;
}

LWLockRelease(SInvalWriteLock);
--- 413,428 ----

if (backendID > 0 && backendID <= segP->lastBackend)
{
! ProcState *stateP = &segP->procState[backendID - 1];
! PGPROC *proc = stateP->proc;

! if (proc != NULL)
! {
! PGXACT *xact = &ProcGlobal->allPgXact[proc->pgprocno];
!
! *xid = xact->xid;
! *xmin = xact->xmin;
! }
}

LWLockRelease(SInvalWriteLock);

--
May the force be with you…
https://simply.name

#10daveg
daveg@sonic.net
In reply to: Tom Lane (#8)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

On Mon, 30 Mar 2015 13:00:01 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Vladimir Borodin <root@simply.name> writes:

I’ve also tried to revert dd1a3bcc where this function appeared but couldn’t do it :( If you would be able to make a build without this commit (if it is easier than fix it in right way), I could install it on several production hosts to test it.

Try this.

Nice to see a patch, in advance of need ;-) Thanks!

We have had a couple segfaults recently but once we enabled core files it
stopped happening. Until just now. I can build with the
patch, but if a 9.4.2 is immanent it would be nice to know before
scheduling an extra round of downtimes.

This is apparently from a python trigger calling get_app_name(). I
can provide the rest of the stack if it would be useful.

Program terminated with signal 11, Segmentation fault.
#0 0x000000000066148b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f5d56ae1598, xmin=0x7f5d56ae159c)
at sinvaladt.c:426
426 sinvaladt.c: No such file or directory.
in sinvaladt.c
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.5.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x000000000066148b in BackendIdGetTransactionIds (backendID=<value optimized out>, xid=0x7f5d56ae1598, xmin=0x7f5d56ae159c)
at sinvaladt.c:426
#1 0x000000000061f064 in pgstat_read_current_status () at pgstat.c:2871
#2 0x000000000061f0e9 in pgstat_fetch_stat_numbackends () at pgstat.c:2342
#3 0x00000000006ef373 in pg_stat_get_activity (fcinfo=0x7fffd2e78f50) at pgstatfuncs.c:591
#4 0x00000000005977ec in ExecMakeTableFunctionResult (funcexpr=0x17fdae0, econtext=0x17fd770, argContext=<value optimized out>,
expectedDesc=0x17ffd70, randomAccess=0 '\000') at execQual.c:2193
#5 0x00000000005a91f2 in FunctionNext (node=0x17fd660) at nodeFunctionscan.c:95
#6 0x00000000005982ce in ExecScanFetch (node=0x17fd660, accessMtd=0x5a8f40 <FunctionNext>, recheckMtd=0x5a8870 <FunctionRecheck>)
at execScan.c:82
#7 ExecScan (node=0x17fd660, accessMtd=0x5a8f40 <FunctionNext>, recheckMtd=0x5a8870 <FunctionRecheck>) at execScan.c:167
#8 0x00000000005913c8 in ExecProcNode (node=0x17fd660) at execProcnode.c:426
#9 0x000000000058ff32 in ExecutePlan (queryDesc=0x17f81f0, direction=<value optimized out>, count=1) at execMain.c:1486
#10 standard_ExecutorRun (queryDesc=0x17f81f0, direction=<value optimized out>, count=1) at execMain.c:319
#11 0x00007f69a7d3867b in explain_ExecutorRun (queryDesc=0x17f81f0, direction=ForwardScanDirection, count=1) at auto_explain.c:243
#12 0x00007f69a7b33965 in pgss_ExecutorRun (queryDesc=0x17f81f0, direction=ForwardScanDirection, count=1)
at pg_stat_statements.c:873
#13 0x000000000059bd6c in postquel_getnext (fcinfo=<value optimized out>) at functions.c:853
#14 fmgr_sql (fcinfo=<value optimized out>) at functions.c:1148
#15 0x0000000000595f85 in ExecMakeFunctionResultNoSets (fcache=0x17ed920, econtext=0x17ed730, isNull=0x17ee2a8 " ",
isDone=<value optimized out>) at execQual.c:2023
#16 0x0000000000591e53 in ExecTargetList (projInfo=<value optimized out>, isDone=0x7fffd2e798fc) at execQual.c:5304
#17 ExecProject (projInfo=<value optimized out>, isDone=0x7fffd2e798fc) at execQual.c:5519
#18 0x00000000005a98fb in ExecResult (node=0x17ed620) at nodeResult.c:155
#19 0x0000000000591478 in ExecProcNode (node=0x17ed620) at execProcnode.c:373
#20 0x000000000058ff32 in ExecutePlan (queryDesc=0x166c610, direction=<value optimized out>, count=0) at execMain.c:1486
#21 standard_ExecutorRun (queryDesc=0x166c610, direction=<value optimized out>, count=0) at execMain.c:319
#22 0x00007f69a7d3867b in explain_ExecutorRun (queryDesc=0x166c610, direction=ForwardScanDirection, count=0) at auto_explain.c:243
#23 0x00007f69a7b33965 in pgss_ExecutorRun (queryDesc=0x166c610, direction=ForwardScanDirection, count=0)
at pg_stat_statements.c:873
#24 0x00000000005b39d0 in _SPI_pquery (plan=0x7fffd2e79d10, paramLI=0x0, snapshot=<value optimized out>, crosscheck_snapshot=0x0,
read_only=0 '\000', fire_triggers=1 '\001', tcount=0) at spi.c:2372
#25 _SPI_execute_plan (plan=0x7fffd2e79d10, paramLI=0x0, snapshot=<value optimized out>, crosscheck_snapshot=0x0,
read_only=0 '\000', fire_triggers=1 '\001', tcount=0) at spi.c:2160
#26 0x00000000005b4076 in SPI_execute (src=0x15f6054 "SELECT get_app_name() AS a", read_only=0 '\000', tcount=0) at spi.c:386
#27 0x00007f5d5672f702 in PLy_spi_execute_query (query=0x15f6054 "SELECT get_app_name() AS a", limit=0) at plpy_spi.c:357

-dg

--
David Gould 510 282 0869 daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: daveg (#10)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

David Gould <daveg@sonic.net> writes:

We have had a couple segfaults recently but once we enabled core files it
stopped happening. Until just now. I can build with the
patch, but if a 9.4.2 is immanent it would be nice to know before
scheduling an extra round of downtimes.

No plans for an imminent 9.4.2. There's been some discussion about a set
of releases in May; the only way something happens sooner than that is
if we find a staggeringly-bad bug.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Vladimir Borodin
root@simply.name
In reply to: Vladimir Borodin (#9)
Re: BUG #12918: Segfault in BackendIdGetTransactionIds

30 марта 2015 г., в 20:54, Vladimir Borodin <root@simply.name> написал(а):

30 марта 2015 г., в 20:00, Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> написал(а):

Vladimir Borodin <root@simply.name <mailto:root@simply.name>> writes:

I’ve also tried to revert dd1a3bcc where this function appeared but couldn’t do it :( If you would be able to make a build without this commit (if it is easier than fix it in right way), I could install it on several production hosts to test it.

Try this.

38 minutes from a bug report to the patch with a fix! You are fantastic. Thanks.

It compiles, passes 'make check' and 'make check-world’ (I think, you have checked it but just in case...). I’ve built a package and installed it on one host. If everything would be ok, tomorrow I will install it on several hosts and slowly farther. The problem reproduces on our number of hosts approximately once a week. If the problem disappears I will let you know in a couple of weeks.

No segfaults for more than a week since I’ve upgraded all hosts. Seems, that the patch is good. Thank you very much.

Thanks again.

regards, tom lane

diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 81b85c0..a2fde89 100644
*** a/src/backend/storage/ipc/sinvaladt.c
--- b/src/backend/storage/ipc/sinvaladt.c
*************** BackendIdGetProc(int backendID)
*** 403,411 ****
void
BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
{
- 	ProcState  *stateP;
SISeg	   *segP = shmInvalBuffer;
- 	PGXACT	   *xact;
*xid = InvalidTransactionId;
*xmin = InvalidTransactionId;
--- 403,409 ----
*************** BackendIdGetTransactionIds(int backendID
*** 415,425 ****

if (backendID > 0 && backendID <= segP->lastBackend)
{
! stateP = &segP->procState[backendID - 1];
! xact = &ProcGlobal->allPgXact[stateP->proc->pgprocno];

! *xid = xact->xid;
! *xmin = xact->xmin;
}

LWLockRelease(SInvalWriteLock);
--- 413,428 ----

if (backendID > 0 && backendID <= segP->lastBackend)
{
! ProcState *stateP = &segP->procState[backendID - 1];
! PGPROC *proc = stateP->proc;

! if (proc != NULL)
! {
! PGXACT *xact = &ProcGlobal->allPgXact[proc->pgprocno];
!
! *xid = xact->xid;
! *xmin = xact->xmin;
! }
}

LWLockRelease(SInvalWriteLock);

--
May the force be with you…
https://simply.name <https://simply.name/&gt;

--
May the force be with you…
https://simply.name