atomic pin/unpin causing errors

Started by Jeff Janesalmost 10 years ago53 messageshackers
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

I've bisected the errors I was seeing, discussed in
/messages/by-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com

It look like they first appear in:

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date: Sun Apr 10 20:12:32 2016 -0700

Allow Pin/UnpinBuffer to operate in a lockfree manner.

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Once these errors start occurring, they happen often. Usually the
"attempted to delete invisible tuple" happens first.

These errors show up after about 9 hours of run time. The timing is
predictable enough that I don't think it is a purely stochastic race
condition. It seems like some counter variable is overflowing. But
it is not the ShmemVariableCache->nextXid counter, as I previously
speculated. This test does not advance that fast enough to for it to
wrap around within 9 hours of run time. But I am at a loss of what
other variable it might be. Since the system goes through a crash and
recovery every few seconds, any backend-local counters or
shared-memory counters would get reset upon recovery. Right?

I think the invisible tuple referred to might be a tuple in the toast
table, not in the parent table.

I don't see the problem with an cassert-enabled, probably because it
is just too slow to ever reach the point where the problem occurs.

Any suggestions about where or how to look? I don't know if the
"attempted to delete invisible tuple" is the bug itself, or is just
tripping over corruption left behind by someone else.

(This was all run using Teodor's test-enabling patch
gin_alone_cleanup-4.patch, so as not to change horses in midstream.
Now that a version of that patch has been committed, I will try to
repeat this in HEAD)

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#1)
Re: atomic pin/unpin causing errors

Hi,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I've bisected the errors I was seeing, discussed in
/messages/by-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com

It look like they first appear in:

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date: Sun Apr 10 20:12:32 2016 -0700

Allow Pin/UnpinBuffer to operate in a lockfree manner.

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Once these errors start occurring, they happen often. Usually the
"attempted to delete invisible tuple" happens first.

That kind of seems to implicate clog/vacuuming or something like that
being involved.

These errors show up after about 9 hours of run time. The timing is
predictable enough that I don't think it is a purely stochastic race
condition.

Hm. I've a bit of a hard time believing that such a timing could be
caused by the above patch. How confident that it's that patch, and not
just changed timing due to performance changes? And you definitely can
only reproduce the problem with the regular crash cycles?

It seems like some counter variable is overflowing. But
it is not the ShmemVariableCache->nextXid counter, as I previously
speculated. This test does not advance that fast enough to for it to
wrap around within 9 hours of run time. But I am at a loss of what
other variable it might be. Since the system goes through a crash and
recovery every few seconds, any backend-local counters or
shared-memory counters would get reset upon recovery. Right?

A lot of those counters will be re-set based on WAL contents. So if
they're corrupted once, several of them are prone to continue to be
corrupted.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#1)
Re: atomic pin/unpin causing errors

Hi Jeff,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I've bisected the errors I was seeing, discussed in
/messages/by-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com

It look like they first appear in:

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date: Sun Apr 10 20:12:32 2016 -0700

Allow Pin/UnpinBuffer to operate in a lockfree manner.

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

(This was all run using Teodor's test-enabling patch
gin_alone_cleanup-4.patch, so as not to change horses in midstream.
Now that a version of that patch has been committed, I will try to
repeat this in HEAD)

Any news on that front?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andres Freund (#3)
Re: atomic pin/unpin causing errors

On Wed, May 4, 2016 at 2:05 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I've bisected the errors I was seeing, discussed in

/messages/by-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com

It look like they first appear in:

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date: Sun Apr 10 20:12:32 2016 -0700

Allow Pin/UnpinBuffer to operate in a lockfree manner.

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array

@> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

FWIW, I'm also trying to reproduce it on big x86 machine on 9888b34f.
I'll write about results when get any.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#5Teodor Sigaev
teodor@sigaev.ru
In reply to: Andres Freund (#3)
Re: atomic pin/unpin causing errors

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

I've got
ERROR: unexpected chunk number 0 (expected 1) for toast value 10192986 in
pg_toast_16424

The test required 10 hours to run on my notebook. postgresql was compiled with
-O0 --enable-debug --enable-cassert.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Teodor Sigaev (#5)
Re: atomic pin/unpin causing errors

On 2016-05-04 18:12:45 +0300, Teodor Sigaev wrote:

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

I've got
ERROR: unexpected chunk number 0 (expected 1) for toast value 10192986 in
pg_toast_16424

The test required 10 hours to run on my notebook. postgresql was compiled
with -O0 --enable-debug --enable-cassert.

Interesting. I just ran a test for a good bit longer, till it failed due
to an nearing wraparound. Presumably because the crashes are too
frequent to finish vacuuming.

I did however, because Jeff said he coulnd't reproduce with cassert, use
an optimized build. Wonder if there's some barrier related issue,
making this dependant on the compiler's exact code generation. That'd
explain why different people can reproduce it in different
circumstances.

Any chance you could package up that data directory for me to download?

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#3)
Re: atomic pin/unpin causing errors

On Tue, May 3, 2016 at 4:05 PM, Andres Freund <andres@anarazel.de> wrote:

Hi Jeff,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I've bisected the errors I was seeing, discussed in
/messages/by-id/CAMkU=1xQEhC0Ok4d+tkjFQ1nvUhO37PYRKhJP6Q8oxifMx7OwA@mail.gmail.com

It look like they first appear in:

commit 48354581a49c30f5757c203415aa8412d85b0f70
Author: Andres Freund <andres@anarazel.de>
Date: Sun Apr 10 20:12:32 2016 -0700

Allow Pin/UnpinBuffer to operate in a lockfree manner.

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

I'll look into. I haven't been saving them, as they are very large
(tens of GB) by the time the errors happen. In case I can't find a
way to transfer that much data, is there something I could do in situ
to debug it?

(This was all run using Teodor's test-enabling patch
gin_alone_cleanup-4.patch, so as not to change horses in midstream.
Now that a version of that patch has been committed, I will try to
repeat this in HEAD)

Any news on that front?

I couldn't reproduce it in 82881b2b432c9433b45a (which is what HEAD
was at the time).

The last commit I saw the problem in was 8f1911d5e6d5, and in that
commit it took longer than usual to see the error, and I never saw at
all in one run (which lead me down the wrong path in git bisect) but
then saw errors upon another try. Up until that commit, it seemed to
give the errors like clockwork, always after 8 to 10 hours of running.

I also have never seen the errors with the crashing turned off. I
even tried it with autovac off and
autovacuum_freeze_max_age=1500000000 (to emulate the way vacuum never
gets a chance to run to completion in the crashing mode) and then I
don't get any errors up to the point I run out of disk space.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#7)
Re: atomic pin/unpin causing errors

Hi Jeff,

On 2016-05-04 14:00:01 -0700, Jeff Janes wrote:

On Tue, May 3, 2016 at 4:05 PM, Andres Freund <andres@anarazel.de> wrote:

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

I'll look into. I haven't been saving them, as they are very large
(tens of GB) by the time the errors happen.

Hm. Any chance that's SSH accessible?

What compiler-version & OS are you using, with what exact
CFLAGS/configure input? I'd like to try to replicate the setup as close
as possible; in the hope of just making it locally reproducible.

In case I can't find a way to transfer that much data, is there
something I could do in situ to debug it?

Yes. It'd be good to get a look at the borked page/tuple with
pageinspect. That might require some manual search to find the affected
tuple, and possibly the problem is transient. I was wondering whether
we could just put an Assert() into those error messages, to get a stack
dump. But unfortunately your tooling would likely generate far too many
of those.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Andres Freund (#6)
Re: atomic pin/unpin causing errors

Any chance you could package up that data directory for me to download?

Sent by personal email to Alexander, Andres and Jeff

In /var/log/message I found

May 4 22:04:07 xor kernel: pid 14010 (postgres), uid 1001: exited on signal 6
(core dumped)
May 4 22:04:25 xor kernel: pid 14032 (postgres), uid 1001: exited on signal 11
(core dumped)
May 4 22:04:52 xor kernel: pid 14037 (postgres), uid 1001: exited on signal 6
(core dumped)

Sometimes postgres is crashed with SIGSEGV signal instead of SIGABRT (which
comes form abort() in assert)

I'll try to get a coredump after SIGSEGV, but it could take a time.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Teodor Sigaev
teodor@sigaev.ru
In reply to: Teodor Sigaev (#9)
Re: atomic pin/unpin causing errors

I'll try to get a coredump after SIGSEGV, but it could take a time.

Got it!

#0 0x00000008014321d7 in sbrk () from /lib/libc.so.7
#1 0x0000000801431ddd in sbrk () from /lib/libc.so.7
#2 0x000000080142e5bb in sbrk () from /lib/libc.so.7
#3 0x000000080142e085 in sbrk () from /lib/libc.so.7
#4 0x000000080142de28 in sbrk () from /lib/libc.so.7
#5 0x000000080142e1cf in sbrk () from /lib/libc.so.7
#6 0x0000000801439815 in free () from /lib/libc.so.7
#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623
#11 <signal handler called>
#12 0x0000000801431847 in sbrk () from /lib/libc.so.7
#13 0x0000000801431522 in sbrk () from /lib/libc.so.7
#14 0x000000080142d47f in sbrk () from /lib/libc.so.7
#15 0x0000000801434628 in malloc () from /lib/libc.so.7
#16 0x0000000000aca278 in AllocSetAlloc (context=0x801c0bb88, size=24) at aset.c:853
#17 0x0000000000acca0e in MemoryContextAlloc (context=0x801c0bb88, size=24) at
mcxt.c:764
#18 0x0000000000aebdb8 in PushActiveSnapshot (snap=0xf4ae10) at snapmgr.c:652
#19 0x00000000008e54bd in exec_bind_message (input_message=0x7fffffffdf60) at
postgres.c:1602
#20 0x00000000008e3957 in PostgresMain (argc=1, argv=0x801d3c968,
dbname=0x801d3c948 "teodor", username=0x801d3c928 "teodor") at postgres.c:4105
#21 0x0000000000839744 in BackendRun (port=0x801c991c0) at postmaster.c:4258
#22 0x0000000000838d54 in BackendStartup (port=0x801c991c0) at postmaster.c:3932
#23 0x0000000000835617 in ServerLoop () at postmaster.c:1690
#24 0x0000000000832c69 in PostmasterMain (argc=4, argv=0x7fffffffe420) at
postmaster.c:1298
#25 0x000000000075f228 in main (argc=4, argv=0x7fffffffe420) at main.c:228

Seems, we have some memory corruption, but it could either separate or the same
problem.

Another one:

#0 0x00000008014321d7 in sbrk () from /lib/libc.so.7
#1 0x0000000801431ddd in sbrk () from /lib/libc.so.7
#2 0x000000080142e5bb in sbrk () from /lib/libc.so.7
#3 0x000000080142e085 in sbrk () from /lib/libc.so.7
#4 0x000000080142de28 in sbrk () from /lib/libc.so.7
#5 0x000000080142e1cf in sbrk () from /lib/libc.so.7
#6 0x0000000801439815 in free () from /lib/libc.so.7
#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623
#11 <signal handler called>
#12 0x000000080143277a in sbrk () from /lib/libc.so.7
#13 0x00000008014318b5 in sbrk () from /lib/libc.so.7
#14 0x000000080142e483 in sbrk () from /lib/libc.so.7
#15 0x000000080142e75b in sbrk () from /lib/libc.so.7
#16 0x00000008014398bd in free () from /lib/libc.so.7
#17 0x0000000000aca676 in AllocSetFree (context=0x801e710d0,
pointer=0x801e65038) at aset.c:976
#18 0x0000000000acbe93 in pfree (pointer=0x801e65038) at mcxt.c:1015
#19 0x00000000004a7986 in ginendscan (scan=0x801e61de0) at ginscan.c:445
#20 0x0000000000504818 in index_endscan (scan=0x801e61de0) at indexam.c:339
#21 0x0000000000719d21 in ExecEndBitmapIndexScan (node=0x801e619c8) at
nodeBitmapIndexscan.c:183
#22 0x00000000006fce9e in ExecEndNode (node=0x801e619c8) at execProcnode.c:685
#23 0x0000000000719195 in ExecEndBitmapHeapScan (node=0x801d63700) at
nodeBitmapHeapscan.c:508
#24 0x00000000006fceaf in ExecEndNode (node=0x801d63700) at execProcnode.c:689
#25 0x000000000072b64a in ExecEndModifyTable (node=0x801d632a0) at
nodeModifyTable.c:1978
#26 0x00000000006fcde3 in ExecEndNode (node=0x801d632a0) at execProcnode.c:638
#27 0x00000000006f6ed9 in ExecEndPlan (planstate=0x801d632a0,
estate=0x801d63038) at execMain.c:1451
#28 0x00000000006f6e56 in standard_ExecutorEnd (queryDesc=0x801e42af0) at
execMain.c:468
#29 0x00000008020038f2 in pgss_ExecutorEnd (queryDesc=0x801e42af0) at
pg_stat_statements.c:938
#30 0x00000000006f6d3c in ExecutorEnd (queryDesc=0x801e42af0) at execMain.c:437
#31 0x00000000008ea387 in ProcessQuery (plan=0x801e43898, sourceText=0x801e42838
"update foo set count=count+1 where text_array @> $1", params=0x801e428b8,
dest=0xf3fcc8,
completionTag=0x7fffffffdd00 "UPDATE 1") at pquery.c:230
#32 0x00000000008e9540 in PortalRunMulti (portal=0x801dc5038, isTopLevel=1
'\001', dest=0xf3fcc8, altdest=0xf3fcc8, completionTag=0x7fffffffdd00 "UPDATE
1") at pquery.c:1267
#33 0x00000000008e8cd6 in PortalRun (portal=0x801dc5038,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x801c96450,
altdest=0x801c96450,
completionTag=0x7fffffffdd00 "UPDATE 1") at pquery.c:813
#34 0x00000000008e61ef in exec_execute_message (portal_name=0x801c96038 "",
max_rows=9223372036854775807) at postgres.c:1979
#35 0x00000000008e39ae in PostgresMain (argc=1, argv=0x801d56bc8,
dbname=0x801d56ba8 "teodor", username=0x801d56b88 "teodor") at postgres.c:4122
#36 0x0000000000839744 in BackendRun (port=0x801d571c0) at postmaster.c:4258
#37 0x0000000000838d54 in BackendStartup (port=0x801d571c0) at postmaster.c:3932
#38 0x0000000000835617 in ServerLoop () at postmaster.c:1690
#39 0x0000000000832c69 in PostmasterMain (argc=4, argv=0x7fffffffe420) at
postmaster.c:1298
#40 0x000000000075f228 in main (argc=4, argv=0x7fffffffe420) at main.c:228

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Teodor Sigaev (#10)
Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)

Hi Teodor,

Thanks for analyzing this.

On 2016-05-05 13:50:09 +0300, Teodor Sigaev wrote:

I'll try to get a coredump after SIGSEGV, but it could take a time.

Got it!

#0 0x00000008014321d7 in sbrk () from /lib/libc.so.7
#1 0x0000000801431ddd in sbrk () from /lib/libc.so.7
#2 0x000000080142e5bb in sbrk () from /lib/libc.so.7
#3 0x000000080142e085 in sbrk () from /lib/libc.so.7
#4 0x000000080142de28 in sbrk () from /lib/libc.so.7
#5 0x000000080142e1cf in sbrk () from /lib/libc.so.7
#6 0x0000000801439815 in free () from /lib/libc.so.7
#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623
#11 <signal handler called>
#12 0x0000000801431847 in sbrk () from /lib/libc.so.7
#13 0x0000000801431522 in sbrk () from /lib/libc.so.7
#14 0x000000080142d47f in sbrk () from /lib/libc.so.7
#15 0x0000000801434628 in malloc () from /lib/libc.so.7
#16 0x0000000000aca278 in AllocSetAlloc (context=0x801c0bb88, size=24) at aset.c:853
#17 0x0000000000acca0e in MemoryContextAlloc (context=0x801c0bb88, size=24)
at mcxt.c:764
#18 0x0000000000aebdb8 in PushActiveSnapshot (snap=0xf4ae10) at snapmgr.c:652
#19 0x00000000008e54bd in exec_bind_message (input_message=0x7fffffffdf60)
at postgres.c:1602
#20 0x00000000008e3957 in PostgresMain (argc=1, argv=0x801d3c968,
dbname=0x801d3c948 "teodor", username=0x801d3c928 "teodor") at
postgres.c:4105
#21 0x0000000000839744 in BackendRun (port=0x801c991c0) at postmaster.c:4258
#22 0x0000000000838d54 in BackendStartup (port=0x801c991c0) at postmaster.c:3932
#23 0x0000000000835617 in ServerLoop () at postmaster.c:1690
#24 0x0000000000832c69 in PostmasterMain (argc=4, argv=0x7fffffffe420) at
postmaster.c:1298
#25 0x000000000075f228 in main (argc=4, argv=0x7fffffffe420) at main.c:228

Seems, we have some memory corruption, but it could either separate or the
same problem.

That looks like independent issue, namely that we're trigger memory
allocations from a signal handler (see frames 12, 11, 10, 9). Presumably
due to system registered atexit handlers. I suspect we should be using
_exit() here? Tom?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#1)
Re: atomic pin/unpin causing errors

Hi Jeff,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I don't see the problem with an cassert-enabled, probably because it
is just too slow to ever reach the point where the problem occurs.

Running the test with cassert enabled I actually get assertion failures,
due to the FATAL you added.

#1 0x0000000000958dde in ExceptionalCondition (conditionName=0xb36c2a "!(RefCountErrors == 0)", errorType=0xb361af "FailedAssertion",
fileName=0xb36170 "/home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c", lineNumber=2506) at /home/admin/src/postgresql/src/backend/utils/error/assert.c:54
#2 0x00000000007c9fc9 in CheckForBufferLeaks () at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2506
#3 0x00000000007c9f09 in AtProcExit_Buffers (code=1, arg=0) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2459
#4 0x00000000007d927f in shmem_exit (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:261
#5 0x00000000007d90dd in proc_exit_prepare (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:185
#6 0x00000000007d904b in proc_exit (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:102
#7 0x000000000095958d in errfinish (dummy=0) at /home/admin/src/postgresql/src/backend/utils/error/elog.c:543
#8 0x000000000080214b in mdwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000')
at /home/admin/src/postgresql/src/backend/storage/smgr/md.c:832
#9 0x0000000000804633 in smgrwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000')
at /home/admin/src/postgresql/src/backend/storage/smgr/smgr.c:650
#10 0x00000000007ca548 in FlushBuffer (buf=0x7f0285955330, reln=0x2e8b4a8) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2734
#11 0x00000000007c9d5a in SyncOneBuffer (buf_id=2503, skip_recently_used=0 '\000', wb_context=0x7ffe7305d290) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2377
#12 0x00000000007c964e in BufferSync (flags=64) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:1967
#13 0x00000000007ca185 in CheckPointBuffers (flags=64) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2561
#14 0x000000000052d497 in CheckPointGuts (checkPointRedo=382762776, flags=64) at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8644
#15 0x000000000052cede in CreateCheckPoint (flags=64) at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8430
#16 0x00000000007706ac in CheckpointerMain () at /home/admin/src/postgresql/src/backend/postmaster/checkpointer.c:488
#17 0x000000000053e0d5 in AuxiliaryProcessMain (argc=2, argv=0x7ffe7305ea40) at /home/admin/src/postgresql/src/backend/bootstrap/bootstrap.c:429
#18 0x000000000078099f in StartChildProcess (type=CheckpointerProcess) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:5227
#19 0x000000000077dcc3 in reaper (postgres_signal_arg=17) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:2781
#20 <signal handler called>
#21 0x00007f028ebbdac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#22 0x000000000077c049 in ServerLoop () at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1654
#23 0x000000000077b7a9 in PostmasterMain (argc=4, argv=0x2e49f20) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1298
#24 0x00000000006c5849 in main (argc=4, argv=0x2e49f20) at /home/admin/src/postgresql/src/backend/main/main.c:228

You didn't see those?

The trigger here appears to be that the checkpointer doesn't have
on-exit callback similar to a normal backend's ShutdownPostgres() et al,
and thus doesn't trigger a resource owner release. The normal ERROR
path has
/* buffer pins are released here: */
ResourceOwnerRelease(CurrentResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS,
false, true);
/* we needn't bother with the other ResourceOwnerRelease phases */

That clearly is a bug. But I'm not immediately seing how this could
trigger the corruption issue you observed.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@anarazel.de
In reply to: Teodor Sigaev (#5)
Re: atomic pin/unpin causing errors

On 2016-05-04 18:12:45 +0300, Teodor Sigaev wrote:

I get the errors:

ERROR: attempted to delete invisible tuple
STATEMENT: update foo set count=count+1,text_array=$1 where text_array @> $2

And also:

ERROR: unexpected chunk number 1 (expected 2) for toast value
85223889 in pg_toast_16424
STATEMENT: update foo set count=count+1 where text_array @> $1

Hm. I appear to have trouble reproducing this issue (continuing to try)
on master as of 8826d8507. Is there any chance you could package up a
data directory after the issue hit?

I've got
ERROR: unexpected chunk number 0 (expected 1) for toast value 10192986 in
pg_toast_16424

The test required 10 hours to run on my notebook. postgresql was compiled
with -O0 --enable-debug --enable-cassert.

Hm. And you're not seeing the asserts I reported in
http://archives.postgresql.org/message-id/20160505185246.2i7qftadwhzewykj%40alap3.anarazel.de
?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Teodor Sigaev
teodor@sigaev.ru
In reply to: Andres Freund (#13)
Re: atomic pin/unpin causing errors

Hm. And you're not seeing the asserts I reported in
http://archives.postgresql.org/message-id/20160505185246.2i7qftadwhzewykj%40alap3.anarazel.de
?

I see it a lot, but I think that is a result of ereport(FATAL) after
FileWrite(BLCKSZ/3) added by Jeff.

Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)

Andres Freund <andres@anarazel.de> writes:

#0 0x00000008014321d7 in sbrk () from /lib/libc.so.7
#1 0x0000000801431ddd in sbrk () from /lib/libc.so.7
#2 0x000000080142e5bb in sbrk () from /lib/libc.so.7
#3 0x000000080142e085 in sbrk () from /lib/libc.so.7
#4 0x000000080142de28 in sbrk () from /lib/libc.so.7
#5 0x000000080142e1cf in sbrk () from /lib/libc.so.7
#6 0x0000000801439815 in free () from /lib/libc.so.7
#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623
#11 <signal handler called>
#12 0x0000000801431847 in sbrk () from /lib/libc.so.7

That looks like independent issue, namely that we're trigger memory
allocations from a signal handler (see frames 12, 11, 10, 9). Presumably
due to system registered atexit handlers. I suspect we should be using
_exit() here? Tom?

I don't think that would improve matters. In the first place, if we use
_exit() here that might encourage third-party extension authors to believe
they should use _exit(), which would be bad. In the second place,
we don't know what it is we're skipping by not running atexit handlers,
and again that could be bad. We don't like people trying to bypass our
on-exit code, why would anyone else? In the third place, by the time we
get to the exit() call we've already exposed ourselves to a whole lot of
such hazards by running ereport() (including sending a message to the
client!). In the fourth place, if we've received a quickdie interrupt,
it doesn't actually matter if the process crashes; we just want it to
quit ASAP.

So I'd say that this is just a cosmetic problem and that trying to fix
it is likely to make things worse.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#15)
Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)

Hi,

On 2016-05-05 15:56:45 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

#0 0x00000008014321d7 in sbrk () from /lib/libc.so.7
#1 0x0000000801431ddd in sbrk () from /lib/libc.so.7
#2 0x000000080142e5bb in sbrk () from /lib/libc.so.7
#3 0x000000080142e085 in sbrk () from /lib/libc.so.7
#4 0x000000080142de28 in sbrk () from /lib/libc.so.7
#5 0x000000080142e1cf in sbrk () from /lib/libc.so.7
#6 0x0000000801439815 in free () from /lib/libc.so.7
#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623
#11 <signal handler called>
#12 0x0000000801431847 in sbrk () from /lib/libc.so.7

That looks like independent issue, namely that we're trigger memory
allocations from a signal handler (see frames 12, 11, 10, 9). Presumably
due to system registered atexit handlers. I suspect we should be using
_exit() here? Tom?

I don't think that would improve matters. In the first place, if we use
_exit() here that might encourage third-party extension authors to believe
they should use _exit(), which would be bad.

The sourcetree already has a number of _exit() calls, so I don't think
that'd make a meaningfull difference.

In the second place, we don't know what it is we're skipping by not
running atexit handlers, and again that could be bad.

I've a hard time coming up with a scenario where that'd be a problem in
a PANIC case. Isn't it pretty common to use _exit after fatal errors
(and forks)?

In the third place, by the time we
get to the exit() call we've already exposed ourselves to a whole lot of
such hazards by running ereport() (including sending a message to the
client!).

True. And that's not good. But the magic of ErrorContext shields us from
a fair amount of issues.

In the fourth place, if we've received a quickdie interrupt,
it doesn't actually matter if the process crashes; we just want it to
quit ASAP.

If it always were crashing, that'd be somewhat fine. But sbrk internally
uses mutexes, so this can result in processes getting stuck. And that is
a problem. There've actually been reports about that every now and then.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#11)
Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)

On Thu, May 5, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:

#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623

Eh, this doesn't this __cxa_finalize() stuff suggest that some C++
code was linked into the backend?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#17)
Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)

On 2016-05-05 16:32:38 -0400, Robert Haas wrote:

On Thu, May 5, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:

#7 0x000000080149e3d6 in nsdispatch () from /lib/libc.so.7
#8 0x00000008014a41c6 in __cxa_finalize () from /lib/libc.so.7
#9 0x000000080144525c in exit () from /lib/libc.so.7
#10 0x00000000008e1bc2 in quickdie (postgres_signal_arg=3) at postgres.c:2623

Eh, this doesn't this __cxa_finalize() stuff suggest that some C++
code was linked into the backend?

IIRC __cxa_finalize also handles atexit() (and gcc
__attribute__((destructor))).

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#12)
Re: Missing error handling for FATALs in checkpointer/bgwriter

On 2016-05-05 11:52:46 -0700, Andres Freund wrote:

Hi Jeff,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I don't see the problem with an cassert-enabled, probably because it
is just too slow to ever reach the point where the problem occurs.

Running the test with cassert enabled I actually get assertion failures,
due to the FATAL you added.

#1 0x0000000000958dde in ExceptionalCondition (conditionName=0xb36c2a "!(RefCountErrors == 0)", errorType=0xb361af "FailedAssertion",
fileName=0xb36170 "/home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c", lineNumber=2506) at /home/admin/src/postgresql/src/backend/utils/error/assert.c:54
#2 0x00000000007c9fc9 in CheckForBufferLeaks () at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2506
#3 0x00000000007c9f09 in AtProcExit_Buffers (code=1, arg=0) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2459
#4 0x00000000007d927f in shmem_exit (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:261
#5 0x00000000007d90dd in proc_exit_prepare (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:185
#6 0x00000000007d904b in proc_exit (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:102
#7 0x000000000095958d in errfinish (dummy=0) at /home/admin/src/postgresql/src/backend/utils/error/elog.c:543
#8 0x000000000080214b in mdwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000')
at /home/admin/src/postgresql/src/backend/storage/smgr/md.c:832
#9 0x0000000000804633 in smgrwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000')
at /home/admin/src/postgresql/src/backend/storage/smgr/smgr.c:650
#10 0x00000000007ca548 in FlushBuffer (buf=0x7f0285955330, reln=0x2e8b4a8) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2734
#11 0x00000000007c9d5a in SyncOneBuffer (buf_id=2503, skip_recently_used=0 '\000', wb_context=0x7ffe7305d290) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2377
#12 0x00000000007c964e in BufferSync (flags=64) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:1967
#13 0x00000000007ca185 in CheckPointBuffers (flags=64) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2561
#14 0x000000000052d497 in CheckPointGuts (checkPointRedo=382762776, flags=64) at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8644
#15 0x000000000052cede in CreateCheckPoint (flags=64) at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8430
#16 0x00000000007706ac in CheckpointerMain () at /home/admin/src/postgresql/src/backend/postmaster/checkpointer.c:488
#17 0x000000000053e0d5 in AuxiliaryProcessMain (argc=2, argv=0x7ffe7305ea40) at /home/admin/src/postgresql/src/backend/bootstrap/bootstrap.c:429
#18 0x000000000078099f in StartChildProcess (type=CheckpointerProcess) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:5227
#19 0x000000000077dcc3 in reaper (postgres_signal_arg=17) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:2781
#20 <signal handler called>
#21 0x00007f028ebbdac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#22 0x000000000077c049 in ServerLoop () at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1654
#23 0x000000000077b7a9 in PostmasterMain (argc=4, argv=0x2e49f20) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1298
#24 0x00000000006c5849 in main (argc=4, argv=0x2e49f20) at /home/admin/src/postgresql/src/backend/main/main.c:228

You didn't see those?

The trigger here appears to be that the checkpointer doesn't have
on-exit callback similar to a normal backend's ShutdownPostgres() et al,
and thus doesn't trigger a resource owner release. The normal ERROR
path has
/* buffer pins are released here: */
ResourceOwnerRelease(CurrentResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS,
false, true);
/* we needn't bother with the other ResourceOwnerRelease phases */

That clearly is a bug. But I'm not immediately seing how this could
trigger the corruption issue you observed.

The same issue exists in bgwriter afaics. ISTM that we need to provide
an before_shmem_exit (or on_shmem_exit?) handler for both which essentially does
/*
* These operations are really just a minimal subset of
* AbortTransaction(). We don't have very many resources to worry
* about in bgwriter, but we do have LWLocks, buffers, and temp files.
*/
LWLockReleaseAll();
AbortBufferIO();
UnlockBuffers();
/* buffer pins are released here: */
ResourceOwnerRelease(CurrentResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS,
false, true);
it looks to me like that should be backpatched?

There's some question about how to make the ordering
vs. AtProcExit_Buffers robust; which is why I'm above explicitly doing
LWLockReleaseAll/AbortBufferIO/UnlockBuffers.

Any better ideas?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#12)
Re: atomic pin/unpin causing errors

On Thu, May 5, 2016 at 11:52 AM, Andres Freund <andres@anarazel.de> wrote:

Hi Jeff,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:

I don't see the problem with an cassert-enabled, probably because it
is just too slow to ever reach the point where the problem occurs.

Running the test with cassert enabled I actually get assertion failures,
due to the FATAL you added.

#1 0x0000000000958dde in ExceptionalCondition (conditionName=0xb36c2a "!(RefCountErrors == 0)", errorType=0xb361af "FailedAssertion",
fileName=0xb36170 "/home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c", lineNumber=2506) at /home/admin/src/postgresql/src/backend/utils/error/assert.c:54
#2 0x00000000007c9fc9 in CheckForBufferLeaks () at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2506

...

You didn't see those?

Yes, I have been seeing those on assert-enabled builds going back as
far as I can remember (long before this particular problem started
showing up). I just assumed it was a natural consequence of throwing
an ERROR from inside a critical section. I never really understood
it, why would a panicking process bother to check for buffer leaks in
the first place? It is leaking everything, which is why the entire
system has to be brought down immediately.

I have been trying (and failing) to reproduce the problem in more
recent releases, with and without cassert. Here is pg_config output
of one of my current attempts:

BINDIR = /home/jjanes/pgsql/torn_bisect/bin
DOCDIR = /home/jjanes/pgsql/torn_bisect/share/doc
HTMLDIR = /home/jjanes/pgsql/torn_bisect/share/doc
INCLUDEDIR = /home/jjanes/pgsql/torn_bisect/include
PKGINCLUDEDIR = /home/jjanes/pgsql/torn_bisect/include
INCLUDEDIR-SERVER = /home/jjanes/pgsql/torn_bisect/include/server
LIBDIR = /home/jjanes/pgsql/torn_bisect/lib
PKGLIBDIR = /home/jjanes/pgsql/torn_bisect/lib
LOCALEDIR = /home/jjanes/pgsql/torn_bisect/share/locale
MANDIR = /home/jjanes/pgsql/torn_bisect/share/man
SHAREDIR = /home/jjanes/pgsql/torn_bisect/share
SYSCONFDIR = /home/jjanes/pgsql/torn_bisect/etc
PGXS = /home/jjanes/pgsql/torn_bisect/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = 'CFLAGS=-ggdb' '--with-extra-version=-c1543a8'
'--enable-debug' '--with-libxml' '--with-perl' '--with-python'
'--with-ldap' '--with-openssl' '--with-gssapi' '--enable-cassert'
'--prefix=/home/jjanes/pgsql/torn_bisect/'
CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -ggdb
CFLAGS_SL = -fpic
LDFLAGS = -L../../src/common -Wl,--as-needed
-Wl,-rpath,'/home/jjanes/pgsql/torn_bisect/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxml2 -lssl -lcrypto -lgssapi_krb5 -lz
-lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.6devel-c1543a8

The only difference between this and the ones that did find the ERR
would be toggling --enable-cassert and changing which git commit was
used (and manually applying the gin_alone patch when testing commits
that precede that one's committal.

Linux: 2.6.32-573.22.1.el6.x86_64 #1 SMP Wed Mar 23 03:35:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

/proc/cpu_info:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-4640 v2 @ 2.20GHz
stepping : 4
microcode : 4294967295
cpu MHz : 2199.933
cache size : 20480 KB
physical id : 0
siblings : 8
core id : 7
cpu cores : 8
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm
constant_tsc rep_good unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1
sse4_2 popcnt aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt
fsgsbase smep erms
bogomips : 4399.86
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#20)
#22Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#1)
#24Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#23)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#23)
#27Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#25)
#28Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#24)
#29Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#26)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#27)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#31)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#28)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#32)
#35Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#27)
#36Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#27)
#37Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#32)
#38Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#34)
#39Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#35)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#38)
#41Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#40)
#42Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#36)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#41)
#44Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#42)
#45Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#44)
#46Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#45)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#45)
#48Teodor Sigaev
teodor@sigaev.ru
In reply to: Andres Freund (#23)
#49Andres Freund
andres@anarazel.de
In reply to: Teodor Sigaev (#48)
#50Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#46)
#51Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#50)
#52Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#38)
#53David Fetter
david@fetter.org
In reply to: Tom Lane (#31)