Segmentation fault with core dump

Started by Glauco Torresover 8 years ago10 messagesgeneral
Jump to latest
#1Glauco Torres
torres.glauco@gmail.com

Hi group,

I'm using PG 9.6.6 and I have a problem with seg fault from every few days
to up to two week,
this server is a replica, the other servers (master, and other slaves) do
not have this problem.

I could not identify the problem, so I do not know what triggers the
problem, however I have the PostgreSQL log and the core-dump generated by
the problem.

The server has 60 GB RAM, PG is configured:

shared_buffers = 14GB
work_mem = 192MB

Below are the relevant details.

$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

postgres=# select version();

version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)

# cat postgresql-Mon.log | grep 'was terminated by signal 11: Segmentation
fault'
2018-01-08 01:51:27.909 -03 [85039]: [102-1] user=,db=,app=,client= LOG:
server process (PID 40286) was terminated by signal 11: Segmentation fault
2018-01-08 05:09:51.929 -03 [85039]: [107-1] user=,db=,app=,client= LOG:
server process (PID 62427) was terminated by signal 11: Segmentation fault
2018-01-08 06:33:46.840 -03 [85039]: [112-1] user=,db=,app=,client= LOG:
server process (PID 72156) was terminated by signal 11: Segmentation fault
2018-01-08 13:59:37.422 -03 [119484]: [4-1] user=,db=,app=,client= LOG:
server process (PID 124190) was terminated by signal 11: Segmentation fault
2018-01-08 14:09:41.590 -03 [119484]: [9-1] user=,db=,app=,client= LOG:
checkpointer process (PID 124528) was terminated by signal 11: Segmentation
fault
2018-01-08 15:18:06.379 -03 [119484]: [13-1] user=,db=,app=,client= LOG:
server process (PID 129026) was terminated by signal 11: Segmentation fault
2018-01-08 15:23:15.586 -03 [119484]: [18-1] user=,db=,app=,client= LOG:
server process (PID 6528) was terminated by signal 11: Segmentation fault
2018-01-08 15:55:32.029 -03 [119484]: [23-1] user=,db=,app=,client= LOG:
server process (PID 8762) was terminated by signal 11: Segmentation fault
2018-01-08 20:52:16.344 -03 [14804]: [5-1] user=,db=,app=,client= LOG:
checkpointer process (PID 14828) was terminated by signal 11: Segmentation
fault

(gdb) bt
#0 ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
#1 0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
<ckpt_buforder_comparator>)
at qsort.c:107
#2 0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:157
#3 0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:203
#4 0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863
#5 0x00000000006a8477 in CheckPointBuffers (flags=flags@entry=128) at
bufmgr.c:2578
#6 0x00000000004dd781 in CheckPointGuts (checkPointRedo=<optimized out>,
flags=<optimized out>) at xlog.c:8698
#7 0x00000000004e9faf in CreateRestartPoint (flags=<optimized out>) at
xlog.c:8856
#8 0x000000000066977c in CheckpointerMain () at checkpointer.c:490
#9 0x00000000004f2820 in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffd8bac2b80) at bootstrap.c:429
#10 0x0000000000673330 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5252
#11 0x0000000000674b1f in sigusr1_handler (postgres_signal_arg=<optimized
out>) at postmaster.c:4949
#12 <signal handler called>
#13 0x00007f6fc75a0b83 in __select_nocancel () from /lib64/libc.so.6
#14 0x000000000046ef32 in ServerLoop () at postmaster.c:1683
#15 0x0000000000675b69 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x27ca210)
at postmaster.c:1327
#16 0x000000000047053e in main (argc=3, argv=0x27ca210) at main.c:228

#0 ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
a = 0x7f6fa9ef4b2c
b = 0x1be06d2d06644
#1 0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
<ckpt_buforder_comparator>)
at qsort.c:107
No locals.
#2 0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:157
d = 350293923535640
pa = <optimized out>
pb = <optimized out>
pc = <optimized out>
pd = <optimized out>
pl = 0x7f6fa9ef4b2c "\177\006"
pm = 0x7f6fa9f06174 "\177\006"
pn = 0xa7428f0f82428 <Address 0xa7428f0f82428 out of bounds>
d1 = <optimized out>
d2 = <optimized out>
r = <optimized out>
swaptype = 2
presorted = 0
#3 0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:203
pa = <optimized out>
pb = 0x7f6fa9f3a49c "\177\006"
pc = <optimized out>
pd = 0x7f6faa0c8840 "\177\006"
pl = <optimized out>
pm = <optimized out>
pn = 0x7f6faa0c8854 "\177\006"
d1 = <optimized out>
d2 = 1631160
r = <optimized out>
swaptype = 2
presorted = 0
#4 0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863
buf_state = <optimized out>
buf_id = 1835008
num_to_scan = 111473
num_spaces = <optimized out>
num_processed = <optimized out>
num_written = <optimized out>
per_ts_stat = 0x0
last_tsid = <optimized out>
ts_heap = 0x7f6c266a8340
i = <optimized out>
mask = -2139095040
wb_context = {max_pending = 0xc287ec <checkpoint_flush_after>, nr_pending =
0, pending_writebacks = {{tag = {rnode = {spcNode = 1663, dbNode = 69060,
relNode = 412606246}, forkNum = MAIN_FORKNUM, blockNum = 428}}, {tag = {
rnode = {spcNode = 1663, dbNode = 69060, relNode =
412606246}, forkNum = MAIN_FORKNUM, blockNum = 429}}, {tag = {rnode =
{spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum =
MAIN_FORKNUM,
blockNum = 54}}, {tag = {rnode = {spcNode = 1663, dbNode =
69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 58}}, {tag
= {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252},
forkNum = MAIN_FORKNUM, blockNum = 81}}, {tag = {rnode =
{spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum =
MAIN_FORKNUM, blockNum = 98}}, {tag = {rnode = {spcNode = 1663, dbNode =
69060,
relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum =
158}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode =
412606252}, forkNum = MAIN_FORKNUM, blockNum = 160}}, {tag = {rnode =
{spcNode = 1663,
dbNode = 69060, relNode = 412606252}, forkNum =
MAIN_FORKNUM, blockNum = 173}}, {tag = {rnode = {spcNode = 1663, dbNode =
69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 177}}, {tag
= {rnode = {
spcNode = 1663, dbNode = 69060, relNode = 412606252},
forkNum = MAIN_FORKNUM, blockNum = 191}}, {tag = {rnode = {spcNode = 1663,
dbNode = 69060, relNode = 412606257}, forkNum = MAIN_FORKNUM, blockNum =
24}}, {
tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode =
412606257}, forkNum = MAIN_FORKNUM, blockNum = 31}}, {tag = {rnode =
{spcNode = 1663, dbNode = 69060, relNode = 412606257}, forkNum =
MAIN_FORKNUM,
blockNum = 36}}, {tag = {rnode = {spcNode = 1663, dbNode =
8156905, relNode = 0}, forkNum = VISIBILITYMAP_FORKNUM, blockNum = 1}},
{tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 20971520},
forkNum = 14828, blockNum = 825242228}}, {tag = {rnode =
{spcNode = 825240888, dbNode = 540553261, relNode = 893005874}, forkNum =
942815793, blockNum = 909194542}}, {tag = {rnode = {spcNode = 858795296,
dbNode = 875649824, relNode = 1563963960}, forkNum =
844832826, blockNum = 825046578}}, {tag = {rnode = {spcNode = 1937055837,
dbNode = 742224485, relNode = 742220388}, forkNum = 1030778977,
blockNum = 1768710956}}, {tag = {rnode = {spcNode =
1031040613, dbNode = 1196379168, relNode = 1914708026}, forkNum =
1635021669, blockNum = 8156905}}, {tag = {rnode = {spcNode = 0, dbNode = 2,
relNode = 1},
forkNum = 1920409658, blockNum = 543519855}}, {tag = {rnode
= {spcNode = 6684672, dbNode = 14828, relNode = 825242228}, forkNum =
825240888, blockNum = 540553261}}, {tag = {rnode = {spcNode = 893005874,
dbNode = 943012401, relNode = 942945582}, forkNum =
858795296, blockNum = 875649824}}, {tag = {rnode = {spcNode = 1563963960,
dbNode = 844832826, relNode = 825047346}, forkNum = 1937055837,
blockNum = 742224485}}, {tag = {rnode = {spcNode =
742220388, dbNode = 1030778977, relNode = 1768710956}, forkNum =
1031040613, blockNum = 1196379168}}, {tag = {rnode = {spcNode = 1914708026,
dbNode = 1635021669,
relNode = 1869640818}, forkNum = 544501353, blockNum =
1918989427}}, {tag = {rnode = {spcNode = 1735289204, dbNode = 1769218106,
relNode = 1862952301}, forkNum = 1030513012, blockNum = 775501362}}, {tag =
{
rnode = {spcNode = 540096568, dbNode = 1931492211, relNode
= 543387257}, forkNum = 1701603686, blockNum = 859323763}}, {tag = {rnode =
{spcNode = 1814047799, dbNode = 1701277295, relNode = 809333875},
forkNum = 875573294, blockNum = 539783968}}, {tag = {rnode
= {spcNode = 1919252065, dbNode = 1030055777, relNode = 808463920}, forkNum
= 997400624, blockNum = 1936286752}}, {tag = {rnode = {spcNode =
1668178292,
dbNode = 809057637, relNode = 808794162}, forkNum =
742550304, blockNum = 1953719584}}, {tag = {rnode = {spcNode = 1952542057,
dbNode = 808533349, relNode = 960049456}, forkNum = 1114316856, blockNum =
8476938}},
{tag = {rnode = {spcNode = 0, dbNode = 8477036, relNode = 0},
forkNum = -951086535, blockNum = 32623}}, {tag = {rnode = {spcNode = 6,
dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag =
{rnode = {
spcNode = 0, dbNode = 4294967295, relNode = 4294967295},
forkNum = -951080648, blockNum = 32623}}, {tag = {rnode = {spcNode = 2,
dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag =
{rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
-1951655728, blockNum = 32765}}, {tag = {rnode = {spcNode = 2343311504,
dbNode = 32765, relNode = 32}, forkNum = MAIN_FORKNUM, blockNum =
8459196}}, {tag = {
rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 8156905, dbNode =
0, relNode = 2}, forkNum = FSM_FORKNUM, blockNum = 0}}, {tag = {rnode = {
spcNode = 0, dbNode = 17235968, relNode = 14828}, forkNum
= 825242228, blockNum = 825240888}}, {tag = {rnode = {spcNode = 540553261,
dbNode = 893005874, relNode = 942815793}, forkNum = 909194542,
blockNum = 858795296}}, {tag = {rnode = {spcNode =
875649824, dbNode = 1563963960, relNode = 844832826}, forkNum = 825046834,
blockNum = 1937055837}}, {tag = {rnode = {spcNode = 742224485, dbNode =
742220388,
relNode = 1030778977}, forkNum = 1768710956, blockNum =
1031040613}}, {tag = {rnode = {spcNode = 1196379168, dbNode = 1914708026,
relNode = 1987011429}, forkNum = 544830053, blockNum = 1953719666}}, {tag =
{
rnode = {spcNode = 544502369, dbNode = 1852403568, relNode
= 1952522356}, forkNum = 1110782752, blockNum = 959983411}}, {tag = {rnode
= {spcNode = 825312577, dbNode = 839528498, relNode = 758657328},
forkNum = 808268080, blockNum = 808591416}}, {tag = {rnode
= {spcNode = 976303418, dbNode = 892221490, relNode = 757085745}, forkNum =
1528836912, blockNum = 842544177}}, {tag = {rnode = {spcNode = 540695864,
dbNode = 942813787, relNode = 542978349}, forkNum =
1919251317, blockNum = 1650732093}}, {tag = {rnode = {spcNode = 1885416509,
dbNode = 1663843696, relNode = 1852139884}, forkNum = 1142963572,
blockNum = 1229018181}}, {tag = {rnode = {spcNode =
538982988, dbNode = 1953718636, relNode = 1836016416}, forkNum =
1952803952, blockNum = 1948279909}}, {tag = {rnode = {spcNode = 1936613746,
dbNode = 1769235297,
relNode = 1998614127}, forkNum = 1629516641, blockNum =
1869357172}}, {tag = {rnode = {spcNode = 1769218151, dbNode = 840983917,
relNode = 758657328}, forkNum = 808268080, blockNum = 808591416}}, {tag =
{rnode = {
spcNode = 976303418, dbNode = 858667058, relNode =
909523250}, forkNum = 171126829, blockNum = 10}}, {tag = {rnode = {spcNode
= 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM,
blockNum = 0}} <repeats 13 times>, {tag = {rnode = {spcNode
= 6987421, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum =
2343312048}}, {tag = {rnode = {spcNode = 32765, dbNode = 0, relNode = 0},
forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode =
{spcNode = 0, dbNode = 0, relNode = 15}, forkNum = MAIN_FORKNUM, blockNum =
3957080064}}, {tag = {rnode = {spcNode = 13747, dbNode = 6991741, relNode =
0},
forkNum = 15, blockNum = 0}}, {tag = {rnode = {spcNode =
6985347, dbNode = 0, relNode = 707}, forkNum = MAIN_FORKNUM, blockNum =
6992616}}, {tag = {rnode = {spcNode = 0, dbNode = 15, relNode = 0}, forkNum
= 15,
blockNum = 0}}, {tag = {rnode = {spcNode = 2343313312,
dbNode = 32765, relNode = 2343314560}, forkNum = 32765, blockNum = 1}},
{tag = {rnode = {spcNode = 0, dbNode = 6992908, relNode = 0}, forkNum =
2019518320,
blockNum = 6778732}}, {tag = {rnode = {spcNode = 808464432,
dbNode = 858796080, relNode = 808464432}, forkNum = 876754227, blockNum =
808464432}}, {tag = {rnode = {spcNode = 909193264, dbNode = 0, relNode =
0},
forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode =
{spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum =
0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode =
{spcNode = 0, dbNode = 0, relNode = 1}, forkNum = MAIN_FORKNUM, blockNum =
0}}, {tag = {rnode = {spcNode = 0, dbNode = 2343313776, relNode = 32765},
forkNum = -949801968, blockNum = 32623}}, {tag = {rnode =
{spcNode = 8449408, dbNode = 0, relNode = 3344033888}, forkNum = 32623,
blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
forkNum = -1951653520, blockNum = 32765}}, {tag = {rnode =
{spcNode = 2343313760, dbNode = 32765, relNode = 8449415}, forkNum =
MAIN_FORKNUM, blockNum = 4}}, {tag = {rnode = {spcNode = 0, dbNode =
8449408,
relNode = 0}, forkNum = -951086535, blockNum = 32623}},
{tag = {rnode = {spcNode = 8459912, dbNode = 0, relNode = 42329571},
forkNum = MAIN_FORKNUM, blockNum = 8459902}}, {tag = {rnode = {spcNode = 0,
dbNode = 3343880761, relNode = 32623}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 3343886648, dbNode
= 32623, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag =
{rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
-1951654688, blockNum = 32765}}, {tag = {rnode = {spcNode = 3343886648,
dbNode = 32623, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 1}}, {tag
= {rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum = -4,
blockNum = 4294967295}}, {tag = {rnode = {spcNode = 0, dbNode = 16, relNode
= 6796}, forkNum = MAIN_FORKNUM, blockNum = 3344033888}}, {tag = {rnode = {
spcNode = 32623, dbNode = 9616944, relNode = 0}, forkNum
= MAIN_FORKNUM, blockNum = 32623}}, {tag = {rnode = {spcNode = 0, dbNode =
32765, relNode = 0}, forkNum = 32765, blockNum = 2343313700}}, {tag =
{rnode = {
spcNode = 32765, dbNode = 0, relNode = 0}, forkNum = 4,
blockNum = 32623}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
forkNum = MAIN_FORKNUM, blockNum = 4294967295}}, {tag = {rnode = {
spcNode = 4294967295, dbNode = 3343886648, relNode =
32623}, forkNum = FSM_FORKNUM, blockNum = 32623}}, {tag = {rnode = {spcNode
= 9616896, dbNode = 0, relNode = 3344033888}, forkNum = 32623, blockNum =
0}}, {
tag = {rnode = {spcNode = 0, dbNode = 8449408, relNode = 0},
forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 25,
dbNode = 32765, relNode = 8449415}, forkNum = MAIN_FORKNUM, blockNum = 2}},
{tag = {
rnode = {spcNode = 0, dbNode = 9616896, relNode = 0},
forkNum = -951086535, blockNum = 32623}}, {tag = {rnode = {spcNode =
9434791, dbNode = 0, relNode = 3343880761}, forkNum = 32623, blockNum =
28}}, {tag = {
rnode = {spcNode = 0, dbNode = 2343312880, relNode =
32765}, forkNum = -951080648, blockNum = 32623}}, {tag = {rnode = {spcNode
= 3343886648, dbNode = 32623, relNode = 0}, forkNum = MAIN_FORKNUM,
blockNum = 0}}, {
tag = {rnode = {spcNode = 4294967295, dbNode = 40, relNode =
48}, forkNum = -1951652928, blockNum = 32765}}, {tag = {rnode = {spcNode =
2343314176, dbNode = 32765, relNode = 2343314208}, forkNum = 32765,
blockNum = 9434794}}, {tag = {rnode = {spcNode = 0, dbNode
= 3, relNode = 0}, forkNum = 9434791, blockNum = 0}}, {tag = {rnode =
{spcNode = 3343880761, dbNode = 32623, relNode = 58}, forkNum =
MAIN_FORKNUM,
blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 48,
relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode =
{spcNode = 0, dbNode = 0, relNode = 0}, forkNum = InvalidForkNumber,
blockNum = 2343314320}}, {tag = {rnode = {spcNode = 32765,
dbNode = 2343314304, relNode = 32765}, forkNum = 9564263, blockNum = 0}},
{tag = {rnode = {spcNode = 2343313056, dbNode = 32765, relNode =
3343886648},
forkNum = 32623, blockNum = 3343880761}}, {tag = {rnode =
{spcNode = 32623, dbNode = 3343886648, relNode = 32623}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 4294967293, dbNode
= 4294967295,
relNode = 0}, forkNum = 10, blockNum = 229}}, {tag =
{rnode = {spcNode = 0, dbNode = 3343886648, relNode = 32623}, forkNum = 32,
blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode =
{spcNode = 32765, dbNode = 2343314149, relNode = 32765}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 2343314464, dbNode
= 32765,
relNode = 2343314448}, forkNum = 32765, blockNum =
9734857}}, {tag = {rnode = {spcNode = 0, dbNode = 8448198, relNode = 0},
forkNum = 9734855, blockNum = 0}}, {tag = {rnode = {spcNode = 3343880761,
dbNode = 32623, relNode = 9616944}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 32623, dbNode = 0,
relNode = 0}, forkNum = 9434791, blockNum = 0}}, {tag = {rnode = {spcNode =
0, dbNode = 0,
relNode = 3}, forkNum = MAIN_FORKNUM, blockNum =
9434794}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum
= MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 2343313296,
dbNode = 32765,
relNode = 2343313232}, forkNum = 32765, blockNum = 0}},
{tag = {rnode = {spcNode = 32765, dbNode = 2343314608, relNode = 32765},
forkNum = MAIN_FORKNUM, blockNum = 32765}}, {tag = {rnode = {spcNode = 0,
dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM,
blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 4, relNode = 32623},
forkNum = 32, blockNum = 48}}, {tag = {rnode = {spcNode = 0, dbNode =
32765,
relNode = 0}, forkNum = 32765, blockNum = 0}}, {tag =
{rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = INIT_FORKNUM,
blockNum = 4294967295}}, {tag = {rnode = {spcNode = 3343886648, dbNode =
32623,
relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}},
{tag = {rnode = {spcNode = 0, dbNode = 4294967295, relNode = 4294967295},
forkNum = -951080648, blockNum = 32623}}, {tag = {rnode = {spcNode = 0,
dbNode = 32623, relNode = 32}, forkNum = 48, blockNum =
2343314848}}, {tag = {rnode = {spcNode = 32765, dbNode = 0, relNode = 0},
forkNum = 9734855, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode =
0,
relNode = 6}, forkNum = 32623, blockNum = 9734860}}, {tag
= {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM,
blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 32765,
relNode = 2343314533}, forkNum = 32765, blockNum = 0}},
{tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 32765}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0,
relNode = 4294967295}, forkNum = InvalidForkNumber,
blockNum = 3343886648}}, {tag = {rnode = {spcNode = 32623, dbNode = 0,
relNode = 32623}, forkNum = 16, blockNum = 48}}, {tag = {rnode = {spcNode =
2343315152,
dbNode = 32765, relNode = 2343314944}, forkNum = 32765,
blockNum = 9434791}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode =
0}, forkNum = INIT_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode =
9434794,
dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM,
blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 4294967295, relNode
= 4294967295}, forkNum = -1951652320, blockNum = 32765}}, {tag = {rnode = {
spcNode = 2343314960, dbNode = 32765, relNode =
41731220}, forkNum = MAIN_FORKNUM, blockNum = 9795134}}, {tag = {rnode =
{spcNode = 0, dbNode = 41731192, relNode = 0}, forkNum = -951086535,
blockNum = 32623}}, {
tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 4},
forkNum = 10, blockNum = 16}}, {tag = {rnode = {spcNode = 48, dbNode =
2343315296, relNode = 32765}, forkNum = -1951652208, blockNum = 32765}},
{tag = {rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
MAIN_FORKNUM, blockNum = 3}}, {tag = {rnode = {spcNode = 32623, dbNode =
3343886648, relNode = 32623}, forkNum = -1951653488, blockNum = 32765}},
{tag = {rnode = {
spcNode = 2343313744, dbNode = 32765, relNode =
4294967295}, forkNum = InvalidForkNumber, blockNum = 3343886648}}, {tag =
{rnode = {spcNode = 32623, dbNode = 0, relNode = 32623}, forkNum =
MAIN_FORKNUM,
blockNum = 48}}, {tag = {rnode = {spcNode = 0, dbNode = 0,
relNode = 0}, forkNum = 32765, blockNum = 9734855}}, {tag = {rnode =
{spcNode = 0, dbNode = 32, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum =
0}}, {
tag = {rnode = {spcNode = 3343886648, dbNode = 32623, relNode
= 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0,
dbNode = 0, relNode = 32765}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag
= {
rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
MAIN_FORKNUM, blockNum = 4294967295}}, {tag = {rnode = {spcNode =
4294967295, dbNode = 3343886648, relNode = 32623}, forkNum = 8, blockNum =
16}}, {tag = {
rnode = {spcNode = 2309886240, dbNode = 1127760177, relNode
= 2343313776}, forkNum = 32765, blockNum = 2343314384}}, {tag = {rnode =
{spcNode = 32765, dbNode = 2343313776, relNode = 32765}, forkNum = 8449408,
blockNum = 0}}, {tag = {rnode = {spcNode = 2343314152,
dbNode = 32765, relNode = 1}, forkNum = MAIN_FORKNUM, blockNum = 1023}},
{tag = {rnode = {spcNode = 0, dbNode = 2343314384, relNode = 32765},
forkNum = -950277931, blockNum = 32623}}, {tag = {rnode =
{spcNode = 4222451713, dbNode = 0, relNode = 2343314384}, forkNum = 32765,
blockNum = 2343314384}}, {tag = {rnode = {spcNode = 32765, dbNode =
2343314384,
relNode = 32765}, forkNum = -1951652912, blockNum =
32765}}, {tag = {rnode = {spcNode = 2343314409, dbNode = 32765, relNode =
2343315407}, forkNum = 32765, blockNum = 2343314384}}, {tag = {rnode = {
spcNode = 32765, dbNode = 2343315407, relNode = 32765},
forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0,
dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag =
{rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum = 12,
blockNum = 4}}, {tag = {rnode = {spcNode = 9616896, dbNode = 0, relNode =
2343305216}, forkNum = 942833661, blockNum = 0}}, {tag = {rnode = {spcNode
= 0,
dbNode = 2343314512, relNode = 32765}, forkNum =
-1951652784, blockNum = 32765}}, {tag = {rnode = {spcNode = 12, dbNode = 0,
relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 8}}, {tag = {rnode =
{spcNode = 48,
dbNode = 2343315808, relNode = 32765}, forkNum =
InvalidForkNumber, blockNum = 808615933}}, {tag = {rnode = {spcNode =
2343314048, dbNode = 32765, relNode = 2343314576}, forkNum = 32765,
blockNum = 3347493952}}, {
tag = {rnode = {spcNode = 32623, dbNode = 0, relNode = 0},
forkNum = -947468368, blockNum = 32623}}, {tag = {rnode = {spcNode =
1951653217, dbNode = 4294934530, relNode = 2343314080}, forkNum = 32765,
blockNum = 2343314079}}, {tag = {rnode = {spcNode = 32765, dbNode =
2343314624, relNode = 32765}, forkNum = 12, blockNum = 0}}, {tag = {rnode =
{spcNode = 9616896, dbNode = 0, relNode = 2343314408},
forkNum = 32765, blockNum = 1}}, {tag = {rnode = {spcNode =
0, dbNode = 2343314096, relNode = 32765}, forkNum = 6796, blockNum = 0}},
{tag = {rnode = {spcNode = 155648, dbNode = 0, relNode = 526916352},
forkNum = 32620, blockNum = 6}}, {tag = {rnode = {spcNode =
0, dbNode = 3347498848, relNode = 32623}, forkNum = -947468448, blockNum =
32623}}, {tag = {rnode = {spcNode = 1, dbNode = 0, relNode = 1951653025},
forkNum = -32766, blockNum = 2052}}, {tag = {rnode =
{spcNode = 0, dbNode = 513, relNode = 0}, forkNum = 64, blockNum = 0}},
{tag = {rnode = {spcNode = 561, dbNode = 155, relNode = 2343314272},
forkNum = 32765,
blockNum = 8}}, {tag = {rnode = {spcNode = 0, dbNode = 1,
relNode = 0}, forkNum = 118, blockNum = 120}}, {tag = {rnode = {spcNode =
0, dbNode = 0, relNode = 2343314271}, forkNum = 32765, blockNum = 1}}, {tag
= {
rnode = {spcNode = 0, dbNode = 118, relNode = 120}, forkNum
= MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 124, dbNode =
32765, relNode = 6987421}, forkNum = MAIN_FORKNUM, blockNum = 4036731840}},
{
tag = {rnode = {spcNode = 13747, dbNode = 2343314384, relNode
= 32765}, forkNum = 124, blockNum = 0}}, {tag = {rnode = {spcNode = 384,
dbNode = 0, relNode = 6}, forkNum = MAIN_FORKNUM, blockNum = 3347498848}}, {
tag = {rnode = {spcNode = 32623, dbNode = 42324488, relNode =
0}, forkNum = 42324506, blockNum = 0}}, {tag = {rnode = {spcNode = 4,
dbNode = 0, relNode = 3344354262}, forkNum = 32623, blockNum = 42324416}},
{tag = {
rnode = {spcNode = 0, dbNode = 42324464, relNode = 0},
forkNum = -192, blockNum = 4294967295}}, {tag = {rnode = {spcNode = 0,
dbNode = 0, relNode = 3}, forkNum = MAIN_FORKNUM, blockNum = 3344353169}},
{tag = {
rnode = {spcNode = 32623, dbNode = 128, relNode = 0},
forkNum = INIT_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 128,
dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 4029030200}},
{tag = {
rnode = {spcNode = 13747, dbNode = 3, relNode = 0}, forkNum
= INIT_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 3344353005,
dbNode = 32623, relNode = 41743440}, forkNum = MAIN_FORKNUM, blockNum =
6985401}}, {
tag = {rnode = {spcNode = 0, dbNode = 42324416, relNode = 0},
forkNum = 4895616, blockNum = 0}}, {tag = {rnode = {spcNode = 41743440,
dbNode = 0, relNode = 6985401}, forkNum = MAIN_FORKNUM, blockNum =
42324416}}, {
tag = {rnode = {spcNode = 0, dbNode = 6876528, relNode = 0},
forkNum = 41733255, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode =
0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 2343314816, dbNode
= 4, relNode = 3343787445}, forkNum = 32623, blockNum = 41680896}}, {tag =
{rnode = {
spcNode = 0, dbNode = 0, relNode = 0}, forkNum =
41867936, blockNum = 0}}...}}
#5 0x00000000006a8477 in CheckPointBuffers (flags=flags@entry=128) at
bufmgr.c:2578
No locals.
#6 0x00000000004dd781 in CheckPointGuts (checkPointRedo=<optimized out>,
flags=<optimized out>) at xlog.c:8698
No locals.
#7 0x00000000004e9faf in CreateRestartPoint (flags=<optimized out>) at
xlog.c:8856
lastCheckPointRecPtr = <optimized out>
lastCheckPointEndPtr = <optimized out>
lastCheckPoint = <optimized out>
PriorRedoPtr = <optimized out>
xtime = <optimized out>
__func__ = "CreateRestartPoint"
#8 0x000000000066977c in CheckpointerMain () at checkpointer.c:490
ckpt_performed = 0 '\000'
do_restartpoint = 1 '\001'
flags = 128
do_checkpoint = <optimized out>
now = 1515455518
elapsed_secs = 300
cur_timeout = <optimized out>
rc = <optimized out>
local_sigjmp_buf = {{__jmpbuf = {140726946769872,
-6702958939962280073, 2, 8447397, 0, 41868768, -6702958940010514569,
6701736465640912759}, __mask_was_saved = 1, __saved_mask = {__val =
{18446744066192964103,
140726946769792, 2, 8447397, 0, 41868768, 6968122, 0, 0, 0,
0, 0, 4, 8, 6700557, 1}}}}
checkpointer_context = 0x27cbb88
__func__ = "CheckpointerMain"
#9 0x00000000004f2820 in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffd8bac2b80) at bootstrap.c:429
progname = 0x80e5a5 "postgres"
flag = <optimized out>
userDoption = 0x0
__func__ = "AuxiliaryProcessMain"
#10 0x0000000000673330 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5252
pid = <optimized out>
av = {0x80e5a5 "postgres", 0x7ffd8bac2bd0 "-x4", 0x0,
0xffec105a240ead00 <Address 0xffec105a240ead00 out of bounds>, 0x1 <Address
0x1 out of bounds>, 0x39e6 <Address 0x39e6 out of bounds>, 0x7ffd8bac2bf4
"",
0x27ede00 "\260\215\301", 0x7f6fc9248800 "", 0x0}
ac = 2
typebuf =
"-x4\000\000\000\000\000vCg\000\000\000\000\000\360>N\307o\177\000\000\065\233N\307o\177\000"
#11 0x0000000000674b1f in sigusr1_handler (postgres_signal_arg=<optimized
out>) at postmaster.c:4949
save_errno = 0
__func__ = "sigusr1_handler"
#12 <signal handler called>
No symbol table info available.
#13 0x00007f6fc75a0b83 in __select_nocancel () from /lib64/libc.so.6
No symbol table info available.
#14 0x000000000046ef32 in ServerLoop () at postmaster.c:1683
timeout = {tv_sec = 59, tv_usec = 999708}
rmask = {fds_bits = {120, 0 <repeats 15 times>}}
selres = <optimized out>
now = <optimized out>
readmask = {fds_bits = {120, 0 <repeats 15 times>}}
last_lockfile_recheck_time = 1515438348
last_touch_time = 1515438348
__func__ = "ServerLoop"
#15 0x0000000000675b69 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x27ca210)
at postmaster.c:1327
opt = <optimized out>
status = <optimized out>
userDoption = <optimized out>
listen_addr_saved = <optimized out>
i = <optimized out>
output_config_variable = <optimized out>
__func__ = "PostmasterMain"
#16 0x000000000047053e in main (argc=3, argv=0x27ca210) at main.c:228

Kind Regards,
Glauco Torres

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Glauco Torres (#1)
Re: Segmentation fault with core dump

Glauco Torres <torres.glauco@gmail.com> writes:

(gdb) bt
#0 ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
#1 0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
<ckpt_buforder_comparator>)
at qsort.c:107
#2 0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:157
#3 0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:203
#4 0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863

Hm. I'm not normally one to jump to the conclusion that something is a
compiler bug, but it's hard to explain this stack trace any other way.
The value of "n" passed to the inner invocation of pg_qsort should not
have been more than 29914, but working from either the value of d or the
value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
looks a lot more like an address in the array than a proper value of n.

I suppose this might be due to a corrupted copy of the postgres executable
rather than an actual compiler bug. Did you build it yourself?

BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
see the same block ID twice in the array, which I think is an
unsupportable assumption. But I cannot see a way that that could lead
to a crash in pg_qsort --- at worst it might cause a little inefficiency.

regards, tom lane

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#2)
Re: Segmentation fault with core dump

On Wed, Jan 10, 2018 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Glauco Torres <torres.glauco@gmail.com> writes:

(gdb) bt
#0 ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
#1 0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
<ckpt_buforder_comparator>)
at qsort.c:107
#2 0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:157
#3 0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:203
#4 0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863

Hm. I'm not normally one to jump to the conclusion that something is a
compiler bug, but it's hard to explain this stack trace any other way.
The value of "n" passed to the inner invocation of pg_qsort should not
have been more than 29914, but working from either the value of d or the
value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
looks a lot more like an address in the array than a proper value of n.

I suppose this might be due to a corrupted copy of the postgres executable
rather than an actual compiler bug. Did you build it yourself?

BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
see the same block ID twice in the array, which I think is an
unsupportable assumption. But I cannot see a way that that could lead
to a crash in pg_qsort --- at worst it might cause a little inefficiency.

simple
SELECT version();
...can give a lot of hints on who/what compiled the database if you don't know.

merlin

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Segmentation fault with core dump

Tom Lane wrote:

Glauco Torres <torres.glauco@gmail.com> writes:

(gdb) bt
#0 ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
pb=pb@entry=0x1be06d2d06644)
at bufmgr.c:4137
#1 0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
<ckpt_buforder_comparator>)
at qsort.c:107
#2 0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:157
#3 0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
<ckpt_buforder_comparator>) at qsort.c:203
#4 0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
bufmgr.c:1863

Hm. I'm not normally one to jump to the conclusion that something is a
compiler bug, but it's hard to explain this stack trace any other way.
The value of "n" passed to the inner invocation of pg_qsort should not
have been more than 29914, but working from either the value of d or the
value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
looks a lot more like an address in the array than a proper value of n.

I suppose this might be due to a corrupted copy of the postgres executable
rather than an actual compiler bug. Did you build it yourself?

Hmm, is this something that can be explained by using a different
postgres executable in GDB than the one that produced the core file?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: Segmentation fault with core dump

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

Hm. I'm not normally one to jump to the conclusion that something is a
compiler bug, but it's hard to explain this stack trace any other way.
The value of "n" passed to the inner invocation of pg_qsort should not
have been more than 29914, but working from either the value of d or the
value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
looks a lot more like an address in the array than a proper value of n.

Hmm, is this something that can be explained by using a different
postgres executable in GDB than the one that produced the core file?

That would result in nonsensical gdb output, most likely; but Glauco's
trace is internally consistent enough that I doubt gdb is lying to us.
In any case, the crash is an observable fact :-(

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Merlin Moncure (#3)
Re: Segmentation fault with core dump

Merlin Moncure wrote:

simple
SELECT version();
...can give a lot of hints on who/what compiled the database if you don't know.

Probably, this is why Glauco included the output in his opening letter.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Glauco Torres
torres.glauco@gmail.com
In reply to: Tom Lane (#5)
Re: Segmentation fault with core dump

That would result in nonsensical gdb output, most likely; but Glauco's
trace is internally consistent enough that I doubt gdb is lying to us.
In any case, the crash is an observable fact :-(

The system is a CentOS 7, and PG was installed using PGDG's YUM repository.

We are pretty sure that the same binary that crashed was using on `gdb`
command. More specifically, the path used was
`/usr/pgsql-9.6/bin/postmaster`, and we were running 9.6.6 (most recent 9.6
minor release today) for a few weeks, so there shouldn't have any upgrade
on the binaries since the server was up, specially because we restarted the
service in order to allow core dump creation, this is not the first crash
(although the only one with core dump generated so far), we can send new
gdb stack if it happens again.

More information:
$ uname -a
Linux pg-iii.br 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux

$ /usr/pgsql-9.6/bin/pg_config
BINDIR = /usr/pgsql-9.6/bin
DOCDIR = /usr/pgsql-9.6/doc
HTMLDIR = /usr/pgsql-9.6/doc/html
INCLUDEDIR = /usr/pgsql-9.6/include
PKGINCLUDEDIR = /usr/pgsql-9.6/include
INCLUDEDIR-SERVER = /usr/pgsql-9.6/include/server
LIBDIR = /usr/pgsql-9.6/lib
PKGLIBDIR = /usr/pgsql-9.6/lib
LOCALEDIR = /usr/pgsql-9.6/share/locale
MANDIR = /usr/pgsql-9.6/share/man
SHAREDIR = /usr/pgsql-9.6/share
SYSCONFDIR = /etc/sysconfig/pgsql
PGXS = /usr/pgsql-9.6/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-rpath' '--prefix=/usr/pgsql-9.6'
'--includedir=/usr/pgsql-9.6/include' '--mandir=/usr/pgsql-9.6/share/man'
'--datadir=/usr/pgsql-9.6/share' '--libdir=/usr/pgsql-9.6/lib'
'--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64'
'--with-openssl' '--with-pam' '--with-gssapi'
'--with-includes=/usr/include' '--with-libraries=/usr/lib64' '--enable-nls'
'--enable-dtrace' '--with-uuid=e2fs' '--with-libxml' '--with-libxslt'
'--with-ldap' '--with-selinux' '--with-systemd'
'--with-system-tzdata=/usr/share/zoneinfo'
'--sysconfdir=/etc/sysconfig/pgsql' '--docdir=/usr/pgsql-9.6/doc'
'--htmldir=/usr/pgsql-9.6/doc/html' 'CFLAGS=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic'
'LDFLAGS=-Wl,--as-needed'
CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic
CFLAGS_SL = -fPIC
LDFLAGS = -L../../src/common -Wl,--as-needed -L/usr/lib64 -Wl,--as-needed
-Wl,-rpath,'/usr/pgsql-9.6/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lselinux -lxslt -lxml2 -lpam -lssl -lcrypto
-lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 9.6.6

Regards,
Glauco

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Glauco Torres (#7)
Re: Segmentation fault with core dump

Glauco Torres <torres.glauco@gmail.com> writes:

The system is a CentOS 7, and PG was installed using PGDG's YUM repository.

Might be worth comparing sha1sum's of the postgres executable between
this server and one that's not having the problem, just to eliminate
the corrupted-binary theory.

regards, tom lane

#9Glauco Torres
torres.glauco@gmail.com
In reply to: Tom Lane (#8)
Re: Segmentation fault with core dump

Might be worth comparing sha1sum's of the postgres executable between
this server and one that's not having the problem, just to eliminate
the corrupted-binary theory.

The return is the same for the two servers,

$ sha1sum /usr/pgsql-9.6/bin/postmaster
56bcb4d644a8b00f07e9bd42f9a3f02be7ff2523 /usr/pgsql-9.6/bin/postmaster

Today I left to generate more core-dump, follow the return,

(gdb) bt
#0 tbm_comparator (left=left@entry=0x1d5ca08, right=right@entry=0x3acdb70)
at tidbitmap.c:1031
#1 0x0000000000801268 in med3 (a=0x1d5ca08 "\350>\337\001", b=0x3acdb70
<Address 0x3acdb70 out of bounds>, c=0x583ecd8 <Address 0x583ecd8 out of
bounds>, cmp=0x603ca0 <tbm_comparator>) at qsort.c:107
#2 0x0000000000801621 in pg_qsort (a=0x1d5ca08, n=<optimized out>,
n@entry=10477,
es=es@entry=8, cmp=cmp@entry=0x603ca0 <tbm_comparator>) at qsort.c:157
#3 0x0000000000604a7b in tbm_begin_iterate (tbm=tbm@entry=0x1dd8a00) at
tidbitmap.c:635
#4 0x00000000005d3a89 in BitmapHeapNext (node=node@entry=0x1dc2ef0) at
nodeBitmapHeapscan.c:110
#5 0x00000000005caf1a in ExecScanFetch (recheckMtd=0x5d35b0
<BitmapHeapRecheck>, accessMtd=0x5d35f0 <BitmapHeapNext>, node=0x1dc2ef0)
at execScan.c:95
#6 ExecScan (node=node@entry=0x1dc2ef0, accessMtd=accessMtd@entry=0x5d35f0
<BitmapHeapNext>, recheckMtd=recheckMtd@entry=0x5d35b0 <BitmapHeapRecheck>)
at execScan.c:180
#7 0x00000000005d3cff in ExecBitmapHeapScan (node=node@entry=0x1dc2ef0) at
nodeBitmapHeapscan.c:440
#8 0x00000000005c3fb8 in ExecProcNode (node=node@entry=0x1dc2ef0) at
execProcnode.c:437
#9 0x00000000005de877 in ExecNestLoop (node=node@entry=0x1dc0148) at
nodeNestloop.c:174
#10 0x00000000005c3f28 in ExecProcNode (node=node@entry=0x1dc0148) at
execProcnode.c:476
#11 0x00000000005de7d7 in ExecNestLoop (node=node@entry=0x1dbfdd8) at
nodeNestloop.c:123
#12 0x00000000005c3f28 in ExecProcNode (node=node@entry=0x1dbfdd8) at
execProcnode.c:476
#13 0x00000000005d624d in MultiExecHash (node=node@entry=0x1dbf9b8) at
nodeHash.c:104
#14 0x00000000005c40c0 in MultiExecProcNode (node=node@entry=0x1dbf9b8) at
execProcnode.c:577
#15 0x00000000005d6cb9 in ExecHashJoin (node=node@entry=0x1dbe688) at
nodeHashjoin.c:178
#16 0x00000000005c3f08 in ExecProcNode (node=node@entry=0x1dbe688) at
execProcnode.c:484
#17 0x00000000005de7d7 in ExecNestLoop (node=node@entry=0x1dbc6e0) at
nodeNestloop.c:123
#18 0x00000000005c3f28 in ExecProcNode (node=node@entry=0x1dbc6e0) at
execProcnode.c:476
#19 0x00000000005de7d7 in ExecNestLoop (node=node@entry=0x1dbc520) at
nodeNestloop.c:123
#20 0x00000000005c3f28 in ExecProcNode (node=0x1dbc520) at
execProcnode.c:476
#21 0x00000000005cf619 in fetch_input_tuple (aggstate=aggstate@entry=0x1dbbc48)
at nodeAgg.c:598
#22 0x00000000005d10ff in agg_retrieve_direct (aggstate=0x1dbbc48) at
nodeAgg.c:2067
#23 ExecAgg (node=node@entry=0x1dbbc48) at nodeAgg.c:1892
#24 0x00000000005c3ec8 in ExecProcNode (node=node@entry=0x1dbbc48) at
execProcnode.c:503
#25 0x00000000005c03a7 in ExecutePlan (dest=0x1b607c8, direction=<optimized
out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x1dbbc48, estate=0x1dbba58)
at execMain.c:1566
#26 standard_ExecutorRun (queryDesc=0x1c98de0, direction=<optimized out>,
count=0) at execMain.c:338
#27 0x00007f016577e0a5 in pgss_ExecutorRun (queryDesc=0x1c98de0,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:877
#28 0x00000000006d3a97 in PortalRunSelect (portal=portal@entry=0x1ad9278,
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,
dest=dest@entry=0x1b607c8) at pquery.c:948
#29 0x00000000006d4eab in PortalRun (portal=0x1ad9278,
count=9223372036854775807, isTopLevel=<optimized out>, dest=0x1b607c8,
altdest=0x1b607c8, completionTag=0x7ffdfe32a700 "") at pquery.c:789
#30 0x00000000006d2371 in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:1969
#31 0x000000000046f8d4 in BackendRun (port=0x1add6d0) at postmaster.c:4294
#32 BackendStartup (port=0x1add6d0) at postmaster.c:3968
#33 ServerLoop () at postmaster.c:1719
#34 0x0000000000675b69 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1ab1210)
at postmaster.c:1327
#35 0x000000000047053e in main (argc=3, argv=0x1ab1210) at main.c:228

Regards,
Glauco

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Glauco Torres (#9)
Re: Segmentation fault with core dump

Glauco Torres <torres.glauco@gmail.com> writes:

Today I left to generate more core-dump, follow the return,

(gdb) bt
#0 tbm_comparator (left=left@entry=0x1d5ca08, right=right@entry=0x3acdb70)
at tidbitmap.c:1031
#1 0x0000000000801268 in med3 (a=0x1d5ca08 "\350>\337\001", b=0x3acdb70
<Address 0x3acdb70 out of bounds>, c=0x583ecd8 <Address 0x583ecd8 out of
bounds>, cmp=0x603ca0 <tbm_comparator>) at qsort.c:107
#2 0x0000000000801621 in pg_qsort (a=0x1d5ca08, n=<optimized out>,
n@entry=10477,
es=es@entry=8, cmp=cmp@entry=0x603ca0 <tbm_comparator>) at qsort.c:157
#3 0x0000000000604a7b in tbm_begin_iterate (tbm=tbm@entry=0x1dd8a00) at
tidbitmap.c:635

Oh ho! I was wondering to myself "if pg_qsort is broken, why isn't
his system falling over everywhere?". The answer evidently is
"yes, it is falling over everywhere". This symptom looks pretty much
like what you had before, ie far-out-of-range addresses getting passed
to med3(), but the qsort call site is completely different.

Since you've eliminated the idea that the executable file per se
is different from your working servers, I think we're now down to
the conclusion that there's something flaky about the hardware
on this server. Maybe it's misexecuting integer divide every so
often --- though it's hard to guess why only pg_qsort would be
affected.

regards, tom lane