pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Started by Bruce Momjianover 14 years ago30 messages

Bruce Momjian

bruce@momjian.us

over 14 years ago

Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
builfarm member.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d5321842528dfb73f8254a48556b4adb1b6d1c5a

Modified Files
--------------
contrib/cube/cubedata.h | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Bruce Momjian (#1)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Bruce Momjian <bruce@momjian.us> writes:

Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
builfarm member.

mongoose is still crashing, so it must have been some other aspect of
commit 4bd7333 that caused this.

regards, tom lane

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Tom Lane (#2)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
builfarm member.

mongoose is still crashing, so it must have been some other aspect of
commit 4bd7333 that caused this.

Agreed. Let me look some more. Odd this succeeds:

okapi Gentoo 1.12.14 icc 11.1.072 x86_64

but this fails:

mongoose Gentoo 1.6.14 icc 9.0.032 i686

The backtrace:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01

shows it failing on this line:

size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;

so I wonder if this is some compiler bug. offsetof is:

((long) &((type *)0)->field)

and the struct is:

typedef struct NDBOX
{
int32 vl_len_; /* varlena header (do not touch directly!) */
unsigned int dim;
double x[1];
} NDBOX;

That "x" is quite a common symbol. Is there any way to get access to
this machine? Should I just revert it all and see what happens?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Jeremy Drake

pgsql@jdrake.com

over 14 years ago

In reply to: Bruce Momjian (#3)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Thu, 1 Sep 2011, Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
builfarm member.

mongoose is still crashing, so it must have been some other aspect of
commit 4bd7333 that caused this.

Agreed. Let me look some more. Odd this succeeds:

okapi Gentoo 1.12.14 icc 11.1.072 x86_64

but this fails:

mongoose Gentoo 1.6.14 icc 9.0.032 i686

The backtrace:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01

shows it failing on this line:

size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;

so I wonder if this is some compiler bug. offsetof is:

((long) &((type *)0)->field)

and the struct is:

typedef struct NDBOX
{
int32 vl_len_; /* varlena header (do not touch directly!) */
unsigned int dim;
double x[1];
} NDBOX;

That "x" is quite a common symbol. Is there any way to get access to
this machine? Should I just revert it all and see what happens?

I am the owner of both mongoose and okapi. Let me know if there's
anything you want me to try.

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Jeremy Drake (#4)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake wrote:

On Thu, 1 Sep 2011, Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Remove "fmgr.h" include in cube contrib --- caused crash on a Gentoo
builfarm member.

mongoose is still crashing, so it must have been some other aspect of
commit 4bd7333 that caused this.

Agreed. Let me look some more. Odd this succeeds:

okapi Gentoo 1.12.14 icc 11.1.072 x86_64

but this fails:

mongoose Gentoo 1.6.14 icc 9.0.032 i686

The backtrace:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2011-09-01%2013%3A45%3A01

shows it failing on this line:

size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;

so I wonder if this is some compiler bug. offsetof is:

((long) &((type *)0)->field)

and the struct is:

typedef struct NDBOX
{
int32 vl_len_; /* varlena header (do not touch directly!) */
unsigned int dim;
double x[1];
} NDBOX;

That "x" is quite a common symbol. Is there any way to get access to
this machine? Should I just revert it all and see what happens?

I am the owner of both mongoose and okapi. Let me know if there's
anything you want me to try.

Thanks. I would either like to email you patches to test or get ssh
access so I can compile it myself.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Jeremy Drake

pgsql@jdrake.com

over 14 years ago

In reply to: Bruce Momjian (#5)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Thu, 1 Sep 2011, Bruce Momjian wrote:

Jeremy Drake wrote:

I am the owner of both mongoose and okapi. Let me know if there's
anything you want me to try.

Thanks. I would either like to email you patches to test or get ssh
access so I can compile it myself.

You can send me patches if you want, but I spent a little time with it
tonight and it seems to be the change to src/include/access/xlog.h:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/include/access/xlog.h;h=1fd60fb98d7362b677112517a20a41e32227a34f;hp=cdbf63fa76e0e7b154c084191d0df6138e1cbfcc;hb=4bd7333;hpb=d010391ac8f706e17998671534ca1230f68d2f38

Unfortunately, I also had to revert commit
6416a82a62db4e66b2edb0fa8fc83a580c3f1931 to fix compile errors. I expect
you would be able to do something a little more surgical than that...

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Jeremy Drake (#6)

1 attachment(s)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake wrote:

On Thu, 1 Sep 2011, Bruce Momjian wrote:

Jeremy Drake wrote:

I am the owner of both mongoose and okapi. Let me know if there's
anything you want me to try.

Thanks. I would either like to email you patches to test or get ssh
access so I can compile it myself.

You can send me patches if you want, but I spent a little time with it
tonight and it seems to be the change to src/include/access/xlog.h:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/include/access/xlog.h;h=1fd60fb98d7362b677112517a20a41e32227a34f;hp=cdbf63fa76e0e7b154c084191d0df6138e1cbfcc;hb=4bd7333;hpb=d010391ac8f706e17998671534ca1230f68d2f38

Unfortunately, I also had to revert commit
6416a82a62db4e66b2edb0fa8fc83a580c3f1931 to fix compile errors. I expect
you would be able to do something a little more surgical than that...

Wow, that is interesting. So the problem is the inclusion of
replication/walsender.h in xlog.h. Hard to see how that could cause the
cube regression tests to fail, but of course, it is.

I noticed you are using these compile flags:

'CFLAGS' => '-O3 -xN -parallel -ip',
'CC' => 'icc'

Can you test it with lower optimizations?

I looked at the contrib/cube compile messages and didn't see anything
unusual.

The only other idea I have is to try the attached patch which changes
the offsetof() call to mention a struct field name, and not the first
element of the field. However, I see other uses of accessing the
element of a struct field, so I might be wrong here.

I will say that our buildfarm is great at giving developers information
to diagnose the cause of failures! It just isn't helping me find the
cause in this particular case.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

/rtmp/cube.difftext/x-diffDownload

diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
new file mode 100644
index b0564f7..5159be4
*** a/contrib/cube/cube.c
--- b/contrib/cube/cube.c
*************** cube_a_f8_f8(PG_FUNCTION_ARGS)
*** 199,205 ****
  	dur = ARRPTR(ur);
  	dll = ARRPTR(ll);
  
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
--- 199,205 ----
  	dur = ARRPTR(ur);
  	dll = ARRPTR(ll);
  
! 	size = offsetof(NDBOX, x) +sizeof(double) * 2 * dim;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
*************** cube_a_f8(PG_FUNCTION_ARGS)
*** 235,241 ****
  
  	dur = ARRPTR(ur);
  
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
--- 235,241 ----
  
  	dur = ARRPTR(ur);
  
! 	size = offsetof(NDBOX, x) +sizeof(double) * 2 * dim;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
*************** cube_subset(PG_FUNCTION_ARGS)
*** 268,274 ****
  	dx = (int4 *) ARR_DATA_PTR(idx);
  
  	dim = ARRNELEMS(idx);
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2 * dim;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
--- 268,274 ----
  	dx = (int4 *) ARR_DATA_PTR(idx);
  
  	dim = ARRNELEMS(idx);
! 	size = offsetof(NDBOX, x) +sizeof(double) * 2 * dim;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
*************** cube_enlarge(PG_FUNCTION_ARGS)
*** 1373,1379 ****
  		dim = n;
  	if (a->dim > dim)
  		dim = a->dim;
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * dim * 2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
--- 1373,1379 ----
  		dim = n;
  	if (a->dim > dim)
  		dim = a->dim;
! 	size = offsetof(NDBOX, x) +sizeof(double) * dim * 2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = dim;
*************** cube_f8(PG_FUNCTION_ARGS)
*** 1414,1420 ****
  	NDBOX	   *result;
  	int			size;
  
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = 1;
--- 1414,1420 ----
  	NDBOX	   *result;
  	int			size;
  
! 	size = offsetof(NDBOX, x) +sizeof(double) * 2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = 1;
*************** cube_f8_f8(PG_FUNCTION_ARGS)
*** 1432,1438 ****
  	NDBOX	   *result;
  	int			size;
  
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * 2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = 1;
--- 1432,1438 ----
  	NDBOX	   *result;
  	int			size;
  
! 	size = offsetof(NDBOX, x) +sizeof(double) * 2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = 1;
*************** cube_c_f8(PG_FUNCTION_ARGS)
*** 1453,1459 ****
  	int			size;
  	int			i;
  
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * (c->dim + 1) *2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = c->dim + 1;
--- 1453,1459 ----
  	int			size;
  	int			i;
  
! 	size = offsetof(NDBOX, x) +sizeof(double) * (c->dim + 1) *2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = c->dim + 1;
*************** cube_c_f8_f8(PG_FUNCTION_ARGS)
*** 1480,1486 ****
  	int			size;
  	int			i;
  
! 	size = offsetof(NDBOX, x[0]) +sizeof(double) * (c->dim + 1) *2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = c->dim + 1;
--- 1480,1486 ----
  	int			size;
  	int			i;
  
! 	size = offsetof(NDBOX, x) +sizeof(double) * (c->dim + 1) *2;
  	result = (NDBOX *) palloc0(size);
  	SET_VARSIZE(result, size);
  	result->dim = c->dim + 1;

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Bruce Momjian (#7)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:

The only other idea I have is to try the attached patch which changes
the offsetof() call to mention a struct field name, and not the first
element of the field. However, I see other uses of accessing the
element of a struct field, so I might be wrong here.

I wonder if this would be the right time to start using the
FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube. Note pg_config.h.in says

/* Define to nothing if C supports flexible array members, and to 1 if it does
not. That way, with a declaration like `struct s { int n; double
d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
compilers. When computing the size of such an object, don't use 'sizeof
(struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
MSVC and with C++ compilers. */
#undef FLEXIBLE_ARRAY_MEMBER

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#8)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if this would be the right time to start using the
FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube. Note pg_config.h.in says

/* Define to nothing if C supports flexible array members, and to 1 if it does
not. That way, with a declaration like `struct s { int n; double
d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
compilers. When computing the size of such an object, don't use 'sizeof
(struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
MSVC and with C++ compilers. */

D'oh ... I bet that last sentence is pointing us at the problem. cube
is using exactly that construct, and for some reason it's crashing.
The most likely explanation for why it's crashing is that the compiler
is trying to dereference NULL instead of successfully reducing
offsetof() to a compile-time constant. It's still not too clear to me
why the inclusion changes cause that, but certainly walsender.h is
pulling in a crapload of other stuff that was not previously included
there ... which connects back to my previous complaints that I think
Bruce was way too aggressive in adding #includes to headers.

Jeremy, could you look at the preprocessor output for cube.c (ie,
use -E instead of -c in the gcc call) and see how the relevant line
of cube_f8_f8 looks in both broken and non-broken cases? What I see
on a Fedora box is

size = __builtin_offsetof (NDBOX, x[0]) +sizeof(double) * 2;

but I'm thinking you might be getting something different.

regards, tom lane

#10

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Tom Lane (#9)

1 attachment(s)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I wonder if this would be the right time to start using the
FLEXIBLE_ARRAY_MEMBER stuff in contrib/cube. Note pg_config.h.in says

/* Define to nothing if C supports flexible array members, and to 1 if it does
not. That way, with a declaration like `struct s { int n; double
d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
compilers. When computing the size of such an object, don't use 'sizeof
(struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
MSVC and with C++ compilers. */

D'oh ... I bet that last sentence is pointing us at the problem. cube
is using exactly that construct, and for some reason it's crashing.
The most likely explanation for why it's crashing is that the compiler
is trying to dereference NULL instead of successfully reducing
offsetof() to a compile-time constant. It's still not too clear to me
why the inclusion changes cause that, but certainly walsender.h is
pulling in a crapload of other stuff that was not previously included
there ... which connects back to my previous complaints that I think
Bruce was way too aggressive in adding #includes to headers.

Jeremy, could you look at the preprocessor output for cube.c (ie,
use -E instead of -c in the gcc call) and see how the relevant line
of cube_f8_f8 looks in both broken and non-broken cases? What I see
on a Fedora box is

size = __builtin_offsetof (NDBOX, x[0]) +sizeof(double) * 2;

but I'm thinking you might be getting something different.

I see 35 instances of this coding, and only 12 are in contrib/cube;
examples attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#11

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Bruce Momjian (#10)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
MSVC and with C++ compilers. */

D'oh ... I bet that last sentence is pointing us at the problem. cube
is using exactly that construct, and for some reason it's crashing.

I see 35 instances of this coding, and only 12 are in contrib/cube;
examples attached.

Yeah, so the next question would be why those other ones aren't showing
problems. But at least now we have a potential mechanism for getting
from "the include list changed" to "cube is crashing on an offsetof",
namely that something is affecting the expansion of the offsetof macro.
Up to now it's been black magic, and I don't like patching around
problems we don't understand any better than that.

regards, tom lane

#12

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Bruce Momjian (#7)

Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:

Wow, that is interesting. So the problem is the inclusion of
replication/walsender.h in xlog.h. Hard to see how that could cause the
cube regression tests to fail, but of course, it is.

Hmm, so you included walsender.h into xlog.h? That seems a bit funny
considering that walsender.h already includes xlog.h. It seems the
reason for this is only the AllowCascadeReplication() definition. Maybe
that should go elsewhere instead, for example walsender.h?

I wonder if there should be a new header, something like
walsender_internal.h, for stuff like WalSnd and WalSndCtlData struct
defs.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#13

Jeremy Drake

pgsql@jdrake.com

over 14 years ago

In reply to: Tom Lane (#11)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Fri, 2 Sep 2011, Tom Lane wrote:

Yeah, so the next question would be why those other ones aren't showing
problems. But at least now we have a potential mechanism for getting
from "the include list changed" to "cube is crashing on an offsetof",
namely that something is affecting the expansion of the offsetof macro.
Up to now it's been black magic, and I don't like patching around
problems we don't understand any better than that.

I'm going to try answering about 3 emails at once here, so I apologize for
the confusion.

Checking preprocessor output on cube_f8_f8 between working and broken, the
output was identical for that function.

Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
behavior.

Then I started investigating the disassembly. The PIC disassembly was
making my head hurt (why is the compiler calling the next instruction,
popping the return address into a register, and using it as a pointer?
Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
attempting to access the global external variable CurrentMemoryContext.
The odd thing is, that the disassembly code between the working and
non-working was the same, except for the offsets.

Here's the disassembly output from the core dump:
Program terminated with signal 11, Segmentation fault.
#0 cube_f8_f8 () at cube.c:1435
1435 size = offsetof(NDBOX, x) +sizeof(double) * 2;
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function cube_f8_f8:
0xb776ab50 <+0>: push ebp
0xb776ab51 <+1>: mov ebp,esp
0xb776ab53 <+3>: and esp,0xfffffff8
0xb776ab56 <+6>: push ebx
0xb776ab57 <+7>: sub esp,0x14
0xb776ab5a <+10>: mov ecx,DWORD PTR [ebp+0x8]
0xb776ab5d <+13>: mov DWORD PTR [esp+0x10],ebx
0xb776ab61 <+17>: mov edx,DWORD PTR [ecx+0x14]
0xb776ab64 <+20>: movsd xmm0,QWORD PTR [edx]
0xb776ab68 <+24>: mov edx,DWORD PTR [ecx+0x18]
0xb776ab6b <+27>: movsd xmm1,QWORD PTR [edx]
0xb776ab6f <+31>: movsd QWORD PTR [esp],xmm0
0xb776ab74 <+36>: movsd QWORD PTR [esp+0x8],xmm1
0xb776ab7a <+42>: push 0x18
0xb776ab7c <+44>: call 0xb776ab81 <cube_f8_f8+49>
0xb776ab81 <+49>: pop eax
0xb776ab82 <+50>: add eax,0x9472
0xb776ab87 <+55>: mov edx,DWORD PTR [eax-0x58]
=> 0xb776ab8d <+61>: push DWORD PTR [edx]
0xb776ab8f <+63>: mov DWORD PTR [esp+0x18],ebx
0xb776ab93 <+67>: mov ebx,eax
0xb776ab95 <+69>: call 0xb776745c <MemoryContextAllocZero@plt>

With this knowledge, I thought maybe cube wasn't the actual problem, but
just happened to be early in the list of contrib modules being tested. So
I arbitrarily picked another contrib module to test, intarray.

============== running regression test queries ==============
test _int ... FAILED (test process exited with exit
code 2)

Program terminated with signal 11, Segmentation fault.
#0 0xb785a1fa in g_intbig_union ()
from
/buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function g_intbig_union:
<SNIP UNRELATED ASSEMBLY CODE>
0xb785a087 <+31>: call 0xb785a08c <g_intbig_union+36>
0xb785a08c <+36>: pop eax
0xb785a08d <+37>: add eax,0x6f67
0xb785a092 <+42>: mov DWORD PTR [esp+0x104],eax
<SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
0xb785a1e5 <+381>: mov eax,DWORD PTR [esp+0x104]
0xb785a1ec <+388>: mov esi,DWORD PTR [eax-0x24]
0xb785a1f2 <+394>: mov DWORD PTR [esp+0x108],ebx
0xb785a1f9 <+401>: push ebx
=> 0xb785a1fa <+402>: push DWORD PTR [esi]
0xb785a1fc <+404>: mov DWORD PTR [esp+0x104],ebx
0xb785a203 <+411>: mov ebx,eax
0xb785a205 <+413>: call 0xb7853ef0 <MemoryContextAlloc@plt>

Looks pretty familiar, right?

Still, I have no idea why adding an include would cause issues accessing
CurrentMemoryContext. But at least we're not blaming offsetof or cube
anymore...

#14

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Jeremy Drake (#13)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake <pgsql@jdrake.com> writes:

... I'm pretty sure that what it crashed on was
attempting to access the global external variable CurrentMemoryContext.

Ah-hah, good insight!

The odd thing is, that the disassembly code between the working and
non-working was the same, except for the offsets.

The code seems to be fetching a pointer to CurrentMemoryContext from a
PC-relative location; presumably that's a literal that the dynamic
linker is supposed to update at shlib load time. I guess that pointer
is not correctly computed in the other case, or else it's fetching the
wrong pointer value.

Still, I have no idea why adding an include would cause issues accessing
CurrentMemoryContext.

Me either, but at least it's something to work from. You might try
diffing the working and non-working -E output from cube.c to see if
there are any changes that obviously affect CurrentMemoryContext.

regards, tom lane

#15

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Jeremy Drake (#13)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Did you test with a lower optimization level? Gentoo is notorious for
buggy/bleeding edge tools.

---------------------------------------------------------------------------

Jeremy Drake wrote:

On Fri, 2 Sep 2011, Tom Lane wrote:

Yeah, so the next question would be why those other ones aren't showing
problems. But at least now we have a potential mechanism for getting
from "the include list changed" to "cube is crashing on an offsetof",
namely that something is affecting the expansion of the offsetof macro.
Up to now it's been black magic, and I don't like patching around
problems we don't understand any better than that.

I'm going to try answering about 3 emails at once here, so I apologize for
the confusion.

Checking preprocessor output on cube_f8_f8 between working and broken, the
output was identical for that function.

Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
behavior.

Then I started investigating the disassembly. The PIC disassembly was
making my head hurt (why is the compiler calling the next instruction,
popping the return address into a register, and using it as a pointer?
Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
attempting to access the global external variable CurrentMemoryContext.
The odd thing is, that the disassembly code between the working and
non-working was the same, except for the offsets.

Here's the disassembly output from the core dump:
Program terminated with signal 11, Segmentation fault.
#0 cube_f8_f8 () at cube.c:1435
1435 size = offsetof(NDBOX, x) +sizeof(double) * 2;
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function cube_f8_f8:
0xb776ab50 <+0>: push ebp
0xb776ab51 <+1>: mov ebp,esp
0xb776ab53 <+3>: and esp,0xfffffff8
0xb776ab56 <+6>: push ebx
0xb776ab57 <+7>: sub esp,0x14
0xb776ab5a <+10>: mov ecx,DWORD PTR [ebp+0x8]
0xb776ab5d <+13>: mov DWORD PTR [esp+0x10],ebx
0xb776ab61 <+17>: mov edx,DWORD PTR [ecx+0x14]
0xb776ab64 <+20>: movsd xmm0,QWORD PTR [edx]
0xb776ab68 <+24>: mov edx,DWORD PTR [ecx+0x18]
0xb776ab6b <+27>: movsd xmm1,QWORD PTR [edx]
0xb776ab6f <+31>: movsd QWORD PTR [esp],xmm0
0xb776ab74 <+36>: movsd QWORD PTR [esp+0x8],xmm1
0xb776ab7a <+42>: push 0x18
0xb776ab7c <+44>: call 0xb776ab81 <cube_f8_f8+49>
0xb776ab81 <+49>: pop eax
0xb776ab82 <+50>: add eax,0x9472
0xb776ab87 <+55>: mov edx,DWORD PTR [eax-0x58]
=> 0xb776ab8d <+61>: push DWORD PTR [edx]
0xb776ab8f <+63>: mov DWORD PTR [esp+0x18],ebx
0xb776ab93 <+67>: mov ebx,eax
0xb776ab95 <+69>: call 0xb776745c <MemoryContextAllocZero@plt>

With this knowledge, I thought maybe cube wasn't the actual problem, but
just happened to be early in the list of contrib modules being tested. So
I arbitrarily picked another contrib module to test, intarray.

============== running regression test queries ==============
test _int ... FAILED (test process exited with exit
code 2)

Program terminated with signal 11, Segmentation fault.
#0 0xb785a1fa in g_intbig_union ()
from
/buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function g_intbig_union:
<SNIP UNRELATED ASSEMBLY CODE>
0xb785a087 <+31>: call 0xb785a08c <g_intbig_union+36>
0xb785a08c <+36>: pop eax
0xb785a08d <+37>: add eax,0x6f67
0xb785a092 <+42>: mov DWORD PTR [esp+0x104],eax
<SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
0xb785a1e5 <+381>: mov eax,DWORD PTR [esp+0x104]
0xb785a1ec <+388>: mov esi,DWORD PTR [eax-0x24]
0xb785a1f2 <+394>: mov DWORD PTR [esp+0x108],ebx
0xb785a1f9 <+401>: push ebx
=> 0xb785a1fa <+402>: push DWORD PTR [esi]
0xb785a1fc <+404>: mov DWORD PTR [esp+0x104],ebx
0xb785a203 <+411>: mov ebx,eax
0xb785a205 <+413>: call 0xb7853ef0 <MemoryContextAlloc@plt>

Looks pretty familiar, right?

Still, I have no idea why adding an include would cause issues accessing
CurrentMemoryContext. But at least we're not blaming offsetof or cube
anymore...

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#16

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Tom Lane (#14)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Tom Lane wrote:

Jeremy Drake <pgsql@jdrake.com> writes:

... I'm pretty sure that what it crashed on was
attempting to access the global external variable CurrentMemoryContext.

Ah-hah, good insight!

The odd thing is, that the disassembly code between the working and
non-working was the same, except for the offsets.

The code seems to be fetching a pointer to CurrentMemoryContext from a
PC-relative location; presumably that's a literal that the dynamic
linker is supposed to update at shlib load time. I guess that pointer
is not correctly computed in the other case, or else it's fetching the
wrong pointer value.

This would explain why the regular regression test work but the /contrib
modules, which do dynamic loading, do not. Good to know the problem is
more the contrib/cube. FYI, I noticed in the contrib/cube failure that
palloc0() was the next line after the reported crash line. Are the
contrib's crashing on the first access of any backend/DLL function?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#17

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Alvaro Herrera (#12)

Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Alvaro Herrera wrote:

Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:

Wow, that is interesting. So the problem is the inclusion of
replication/walsender.h in xlog.h. Hard to see how that could cause the
cube regression tests to fail, but of course, it is.

Hmm, so you included walsender.h into xlog.h? That seems a bit funny
considering that walsender.h already includes xlog.h. It seems the
reason for this is only the AllowCascadeReplication() definition. Maybe
that should go elsewhere instead, for example walsender.h?

Moved to walsender.h. Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#18

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Bruce Momjian (#17)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Bruce Momjian <bruce@momjian.us> writes:

Alvaro Herrera wrote:

Hmm, so you included walsender.h into xlog.h? That seems a bit funny
considering that walsender.h already includes xlog.h. It seems the
reason for this is only the AllowCascadeReplication() definition. Maybe
that should go elsewhere instead, for example walsender.h?

Moved to walsender.h. Thanks.

You seem to have entirely missed the point of Alvaro's remark, which is
that you've got xlog.h including walsender.h (still) as well as
walsender.h including xlog.h. That's broken.

regards, tom lane

#19

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Tom Lane (#18)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Alvaro Herrera wrote:

Hmm, so you included walsender.h into xlog.h? That seems a bit funny
considering that walsender.h already includes xlog.h. It seems the
reason for this is only the AllowCascadeReplication() definition. Maybe
that should go elsewhere instead, for example walsender.h?

Moved to walsender.h. Thanks.

You seem to have entirely missed the point of Alvaro's remark, which is
that you've got xlog.h including walsender.h (still) as well as
walsender.h including xlog.h. That's broken.

Oh, OK, done. xlog.h removed from walsender.h and tested.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#20

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Bruce Momjian (#19)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

You seem to have entirely missed the point of Alvaro's remark, which is
that you've got xlog.h including walsender.h (still) as well as
walsender.h including xlog.h. That's broken.

Oh, OK, done. xlog.h removed from walsender.h and tested.

I would have gone the other way on that one, if possible. Seems like
xlog.h ought to be the lower-level file.

Some quick mechanical analysis shows that's not the only circular
inclusion path in our headers, either: we have storage/proc.h including
replication/syncrep.h which includes storage/proc.h. That one appears
to be Simon's fault not yours though.

regards, tom lane

#21

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Tom Lane (#20)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

You seem to have entirely missed the point of Alvaro's remark, which is
that you've got xlog.h including walsender.h (still) as well as
walsender.h including xlog.h. That's broken.

Oh, OK, done. xlog.h removed from walsender.h and tested.

I would have gone the other way on that one, if possible. Seems like
xlog.h ought to be the lower-level file.

Agreed. Let me work on that.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#22

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Bruce Momjian (#21)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

I would have gone the other way on that one, if possible. Seems like
xlog.h ought to be the lower-level file.

Agreed. Let me work on that.

Uh, I just did it. Painful. It would have been a lot easier before
the pgrminclude run, because that baked-in a whole lot of bad decisions.

regards, tom lane

#23

Jeremy Drake

pgsql@jdrake.com

over 14 years ago

In reply to: Bruce Momjian (#10)

Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Sun, 4 Sep 2011, Tom Lane wrote:

Jeremy Drake <jeremyd@jdrake.com> writes:

I didn't see any changes that looked like they affected
CurrentMemoryContext, but I attached the compressed context diff in case
you want to look at it.

Right now I have a feeling that this is a compiler bug.

That's my feeling, also.

Don't know
whether you have the interest/energy to try to reduce it to a reportable
test case.

If you mean reporting it to the compiler vendor (Intel), I doubt that
would be worthwhile. The version of the compiler on this machine is very
out of date. It is version 9.0 20060222. I would bet that if I did track
down and report an issue in a 5-year-old compiler version, their first
question would be, does the issue duplicate in the current version. Given
that my other buildfarm member is running 11.1 20100414 (albeit for the
x64 platform instead of the x86) and had no issue, I would expect that the
current x86 version would also have no problem.

I have intentionally been keeping the compiler versions on my buildfarm
members pretty much fixed, for the benefit of reproducable results.
However, I would be interested in hearing any guidelines on "how old is
too old" for buildfarm member versions.

Import Notes

Reply to msg id not found: 4012.1315118833@sss.pgh.pa.us

#24

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Jeremy Drake (#23)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake <pgsql@jdrake.com> writes:

On Sun, 4 Sep 2011, Tom Lane wrote:

Right now I have a feeling that this is a compiler bug.

That's my feeling, also.

Don't know
whether you have the interest/energy to try to reduce it to a reportable
test case.

If you mean reporting it to the compiler vendor (Intel), I doubt that
would be worthwhile. The version of the compiler on this machine is very
out of date. It is version 9.0 20060222. I would bet that if I did track
down and report an issue in a 5-year-old compiler version, their first
question would be, does the issue duplicate in the current version. Given
that my other buildfarm member is running 11.1 20100414 (albeit for the
x64 platform instead of the x86) and had no issue, I would expect that the
current x86 version would also have no problem.

I have intentionally been keeping the compiler versions on my buildfarm
members pretty much fixed, for the benefit of reproducable results.
However, I would be interested in hearing any guidelines on "how old is
too old" for buildfarm member versions.

Well, I'm still running this on one development machine:

$ gcc -v
Reading specs from /usr/local/lib/gcc-lib/hppa2.0-hp-hpux10.20/2.95.3/specs
gcc version 2.95.3 20010315 (release)

so I'm not in the "it's too old" camp. My perspective on tool bugs
is whether there is something (a) well defined and (b) not costly
that we can do to work around it. So far, this issue is failing
test (a) ... we know that one header inclusion order triggers the
crash and another not, but there is no theory as to why.

What I would suggest is to see whether a more recent x86 version shows
the problem or not. If not, let's just write it off as an already-fixed
compiler bug.

regards, tom lane

#25

Jeremy Drake

pgsql@jdrake.com

over 14 years ago

In reply to: Tom Lane (#24)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Sun, 4 Sep 2011, Tom Lane wrote:

What I would suggest is to see whether a more recent x86 version shows
the problem or not. If not, let's just write it off as an already-fixed
compiler bug.

I have installed the most recent version in the home directory of a
purpose-made user on that machine.

I did
git checkout 6416a82a62db4e66b2edb0fa8fc83a580c3f1931
to get a revision I knew was right in the broken range for mongoose.

Apparently they deprecated one of my compiler flags: -xN (N is for
Nocona), seems they renamed it to -xSSE2. Since this is a one-off run, I
ignored that warning.

The result is no crash in the cube test.

I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
and if the issue duplicates there, I can see about setting up SSH access
if anyone is still interested in investigating this further.

#26

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Jeremy Drake (#25)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake wrote:

On Sun, 4 Sep 2011, Tom Lane wrote:

What I would suggest is to see whether a more recent x86 version shows
the problem or not. If not, let's just write it off as an already-fixed
compiler bug.

I have installed the most recent version in the home directory of a
purpose-made user on that machine.

configure:3252: icc --version >&5
icc (ICC) 12.0.5 20110719
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.

I did
git checkout 6416a82a62db4e66b2edb0fa8fc83a580c3f1931
to get a revision I knew was right in the broken range for mongoose.

Apparently they deprecated one of my compiler flags: -xN (N is for
Nocona), seems they renamed it to -xSSE2. Since this is a one-off run, I
ignored that warning.

The result is no crash in the cube test.

I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
and if the issue duplicates there, I can see about setting up SSH access
if anyone is still interested in investigating this further.

What would we investigate except a compiler bug?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#27

Jeremy Drake

pgsql@jdrake.com

over 14 years ago

In reply to: Bruce Momjian (#26)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

On Mon, 5 Sep 2011, Bruce Momjian wrote:

Jeremy Drake wrote:

I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
and if the issue duplicates there, I can see about setting up SSH access
if anyone is still interested in investigating this further.

What would we investigate except a compiler bug?

To me, simply chalking it up to some uncharacterized compiler bug is still
quite a bit of black magic.

But, if that explanation is good enough for you, I've certainly got
better things to do with my holiday than spending time on this :)

#28

Bruce Momjian

bruce@momjian.us

over 14 years ago

In reply to: Jeremy Drake (#27)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake wrote:

On Mon, 5 Sep 2011, Bruce Momjian wrote:

Jeremy Drake wrote:

I think tomorrow I'll try to get the 9.0 compiler set up on a clean VM,
and if the issue duplicates there, I can see about setting up SSH access
if anyone is still interested in investigating this further.

What would we investigate except a compiler bug?

To me, simply chalking it up to some uncharacterized compiler bug is still
quite a bit of black magic.

But, if that explanation is good enough for you, I've certainly got
better things to do with my holiday than spending time on this :)

If the underlying tools are buggy, the system can't be reliable. We
can't invest time to track down every compiler bug, especially when
there are later compiler versions available.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#29

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Jeremy Drake (#27)

Re: Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Jeremy Drake <pgsql@jdrake.com> writes:

On Mon, 5 Sep 2011, Bruce Momjian wrote:

What would we investigate except a compiler bug?

To me, simply chalking it up to some uncharacterized compiler bug is still
quite a bit of black magic.

If there were some reason to believe either that it wasn't a compiler
bug, or that there would be something reasonable we could do to work
around it, then I'd be interested in pressing further. But on the
strength of what we have now, neither of those things seem real likely.
I'm with Bruce on thinking that it's probably not going to repay further
effort.

regards, tom lane

#30

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Alvaro Herrera (#12)

1 attachment(s)

Re: [COMMITTERS] pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Excerpts from Alvaro Herrera's message of vie sep 02 13:53:12 -0300 2011:

I wonder if there should be a new header, something like
walsender_internal.h, for stuff like WalSnd and WalSndCtlData struct
defs.

... as in the attached patch.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

0001-Split-walsender.h-taking-out-private-declarations.patchapplication/octet-stream; name=0001-Split-walsender.h-taking-out-private-declarations.patchDownload

From 5dfb9b91fcfe13bf28b4635da8cf578199b5d206 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon, 12 Sep 2011 15:24:29 -0300
Subject: [PATCH] Split walsender.h, taking out private declarations

This dramatically cuts short the number of headers this one brings into
whatever includes it.
---
 src/backend/replication/basebackup.c        |    1 +
 src/backend/replication/repl_gram.y         |    1 +
 src/backend/replication/syncrep.c           |    1 +
 src/backend/replication/walsender.c         |    1 +
 src/include/replication/walsender.h         |   93 +----------------------
 src/include/replication/walsender_private.h |  109 +++++++++++++++++++++++++++
 6 files changed, 115 insertions(+), 91 deletions(-)
 create mode 100644 src/include/replication/walsender_private.h

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 785b999..4841095 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -25,6 +25,7 @@
 #include "nodes/pg_list.h"
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
+#include "replication/walsender_private.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 70b34ff..b3fd051 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -19,6 +19,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/replnodes.h"
 #include "replication/walsender.h"
+#include "replication/walsender_private.h"
 
 
 /* Result of the parsing is returned here */
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 542bdf2..95de6c7 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -49,6 +49,7 @@
 #include "miscadmin.h"
 #include "replication/syncrep.h"
 #include "replication/walsender.h"
+#include "replication/walsender_private.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 6e1d82a..5d1c518 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -51,6 +51,7 @@
 #include "replication/walprotocol.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
+#include "replication/walsender_private.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/pmsignal.h"
diff --git a/src/include/replication/walsender.h b/src/include/replication/walsender.h
index a386ea9..f5ea9e2 100644
--- a/src/include/replication/walsender.h
+++ b/src/include/replication/walsender.h
@@ -12,84 +12,9 @@
 #ifndef _WALSENDER_H
 #define _WALSENDER_H
 
-#include "access/xlog.h"
-#include "fmgr.h"
-#include "nodes/nodes.h"
-#include "storage/latch.h"
-#include "storage/shmem.h"
-#include "storage/spin.h"
-
-typedef enum WalSndState
-{
-	WALSNDSTATE_STARTUP = 0,
-	WALSNDSTATE_BACKUP,
-	WALSNDSTATE_CATCHUP,
-	WALSNDSTATE_STREAMING
-} WalSndState;
-
-/*
- * Each walsender has a WalSnd struct in shared memory.
- */
-typedef struct WalSnd
-{
-	pid_t		pid;			/* this walsender's process id, or 0 */
-	WalSndState state;			/* this walsender's state */
-	XLogRecPtr	sentPtr;		/* WAL has been sent up to this point */
-	bool		needreload;		/* does currently-open file need to be reloaded? */
-
-	/*
-	 * The xlog locations that have been written, flushed, and applied by
-	 * standby-side. These may be invalid if the standby-side has not offered
-	 * values yet.
-	 */
-	XLogRecPtr	write;
-	XLogRecPtr	flush;
-	XLogRecPtr	apply;
-
-	/* Protects shared variables shown above. */
-	slock_t		mutex;
-
-	/*
-	 * Latch used by backends to wake up this walsender when it has work to
-	 * do.
-	 */
-	Latch		latch;
-
-	/*
-	 * The priority order of the standby managed by this WALSender, as listed
-	 * in synchronous_standby_names, or 0 if not-listed. Protected by
-	 * SyncRepLock.
-	 */
-	int			sync_standby_priority;
-} WalSnd;
-
-extern WalSnd *MyWalSnd;
+#include <signal.h>
 
-/* There is one WalSndCtl struct for the whole database cluster */
-typedef struct
-{
-	/*
-	 * Synchronous replication queue. Protected by SyncRepLock.
-	 */
-	SHM_QUEUE	SyncRepQueue;
-
-	/*
-	 * Current location of the head of the queue. All waiters should have a
-	 * waitLSN that follows this value. Protected by SyncRepLock.
-	 */
-	XLogRecPtr	lsn;
-
-	/*
-	 * Are any sync standbys defined?  Waiting backends can't reload the
-	 * config file safely, so WAL writer updates this value as needed.
-	 * Protected by SyncRepLock.
-	 */
-	bool		sync_standbys_defined;
-
-	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
-} WalSndCtlData;
-
-extern WalSndCtlData *WalSndCtl;
+#include "fmgr.h"
 
 /* global state */
 extern bool am_walsender;
@@ -106,22 +31,8 @@ extern void WalSndSignals(void);
 extern Size WalSndShmemSize(void);
 extern void WalSndShmemInit(void);
 extern void WalSndWakeup(void);
-extern void WalSndSetState(WalSndState state);
-extern void XLogRead(char *buf, XLogRecPtr startptr, Size count);
 extern void WalSndRqstFileReload(void);
 
 extern Datum pg_stat_get_wal_senders(PG_FUNCTION_ARGS);
 
-/*
- * Internal functions for parsing the replication grammar, in repl_gram.y and
- * repl_scanner.l
- */
-extern int	replication_yyparse(void);
-extern int	replication_yylex(void);
-extern void replication_yyerror(const char *str);
-extern void replication_scanner_init(const char *query_string);
-extern void replication_scanner_finish(void);
-
-extern Node *replication_parse_result;
-
 #endif   /* _WALSENDER_H */
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
new file mode 100644
index 0000000..be7a341
--- /dev/null
+++ b/src/include/replication/walsender_private.h
@@ -0,0 +1,109 @@
+/*-------------------------------------------------------------------------
+ *
+ * walsender_private.h
+ *	  Private definitions from replication/walsender.c.
+ *
+ * Portions Copyright (c) 2010-2011, PostgreSQL Global Development Group
+ *
+ * src/include/replication/walsender_private.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef _WALSENDER_PRIVATE_H
+#define _WALSENDER_PRIVATE_H
+
+#include "access/xlog.h"
+#include "nodes/nodes.h"
+#include "storage/latch.h"
+#include "storage/shmem.h"
+#include "storage/spin.h"
+
+typedef enum WalSndState
+{
+	WALSNDSTATE_STARTUP = 0,
+	WALSNDSTATE_BACKUP,
+	WALSNDSTATE_CATCHUP,
+	WALSNDSTATE_STREAMING
+} WalSndState;
+
+/*
+ * Each walsender has a WalSnd struct in shared memory.
+ */
+typedef struct WalSnd
+{
+	pid_t		pid;			/* this walsender's process id, or 0 */
+	WalSndState state;			/* this walsender's state */
+	XLogRecPtr	sentPtr;		/* WAL has been sent up to this point */
+	bool		needreload;		/* does currently-open file need to be reloaded? */
+
+	/*
+	 * The xlog locations that have been written, flushed, and applied by
+	 * standby-side. These may be invalid if the standby-side has not offered
+	 * values yet.
+	 */
+	XLogRecPtr	write;
+	XLogRecPtr	flush;
+	XLogRecPtr	apply;
+
+	/* Protects shared variables shown above. */
+	slock_t		mutex;
+
+	/*
+	 * Latch used by backends to wake up this walsender when it has work to
+	 * do.
+	 */
+	Latch		latch;
+
+	/*
+	 * The priority order of the standby managed by this WALSender, as listed
+	 * in synchronous_standby_names, or 0 if not-listed. Protected by
+	 * SyncRepLock.
+	 */
+	int			sync_standby_priority;
+} WalSnd;
+
+extern WalSnd *MyWalSnd;
+
+/* There is one WalSndCtl struct for the whole database cluster */
+typedef struct
+{
+	/*
+	 * Synchronous replication queue. Protected by SyncRepLock.
+	 */
+	SHM_QUEUE	SyncRepQueue;
+
+	/*
+	 * Current location of the head of the queue. All waiters should have a
+	 * waitLSN that follows this value. Protected by SyncRepLock.
+	 */
+	XLogRecPtr	lsn;
+
+	/*
+	 * Are any sync standbys defined?  Waiting backends can't reload the
+	 * config file safely, so WAL writer updates this value as needed.
+	 * Protected by SyncRepLock.
+	 */
+	bool		sync_standbys_defined;
+
+	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
+} WalSndCtlData;
+
+extern WalSndCtlData *WalSndCtl;
+
+
+extern void WalSndSetState(WalSndState state);
+extern void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+
+/*
+ * Internal functions for parsing the replication grammar, in repl_gram.y and
+ * repl_scanner.l
+ */
+extern int	replication_yyparse(void);
+extern int	replication_yylex(void);
+extern void replication_yyerror(const char *str);
+extern void replication_scanner_init(const char *query_string);
+extern void replication_scanner_finish(void);
+
+extern Node *replication_parse_result;
+
+#endif   /* _WALSENDER_PRIVATE_H */
-- 
1.7.5.4