windows regression failure - prepared xacts

Started by Andrew Dunstanover 20 years ago15 messages
#1Andrew Dunstan
andrew@dunslane.net

I am consistently seeing the regression failure shown below on my
Windows machine. See
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13

(On the plus side, I am now building happily and passing regression
tests with ASPerl, and hope to add ASPython and ASTcl to the list shortly).

cheers

andrew

================== pgsql.2072/src/test/regress/regression.diffs ===================
*** ./expected/prepared_xacts.out	Thu Jul  7 09:55:18 2005
--- ./results/prepared_xacts.out	Thu Jul  7 10:20:37 2005
***************
*** 179,189 ****
  -- Commit table creation
  COMMIT PREPARED 'regress-one';
  \d pxtest2
!     Table "public.pxtest2"
!  Column |  Type   | Modifiers 
! --------+---------+-----------
!  a      | integer | 
! 
  SELECT * FROM pxtest2;
   a 
  ---
--- 179,185 ----
  -- Commit table creation
  COMMIT PREPARED 'regress-one';
  \d pxtest2
! ERROR:  cache lookup failed for relation 27240
  SELECT * FROM pxtest2;
   a 
  ---

======================================================================

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#1)
Re: windows regression failure - prepared xacts

I never got a reply to this, but I am still seeing it from time to time
- twice today in fact. Any suggestions?

cheers

andrew

Andrew Dunstan wrote:

Show quoted text

I am consistently seeing the regression failure shown below on my
Windows machine. See
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13

================== pgsql.2072/src/test/regress/regression.diffs 
===================
*** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
--- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
***************
*** 179,189 ****
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
!     Table "public.pxtest2"
!  Column |  Type   | Modifiers ! --------+---------+-----------
!  a      | integer | !  SELECT * FROM pxtest2;
a  ---
--- 179,185 ----
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
! ERROR:  cache lookup failed for relation 27240
SELECT * FROM pxtest2;
a  ---

======================================================================

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#2)
Re: windows regression failure - prepared xacts

Andrew Dunstan <andrew@dunslane.net> writes:

I never got a reply to this, but I am still seeing it from time to time
- twice today in fact. Any suggestions?

I've been puzzled by that too. It seems to indicate that the syscache
inval message that the COMMIT should send is either not getting sent at
all, or is being processed too late. Neither of these ideas seems very
promising, especially considering that we're looking at a single backend
as both source and recipient of the message --- a race condition doesn't
seem credible. And there's nothing very platform-specific in that code
either. (I tried for awhile to explain it as some kind of deficiency
in the signal emulation we use on Windows, but there's no signals used
for normal sinval processing, so that doesn't seem to hold water.)

Are we sure that it only happens on Windows? Anyone else seen a similar
failure in the prepared_xacts test?

*** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
--- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
***************
*** 179,189 ****
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
!     Table "public.pxtest2"
!  Column |  Type   | Modifiers ! --------+---------+-----------
!  a      | integer | !  SELECT * FROM pxtest2;
a  ---
--- 179,185 ----
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
! ERROR:  cache lookup failed for relation 27240
SELECT * FROM pxtest2;
a  ---

regards, tom lane

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#3)
Re: windows regression failure - prepared xacts

further (anecdotal) data point: I have usually seen this after doing a
number of builds. Rebooting seems to cure the problem (and that's
happened today agin - I have just seen 2 builds work). Maybe some sort
of strange shmem corruption?

cheers

andrew

Tom Lane wrote:

Show quoted text

Andrew Dunstan <andrew@dunslane.net> writes:

I never got a reply to this, but I am still seeing it from time to time
- twice today in fact. Any suggestions?

I've been puzzled by that too. It seems to indicate that the syscache
inval message that the COMMIT should send is either not getting sent at
all, or is being processed too late. Neither of these ideas seems very
promising, especially considering that we're looking at a single backend
as both source and recipient of the message --- a race condition doesn't
seem credible. And there's nothing very platform-specific in that code
either. (I tried for awhile to explain it as some kind of deficiency
in the signal emulation we use on Windows, but there's no signals used
for normal sinval processing, so that doesn't seem to hold water.)

Are we sure that it only happens on Windows? Anyone else seen a similar
failure in the prepared_xacts test?

*** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
--- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
***************
*** 179,189 ****
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
!     Table "public.pxtest2"
!  Column |  Type   | Modifiers ! --------+---------+-----------
!  a      | integer | !  SELECT * FROM pxtest2;
a  ---
--- 179,185 ----
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
! ERROR:  cache lookup failed for relation 27240
SELECT * FROM pxtest2;
a  ---

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#4)
Re: windows regression failure - prepared xacts

Andrew Dunstan <andrew@dunslane.net> writes:

further (anecdotal) data point: I have usually seen this after doing a
number of builds. Rebooting seems to cure the problem (and that's
happened today agin - I have just seen 2 builds work). Maybe some sort
of strange shmem corruption?

Hmmm ... that still doesn't make any sense, given that the test is
being run on a freshly started postmaster. Unless it's a hardware
problem? Have you seen this on more than one machine?

regards, tom lane

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: windows regression failure - prepared xacts

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

further (anecdotal) data point: I have usually seen this after doing a
number of builds. Rebooting seems to cure the problem (and that's
happened today agin - I have just seen 2 builds work). Maybe some sort
of strange shmem corruption?

Hmmm ... that still doesn't make any sense, given that the test is
being run on a freshly started postmaster. Unless it's a hardware
problem? Have you seen this on more than one machine?

No :-( But I find it hard to believe that a hardware failure would lead
to this precise error repeatedly. Stranger things have happened, I guess.

ON a related note, we need more Windows boxes on the buildfarm - ideally
living in a data center somewhere so we can automate builds, rather than
relying on my laptop and Jim's Windows box which seems to build
intermittently.

cheers

andrew

#7Dave Page
dpage@vale-housing.co.uk
In reply to: Andrew Dunstan (#6)
Re: windows regression failure - prepared xacts

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Andrew Dunstan
Sent: 14 July 2005 13:35
To: Tom Lane
Cc: PostgreSQL-development
Subject: Re: [HACKERS] windows regression failure - prepared xacts

ON a related note, we need more Windows boxes on the
buildfarm - ideally
living in a data center somewhere so we can automate builds,
rather than
relying on my laptop and Jim's Windows box which seems to build
intermittently.

I might be able to help out there. What's required to be part of the
build farm?

Regards, Dave.

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Dave Page (#7)
Re: windows regression failure - prepared xacts

Dave Page wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Andrew Dunstan
Sent: 14 July 2005 13:35
To: Tom Lane
Cc: PostgreSQL-development
Subject: Re: [HACKERS] windows regression failure - prepared xacts

ON a related note, we need more Windows boxes on the
buildfarm - ideally
living in a data center somewhere so we can automate builds,
rather than
relying on my laptop and Jim's Windows box which seems to build
intermittently.

I might be able to help out there.

Excellent.

What's required to be part of the
build farm?

Short answer:

. your box will need to be able to contact http://www.pgbuildfarm.org
either directly or via proxy, and it wiull need access to a CVS repo,
either the one at postgresql.org or a mirror (you can set up your own
mirror using CSVup on a linux or FBSD box).
. have a working postgresql build environment for your platform (for
Windows this means MSys/MinGW with the libz and libintl stuff, and
ideally native Python and Tcl).
. Windows only: you will need a native perl installed as well as the one
in the MSys DTK. The one from ActiveState works fine.
. download and unpack the latest release of client code from
http://pgfoundry.org/frs/?group_id=1000040
. read instructions at
http://pgfoundry.org/docman/view.php/1000040/4/PGBuildFarm-HOWTO.txt
. get the software running locally using flags --force--nostatus --nosend
. register your machine at http://www.pgbuildfarm.org/register.html
. when you receive credentials, put them in the config file, and
schedule regular builds (without those flags) for the branches you want
to support. (For Windows that should be HEAD and optionally REL8_0_STABLE).

Feel free to ask me questions if anything isn't clear.

There's a short description of how it works at
http://www.onlamp.com/pub/a/onlamp/2005/02/24/pg_buildfarm.html

cheers

andrew

#9Dave Page
dpage@vale-housing.co.uk
In reply to: Andrew Dunstan (#8)
Re: windows regression failure - prepared xacts

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: 14 July 2005 14:36
To: Dave Page
Cc: Tom Lane; PostgreSQL-development
Subject: Re: [HACKERS] windows regression failure - prepared xacts

Short answer:

. your box will need to be able to contact http://www.pgbuildfarm.org
either directly or via proxy, and it wiull need access to a CVS repo,
either the one at postgresql.org or a mirror (you can set up your own
mirror using CSVup on a linux or FBSD box).

Right, that should be OK. As long as you don't need access /to/ the box.

. have a working postgresql build environment for your platform (for
Windows this means MSys/MinGW with the libz and libintl stuff, and
ideally native Python and Tcl).
. Windows only: you will need a native perl installed as well
as the one
in the MSys DTK. The one from ActiveState works fine.

Yep, no problem there. Well, I say that - I find that if I have the DTK
perl installed, --with-perl fails miserably on my laptop, so I normally
only have ActiveState installed.

. download and unpack the latest release of client code from
http://pgfoundry.org/frs/?group_id=1000040
. read instructions at
http://pgfoundry.org/docman/view.php/1000040/4/PGBuildFarm-HOWTO.txt
. get the software running locally using flags
--force--nostatus --nosend
. register your machine at http://www.pgbuildfarm.org/register.html
. when you receive credentials, put them in the config file, and
schedule regular builds (without those flags) for the
branches you want
to support. (For Windows that should be HEAD and optionally
REL8_0_STABLE).

Feel free to ask me questions if anything isn't clear.

There's a short description of how it works at
http://www.onlamp.com/pub/a/onlamp/2005/02/24/pg_buildfarm.html

OK. I'll have to run it past one of my colleagues (who is out until
Monday) as he technically 'owns' our Windows dev server. It will be a
2K3 Server in case you're interested.

I'll let you know either way.

Regards, Dave

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Dave Page (#9)
Re: windows regression failure - prepared xacts

Dave Page wrote:

Short answer:

. your box will need to be able to contact http://www.pgbuildfarm.org
either directly or via proxy, and it wiull need access to a CVS repo,
either the one at postgresql.org or a mirror (you can set up your own
mirror using CSVup on a linux or FBSD box).

Right, that should be OK. As long as you don't need access /to/ the box.

No. You own the box, and you run it. I never touch it.

. Windows only: you will need a native perl installed as well
as the one
in the MSys DTK. The one from ActiveState works fine.

Yep, no problem there. Well, I say that - I find that if I have the DTK
perl installed, --with-perl fails miserably on my laptop, so I normally
only have ActiveState installed.

For buildfarm you need both, as the main buildfarm script needs to run
under the DTK perl, while the auxiliary script that does the web txn
needs to run under native perl.

It all works quite happily - the trick is that in the buildfarm config
you say something like this:

'build_env' => {
PATH => "/c/tcl/bin:/c/python24:/c/perl/bin/:$ENV{PATH}",
}

Then you will build with the native perl while running under the DTK perl.

OK. I'll have to run it past one of my colleagues (who is out until
Monday) as he technically 'owns' our Windows dev server. It will be a
2K3 Server in case you're interested.

I'll let you know either way.

Thanks

andrew

#11Dave Page
dpage@vale-housing.co.uk
In reply to: Andrew Dunstan (#10)
Re: windows regression failure - prepared xacts

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: 14 July 2005 15:17
To: Dave Page
Cc: Tom Lane; PostgreSQL-development
Subject: Re: [HACKERS] windows regression failure - prepared xacts

Yep, no problem there. Well, I say that - I find that if I

have the DTK

perl installed, --with-perl fails miserably on my laptop, so

I normally

only have ActiveState installed.

For buildfarm you need both, as the main buildfarm script
needs to run
under the DTK perl, while the auxiliary script that does the web txn
needs to run under native perl.

It all works quite happily - the trick is that in the
buildfarm config
you say something like this:

'build_env' => {
PATH =>
"/c/tcl/bin:/c/python24:/c/perl/bin/:$ENV{PATH}",
}

Then you will build with the native perl while running under
the DTK perl.

Right. I'll sort it out - it's going to become a PITA real soon now
anyway as this release I will be doing the pgInstaller builds again,
plus I'm working on Slony for which I need DTK's autoconf, so I can't
just uninstall DTK this time.

Regards, Dave.

#12Petr Jelinek
pjmodos@parba.cz
In reply to: Dave Page (#11)
Re: windows regression failure - prepared xacts

I did some testing and I think that I can confirm this - on my
workstation under Windows 2000 with latest CVS and gcc 3.2.3 it randomly
fails (it sometimes works and sometimes fails even with same binary) on
prepared_xacts and always fails on rules.

Tested 3 rebuilds and about 10 checks on each and I have about 50%
failure on prepared_xacts.
Using configure without any parameters, error is same as in you mail
(cache lookup failed ...).

I have another 2 windows machines with different hardware and different
windows versions (first is XP and second is Win 2003) I could do some
testing on them tomorrow if you are interested.

--
Regards
Petr Jelinek (PJMODOS)

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Petr Jelinek (#12)
Re: windows regression failure - prepared xacts

So it's not hardware and doesn't seem to be Windows version specific (my
tests were run on XPPro).

Looks like we have some major digging to do :-(

cheers

andrew

Petr Jelinek wrote:

Show quoted text

I did some testing and I think that I can confirm this - on my
workstation under Windows 2000 with latest CVS and gcc 3.2.3 it
randomly fails (it sometimes works and sometimes fails even with same
binary) on prepared_xacts and always fails on rules.

Tested 3 rebuilds and about 10 checks on each and I have about 50%
failure on prepared_xacts.
Using configure without any parameters, error is same as in you mail
(cache lookup failed ...).

I have another 2 windows machines with different hardware and
different windows versions (first is XP and second is Win 2003) I
could do some testing on them tomorrow if you are interested.

#14Petr Jelinek
pjmodos@parba.cz
In reply to: Andrew Dunstan (#13)
Re: The reason for loris' intermittent prepared_xacts failures

Andrew Dunstan wrote:

How things will work with that setting on a non-English locale box I
don't know - it might help if we had a Windows buildfarm member from
one of our colleagues in a non-English speaking country.

Actually my workstation is non-English (Czech) and it works (All 98
tests passed several times). It also fixed my rules test problem - it
always failed before.

About that Toms conclusion - I was trying to analyze it too but I lack
knowledge of Postgres internals so I didn't come up with any final
conclusion. I can however agree on few things - that OID isn't OID of
pxtest2 and it seems to happend only if concurent drop table occurs in
one of parralel tests. And I also agree with that PS.

On related note, I was also testing that signal emulation when analyzing
this and it seems to work without any errors (except for start of course
but thats known) - I think Tom had some doubts about it after what he
wrote in his first reply.

--
Regards
Petr Jelinek (PJMODOS)

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Petr Jelinek (#14)
Re: The reason for loris' intermittent prepared_xacts failures

Petr Jelinek <pjmodos@parba.cz> writes:

On related note, I was also testing that signal emulation when analyzing
this and it seems to work without any errors (except for start of course
but thats known) - I think Tom had some doubts about it after what he
wrote in his first reply.

It was just one of the first things that came to mind when wondering
"why does this only seem to happen on Windows?" At this point I'm
convinced that the answer is "because pg_regress sets up the locale
differently on Windows than anywhere else". But at the time I hadn't
suspected a locale issue ...

regards, tom lane