cvs head initdb hangs on unixware

Started by Olivier PRENANTover 17 years ago39 messageshackers
Jump to latest
#1Olivier PRENANT
ohp@pyrenet.fr

Hi all,

cvs head configured without --enable-debug hang in initdb while making
check.

warthog doesn't exhibit it because it's configured with debug.

when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for
it while creating template db.

According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.

If someone needs an access to a unixware machine, let me know.

regards,

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

#2Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Olivier PRENANT (#1)
Re: cvs head initdb hangs on unixware

Could you generate a core and send a stacktrace?

kill SIGABRT <pid> should do that.

Zdenek

ohp@pyrenet.fr napsal(a):

Show quoted text

Hi all,

cvs head configured without --enable-debug hang in initdb while making
check.

warthog doesn't exhibit it because it's configured with debug.

when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for
it while creating template db.

According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.

If someone needs an access to a unixware machine, let me know.

regards,

#3Olivier PRENANT
ohp@pyrenet.fr
In reply to: Zdenek Kotala (#2)
Re: cvs head initdb hangs on unixware

On Tue, 2 Dec 2008, Zdenek Kotala wrote:

Date: Tue, 02 Dec 2008 17:22:25 +0100
From: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

Could you generate a core and send a stacktrace?

kill SIGABRT <pid> should do that.

Zdenek

Hmm. No point doing it, it's not debug enabled, I'm afraid stack trace
won't show us anything usefull.

ohp@pyrenet.fr napsal(a):

Hi all,

cvs head configured without --enable-debug hang in initdb while making
check.

warthog doesn't exhibit it because it's configured with debug.

when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it
while creating template db.

According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.

If someone needs an access to a unixware machine, let me know.

regards,

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

#4Olivier PRENANT
ohp@pyrenet.fr
In reply to: Zdenek Kotala (#2)
Re: cvs head initdb hangs on unixware

On Tue, 2 Dec 2008, Zdenek Kotala wrote:

Date: Tue, 02 Dec 2008 17:22:25 +0100
From: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

Could you generate a core and send a stacktrace?

kill SIGABRT <pid> should do that.

Zdenek

Zdenek,

On second thought, I tried and got that:
Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0)
[0x81e6a97]
[1]: fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2]: fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 0xb4) [0x81e6385]
0xb4) [0x81e6385]
[3]: RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
[0x81e5a00]
[4]: RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[0x8099b59]
[5]: heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6]: simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310) [0x8097297]
[0x8097297]
[7]: InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[0x80cb210]
[8]: boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9]: BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10]: AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11]: main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12]: _start() [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is
fixable!

regards

ohp@pyrenet.fr napsal(a):

Hi all,

cvs head configured without --enable-debug hang in initdb while making
check.

warthog doesn't exhibit it because it's configured with debug.

when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it
while creating template db.

According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.

If someone needs an access to a unixware machine, let me know.

regards,

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

From pgsql-hackers-owner@postgresql.org Tue Dec 2 13:46:51 2008

Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id ED83C64FE0F
for <pgsql-hackers-postgresql.org@mail.postgresql.org>; Tue, 2 Dec 2008 13:46:50 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 83332-01
for <pgsql-hackers-postgresql.org@mail.postgresql.org>;
Tue, 2 Dec 2008 13:46:48 -0400 (AST)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.233])
by mail.postgresql.org (Postfix) with ESMTP id 0557464FD9F
for <pgsql-hackers@postgresql.org>; Tue, 2 Dec 2008 13:46:47 -0400 (AST)
Received: by rv-out-0506.google.com with SMTP id b25so2998730rvf.43
for <pgsql-hackers@postgresql.org>; Tue, 02 Dec 2008 09:46:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:received:message-id:date:from:to
:subject:cc:in-reply-to:mime-version:content-type
:content-transfer-encoding:content-disposition:references;
bh=aKIdYuz7B/SybfXN4yCNWHRV9RMbF3h1248u3XyI3cg=;
b=nzKv5HinM1zE5rJCm0fWGnb/OtP25JOLx7HcHoehFO5j5VNgyjuEXEcfwbQoQQNBBQ
fLZmY0jUzjAT+YH4C+j0nN23kbCsiEgLWFqu+LTnTUgSTfNQwdA4QjM5cvRwC/tQnWdG
VchslhVbBRHXzQ3uBB/qjDO3Vn3jGT9nD+muA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=message-id:date:from:to:subject:cc:in-reply-to:mime-version
:content-type:content-transfer-encoding:content-disposition
:references;
b=IxCKiF6Y4QgkUmSn1EAHTJibriYXjrGEpTFqWn8fWDgWVKMB8dazpIZYd5kH8/1BiF
c3+TGGrAHRTmzFow7DKTDxPMQDtVKbOkMOmnhWUO0rlq56a5rsWS03hqcbffz8OGdr7E
emB+yILNyH4LXHGseQUyW/IYSClgk+CE0jFHM=
Received: by 10.141.212.5 with SMTP id o5mr5852879rvq.247.1228240006866;
Tue, 02 Dec 2008 09:46:46 -0800 (PST)
Received: by 10.141.189.10 with HTTP; Tue, 2 Dec 2008 09:46:46 -0800 (PST)
Message-ID: <e08cc0400812020946i7c4c2afxf24a45e5a37c153@mail.gmail.com>
Date: Wed, 3 Dec 2008 02:46:46 +0900
From: "Hitoshi Harada" <umi.tanuki@gmail.com>
To: "Heikki Linnakangas" <heikki.linnakangas@enterprisedb.com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Cc: "David Rowley" <dgrowley@gmail.com>, pgsql-hackers@postgresql.org
In-Reply-To: <492D3356.2070705@enterprisedb.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <9E276C7F44A4410D969D25BEDDC2E7FE@amd64>
<e08cc0400811232348v1ad4d192tf4c9967705bca5fe@mail.gmail.com>
<492A8E4B.4050409@enterprisedb.com>
<e08cc0400811240541p296f051v9f3298b821e23e0@mail.gmail.com>
<492AEBB8.8030609@enterprisedb.com>
<e08cc0400811242046v4b368eebx3a18995e92e3538@mail.gmail.com>
<e08cc0400811252203o46e2e859y29104c6732394395@mail.gmail.com>
<492D3356.2070705@enterprisedb.com>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200812/85
X-Sequence-Number: 128714

2008/11/26 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:

Hitoshi Harada wrote:

I read more, and your spooling approach seems flexible for both now
and the furture. Looking at only current release, the frame with ORDER
BY is done by detecting peers in WinFrameGetArg() and add row number
of peers to winobj->currentpos. Actually if we have capability to
spool all rows we need on demand, the frame would be only a boundary
problem.

Yeah, we could do that. I'm afraid it would be pretty slow, though, if
there's a lot of peers. That could probably be alleviated with some sort of
caching, though.

I added code for this issue. See
http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=blobdiff;f=src/backend/executor/nodeWindow.c;h=f2144bf73a94829cd7a306c28064fa5454f8d369;hp=50a6d6ca4a26cd4854c445364395ed183b61f831;hb=895f1e615352dfc733643a701d1da3de7f91344b;hpb=843e34f341f0e824fd2cc0f909079ad943e3815b

This process is very similar to your aggregatedupto in window
aggregate, so they might be shared as general "the way to detect frame
boundary", aren't they?

I am randomly trying some issues instead of agg common code (which I
now doubt if it's worth sharing the code), so tell me if you're
restarting your hack again. I'll send the whole patch.

Regards,

--
Hitoshi Harada

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Olivier PRENANT (#4)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e,
0x8047416, 0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is
fixable!

Looking at fsm_rebuild_page, I wonder if the compiler is treating "int"
as an unsigned integer? That would cause an infinite loop.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Olivier PRENANT
ohp@pyrenet.fr
In reply to: Heikki Linnakangas (#5)
Re: cvs head initdb hangs on unixware

On Tue, 2 Dec 2008, Heikki Linnakangas wrote:

Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416,
0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is
fixable!

Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" as an
unsigned integer? That would cause an infinite loop.

No, a simple printf of nodeno shows it starting at 4096 all the way down
to 0, starting back at 4096...

I wonder if leftchild/rightchild definitions has something to do with
it...

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

From pgsql-hackers-owner@postgresql.org Wed Dec 3 09:23:34 2008

Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id A2EDE650014
for <pgsql-hackers-postgresql.org@mail.postgresql.org>; Wed, 3 Dec 2008 09:23:33 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 87376-09
for <pgsql-hackers-postgresql.org@mail.postgresql.org>;
Wed, 3 Dec 2008 09:23:31 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.169])
by mail.postgresql.org (Postfix) with ESMTP id 5948264FEBD
for <pgsql-hackers@postgresql.org>; Wed, 3 Dec 2008 09:23:29 -0400 (AST)
Received: by ug-out-1314.google.com with SMTP id k40so3309484ugc.7
for <pgsql-hackers@postgresql.org>; Wed, 03 Dec 2008 05:23:28 -0800 (PST)
Received: by 10.210.52.15 with SMTP id z15mr15406978ebz.19.1228310607851;
Wed, 03 Dec 2008 05:23:27 -0800 (PST)
Received: from ?80.223.223.193? (dsl-hkibrasgw2-fedfdf00-193.dhcp.inet.fi [80.223.223.193])
by mx.google.com with ESMTPS id h6sm35338289nfh.21.2008.12.03.05.23.24
(version=TLSv1/SSLv3 cipher=RC4-MD5);
Wed, 03 Dec 2008 05:23:25 -0800 (PST)
Message-ID: <4936884B.6050205@enterprisedb.com>
Date: Wed, 03 Dec 2008 15:23:23 +0200
Organization: EnterpriseDB
User-Agent: Mozilla-Thunderbird 2.0.0.17 (X11/20081018)
MIME-Version: 1.0
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
CC: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: Visibility map, partial vacuums
References: <4905AE17.7090305@enterprisedb.com> <491D376B.9000608@enterprisedb.com> <491D7F52.6070908@enterprisedb.com> <4925664C.3090605@enterprisedb.com> <26361.1227467112@sss.pgh.pa.us> <492A6032.6080000@enterprisedb.com> <18086.1227537479@sss.pgh.pa.us> <492D4460.1000809@enterprisedb.com> <5856.1227705135@sss.pgh.pa.us> <492EF88F.9050709@enterprisedb.com>
In-Reply-To: <492EF88F.9050709@enterprisedb.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200812/147
X-Sequence-Number: 128776

Heikki Linnakangas wrote:

Here's an updated version, with a lot of smaller cleanups, and using
relcache invalidation to notify other backends when the visibility map
fork is extended. I already committed the change to FSM to do the same.
I'm feeling quite satisfied to commit this patch early next week.

Committed.

I haven't done any doc changes for this yet. I think a short section in
the "database internal storage" chapter is probably in order, and the
fact that plain VACUUM skips pages should be mentioned somewhere. I'll
skim through references to vacuum and see what needs to be changed.

Hmm. It just occurred to me that I think this circumvented the
anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid
anymore. We'll need to disable the skipping when autovacuum is triggered
to prevent wraparound. VACUUM FREEZE does that already, but it's
unnecessarily aggressive in freezing.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Olivier PRENANT (#6)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

Looking at fsm_rebuild_page, I wonder if the compiler is treating
"int" as an unsigned integer? That would cause an infinite loop.

No, a simple printf of nodeno shows it starting at 4096 all the way
down to 0, starting back at 4096...

I wonder if leftchild/rightchild definitions has something to do with
it...

With probably no relevance at all, I notice that this routine is
declared extern, although it is only referenced in its own file
apparently. Don't we have a tool that checks that?

cheers

andrew

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Olivier PRENANT (#6)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

On Tue, 2 Dec 2008, Heikki Linnakangas wrote:

Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e,
0x8047416, 0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4)
[0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is
fixable!

Looking at fsm_rebuild_page, I wonder if the compiler is treating
"int" as an unsigned integer? That would cause an infinite loop.

No, a simple printf of nodeno shows it starting at 4096 all the way
down to 0, starting back at 4096...

Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster,
there shouldn't be any broken FSM pages that need rebuilding.

I'd like to see what the FSM page in question looks like. Could you try
to run initdb with "-d -n" options? I bet you'll get an infinite number
of lines like:

DEBUG: fixing corrupt FSM block 1, relation 123/456/789

Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it shouldn't
be that big.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#7)
Re: cvs head initdb hangs on unixware

Andrew Dunstan wrote:

ohp@pyrenet.fr wrote:

Looking at fsm_rebuild_page, I wonder if the compiler is treating
"int" as an unsigned integer? That would cause an infinite loop.

No, a simple printf of nodeno shows it starting at 4096 all the way
down to 0, starting back at 4096...

I wonder if leftchild/rightchild definitions has something to do with
it...

With probably no relevance at all, I notice that this routine is
declared extern, although it is only referenced in its own file
apparently. Don't we have a tool that checks that?

Sure, src/tools/find_static.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Olivier PRENANT
ohp@pyrenet.fr
In reply to: Heikki Linnakangas (#8)
Re: cvs head initdb hangs on unixware

On Wed, 3 Dec 2008, Heikki Linnakangas wrote:

Date: Wed, 03 Dec 2008 20:29:01 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

On Tue, 2 Dec 2008, Heikki Linnakangas wrote:

Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416,
0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4)
[0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]

seems interesting!

We've had problems already with unixware optimizer, hope this one is
fixable!

Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" as
an unsigned integer? That would cause an infinite loop.

No, a simple printf of nodeno shows it starting at 4096 all the way down
to 0, starting back at 4096...

Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster,
there shouldn't be any broken FSM pages that need rebuilding.

You're right!

I'd like to see what the FSM page in question looks like. Could you try to
run initdb with "-d -n" options? I bet you'll get an infinite number of lines
like:

DEBUG: fixing corrupt FSM block 1, relation 123/456/789

right again!
DEBUG: fixing corrupt FSM block 2, relation 1663/1/1255

Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it shouldn't be
that big.

you get both.
BTW, this is an optimizer problem, not anything wrong with the code, but
I'd hate to have a -g compiled postmaster in prod :)

best regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Attachments:

1255_fsmapplication/octet-stream; name=1255_fsmDownload
db.tgzapplication/octet-stream; name=db.tgzDownload
#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Olivier PRENANT (#10)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

On Wed, 3 Dec 2008, Heikki Linnakangas wrote:

Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it
shouldn't be that big.

you get both.

Thanks. Hmm, the FSM pages are full of zeros, as I would expect for a
just-created relation. fsm_search_avail should've returned quickly at
the top of the function in that case. Can you put a extra printf or
something at the top of the function, to print all the arguments? And
the value of fsmpage->fp_nodes[0].

BTW, this is an optimizer problem, not anything wrong with the code, but
I'd hate to have a -g compiled postmaster in prod :)

Yes, so it seems, although I wouldn't be surprised if it turns out to be
a bug in the new FSM code either..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Olivier PRENANT
ohp@pyrenet.fr
In reply to: Heikki Linnakangas (#11)
Re: cvs head initdb hangs on unixware

On Thu, 4 Dec 2008, Heikki Linnakangas wrote:

Date: Thu, 04 Dec 2008 13:19:15 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr wrote:

On Wed, 3 Dec 2008, Heikki Linnakangas wrote:

Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it shouldn't be
that big.

you get both.

Thanks. Hmm, the FSM pages are full of zeros, as I would expect for a
just-created relation. fsm_search_avail should've returned quickly at the top
of the function in that case. Can you put a extra printf or something at the
top of the function, to print all the arguments? And the value of
fsmpage->fp_nodes[0].

BTW, this is an optimizer problem, not anything wrong with the code, but
I'd hate to have a -g compiled postmaster in prod :)

Yes, so it seems, although I wouldn't be surprised if it turns out to be a
bug in the new FSM code either..

As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

Attachments:

initdb.logtext/plain; charset=US-ASCII; name=initdb.logDownload
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Olivier PRENANT (#12)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr writes:

As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...

That's expected, since the system is inserting a lot of tuples
successively. What it looks like to me is that the failing call is the
first one where the initial test *doesn't* result in falling out
immediately. So the probability is that there's something wrong with
the code that descends the tree.

Note that the all-zeroes pages in your dump are uninformative because
none of the real FSM data has been written to disk yet. We can see
from this trace that the code is dealing with not-all-zero pages.

regards, tom lane

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#13)
Re: cvs head initdb hangs on unixware

Tom Lane wrote:

ohp@pyrenet.fr writes:

As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...

That's expected, since the system is inserting a lot of tuples
successively.

Right. I suspect it was in the infinite loop yet. Try to run it for
*much* longer (it'll probably take much longer than usual because it's
printing all the debug stuff), until it gets stuck looping over the same
pages in same relation.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Olivier PRENANT
ohp@pyrenet.fr
In reply to: Heikki Linnakangas (#14)
Re: cvs head initdb hangs on unixware

Dear all,
On Mon, 8 Dec 2008, Heikki Linnakangas wrote:

Date: Mon, 08 Dec 2008 09:17:52 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: ohp@pyrenet.fr, Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

Tom Lane wrote:

ohp@pyrenet.fr writes:

As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...

That's expected, since the system is inserting a lot of tuples
successively.

Right. I suspect it was in the infinite loop yet. Try to run it for *much*
longer (it'll probably take much longer than usual because it's printing all
the debug stuff), until it gets stuck looping over the same pages in same
relation.

the infinite loop occurs in fsm_search_avail when called for the 32nd
time.

It loops between restart: and goto restart

the long (95M) initdb.log can be found at
ftp://ftp.pyrenet.fr/private/initdb.log

regards,

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Olivier PRENANT (#15)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr writes:

the infinite loop occurs in fsm_search_avail when called for the 32nd
time.

... which is the first time that the initial test doesn't make it fall
out immediately.

Would you add a couple more printouts, along the line of

	nodeno = target;
	while (nodeno > 0)
	{
+		fprintf(stderr, "ascend at node %d value %d\n",
+			nodeno, fsmpage->fp_nodes[nodeno]);

if (fsmpage->fp_nodes[nodeno] >= minvalue)
break;

/*
* Move to the right, wrapping around on same level if necessary,
* then climb up.
*/
nodeno = parentof(rightneighbor(nodeno));
}

/*
* We're now at a node with enough free space, somewhere in the middle of
* the tree. Descend to the bottom, following a path with enough free
* space, preferring to move left if there's a choice.
*/
while (nodeno < NonLeafNodesPerPage)
{
int leftnodeno = leftchild(nodeno);
int rightnodeno = leftnodeno + 1;
bool leftok = (leftnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[leftnodeno] >= minvalue);
bool rightok = (rightnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[rightnodeno] >= minvalue);

+		fprintf(stderr, "descend at node %d value %d, leftnode %d value %d, rightnode %d value %d\n",
+			nodeno, fsmpage->fp_nodes[nodeno],
+			leftnodeno, fsmpage->fp_nodes[leftnodeno],
+			rightnodeno, fsmpage->fp_nodes[rightnodeno]);

if (leftok)
nodeno = leftnodeno;
else if (rightok)
nodeno = rightnodeno;
else

(I'm assuming we can print possibly-off-the-end array elements without dumping
core; which is bogus in general but I expect we can get away with it
for this purpose.)

Also, we don't really need 94MB of log to convince us it's an
infinite loop ;-)

regards, tom lane

#17Olivier PRENANT
ohp@pyrenet.fr
In reply to: Tom Lane (#16)
Re: cvs head initdb hangs on unixware

Hi Tom,
On Mon, 8 Dec 2008, Tom Lane wrote:

Date: Mon, 08 Dec 2008 13:15:28 -0500
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr writes:

the infinite loop occurs in fsm_search_avail when called for the 32nd
time.

... which is the first time that the initial test doesn't make it fall
out immediately.

Would you add a couple more printouts, along the line of

nodeno = target;
while (nodeno > 0)
{
+		fprintf(stderr, "ascend at node %d value %d\n",
+			nodeno, fsmpage->fp_nodes[nodeno]);

if (fsmpage->fp_nodes[nodeno] >= minvalue)
break;

/*
* Move to the right, wrapping around on same level if necessary,
* then climb up.
*/
nodeno = parentof(rightneighbor(nodeno));
}

/*
* We're now at a node with enough free space, somewhere in the middle of
* the tree. Descend to the bottom, following a path with enough free
* space, preferring to move left if there's a choice.
*/
while (nodeno < NonLeafNodesPerPage)
{
int leftnodeno = leftchild(nodeno);
int rightnodeno = leftnodeno + 1;
bool leftok = (leftnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[leftnodeno] >= minvalue);
bool rightok = (rightnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[rightnodeno] >= minvalue);

+		fprintf(stderr, "descend at node %d value %d, leftnode %d value %d, rightnode %d value %d\n",
+			nodeno, fsmpage->fp_nodes[nodeno],
+			leftnodeno, fsmpage->fp_nodes[leftnodeno],
+			rightnodeno, fsmpage->fp_nodes[rightnodeno]);

if (leftok)
nodeno = leftnodeno;
else if (rightok)
nodeno = rightnodeno;
else

(I'm assuming we can print possibly-off-the-end array elements without dumping
core; which is bogus in general but I expect we can get away with it
for this purpose.)

Also, we don't really need 94MB of log to convince us it's an
infinite loop ;-)

oops, sorry

regards, tom lane

I first misread your mail, and added only the first fprintf , while I was
uploading a 400M initdb.log, I went back to add the second one.

Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

#18Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Olivier PRENANT (#17)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr napsal(a):

I first misread your mail, and added only the first fprintf , while I
was uploading a 400M initdb.log, I went back to add the second one.

Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..

Could you generate assembler code with and without optimization of fsmSearch
function? Of course without extra printf :-). It should show difference.

Zdenek

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Olivier PRENANT (#17)
Re: cvs head initdb hangs on unixware

ohp@pyrenet.fr writes:

Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..

Hmm, so the problem is in that second loop. The trick is to pick some
reasonably non-ugly code change that makes the problem go away.

The first thing I'd try is to get rid of the overly cute optimization

int rightnodeno = leftnodeno + 1;

and make it just read

int rightnodeno = rightchild(nodeno);

If that doesn't work, we might try refactoring the code enough to get
rid of the goto, but that looks a little bit tedious.

regards, tom lane

#20Olivier PRENANT
ohp@pyrenet.fr
In reply to: Tom Lane (#19)
Re: cvs head initdb hangs on unixware

On Tue, 9 Dec 2008, Tom Lane wrote:

Date: Tue, 09 Dec 2008 09:23:06 -0500
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixware

ohp@pyrenet.fr writes:

Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..

Hmm, so the problem is in that second loop. The trick is to pick some
reasonably non-ugly code change that makes the problem go away.

The first thing I'd try is to get rid of the overly cute optimization

int rightnodeno = leftnodeno + 1;

and make it just read

int rightnodeno = rightchild(nodeno);

If that doesn't work, we might try refactoring the code enough to get
rid of the goto, but that looks a little bit tedious.

regards, tom lane

I tried that and moving leftok,rightok declaration outside the loop, and
refactor the assignement code of leftok, rightok . nothing worked!

Regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

In reply to: Olivier PRENANT (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Olivier PRENANT (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Olivier PRENANT (#1)
#24Olivier PRENANT
ohp@pyrenet.fr
In reply to: Tom Lane (#23)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Olivier PRENANT (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#25)
#27Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#25)
#28Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#27)
#29Olivier PRENANT
ohp@pyrenet.fr
In reply to: Heikki Linnakangas (#25)
#30Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Olivier PRENANT (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Olivier PRENANT (#29)
#32Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#30)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zdenek Kotala (#32)
#35Martijn van Oosterhout
kleptog@svana.org
In reply to: Zdenek Kotala (#32)
#36Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#34)
#37Andrew Dunstan
andrew@dunslane.net
In reply to: Zdenek Kotala (#32)
#38Aidan Van Dyk
aidan@highrise.ca
In reply to: Zdenek Kotala (#32)
#39Olivier PRENANT
ohp@pyrenet.fr
In reply to: Tom Lane (#31)