Debug strategy for musl Postgres?

Started by John Muddalmost 12 years ago10 messages
#1John Mudd
johnbmudd@gmail.com

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using musl
as a foundation for Postgres.

I'm using musl to increase the portability of the Postgres binary. I build
on Ubuntu 13.10 but will runs on older Linux boxes.

So far I get better results with the musl Postgres built on modern Ubuntu
and running on an old kernel than building Postgres directly on the old
Linux using standard C library. But the musl Postgres is still not working
fully. I'm not getting responses from the server.

Here's the tail end "strace pg_isready" output for musl Postgres built and
running on Ubuntu 13.10:

clock_gettime(CLOCK_REALTIME, {1397359337, 426941692}) = 0
poll([{fd=4, events=POLLOUT|POLLERR}], 1, 3000) = 1 ([{fd=4,
revents=POLLOUT}])
sendto(4, "\0\0\0=\0\3\0\0user\0mudd\0database\0mudd\0"..., 61,
MSG_NOSIGNAL, NULL, 0) = 61
clock_gettime(CLOCK_REALTIME, {1397359337, 427070343}) = 0
poll([{fd=4, events=POLLIN|POLLERR}], 1, 3000) = 1 ([{fd=4,
revents=POLLIN}])
recvfrom(4, "R\0\0\0\10\0\0\0\0E\0\0\0RSFATAL\0C3D000\0Mdat"..., 16384, 0,
NULL, NULL) = 92
close(4) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
writev(1, [{"/tmp:5432 - accepting connection"..., 33}, {"\n", 1}], 2) = 34
exit_group(0) = ?

Here's the tail end "strace pg_isready" output for musl Postgres built on
Ubuntu 13.10 but running on old Linux:

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, 3000) = 1
sendto(3, "\0\0\0?\0\3\0\0user\0jmudd\0database\0jmud"..., 63, 0x4000,
NULL, 0) = 63

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLIN|POLLERR}], 1, 3000) = 0
close(3) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
writev(1, [{"/tmp:5432 - no response", 23}, {"\n", 1}], 2) = 24
exit_group(2) = ?

For my next step I'll try building musl Postgres with the --enable-cassert
option. What else can I do to debug this?

John

#2Euler Taveira
euler@timbira.com.br
In reply to: John Mudd (#1)
Re: Debug strategy for musl Postgres?

On 13-04-2014 00:40, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using musl
as a foundation for Postgres.

This is not a bug. This kind of discussion belongs to -hackers.

While reading this email, I give musl a try. I'm using Debian jessie
which contains musl 1.0.0. I compiled the source (git master) using
CC="musl-gcc" and disabled zlib and readline. It passed all regression
tests. I also tried a pgbench which ran like a charm. (After installed
the binaries I had to set the libray path for musl in
/etc/ld-musl-x86_64.d.)

I'm using musl to increase the portability of the Postgres binary. I build
on Ubuntu 13.10 but will runs on older Linux boxes.

Could you give details about your architecture?

For my next step I'll try building musl Postgres with the --enable-cassert
option. What else can I do to debug this?

Is postgres running and listening 5432? Did you try another binaries
(eg. psql) or even postgres in single mode?

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3John Mudd
johnbmudd@gmail.com
In reply to: Euler Taveira (#2)
Re: Debug strategy for musl Postgres?

I agree, not a bug. I was just following the instructions to post as bug
first and then move to hackers if directed. I'll repost on hackers and give
the rest of my reply there.

On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br>wrote:

Show quoted text

On 13-04-2014 00:40, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using

musl

as a foundation for Postgres.

This is not a bug. This kind of discussion belongs to -hackers.

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#4John Mudd
johnbmudd@gmail.com
In reply to: John Mudd (#1)
Fwd: Debug strategy for musl Postgres?

Reposting from pgsql-bugs since this is not a bug.

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using musl
as a foundation for Postgres.

I'm using musl to increase the portability of the Postgres binary. I build
on Ubuntu 13.10 but will runs on older Linux boxes.

So far I get better results with the musl Postgres built on modern Ubuntu
and running on an old kernel than building Postgres directly on the old
Linux using standard C library. But the musl Postgres is still not working
fully. I'm not getting responses from the server.

Here's the tail end "strace pg_isready" output for musl Postgres built and
running on Ubuntu 13.10:

clock_gettime(CLOCK_REALTIME, {1397359337, 426941692}) = 0
poll([{fd=4, events=POLLOUT|POLLERR}], 1, 3000) = 1 ([{fd=4,
revents=POLLOUT}])
sendto(4, "\0\0\0=\0\3\0\0user\0mudd\0database\0mudd\0"..., 61,
MSG_NOSIGNAL, NULL, 0) = 61
clock_gettime(CLOCK_REALTIME, {1397359337, 427070343}) = 0
poll([{fd=4, events=POLLIN|POLLERR}], 1, 3000) = 1 ([{fd=4,
revents=POLLIN}])
recvfrom(4, "R\0\0\0\10\0\0\0\0E\0\0\0RSFATAL\0C3D000\0Mdat"..., 16384, 0,
NULL, NULL) = 92
close(4) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
writev(1, [{"/tmp:5432 - accepting connection"..., 33}, {"\n", 1}], 2) = 34
exit_group(0) = ?

Here's the tail end "strace pg_isready" output for musl Postgres built on
Ubuntu 13.10 but running on old Linux:

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, 3000) = 1
sendto(3, "\0\0\0?\0\3\0\0user\0jmudd\0database\0jmud"..., 63, 0x4000,
NULL, 0) = 63

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLIN|POLLERR}], 1, 3000) = 0
close(3) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
writev(1, [{"/tmp:5432 - no response", 23}, {"\n", 1}], 2) = 24
exit_group(2) = ?

For my next step I'll try building musl Postgres with the --enable-cassert
option. What else can I do to debug this?

John

#5John Mudd
johnbmudd@gmail.com
In reply to: Euler Taveira (#2)
Re: Debug strategy for musl Postgres?

On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br>wrote:

On 13-04-2014 00:40, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using

musl

as a foundation for Postgres.

This is not a bug. This kind of discussion belongs to -hackers.

While reading this email, I give musl a try. I'm using Debian jessie
which contains musl 1.0.0. I compiled the source (git master) using
CC="musl-gcc" and disabled zlib and readline. It passed all regression
tests. I also tried a pgbench which ran like a charm. (After installed
the binaries I had to set the libray path for musl in
/etc/ld-musl-x86_64.d.)

I'm using musl to increase the portability of the Postgres binary. I

build

on Ubuntu 13.10 but will runs on older Linux boxes.

Could you give details about your architecture?

Built on 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013 i686
i686 i686 GNU/Linux
Runs fine there.

Moved postgres install directory to 2.4.21-4.EL #1 Fri Oct 3 18:13:58 EDT
2003 i686 i686 i386 GNU/Linux
Not working fully there.
Note: It's says 2.4 kernel but I've been told that's misleading. The kernel
has upgrades that make it effectively 2.6.

For my next step I'll try building musl Postgres with the

--enable-cassert

option. What else can I do to debug this?

Is postgres running and listening 5432? Did you try another binaries
(eg. psql) or even postgres in single mode?

I rebuilt with --enable-cassert, reran and no difference on 2.4 machine.

It's listening even on 2.4 machine. I ran strace on main postgres process
and got the following while running pg_isready.

Process 23811 attached - interrupt to quit
Process 23811 detached

But pg_isready just reports "/tmp:5432 - no response" after a few seconds.

Show quoted text

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#6John Mudd
johnbmudd@gmail.com
In reply to: John Mudd (#5)
Re: Debug strategy for musl Postgres?

On Sun, Apr 13, 2014 at 4:19 PM, John Mudd <johnbmudd@gmail.com> wrote:

It's listening even on 2.4 machine. I ran strace on main postgres process
and got the following while running pg_isready.

Process 23811 attached - interrupt to quit
Process 23811 detached

Correction, the main postgres process does not indicate any awareness that
pg_isready is trying to connect. The msgs I listed above are just from
strace attaching.

The same happens if I try psql. Psql just waits indefinitely.

#7Andres Freund
andres@2ndquadrant.com
In reply to: John Mudd (#4)
Re: Fwd: Debug strategy for musl Postgres?

Hi,

On 2014-04-13 16:08:00 -0400, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using musl
as a foundation for Postgres.

I'm using musl to increase the portability of the Postgres binary. I build
on Ubuntu 13.10 but will runs on older Linux boxes.

So far I get better results with the musl Postgres built on modern Ubuntu
and running on an old kernel than building Postgres directly on the old
Linux using standard C library. But the musl Postgres is still not working
fully. I'm not getting responses from the server.

I tend to think that this is more a matter for the musl devs than
postgres. Postgres works on a fair numbers of libcs and musl is pretty
new and rough around the edges.

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not implemented)

This looks suspicious.

gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, 3000) = 1
sendto(3, "\0\0\0?\0\3\0\0user\0jmudd\0database\0jmud"..., 63, 0x4000,
NULL, 0) = 63

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLIN|POLLERR}], 1, 3000) = 0

Here a poll didn't return anything. You'll likely have to look at
the server side.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8John Mudd
johnbmudd@gmail.com
In reply to: John Mudd (#1)
Fwd: [HACKERS] Fwd: Debug strategy for musl Postgres?

On Sun, Apr 13, 2014 at 4:28 PM, Andres Freund <andres@2ndquadrant.com>wrote:

Hi,

On 2014-04-13 16:08:00 -0400, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python using

musl

as a foundation for Postgres.

I'm using musl to increase the portability of the Postgres binary. I

build

on Ubuntu 13.10 but will runs on older Linux boxes.

So far I get better results with the musl Postgres built on modern Ubuntu
and running on an old kernel than building Postgres directly on the old
Linux using standard C library. But the musl Postgres is still not

working

fully. I'm not getting responses from the server.

I tend to think that this is more a matter for the musl devs than
postgres. Postgres works on a fair numbers of libcs and musl is pretty
new and rough around the edges.

Okay. I just wanted to check here too.

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not

implemented)

This looks suspicious.

gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, 3000) = 1
sendto(3, "\0\0\0?\0\3\0\0user\0jmudd\0database\0jmud"..., 63, 0x4000,
NULL, 0) = 63

clock_gettime(0, 0xbfffa5a8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
poll([{fd=3, events=POLLIN|POLLERR}], 1, 3000) = 0

Here a poll didn't return anything. You'll likely have to look at
the server side.

Yes, the server. It's in a tight loop. This is all it's doing. Thanks, I'll
look into this.

clock_gettime(0, 0xbfffded8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
clock_gettime(0, 0xbfffded8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
clock_gettime(0, 0xbfffded8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
clock_gettime(0, 0xbfffded8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0
clock_gettime(0, 0xbfffded8) = -1 ENOSYS (Function not
implemented)
gettimeofday(NULL, {300, 0}) = 0

Show quoted text

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: John Mudd (#5)
Re: Debug strategy for musl Postgres?

On 04/13/2014 10:19 PM, John Mudd wrote:

On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br
<mailto:euler@timbira.com.br>> wrote:

On 13-04-2014 00:40, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python

using musl

as a foundation for Postgres.

This is not a bug. This kind of discussion belongs to -hackers.

While reading this email, I give musl a try. I'm using Debian jessie
which contains musl 1.0.0. I compiled the source (git master) using
CC="musl-gcc" and disabled zlib and readline. It passed all regression
tests. I also tried a pgbench which ran like a charm. (After installed
the binaries I had to set the libray path for musl in
/etc/ld-musl-x86_64.d.)

I'm using musl to increase the portability of the Postgres binary.

I build

on Ubuntu 13.10 but will runs on older Linux boxes.

Could you give details about your architecture?

Built on 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013
i686 i686 i686 GNU/Linux
Runs fine there.

Moved postgres install directory to 2.4.21-4.EL #1 Fri Oct 3 18:13:58
EDT 2003 i686 i686 i386 GNU/Linux
Not working fully there.
Note: It's says 2.4 kernel but I've been told that's misleading. The
kernel has upgrades that make it effectively 2.6.

This looks like a RHEL3 version number, and while that kernel was kind
of creepy thing with a lot of patches (also from the 2.6 era) backport
it is definititly not a 2.6 kernel(also note that 2.6.0 was released in
december of 2003 while RHEL 3 was released in october that year. Juding
from the version number this also seems to be based on the very first
RHEL3 kernel missing all follow up bugfixed during the RHEL3 lifetime.

So I would be very much not surprised if a modern and young C-library
running on a >10 year old kernel that never looked like the upstream
kernel misbehaved with a complex userspace app like postgresql.

Stefan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10John Mudd
johnbmudd@gmail.com
In reply to: John Mudd (#1)
Fwd: [BUGS] Debug strategy for musl Postgres?

On Mon, Apr 14, 2014 at 2:06 PM, Stefan Kaltenbrunner <
stefan@kaltenbrunner.cc> wrote:

On 04/13/2014 10:19 PM, John Mudd wrote:

On Sun, Apr 13, 2014 at 12:04 PM, Euler Taveira <euler@timbira.com.br
<mailto:euler@timbira.com.br>> wrote:

On 13-04-2014 00:40, John Mudd wrote:

I built Postgres 9.3.4 from source on top of the musl C library,
http://www.musl-libc.org/
I also built zlib, bzip2, ncurses, openssl, readline and Python

using musl

as a foundation for Postgres.

This is not a bug. This kind of discussion belongs to -hackers.

While reading this email, I give musl a try. I'm using Debian jessie
which contains musl 1.0.0. I compiled the source (git master) using
CC="musl-gcc" and disabled zlib and readline. It passed all

regression

tests. I also tried a pgbench which ran like a charm. (After

installed

the binaries I had to set the libray path for musl in
/etc/ld-musl-x86_64.d.)

I'm using musl to increase the portability of the Postgres binary.

I build

on Ubuntu 13.10 but will runs on older Linux boxes.

Could you give details about your architecture?

Built on 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013
i686 i686 i686 GNU/Linux
Runs fine there.

Moved postgres install directory to 2.4.21-4.EL #1 Fri Oct 3 18:13:58
EDT 2003 i686 i686 i386 GNU/Linux
Not working fully there.
Note: It's says 2.4 kernel but I've been told that's misleading. The
kernel has upgrades that make it effectively 2.6.

This looks like a RHEL3 version number, and while that kernel was kind
of creepy thing with a lot of patches (also from the 2.6 era) backport
it is definititly not a 2.6 kernel(also note that 2.6.0 was released in
december of 2003 while RHEL 3 was released in october that year. Juding
from the version number this also seems to be based on the very first
RHEL3 kernel missing all follow up bugfixed during the RHEL3 lifetime.

So I would be very much not surprised if a modern and young C-library
running on a >10 year old kernel that never looked like the upstream
kernel misbehaved with a complex userspace app like postgresql.

Update:
I contacted musl developers, received a one line patch to the
gettimeofday() fallback code, rebuilt the musl libc, copied the lib to my
old linux box and Postgres is running well now.

=======================
All 136 tests passed.
=======================

It's interesting that when I built Postgres on this same old Linux but it
fails to run.

============== removing existing temp installation ==============
============== creating temporary installation ==============
============== initializing database system ==============
============== starting postmaster ==============

pg_regress: postmaster did not respond within 60 seconds

Building with musl on a modern Linux works on an old Linux. But building
Postgres on the old Linux with the native libc gives me a broken Postgres.
That's why I'm interested in musl libc.

Show quoted text

Stefan