2nd update on TOAST

Started by 'JanWieck@t-online.de'almost 26 years ago36 messageshackers

JanWieck@t-online.de

almost 26 years ago

Uh oh -

on one side I'm a happy camper. pg_dump already ignored pg_*
tables and since the TOAST tables are named pg_toast_...,
nothing to be done. I loaded my test DB with TOAST entries,
dumped and restored it. Anything is there, works perfectly.

But then I added the ALTER TABLE for unlimited rewrite rule
size to initdb and the problems started. I can create a table
with 500+ attributes. Also I can create a view on it (the
rules size is 170K - whow). Anything works pretty well, just
a pg_dump output is garbage.

Seems the dynamic string buffers used in pg_dump aren't as
bullet proof as they should. I'm still busy with other
things, so can someone please take a look on it? Attached is
an SQL script that creates the table and the view that I
cannot dump.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

Tom Lane

tgl@sss.pgh.pa.us

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#1)

Re: 2nd update on TOAST

JanWieck@t-online.de (Jan Wieck) writes:

size to initdb and the problems started. I can create a table
with 500+ attributes. Also I can create a view on it (the
rules size is 170K - whow). Anything works pretty well, just
a pg_dump output is garbage.

Are you using current sources? Bruce committed Philip Warner's
pg_dump rewrite a day or so ago (which I thought was way premature,
but anyway...). Just want to know which pg_dump we're talking about.

regards, tom lane

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#1)

Re: 2nd update on TOAST

At 22:01 5/07/00 +0200, Jan Wieck wrote:

Seems the dynamic string buffers used in pg_dump aren't as
bullet proof as they should. I'm still busy with other
things, so can someone please take a look on it? Attached is
an SQL script that creates the table and the view that I
cannot dump.

If the TOAST patch on the FTP site the most recent?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: Tom Lane (#2)

Re: 2nd update on TOAST

At 17:18 5/07/00 -0400, Tom Lane wrote:

JanWieck@t-online.de (Jan Wieck) writes:

size to initdb and the problems started. I can create a table
with 500+ attributes. Also I can create a view on it (the
rules size is 170K - whow). Anything works pretty well, just
a pg_dump output is garbage.

Are you using current sources? Bruce committed Philip Warner's
pg_dump rewrite a day or so ago (which I thought was way premature,
but anyway...). Just want to know which pg_dump we're talking about.

It's good that any bugs had a chnace to come out ASAP.

However, I'd be interested to know what actually happens: I do recall that
pg_dump (both versions) have seomething like '#define COPY_BUFFER_SIZE
8192'; if I were to start looking, that's where I'd go first.

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: Philip Warner (#4)

Re: 2nd update on TOAST

At 12:50 6/07/00 +1000, Philip Warner wrote:

Are you using current sources? Bruce committed Philip Warner's
pg_dump rewrite a day or so ago (which I thought was way premature,
but anyway...). Just want to know which pg_dump we're talking about.

It's good that any bugs had a chnace to come out ASAP.

However, I'd be interested to know what actually happens: I do recall that
pg_dump (both versions) have seomething like '#define COPY_BUFFER_SIZE
8192'; if I were to start looking, that's where I'd go first.

Further to this, looking at the code, it now uses 'archputs', which
replaces 'fputs', to output the copy buffer (assuming the error is in the
copy, not while dumping the definitions); if PQgetline fills the entire
buffer (no trailing \0), then I could imagine both would croak. The
simplest test would be to tell PQgetline that the buffer size if 1 byte
smaller, and see if it fixes the problem.

I'd be interested to hear from someone...at least to know a little more of
the circumstaces of the error.

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: Philip Warner (#5)

Re: 2nd update on TOAST

At 13:15 6/07/00 +1000, Philip Warner wrote:

At 12:50 6/07/00 +1000, Philip Warner wrote:

Are you using current sources? Bruce committed Philip Warner's
pg_dump rewrite a day or so ago (which I thought was way premature,
but anyway...). Just want to know which pg_dump we're talking about.

OK, I've built from the latest CVS, run initdb etc, created a new DB,
created the megatable & megaview, and done a pg_dump. It works, except that
I introduced a but that caused the view to be dumped as a table as well. I
will keep looking, and submit a patch shortly. In the mean time, if anyone
can reproduce Jan's problem, that would help...

'JanWieck@t-online.de'

JanWieck@t-online.de

almost 26 years ago

In reply to: Philip Warner (#3)

Re: 2nd update on TOAST

Philip Warner wrote:

At 22:01 5/07/00 +0200, Jan Wieck wrote:

Seems the dynamic string buffers used in pg_dump aren't as
bullet proof as they should. I'm still busy with other
things, so can someone please take a look on it? Attached is
an SQL script that creates the table and the view that I
cannot dump.

If the TOAST patch on the FTP site the most recent?

No. It's all in the current CVS tree. I removed that patch
already.

Jan

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#7)

Re: 2nd update on TOAST

At 11:09 6/07/00 +0200, Jan Wieck wrote:

No. It's all in the current CVS tree. I removed that patch
already.

OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a
'make distclean' then a 'make' & 'make install' again, and now postmaster
wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the
process dies so I cant get a backtrace.

I have now rebuilt with SYSLOG support, and get the following:

Jul 6 20:54:54 Cerberus2 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000038
Jul 6 20:54:54 Cerberus2 kernel: current->tss.cr3 = 02467000, %cr3 = 02467000
Jul 6 20:54:54 Cerberus2 kernel: *pde = 00000000
Jul 6 20:54:54 Cerberus2 kernel: Oops: 0000
Jul 6 20:54:54 Cerberus2 kernel: CPU: 0
Jul 6 20:54:54 Cerberus2 kernel: EIP: 0010:[fcntl_setlk+327/404]
Jul 6 20:54:54 Cerberus2 kernel: EFLAGS: 00000202
Jul 6 20:54:54 Cerberus2 kernel: eax: 00000000 ebx: c15485b0 ecx:
c2494000 edx: c0d49a50
Jul 6 20:54:54 Cerberus2 kernel: esi: bffff574 edi: 00000004 ebp:
fffffff7 esp: c2495f34
Jul 6 20:54:54 Cerberus2 kernel: ds: 0018 es: 0018 ss: 0018
Jul 6 20:54:54 Cerberus2 kernel: Process postmaster (pid: 5661, process
nr: 50, stackpage=c2495000)
Jul 6 20:54:54 Cerberus2 kernel: Stack: 00000000 c15485b0 c2495f40
00000001 00000000 00000000 4000bc74 00000000
Jul 6 20:54:54 Cerberus2 kernel: 00000000 00000000 00000000
00000000 c0d49a50 0000161d 00000000 c15485b0
Jul 6 20:54:54 Cerberus2 kernel: 00000101 00000000 7fffffff
00000000 00000000 00000000 c012c687 00000004
Jul 6 20:54:54 Cerberus2 kernel: Call Trace: [sys_fcntl+595/772]
[sys_open+94/124] [system_call+52/56]
Jul 6 20:54:54 Cerberus2 kernel: Code: 8b 50 38 85 d2 74 15 8d 44 24 24 50
ff 74 24 6c 53 ff d2 89

If anyone can give me some tips on tracking this down, I would appreciate
it....

Import Notes

Resolved by subject fallback

'JanWieck@t-online.de'

JanWieck@t-online.de

almost 26 years ago

In reply to: Philip Warner (#8)

Re: 2nd update on TOAST

Philip Warner wrote:

At 11:09 6/07/00 +0200, Jan Wieck wrote:

No. It's all in the current CVS tree. I removed that patch
already.

OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a
'make distclean' then a 'make' & 'make install' again, and now postmaster
wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the
process dies so I cant get a backtrace.

Have the same symptom with a completely fresh cvs checkout.

If anyone can give me some tips on tracking this down, I would appreciate
it....

Bruce applied a patch to configure.in yesterday. Read the
comments from the cvslog. It tells that it triggers a bug in
the Linux kernels fcntl(SETLK) code when used with unix
domain sockets, and that the bug is present in Linux kernels
<= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies
in pqcomm.c line 229 on fcntl() against the socket.

Undefining HAVE_FCNTL_SETLK in config.h did it for me
temporary. Don't know how to deal with it finally.

With this setup I did

initdb
createdb
psql <megaview.sql
pg_dump pgsql >megaview.dump

In the dump file, the first 2183 bytes look OK. What's
following then looks like internal tables where pg_dump holds
the info of the schema analyzing.

And don't worry that the view is dumped as table with a later
CREATE RULE. That's correct this way.

Jan

#10

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#9)

Re: 2nd update on TOAST

At 14:04 6/07/00 +0200, Jan Wieck wrote:

Philip Warner wrote:

In the dump file, the first 2183 bytes look OK. What's
following then looks like internal tables where pg_dump holds
the info of the schema analyzing.

Any chance you could mail it direct to me?

And don't worry that the view is dumped as table with a later
CREATE RULE. That's correct this way.

I figured this out the hard way!

#11

Philip Warner

pjw@rhyme.com.au

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#9)

Re: 2nd update on TOAST

At 14:04 6/07/00 +0200, Jan Wieck wrote:

With this setup I did

initdb
createdb
psql <megaview.sql
pg_dump pgsql >megaview.dump

In the dump file, the first 2183 bytes look OK. What's
following then looks like internal tables where pg_dump holds
the info of the schema analyzing.

Just in case there is some other factor, can you let me know what your
choices in 'configure' were?

#12

'JanWieck@t-online.de'

JanWieck@t-online.de

almost 26 years ago

In reply to: Philip Warner (#10)

Re: 2nd update on TOAST

Philip Warner wrote:

At 14:04 6/07/00 +0200, Jan Wieck wrote:

Philip Warner wrote:

In the dump file, the first 2183 bytes look OK. What's
following then looks like internal tables where pg_dump holds
the info of the schema analyzing.

Any chance you could mail it direct to me?

Attached.

And don't worry that the view is dumped as table with a later
CREATE RULE. That's correct this way.

I figured this out the hard way!

:-)

Will be off after this until approx. 1:00 UCT.

Jan

#13

'JanWieck@t-online.de'

JanWieck@t-online.de

almost 26 years ago

In reply to: Philip Warner (#11)

Re: 2nd update on TOAST

Philip Warner wrote:

At 14:04 6/07/00 +0200, Jan Wieck wrote:

With this setup I did

initdb
createdb
psql <megaview.sql
pg_dump pgsql >megaview.dump

In the dump file, the first 2183 bytes look OK. What's
following then looks like internal tables where pg_dump holds
the info of the schema analyzing.

Just in case there is some other factor, can you let me know what your
choices in 'configure' were?

--with-tcl
--enable-cassert
--enable-debug

Jan

#14

Thomas Lockhart

lockhart@alumni.caltech.edu

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#9)

Re: 2nd update on TOAST

Bruce applied a patch to configure.in yesterday. Read the
comments from the cvslog. It tells that it triggers a bug in
the Linux kernels fcntl(SETLK) code when used with unix
domain sockets, and that the bug is present in Linux kernels
<= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies
in pqcomm.c line 229 on fcntl() against the socket.
Undefining HAVE_FCNTL_SETLK in config.h did it for me
temporary. Don't know how to deal with it finally.

Thanks Jan for the workaround. I'll see if this gets me up and going
again.

- Thomas

#15

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: Tom Lane (#2)

Re: 2nd update on TOAST

JanWieck@t-online.de (Jan Wieck) writes:

size to initdb and the problems started. I can create a table
with 500+ attributes. Also I can create a view on it (the
rules size is 170K - whow). Anything works pretty well, just
a pg_dump output is garbage.

Are you using current sources? Bruce committed Philip Warner's
pg_dump rewrite a day or so ago (which I thought was way premature,
but anyway...). Just want to know which pg_dump we're talking about.

I committed pg_dump so people could see his changes and start making
additions. I can then have him supply incremental patches. Keeping
stuff out of the tree usually causes the author to get frustrated.

Of course, if it comes in too quickly, it can cause chaos if he needs to
make major changes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#16

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#9)

Re: 2nd update on TOAST

OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a
'make distclean' then a 'make' & 'make install' again, and now postmaster
wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the
process dies so I cant get a backtrace.

Have the same symptom with a completely fresh cvs checkout.

If anyone can give me some tips on tracking this down, I would appreciate
it....

Bruce applied a patch to configure.in yesterday. Read the
comments from the cvslog. It tells that it triggers a bug in
the Linux kernels fcntl(SETLK) code when used with unix
domain sockets, and that the bug is present in Linux kernels
<= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies
in pqcomm.c line 229 on fcntl() against the socket.

I thought when he said flock() bug, he meant only on the new IA64
platform, not on all Linux platforms. Yikes, I enable flock(), and it
breaks initdb for all the Linux users. This is a problem!

Tom was mentioning the configure check for flock() was broken recently,
so I was glad to fix it.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#17

'JanWieck@t-online.de'

JanWieck@t-online.de

almost 26 years ago

In reply to: Bruce Momjian (#16)

Re: 2nd update on TOAST

Bruce Momjian wrote:

OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a
'make distclean' then a 'make' & 'make install' again, and now postmaster
wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the
process dies so I cant get a backtrace.

Have the same symptom with a completely fresh cvs checkout.

If anyone can give me some tips on tracking this down, I would appreciate
it....

Bruce applied a patch to configure.in yesterday. Read the
comments from the cvslog. It tells that it triggers a bug in
the Linux kernels fcntl(SETLK) code when used with unix
domain sockets, and that the bug is present in Linux kernels
<= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies
in pqcomm.c line 229 on fcntl() against the socket.

I thought when he said flock() bug, he meant only on the new IA64
platform, not on all Linux platforms. Yikes, I enable flock(), and it
breaks initdb for all the Linux users. This is a problem!

Not initdb, but postmaster. That's the one who tries (after a
successful initdb) to do the fcntl(F_SETLK) on the unix
domain socket. Causing the kernel saying "go to hell, go
directly, don't write a core, don't leave useful info in
gdb".

The only reason I see for the entire section is to detect if
it would be safe to unlink the socket because it's left by
another postmaster in case of abnormal termination. Tell me
if I've misread it. So why not doing it on the Linux
platform different, using a separate file like
.s.PGSQL.5432.LCK?

Jan

#18

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: 'JanWieck@t-online.de' (#17)

Re: 2nd update on TOAST

Not initdb, but postmaster. That's the one who tries (after a
successful initdb) to do the fcntl(F_SETLK) on the unix
domain socket. Causing the kernel saying "go to hell, go
directly, don't write a core, don't leave useful info in
gdb".

The only reason I see for the entire section is to detect if
it would be safe to unlink the socket because it's left by
another postmaster in case of abnormal termination. Tell me
if I've misread it. So why not doing it on the Linux
platform different, using a separate file like
.s.PGSQL.5432.LCK?

But how do you know if that file still belongs to an active postmaster?
What if it exited before removing the file. Seems we would have to
write the PID into the file, and do a kill() to see if it is running.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#19

Tom Lane

tgl@sss.pgh.pa.us

almost 26 years ago

In reply to: Bruce Momjian (#18)

fcntl(SETLK) [was Re: 2nd update on TOAST]

The only reason I see for the entire section is to detect if
it would be safe to unlink the socket because it's left by
another postmaster in case of abnormal termination. Tell me
if I've misread it.

That's exactly what it's for. We need to tell whether there is
still another postmaster running on the same port number. Too
bad the kernel is not bright enough to unlink the socket file
automatically when it's no longer in use...

So why not doing it on the Linux
platform different, using a separate file like
.s.PGSQL.5432.LCK?

I think it's a bad idea to do it differently on Linux than other
platforms. If we fix this (other than by just disabling the fcntl
call again on old Linuxen) we should use the new method everywhere.

But how do you know if that file still belongs to an active postmaster?
What if it exited before removing the file. Seems we would have to
write the PID into the file, and do a kill() to see if it is running.

Well, if we wanted to continue to depend on fcntl(SETLK) then we could
use an empty plain file. I read the bug report as being that old Linux
kernels fail if fcntl(SETLK) is applied to a Unix-socket file. They'd
surely have noticed long before if the feature didn't work on plain
files.

But if we are going to change this at all, I'd vote for storing pids
in the lock files the way we are now doing in the data-directory pid
lock files. Then we wouldn't have to depend on fcntl at all, which
would be a Good Thing from a portability point of view.

However, I think it would be a really bad idea to keep the lock files
in /tmp --- that's way too open to accidental removals, not to mention
deliberate denial-of-service attacks. They need to be in a more secure
directory; but where? See the past discussions summarized in the
TODO.detail file.

regards, tom lane

#20

Peter Eisentraut

peter_e@gmx.net

almost 26 years ago

In reply to: Tom Lane (#19)

Re: fcntl(SETLK) [was Re: 2nd update on TOAST]

Tom Lane writes:

However, I think it would be a really bad idea to keep the lock files
in /tmp --- that's way too open to accidental removals, not to mention
deliberate denial-of-service attacks. They need to be in a more secure
directory; but where? See the past discussions summarized in the
TODO.detail file.

Quoth the file system standard:

`sharedstatedir'
The directory for installing architecture-independent data files
which the programs modify while they run. This should normally be
`/usr/local/com', but write it as `$(prefix)/com'. (If you are
using Autoconf, write it as `@sharedstatedir@'.)

The problem with this approach is making that directory writeable by the
server account. Solutions:

1) Making the postmaster executable as root but later drop root
privileges. (This looks to be the cleanest solution, but it is
probably a security problem waiting to happen.)

2) Making initdb executable as root but with some --user switch. Have it
create a subdirectory of $sharedstatedir writable by the server
account, possibly with sticky bit and whatnot. Use `su' to invoke
`postgres'.

This approach might be convenient also in terms of creating the data
directory.

3) Making "initialize lock file area" a separate initialization step,
possibly encapsulated into a shell script.

Btw., what would happen if we did start a second postmaster at the same
TCP port? Or more interestingly, what happens if some completely different
program already runs at that port? How do we protect against that? This
has something to do with SO_REUSEADDR, but I don't understand those things
too well.

--
Peter Eisentraut Sernanders vï¿½g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#21

Tom Lane

tgl@sss.pgh.pa.us

almost 26 years ago

In reply to: Peter Eisentraut (#20)

#22

'JanWieck@t-online.de'

JanWieck@t-online.de

almost 26 years ago

In reply to: Tom Lane (#21)

#23