Re: Hung postmaster (8.3.9)

Started by Ed L.about 16 years ago25 messageshackersgeneral
Jump to latest
#1Ed L.
pgsql@bluepolka.net
hackersgeneral

On Monday 01 March 2010 @ 15:41, Ed Loehr (LoehrTech.com) wrote:

"Ed L." <pgsql@bluepolka.net> writes:

(gdb) bt
#0 0x000000346f8c43a0 in __read_nocancel () from
/lib64/libc.so.6 #1 0x000000346f86c747 in
_IO_new_file_underflow () from /lib64/libc.so.6 #2
0x000000346f86d10e in _IO_default_uflow_internal () from
/lib64/libc.so.6 #3 0x000000346f8689cb in getc () from
/lib64/libc.so.6 #4 0x0000000000531ee8 in next_token
(fp=0x5b90f20, buf=0x7fff59bef330 "", bufsz=4096) at
hba.c:128 #5 0x0000000000532233 in tokenize_file
(filename=0x5b8f3f0 "global", file=0x5b90f20,
lines=0x7fff59bef5c8, line_nums=0x7fff59bef5c0) at hba.c:232
#6 0x00000000005322e9 in tokenize_file (filename=0x5b8f3d0
"global/pg_auth", file=0x5b90ce0, lines=0x98b168,
line_nums=0x98b170) at hba.c:358
#7 0x00000000005327ff in load_role () at hba.c:959
#8 0x000000000057f300 in reaper (postgres_signal_arg=<value
optimized out>) at postmaster.c:2145 #9 <signal handler
called>
#10 0x000000346f8cb323 in __select_nocancel () from
/lib64/libc.so.6 #11 0x000000000057cc33 in ServerLoop () at
postmaster.c:1236 #12 0x000000000057dfdf in PostmasterMain
(argc=6, argv=0x5b73fe0) at postmaster.c:1031 #13
0x00000000005373de in main (argc=6, argv=<value optimized
out>) at main.c:188

The postmaster seems to be stuck trying to read
$PGDATA/global/pg_auth (which would be an expected thing for
it to do at this point in the startup sequence). Does that
file exist? Is it an ordinary file? Do its contents look
sane (a list of your userids and their passwords and group
memberships)?

This just happened again ~24 hours after full reload from backup.
Arrrgh.

Backtrace looks the same again, same file, same
__read_nocancel(). $PGDATA/global/pg_auth looks fine to me,
permissions are 600, entries are 3 or more double-quoted items
per line each separated by a space, items 3 and beyond being
groups.

Any clues?

#2Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#1)
hackersgeneral

On Monday 01 March 2010 @ 15:46, Ed L. wrote:

On Monday 01 March 2010 @ 15:41, Ed Loehr (LoehrTech.com) wrote:

"Ed L." <pgsql@bluepolka.net> writes:

(gdb) bt
#0 0x000000346f8c43a0 in __read_nocancel () from
/lib64/libc.so.6 #1 0x000000346f86c747 in
_IO_new_file_underflow () from /lib64/libc.so.6 #2
0x000000346f86d10e in _IO_default_uflow_internal () from
/lib64/libc.so.6 #3 0x000000346f8689cb in getc () from
/lib64/libc.so.6 #4 0x0000000000531ee8 in next_token
(fp=0x5b90f20, buf=0x7fff59bef330 "", bufsz=4096) at
hba.c:128 #5 0x0000000000532233 in tokenize_file
(filename=0x5b8f3f0 "global", file=0x5b90f20,
lines=0x7fff59bef5c8, line_nums=0x7fff59bef5c0) at
hba.c:232 #6 0x00000000005322e9 in tokenize_file
(filename=0x5b8f3d0 "global/pg_auth", file=0x5b90ce0,
lines=0x98b168, line_nums=0x98b170) at hba.c:358
#7 0x00000000005327ff in load_role () at hba.c:959
#8 0x000000000057f300 in reaper
(postgres_signal_arg=<value optimized out>) at
postmaster.c:2145 #9 <signal handler called>
#10 0x000000346f8cb323 in __select_nocancel () from
/lib64/libc.so.6 #11 0x000000000057cc33 in ServerLoop ()
at postmaster.c:1236 #12 0x000000000057dfdf in
PostmasterMain (argc=6, argv=0x5b73fe0) at
postmaster.c:1031 #13 0x00000000005373de in main (argc=6,
argv=<value optimized out>) at main.c:188

The postmaster seems to be stuck trying to read
$PGDATA/global/pg_auth (which would be an expected thing
for it to do at this point in the startup sequence). Does
that file exist? Is it an ordinary file? Do its contents
look sane (a list of your userids and their passwords and
group memberships)?

This just happened again ~24 hours after full reload from
backup. Arrrgh.

Backtrace looks the same again, same file, same
__read_nocancel(). $PGDATA/global/pg_auth looks fine to me,
permissions are 600, entries are 3 or more double-quoted items
per line each separated by a space, items 3 and beyond being
groups.

Any clues?

Watching the server logs, the system is continuing to process
data on existing connections. Just can't get any new ones.
Here's a backtrace for a hung psql -c "select version()":

$ gdb `which psql`
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) attach 9631
Attaching to program: /opt/pgsql/installs/postgresql-8.3.9/bin/psql, process 9631
Reading symbols from /opt/pgsql/installs/postgresql-8.3.9/lib/libpq.so.5...done.
Loaded symbols for /opt/pgsql/installs/postgresql-8.3.9/lib/libpq.so.5
Reading symbols from /usr/lib64/libz.so.1...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /usr/lib64/libreadline.so.5...done.
Loaded symbols for /usr/lib64/libreadline.so.5
Reading symbols from /lib64/libtermcap.so.2...done.
Loaded symbols for /lib64/libtermcap.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
0x000000346f8c92af in poll () from /lib64/libc.so.6
(gdb) bt
#0 0x000000346f8c92af in poll () from /lib64/libc.so.6
#1 0x00002b03826e5e6f in pqSocketCheck (conn=0x655eef0, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1046
#2 0x00002b03826e5f10 in pqWaitTimed (forRead=1, forWrite=-1, conn=0x655eef0, finish_time=-1) at fe-misc.c:920
#3 0x00002b03826e1752 in connectDBComplete (conn=0x655eef0) at fe-connect.c:930
#4 0x00002b03826e2c60 in PQsetdbLogin (pghost=0x0, pgport=0x0, pgoptions=0x0, pgtty=0x0, dbName=0x0, login=0x0, pwd=0x0) at fe-connect.c:678
#5 0x000000000040e319 in main (argc=<value optimized out>, argv=0x7fff283ce6e8) at startup.c:195

#3Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#2)
hackersgeneral

On Monday 01 March 2010 @ 15:59, Ed L. wrote:

This just happened again ~24 hours after full reload from
backup. Arrrgh.

Backtrace looks the same again, same file, same
__read_nocancel(). $PGDATA/global/pg_auth looks fine to me,
permissions are 600, entries are 3 or more double-quoted
items per line each separated by a space, items 3 and beyond
being groups.

Any clues?

Also seeing lots of postmaster zombies (190 and growing)...

Ed
--

#4Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#3)
hackersgeneral

On Monday 01 March 2010 @ 16:03, Ed L. wrote:

On Monday 01 March 2010 @ 15:59, Ed L. wrote:

This just happened again ~24 hours after full reload from
backup. Arrrgh.

Backtrace looks the same again, same file, same
__read_nocancel(). $PGDATA/global/pg_auth looks fine to
me, permissions are 600, entries are 3 or more
double-quoted items per line each separated by a space,
items 3 and beyond being groups.

Any clues?

Also seeing lots of postmaster zombies (190 and growing)...

While new connections are hanging, top shows postmaster using
100% of cpu. SIGTERM/SIGQUIT do nothing. Here's a backtrace
of this busy postmaster:

(gdb) bt
#0 0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
#1 0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x000000346f8689cb in getc () from /lib64/libc.so.6
#4 0x0000000000531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 "", bufsz=4096) at hba.c:128
#5 0x0000000000532233 in tokenize_file (filename=0x10359b70 "global", file=0x10377ae0, lines=0x7fff322310f8, line_nums=0x7fff322310f0) at hba.c:232
#6 0x00000000005322e9 in tokenize_file (filename=0x2b1c8cbf5800 "global/pg_auth", file=0x103767a0, lines=0x98b168, line_nums=0x98b170) at hba.c:358
#7 0x00000000005327ff in load_role () at hba.c:959
#8 0x000000000057f878 in sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:3830
#9 <signal handler called>
#10 0x000000346f8cb323 in __select_nocancel () from /lib64/libc.so.6
#11 0x000000000057cc33 in ServerLoop () at postmaster.c:1236
#12 0x000000000057dfdf in PostmasterMain (argc=6, argv=0x1033f000) at postmaster.c:1031
#13 0x00000000005373de in main (argc=6, argv=<value optimized out>) at main.c:188

...and more from the server logs, fwiw:

2010-03-01 17:30:24.213 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:30:31.250 CST [32236] DEBUG: transaction log switch forced (archive_timeout=300)
2010-03-01 17:31:24.216 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:32:24.219 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:33:24.222 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:34:24.225 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:35:19.061 CST [32236] LOG: checkpoint starting: time
2010-03-01 17:35:19.185 CST [32236] DEBUG: recycled transaction log file "000000010000001C00000071"
2010-03-01 17:35:19.185 CST [32236] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 1 recycled;
write=0.028 s, sync=0.000 s, total=0.124 s
2010-03-01 17:35:24.328 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:35:31.224 CST [32236] DEBUG: transaction log switch forced (archive_timeout=300)
2010-03-01 17:36:44.332 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:37:44.434 CST [32238] WARNING: worker took too long to start; cancelled
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG: could not receive data from client: Connection timed out
2010-03-01 17:37:47.378 CST [3692] dba 10....(42816) dba LOG: unexpected EOF on client connection
2010-03-01 17:37:47.380 CST [3692] dba 10....(42816) dba LOG: disconnection: session time: 2:11:15.303 user=dba database=dba host=... port=428

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#4)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

Also seeing lots of postmaster zombies (190 and growing)...

Yeah, that and the lack of service for new connections would both be
expected if the postmaster is stuck. And the autovac worker start
failures, too. There's only one bug here.

While new connections are hanging, top shows postmaster using
100% of cpu.

Oh, for some reason I thought it was sitting idle. That sounds more
like an infinite loop. Try reattaching to the postmaster, confirm the
stack trace, and then see how many times you can do "fin" before it
doesn't return control. That will tell us which level of subroutine is
looping.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#4)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

#4 0x0000000000531ee8 in next_token (fp=0x10377ae0, buf=0x7fff32230e60 "", bufsz=4096) at hba.c:128
#5 0x0000000000532233 in tokenize_file (filename=0x10359b70 "global", file=0x10377ae0, lines=0x7fff322310f8, line_nums=0x7fff322310f0) at hba.c:232
#6 0x00000000005322e9 in tokenize_file (filename=0x2b1c8cbf5800 "global/pg_auth", file=0x103767a0, lines=0x98b168, line_nums=0x98b170) at hba.c:358
#7 0x00000000005327ff in load_role () at hba.c:959

Now that I look more closely at those line numbers, it looks like the
thing thinks it is processing an include file. Are there any @ signs
in your global/pg_auth file?

regards, tom lane

#7Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#6)
hackersgeneral

On Monday 01 March 2010 @ 16:57, Tom Lane wrote:>

Now that I look more closely at those line numbers, it looks
like the thing thinks it is processing an include file. Are
there any @ signs in your global/pg_auth file?

Yes, indeed, there are many. My user names are "user@host" form,
and have been for years. Would that be a problem now?

Ed

#8Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#5)
hackersgeneral

On Monday 01 March 2010 @ 16:49, Tom Lane wrote:

Oh, for some reason I thought it was sitting idle. That
sounds more like an infinite loop. Try reattaching to the
postmaster, confirm the stack trace, and then see how many
times you can do "fin" before it doesn't return control.
That will tell us which level of subroutine is looping.

Yeah, it took me a bit to notice it was at 100% cpu. Here's
gdb output with the 'fin's you suggested:

$ gdb `which postgres`
GNU gdb Fedora (6.8-37.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) attach 21670
Attaching to program: /opt/pgsql/installs/postgresql-8.3.9/bin/postgres, process 21670
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
(gdb) bt
#0 0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
#1 0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x000000346f8689cb in getc () from /lib64/libc.so.6
#4 0x0000000000531ee8 in next_token (fp=0x11da7f30, buf=0x7fff5cb9e300 "", bufsz=4096) at hba.c:128
#5 0x0000000000532233 in tokenize_file (filename=0x11da5940 "global", file=0x11da7f30, lines=0x7fff5cb9e598, line_nums=0x7fff5cb9e590) at hba.c:232
#6 0x00000000005322e9 in tokenize_file (filename=0x11da5920 "global/pg_auth", file=0x11da7cf0, lines=0x98b168, line_nums=0x98b170) at hba.c:358
#7 0x00000000005327ff in load_role () at hba.c:959
#8 0x000000000057f300 in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2145
#9 <signal handler called>
#10 0x000000346f8cb323 in __select_nocancel () from /lib64/libc.so.6
#11 0x000000000057cc33 in ServerLoop () at postmaster.c:1236
#12 0x000000000057dfdf in PostmasterMain (argc=6, argv=0x11d8afb0) at postmaster.c:1031
#13 0x00000000005373de in main (argc=6, argv=<value optimized out>) at main.c:188
(gdb) fin
Run till exit from #0 0x000000346f8c43a0 in __read_nocancel () from /lib64/libc.so.6
0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
(gdb) fin
Run till exit from #0 0x000000346f86c747 in _IO_new_file_underflow () from /lib64/libc.so.6
0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
(gdb) fin
Run till exit from #0 0x000000346f86d10e in _IO_default_uflow_internal () from /lib64/libc.so.6
0x000000346f8689cb in getc () from /lib64/libc.so.6
(gdb) fin
Run till exit from #0 0x000000346f8689cb in getc () from /lib64/libc.so.6
0x0000000000531ee8 in next_token (fp=0x11da7f30, buf=0x7fff5cb9e300 "", bufsz=<value optimized out>) at hba.c:128
128 while ((c = getc(fp)) != EOF && (pg_isblank(c) || c == ','))
(gdb) fin
Run till exit from #0 0x0000000000531ee8 in next_token (fp=0x11da7f30, buf=0x7fff5cb9e300 "", bufsz=<value optimized out>) at hba.c:128
0x0000000000532233 in tokenize_file (filename=0x11da5940 "global", file=0x11da7f30, lines=0x7fff5cb9e598, line_nums=0x7fff5cb9e590) at hba.c:232
232 if (!next_token(file, buf, sizeof(buf)))
Value returned is $1 = 0 '\0'
(gdb) fin
Run till exit from #0 0x0000000000532233 in tokenize_file (filename=0x11da5940 "global", file=0x11da7f30, lines=0x7fff5cb9e598,
line_nums=0x7fff5cb9e590) at hba.c:232

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#7)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 16:57, Tom Lane wrote:>

Now that I look more closely at those line numbers, it looks
like the thing thinks it is processing an include file. Are
there any @ signs in your global/pg_auth file?

Yes, indeed, there are many. My user names are "user@host" form,
and have been for years. Would that be a problem now?

user@host shouldn't be a problem, but if there were an @ by itself or
starting a token, it might possibly cause something like this.

regards, tom lane

#10Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#9)
hackersgeneral

On Monday 01 March 2010 @ 17:15, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 16:57, Tom Lane wrote:>

Now that I look more closely at those line numbers, it
looks like the thing thinks it is processing an include
file. Are there any @ signs in your global/pg_auth file?

Yes, indeed, there are many. My user names are "user@host"
form, and have been for years. Would that be a problem now?

user@host shouldn't be a problem, but if there were an @ by
itself or starting a token, it might possibly cause something
like this.

There is one, looks like a typo got in. How do I fix it?

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#10)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:15, Tom Lane wrote:

user@host shouldn't be a problem, but if there were an @ by
itself or starting a token, it might possibly cause something
like this.

There is one, looks like a typo got in. How do I fix it?

Hmm, a user named @, or what?

regards, tom lane

#12Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#11)
hackersgeneral

On Monday 01 March 2010 @ 17:18, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:15, Tom Lane wrote:

user@host shouldn't be a problem, but if there were an @ by
itself or starting a token, it might possibly cause
something like this.

There is one, looks like a typo got in. How do I fix it?

Hmm, a user named @, or what?

Yes, a bogus user:

"@" "" ""

#13Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#12)
hackersgeneral

On Monday 01 March 2010 @ 17:23, Ed L. wrote:

On Monday 01 March 2010 @ 17:18, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:15, Tom Lane wrote:

user@host shouldn't be a problem, but if there were an @
by itself or starting a token, it might possibly cause
something like this.

There is one, looks like a typo got in. How do I fix it?

Hmm, a user named @, or what?

Yes, a bogus user:

"@" "" ""

Correction. Here's the line:

"@" "" "" "agent_group"

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#12)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:18, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

There is one, looks like a typo got in. How do I fix it?

Hmm, a user named @, or what?

Yes, a bogus user:

"@" "" ""

Mph. We really ought to fix things so that a quoted @ doesn't get taken
as an include file reference. Odd that it's never come up before.

Anyway, if you still have any open superuser sessions, the best thing
would be an ALTER USER RENAME. If you don't, you'll have to resort to
manually editing the pg_auth file, and then rename the user as soon as
you can get in.

regards, tom lane

#15Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#13)
hackersgeneral

On Monday 01 March 2010 @ 17:25, Ed L. wrote:

On Monday 01 March 2010 @ 17:23, Ed L. wrote:

On Monday 01 March 2010 @ 17:18, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:15, Tom Lane wrote:

user@host shouldn't be a problem, but if there were an
@ by itself or starting a token, it might possibly
cause something like this.

There is one, looks like a typo got in. How do I fix
it?

Hmm, a user named @, or what?

Yes, a bogus user:

"@" "" ""

Correction. Here's the line:

"@" "" "" "agent_group"

It is the first line in the pg_auth file.

#16Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#14)
hackersgeneral

On Monday 01 March 2010 @ 17:26, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:18, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

There is one, looks like a typo got in. How do I fix it?

Hmm, a user named @, or what?

Yes, a bogus user:

"@" "" ""

Mph. We really ought to fix things so that a quoted @ doesn't
get taken as an include file reference. Odd that it's never
come up before.

Anyway, if you still have any open superuser sessions, the
best thing would be an ALTER USER RENAME. If you don't,
you'll have to resort to manually editing the pg_auth file,
and then rename the user as soon as you can get in.

Killed the stuck postmaster with sigkill, edited the file,
restarted postmaster, and it re-wrote the file with the bogus
entry. I don't have any superuser sessions open. Is there
another route?

Ed

#17Ed L.
pgsql@bluepolka.net
In reply to: Ed L. (#16)
hackersgeneral

On Monday 01 March 2010 @ 17:36, Ed L. wrote:

On Monday 01 March 2010 @ 17:26, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

On Monday 01 March 2010 @ 17:18, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

There is one, looks like a typo got in. How do I fix
it?

Hmm, a user named @, or what?

Yes, a bogus user:

"@" "" ""

Mph. We really ought to fix things so that a quoted @
doesn't get taken as an include file reference. Odd that
it's never come up before.

Anyway, if you still have any open superuser sessions, the
best thing would be an ALTER USER RENAME. If you don't,
you'll have to resort to manually editing the pg_auth file,
and then rename the user as soon as you can get in.

Killed the stuck postmaster with sigkill, edited the file,
restarted postmaster, and it re-wrote the file with the bogus
entry. I don't have any superuser sessions open. Is there
another route?

I have source code and can patch and rebuild/reinstall if I had
the right patch.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#15)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

Correction. Here's the line:
"@" "" "" "agent_group"

It is the first line in the pg_auth file.

BTW, there seems to be some other contributing factor here besides
the weird username, because I don't see any looping when I try
CREATE USER "@". What's your platform exactly, and what type of
filesystem is $PGDATA on?

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#16)
hackersgeneral

"Ed L." <pgsql@bluepolka.net> writes:

Killed the stuck postmaster with sigkill, edited the file,
restarted postmaster, and it re-wrote the file with the bogus
entry. I don't have any superuser sessions open. Is there
another route?

What you're going to need to do is stop the postmaster, start
a standalone backend (see the "postgres" man page entry if you
never did that before) and issue the ALTER USER in the standalone
backend. Then you can restart normal operations.

regards, tom lane

#20Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#19)
hackersgeneral
[SOLVED] Re: Hung postmaster (8.3.9)

On Monday 01 March 2010 @ 17:58, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

Killed the stuck postmaster with sigkill, edited the file,
restarted postmaster, and it re-wrote the file with the
bogus entry. I don't have any superuser sessions open. Is
there another route?

What you're going to need to do is stop the postmaster, start
a standalone backend (see the "postgres" man page entry if you
never did that before) and issue the ALTER USER in the
standalone backend. Then you can restart normal operations.

That did the trick. Thank you very much, Sensei.

Ed

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#20)
hackersgeneral
#22Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#18)
hackersgeneral
#23Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#21)
hackersgeneral
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#22)
hackersgeneral
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
hackersgeneral