SQL command to dump the contents of table failed: PQendcopy() failed. Error message from server: socket not open

Started by rubenover 20 years ago11 messagesgeneral
Jump to latest
#1ruben
ruben20@superguai.com

Hi:

I run a cron job every day to dump all the tables in my 7.4 Postgres
database. For one of the tables (sample) the command returns this error:

-bash-2.05b$ /usr/local/pgsql/bin/pg_dump -Fc -t sample heos -f
/home/bu/5/sample.dump
pg_dump: socket not open
pg_dump: SQL command to dump the contents of table "sample" failed:
PQendcopy() failed.
pg_dump: Error message from server: socket not open
pg_dump: The command was: COPY public.sample (field1, field2, field3,
field4, field5) TO stdout;

This is what the log shows:

-bash-2.05b$ more /usr/local/pgsql/logfile
LOG: database system was shut down at 2005-09-23 13:43:16 CEST
LOG: checkpoint record is at 122/E90B52AC
LOG: redo record is at 122/E90B52AC; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 2227508; next OID: 30689327
LOG: database system is ready
LOG: server process (PID 27688) was terminated by signal 11
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at 2005-09-23 13:54:37 CEST
LOG: checkpoint record is at 122/E90C3D38
LOG: redo record is at 122/E90C3D38; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 2227617; next OID: 30697519
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 122/E90C3D78
LOG: redo is not required
LOG: database system is ready

This happens every day consistently. The table is quite big, about 1
million tuples and it is vacuumed.

Any help would be appreciated.

Cheers, Ruben.

#2Michael Fuhr
mike@fuhr.org
In reply to: ruben (#1)
Re: SQL command to dump the contents of table failed: PQendcopy() failed. Error message from server: socket not open

On Fri, Sep 23, 2005 at 01:34:18PM +0200, ruben wrote:

LOG: server process (PID 27688) was terminated by signal 11

This suggests a bug in the backend. There should be a core dump
somewhere under $PGDATA (unless resource limits prevent it or your
system is configured to put core dumps elsewhere) -- can you use a
debugger to get a stack trace from it? What exact version of
PostgreSQL are you running and on what operating system? Do you
have any non-standard extensions to PostgreSQL (custom types,
third-party modules, etc.)?

--
Michael Fuhr

#3ruben
ruben20@superguai.com
In reply to: Michael Fuhr (#2)
Re: SQL command to dump the contents of table failed: PQendcopy()

Hi Michael:

The operating system is Red Hat Linux release 8.0 (Psyche) and
PostgreSQL version is 7.4.6., without non-standard extensions.

I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
don't know how to debug it to get a stack trace, I'll find out).

Thanks, Ruben.

Michael Fuhr wrote:

Show quoted text

On Fri, Sep 23, 2005 at 01:34:18PM +0200, ruben wrote:

LOG: server process (PID 27688) was terminated by signal 11

This suggests a bug in the backend. There should be a core dump
somewhere under $PGDATA (unless resource limits prevent it or your
system is configured to put core dumps elsewhere) -- can you use a
debugger to get a stack trace from it? What exact version of
PostgreSQL are you running and on what operating system? Do you
have any non-standard extensions to PostgreSQL (custom types,
third-party modules, etc.)?

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: ruben (#3)
Re: SQL command to dump the contents of table failed: PQendcopy()

ruben <ruben20@superguai.com> writes:

I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
don't know how to debug it to get a stack trace, I'll find out).

It would normally be in the per-database subdirectory
($PGDATA/base/NNN/) for the database where the problem occurs. However,
if you don't see a core file there either, that probably means the
postmaster was started under "ulimit -c 0" to prevent core dumps. Add
"ulimit -c unlimited" to the postmaster start script and restart it.

(For the sake of the archives, I'll mention that as of PG 8.1 core dumps
will appear directly in $PGDATA, not in its subdirectories.)

regards, tom lane

#5Michael Fuhr
mike@fuhr.org
In reply to: ruben (#3)
Re: SQL command to dump the contents of table failed: PQendcopy() failed. Error message from server: socket not open

On Fri, Sep 23, 2005 at 05:11:19PM +0200, ruben wrote:

The operating system is Red Hat Linux release 8.0 (Psyche) and
PostgreSQL version is 7.4.6., without non-standard extensions.

I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
don't know how to debug it to get a stack trace, I'll find out).

Did you look everywhere under $PGDATA or just in that directory?
As I recall, released versions of PostgreSQL usually dump core under
$PGDATA/base/<database oid>. However, it's also possible that your
coredumpsize resource limit prevents core dumps; you could fix that
by putting a command like "ulimit -c unlimited" in your PostgreSQL
startup script and then stopping and restarting PostgreSQL.

Once you have a core dump, you can get a stack trace with gdb:

$ gdb /path/to/postgres /path/to/core
...
(gdb) bt

If your postgres binary was built with debugging symbols then the
stack trace should show function names, file names, and line numbers.

Can you duplicate the backend crash from psql if you issue the COPY
command that pg_dump complained about? What about if you issue a
SELECT for all records in the table? What does "\d tablename" show
for the table in question?

--
Michael Fuhr

#6ruben
ruben20@superguai.com
In reply to: Michael Fuhr (#5)
Re: SQL command to dump the contents of table failed: PQendcopy()

Thanks Tom and Michael:

Michael Fuhr wrote:

On Fri, Sep 23, 2005 at 05:11:19PM +0200, ruben wrote:

The operating system is Red Hat Linux release 8.0 (Psyche) and
PostgreSQL version is 7.4.6., without non-standard extensions.

I cannot find any core dump in the PGDATA directory /usr/local/pgsql (I
don't know how to debug it to get a stack trace, I'll find out).

Did you look everywhere under $PGDATA or just in that directory?
As I recall, released versions of PostgreSQL usually dump core under
$PGDATA/base/<database oid>. However, it's also possible that your
coredumpsize resource limit prevents core dumps; you could fix that
by putting a command like "ulimit -c unlimited" in your PostgreSQL
startup script and then stopping and restarting PostgreSQL.

Once you have a core dump, you can get a stack trace with gdb:

$ gdb /path/to/postgres /path/to/core
...
(gdb) bt

If your postgres binary was built with debugging symbols then the
stack trace should show function names, file names, and line numbers.

I cannot find a core file, ulimit is set to unlimit. I guess I'm doing
something wrong:

-bash-2.05b$ ulimit
unlimited
-bash-2.05b$ find /usr/local/pgsql -name '*core*' -print
-bash-2.05b$

Can you duplicate the backend crash from psql if you issue the COPY
command that pg_dump complained about?

-bash-2.05b$ /usr/local/pgsql/bin/pg_dump -Fc -t llamadas heos -f
/home/buheos/5/llamadas3.dump
pg_dump: socket not open
pg_dump: SQL command to dump the contents of table "llamadas" failed:
PQendcopy() failed.
pg_dump: Error message from server: socket not open
pg_dump: The command was: COPY public.llamadas (cod_empresa,
fecha_llamada, tfno_origen, tfno_destino, duracion_llamada,
hora_llamada, cod_destino_llamada, cod_pais_destino,
cod_destino_internacional, franja_horaria, importe, cod_fuente,
precio_coste_llamada, observaciones_llamada, coment_llamada,
fecha_factura, num_factura, fecha_alta, fecha_ult_mod, fecha_sis_alta,
usuario_alta, i_a_alta, fecha_sis_ult_mod, usuario_ult_mod, i_a_ult_mod,
periodicidad_facturacion, cod_operador, franja_horaria_operador,
fichero_origen, cod_destino_internacional_operador) TO stdout;

What about if you issue a SELECT for all records in the table?

heos=# select * from llamadas;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING:
terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
Failed.

In the logfile:

LOG: server process (PID 7069) was terminated by signal 11
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at 2005-09-26 16:49:43 CEST
LOG: checkpoint record is at 125/858E0144
LOG: redo record is at 125/858E0144; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 2270061; next OID: 30820346
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 125/858E0184
LOG: redo is not required
LOG: database system is ready
LOG: unexpected EOF on client connection

What does "\d tablename" show for the table in question?

heos=# \d llamadas;
Table "public.llamadas"
Column | Type |
Modifiers
------------------------------------+--------------------------+----------------------------------------------------
cod_empresa | smallint | not null
fecha_llamada | date |
tfno_origen | character(15) |
tfno_destino | character(15) |
duracion_llamada | integer |
hora_llamada | time without time zone |
default ('now'::text)::time(6) with time zone
cod_destino_llamada | character(1) |
cod_pais_destino | integer |
cod_destino_internacional | character(15) |
franja_horaria | character(1) |
importe | real |
cod_fuente | integer |
precio_coste_llamada | real |
observaciones_llamada | character varying(100) |
coment_llamada | character varying(100) |
fecha_factura | date |
num_factura | integer |
fecha_alta | date |
fecha_ult_mod | date |
fecha_sis_alta | timestamp with time zone |
default ('now'::text)::timestamp(6) with time zone
usuario_alta | character(10) |
i_a_alta | character(15) |
fecha_sis_ult_mod | timestamp with time zone |
usuario_ult_mod | character(10) |
i_a_ult_mod | character(15) |
periodicidad_facturacion | character(1) |
cod_operador | character(2) |
franja_horaria_operador | character(1) |
fichero_origen | character varying(100) |
cod_destino_internacional_operador | character(15) |
Indexes:
"llamadas_i01" btree (cod_empresa, fecha_llamada, tfno_origen)

#7Peter Wiersig
peter@friesenpeter.de
In reply to: ruben (#6)
Re: SQL command to dump the contents of table failed: PQendcopy()

On Mon, Sep 26, 2005 at 07:03:06PM +0200, ruben wrote:

I guess I'm doing something wrong:

-bash-2.05b$ ulimit
unlimited

Please read manpages, in this case bash: ulimit -a

Peter

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: ruben (#6)
Re: SQL command to dump the contents of table failed: PQendcopy()

ruben <ruben20@superguai.com> writes:

As I recall, released versions of PostgreSQL usually dump core under
$PGDATA/base/<database oid>. However, it's also possible that your
coredumpsize resource limit prevents core dumps; you could fix that
by putting a command like "ulimit -c unlimited" in your PostgreSQL
startup script and then stopping and restarting PostgreSQL.

I cannot find a core file, ulimit is set to unlimit. I guess I'm doing
something wrong:

-bash-2.05b$ ulimit
unlimited
-bash-2.05b$ find /usr/local/pgsql -name '*core*' -print

(1) The fact that it's unlimited in your user environment doesn't prove
that it's unlimited in the environment the postmaster is started in.

(2) I forget which constraint ulimit-with-no-argument prints, but it's
not core file size. (Try "ulimit -a")

Please actually follow the advice given to you above.

regards, tom lane

#9ruben
ruben20@superguai.com
In reply to: Tom Lane (#8)
Re: SQL command to dump the contents of table failed: PQendcopy()

Hi Tom:

No way to get the core dump, this is what I did:

1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start
in the "start" section of the script:

...
ulimit -c unlimited
su -l postgres -s /bin/sh -c "/usr/local/pgsql/bin/pg_ctl -D $PGDATA
-p /usr/local/pgsql/bin/postmaster -l $DIRLOG/logfile start > /dev/null
2>&1" < /dev/null
...

I tried "man ulimit" but it didn't work, "man bash" returns the command
help.

2) Restart postmaster

/etc/rc.d/init.d/postgresql restart

3) Run the query that produces the crash.

4) Search core file:

-bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print
-bash-2.05b$

Nothing comes up.

Thanks for your help.
Ruben.

Tom Lane wrote:

Show quoted text

ruben <ruben20@superguai.com> writes:

As I recall, released versions of PostgreSQL usually dump core under
$PGDATA/base/<database oid>. However, it's also possible that your
coredumpsize resource limit prevents core dumps; you could fix that
by putting a command like "ulimit -c unlimited" in your PostgreSQL
startup script and then stopping and restarting PostgreSQL.

I cannot find a core file, ulimit is set to unlimit. I guess I'm doing
something wrong:

-bash-2.05b$ ulimit
unlimited
-bash-2.05b$ find /usr/local/pgsql -name '*core*' -print

(1) The fact that it's unlimited in your user environment doesn't prove
that it's unlimited in the environment the postmaster is started in.

(2) I forget which constraint ulimit-with-no-argument prints, but it's
not core file size. (Try "ulimit -a")

Please actually follow the advice given to you above.

regards, tom lane

#10ruben
ruben20@superguai.com
In reply to: ruben (#1)
Re: SQL command to dump the contents of table failed: PQendcopy()

Thanks Martijn:

Martijn van Oosterhout wrote:

On Wed, Sep 28, 2005 at 11:40:11AM +0200, ruben wrote:

Hi Tom:

No way to get the core dump, this is what I did:

1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start
in the "start" section of the script:

<snip>

2) Restart postmaster

/etc/rc.d/init.d/postgresql restart

Does that run the start section? Maybe you need to stop/start. Maybe
the bash_profile/bashrc for the postgres user resets the core limit.

The server has been rebooted.

This is the postgres start script ($PGDATA is /usr/local/pgsql/):
http://80.33.3.245/temp/postgres.txt

Regading the core limit:

-bash-2.05b$ whoami
postgres

-bash-2.05b$ cat ~/.bash_profile
PGDATA=/var/lib/pgsql/data
[ -f $PGDATA/../initdb.i18n ] && source $PGDATA/../initdb.i18n
export PGDATA

-bash-2.05b$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 7168
virtual memory (kbytes, -v) unlimited

4) Search core file:

-bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print
-bash-2.05b$

It'll be under $PGDATA, where is that? It may be under
/var/lib/postgres, depending on how you installed...

Indeed I searched the whole file structure for core files.

Regards.

#11Martijn van Oosterhout
kleptog@svana.org
In reply to: ruben (#9)
Re: SQL command to dump the contents of table failed: PQendcopy()

On Wed, Sep 28, 2005 at 11:40:11AM +0200, ruben wrote:

Hi Tom:

No way to get the core dump, this is what I did:

1) Add "ulimit" to /etc/rc.d/init.d/postgresql just before pg_ctl start
in the "start" section of the script:

<snip>

2) Restart postmaster

/etc/rc.d/init.d/postgresql restart

Does that run the start section? Maybe you need to stop/start. Maybe
the bash_profile/bashrc for the postgres user resets the core limit.

4) Search core file:

-bash-2.05b$ find /usr/local/pgsql/ -name '*core*' -print
-bash-2.05b$

It'll be under $PGDATA, where is that? It may be under
/var/lib/postgres, depending on how you installed...

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.