help getting a backtrace from 9.2 on Ubuntu 13.04?
I have a production server running PG 8.4 on RHEL6. I have a development server running PG 9.2 on Ubuntu 13.04. Periodically, I like to take a dump of the production server and load it on the development machine.
But I'm having troubles with the 9.2 server crashing when I'm restoring the dump. I'm using the 9.2 version of pg_dump. I've tried restoring a custom-format dump with pg_restore, and I've tried restoring a text-format dump with pqsl, and both of them are crashing on me.
The data is too sensitive for me to submit a database dump to the community, but I'd like to submit a stack trace, in the hopes that someone might be able to figure out what's going on. But I'm having some trouble getting this done.
I'm looking at the instructions at http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
Which points me to https://wiki.edubuntu.org/DebuggingProgramCrash
Which tells me to install the “dbg” version of my package, so I have installed postgresql-9.2-dbg
The PG wiki page tells me to download and run list-dbgsym-packages-v2.sh, but the Edubuntu page says that that script has been supplanted by list-symbols-packages-v2.sh. Okay, so I run
chris@mu:~/Downloads$ sudo bash list-symbols-packages-v2.sh -p $(pidof -s postgres)
and get
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffcbae5000
any idea where I go from here?
Disclaimer
THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential, intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/9/2013 7:00 AM, Chris Curvey wrote:
any idea where I go from here?
don't develop on a newer version of the database than you are deploying on.
*maybe* you can use the pgdump from 8.4 to connect to and dump the 9.2
database, but the 9.2 dump is NOT guaranteed to generate 8.4 compatible
SQL, in fact its highly likely it won't, as you can already attest to.
--
john r pierce 37N 122W
somewhere on the middle of the left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 9/9/2013 4:53 PM, John R Pierce wrote:
On 9/9/2013 7:00 AM, Chris Curvey wrote:
any idea where I go from here?
don't develop on a newer version of the database than you are
deploying on.*maybe* you can use the pgdump from 8.4 to connect to and dump the 9.2
database, but the 9.2 dump is NOT guaranteed to generate 8.4
compatible SQL, in fact its highly likely it won't, as you can already
attest to.
oops, I re-read yur request, you're going the other way.
use the pg_dump from 9.2 to dump the 8.4 database, via either a direct
network connection, or an ssh tunnel, or whatever works best. this is
guaranteed to work
--
john r pierce 37N 122W
somewhere on the middle of the left coast
On Mon, Sep 9, 2013 at 4:00 PM, Chris Curvey <ccurvey@zuckergoldberg.com>wrote:
But I'm having troubles with the 9.2 server crashing when I'm restoring
the dump. I'm using the 9.2 version of pg_dump. I've tried restoring a
custom-format dump with pg_restore, and I've tried restoring a text-format
dump with pqsl, and both of them are crashing on me.The data is too sensitive for me to submit a database dump to the
community, but I'd like to submit a stack trace, in the hopes that someone
might be able to figure out what's going on. But I'm having some trouble
getting this done.
Is it crashing on a specific database object? pg_restore -v will tell you
how far it went. Then try to restore only that object. Is it perhaps
crashing on a specific row?
Try producing a self contained test case (like only the culprit table,
anonymized).
Regards
Marcin Mańk
From: Marcin Mańk [mailto:marcin.mank@gmail.com]
Sent: Monday, September 09, 2013 8:30 PM
To: Chris Curvey
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] help getting a backtrace from 9.2 on Ubuntu 13.04?
On Mon, Sep 9, 2013 at 4:00 PM, Chris Curvey <ccurvey@zuckergoldberg.com<mailto:ccurvey@zuckergoldberg.com>> wrote:
But I'm having troubles with the 9.2 server crashing when I'm restoring the dump. I'm using the 9.2 version of pg_dump. I've tried restoring a custom-format dump with pg_restore, and I've tried restoring a text-format dump with pqsl, and both of them are crashing on me.
The data is too sensitive for me to submit a database dump to the community, but I'd like to submit a stack trace, in the hopes that someone might be able to figure out what's going on. But I'm having some trouble getting this done.
Is it crashing on a specific database object? pg_restore -v will tell you how far it went. Then try to restore only that object. Is it perhaps crashing on a specific row?
Try producing a self contained test case (like only the culprit table, anonymized).
Regards
Marcin Mańk
Good advice. I turned on –verbose, and got a ton of output, ending with:
pg_restore: setting owner and privileges for FK CONSTRAINT user_id_refs_id_7ceef80f
pg_restore: setting owner and privileges for FK CONSTRAINT user_id_refs_id_dfbab7d
pg_restore: [archiver (db)] could not execute query: no connection to the server
Command was: -- Completed on 2013-09-09 11:35:16 EDT
pg_restore: [archiver (db)] could not execute query: no connection to the server
Command was: --
-- PostgreSQL database dump complete
–
Which I find really odd, because I specified –no-owner –no-privileges –no-tablespace
chris@mu:/sdb$ pg_restore --dbname=certified_mail_ccc2 --format=c --verbose --clean --no-owner --no-privileges --no-tablespaces -h mu -p 5434 cm_Mon.backup
So now I’m up to three questions. (Why the crash? How to get backtrace? Why are we applying permissions when I said not to?) I guess that’s the nature of the universe. Let me see if I can figure out which table that is and try to create a test case.
Disclaimer
THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential, intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
On 09/10/2013 06:57 AM, Chris Curvey wrote:
*From:*Marcin Mańk [mailto:marcin.mank@gmail.com]
*Sent:* Monday, September 09, 2013 8:30 PM
*To:* Chris Curvey
*Cc:* pgsql-general@postgresql.org
*Subject:* Re: [GENERAL] help getting a backtrace from 9.2 on Ubuntu 13.04?On Mon, Sep 9, 2013 at 4:00 PM, Chris Curvey <ccurvey@zuckergoldberg.com
<mailto:ccurvey@zuckergoldberg.com>> wrote:But I'm having troubles with the 9.2 server crashing when I'm
restoring the dump. I'm using the 9.2 version of pg_dump. I've
tried restoring a custom-format dump with pg_restore, and I've tried
restoring a text-format dump with pqsl, and both of them are
crashing on me.The data is too sensitive for me to submit a database dump to the
community, but I'd like to submit a stack trace, in the hopes that
someone might be able to figure out what's going on. But I'm having
some trouble getting this done.Is it crashing on a specific database object? pg_restore -v will tell
you how far it went. Then try to restore only that object. Is it perhaps
crashing on a specific row?Try producing a self contained test case (like only the culprit table,
anonymized).Regards
Marcin Mańk
Good advice. I turned on –verbose, and got a ton of output, ending with:
pg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_7ceef80fpg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_dfbab7dpg_restore: [archiver (db)] could not execute query: no connection to
the serverCommand was: -- Completed on 2013-09-09 11:35:16 EDT
pg_restore: [archiver (db)] could not execute query: no connection to
the server
At this point I would be more worried about the above, 'no connection to
server'.
Command was: --
-- PostgreSQL database dump complete
–
Which I find really odd, because I specified –no-owner –no-privileges
–no-tablespace
--no-owner does not mean that ownership is not set, just that the
ownership from the source database is not carried over.
http://www.postgresql.org/docs/9.2/interactive/app-pgrestore.html
-O
--no-owner
Do not output commands to set ownership of objects to match the original
database. By default, pg_restore issues ALTER OWNER or SET SESSION
AUTHORIZATION statements to set ownership of created schema elements.
These statements will fail unless the initial connection to the database
is made by a superuser (or the same user that owns all of the objects in
the script). With -O, any user name can be used for the initial
connection, and this user will own all the created objects.
chris@mu:/sdb$ pg_restore --dbname=certified_mail_ccc2 --format=c
--verbose --clean --no-owner --no-privileges --no-tablespaces -h mu -p
5434 cm_Mon.backupSo now I’m up to three questions. (Why the crash? How to get
backtrace? Why are we applying permissions when I said not to?) I
guess that’s the nature of the universe. Let me see if I can figure out
which table that is and try to create a test case.
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Good advice. I turned on –verbose, and got a ton of output, ending with:
pg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_7ceef80fpg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_dfbab7dpg_restore: [archiver (db)] could not execute query: no connection to
the serverCommand was: -- Completed on 2013-09-09 11:35:16 EDT
pg_restore: [archiver (db)] could not execute query: no connection to
the server
This was a red herring. That last constraint that was listed is the last thing in the dump/restore (at least according to --list). Perhaps at the end of pg_restore, it's trying to send the comment to the database ("completed on...") and that's confusing things? It does appear that the entire database has been restored, but the cluster restart is disconcerting.
At this point I would be more worried about the above, 'no connection to
server'.
Yep, that would be the point where the cluster is restarting.
--no-owner does not mean that ownership is not set, just that the ownership
from the source database is not carried over.http://www.postgresql.org/docs/9.2/interactive/app-pgrestore.html
-O
--no-owner
Do not output commands to set ownership of objects to match the original
database. By default, pg_restore issues ALTER OWNER or SET SESSION
AUTHORIZATION statements to set ownership of created schema elements.
These statements will fail unless the initial connection to the database is
made by a superuser (or the same user that owns all of the objects in the
script). With -O, any user name can be used for the initial connection, and
this user will own all the created objects.
Does this mean that the --no-owner is a command flag for pg_dump, but is ignored by pg_restore? Let me run a test and see...
Disclaimer
THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential, intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Sep 10, 2013 at 6:57 AM, Chris Curvey <ccurvey@zuckergoldberg.com>
wrote:
From: Marcin Mańk [mailto:marcin.mank@gmail.com]
Is it crashing on a specific database object? pg_restore -v will tell
you
how far it went. Then try to restore only that object. Is it perhaps
crashing on a specific row?Good advice. I turned on –verbose, and got a ton of output, ending with:
pg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_7ceef80fpg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_dfbab7dpg_restore: [archiver (db)] could not execute query: no connection to the
serverCommand was: -- Completed on 2013-09-09 11:35:16 EDT
What does the server log say? It should tell you why the server is
restarting.
Cheers,
Jeff
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: Tuesday, September 10, 2013 1:26 PM
To: Chris Curvey
Cc: Marcin Mańk; pgsql-general@postgresql.org
Subject: Re: [GENERAL] help getting a backtrace from 9.2 on Ubuntu 13.04?
On Tue, Sep 10, 2013 at 6:57 AM, Chris Curvey <ccurvey@zuckergoldberg.com<mailto:ccurvey@zuckergoldberg.com>> wrote:
From: Marcin Mańk [mailto:marcin.mank@gmail.com<mailto:marcin.mank@gmail.com>]
Is it crashing on a specific database object? pg_restore -v will tell you
how far it went. Then try to restore only that object. Is it perhaps
crashing on a specific row?Good advice. I turned on –verbose, and got a ton of output, ending with:
pg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_7ceef80fpg_restore: setting owner and privileges for FK CONSTRAINT
user_id_refs_id_dfbab7dpg_restore: [archiver (db)] could not execute query: no connection to the
serverCommand was: -- Completed on 2013-09-09 11:35:16 EDT
What does the server log say? It should tell you why the server is restarting.
Cheers,
Jeff
Great thought. Looking through the logs, it appears that all my failures are on a CREATE INDEX. Usually on my biggest table, but often on another table.
2013-09-10 10:09:46 EDT ERROR: canceling autovacuum task
2013-09-10 10:09:46 EDT CONTEXT: automatic analyze of table "certified_mail_ccc2.public.cm_status_history"
2013-09-10 10:15:13 EDT LOG: server process (PID 14386) was terminated by signal 11: Segmentation fault
2013-09-10 10:15:13 EDT DETAIL: Failed process was running: CREATE INDEX cm_envelope_tracking_number ON cm_envelope USING btree (tracking_number);
2013-09-10 10:15:13 EDT LOG: terminating any other active server processes
2013-09-10 10:15:13 EDT WARNING: terminating connection because of crash of another server process
2013-09-10 10:15:13 EDT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
I cannot square this with the fact that when I echo the commands, the last echoed command is about setting privileges.
Another development (possibly unrelated): I tried *dumping* with –no-privileges –no-tablespace –no-owner, and the restore went fine.
Disclaimer
THIS IS A CONFIDENTIAL COMMUNICATION. The information contained in this e-mail is attorney-client privileged and confidential, intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are here by notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail or by telephone (908) 233-8500.Thank you..
On 09/10/2013 10:37 AM, Chris Curvey wrote:
Another development (possibly unrelated): I tried **dumping** with
–no-privileges –no-tablespace –no-owner, and the restore went fine.
Probably has to do with whether you are dumping plain text or custom format:
http://www.postgresql.org/docs/9.2/interactive/app-pgdump.html
-O
--no-owner
Do not output commands to set ownership of objects to match the original
database. By default, pg_dump issues ALTER OWNER or SET SESSION
AUTHORIZATION statements to set ownership of created database objects.
These statements will fail when the script is run unless it is started
by a superuser (or the same user that owns all of the objects in the
script). To make a script that can be restored by any user, but will
give that user ownership of all the objects, specify -O.
This option is only meaningful for the plain-text format. For the
archive formats, you can specify the option when you call pg_restore.
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Sep 10, 2013 at 10:37 AM, Chris Curvey
<ccurvey@zuckergoldberg.com>wrote:
Great thought. Looking through the logs, it appears that all my failures
are on a CREATE INDEX. Usually on my biggest table, but often on another
table.2013-09-10 10:09:46 EDT ERROR: canceling autovacuum task
2013-09-10 10:09:46 EDT CONTEXT: automatic analyze of table
"certified_mail_ccc2.public.cm_status_history"2013-09-10 10:15:13 EDT LOG: server process (PID 14386) was terminated by
signal 11: Segmentation fault2013-09-10 10:15:13 EDT DETAIL: Failed process was running: CREATE INDEX
cm_envelope_tracking_number ON cm_envelope USING btree (tracking_number);2013-09-10 10:15:13 EDT LOG: terminating any other active server processes
2013-09-10 10:15:13 EDT WARNING: terminating connection because of crash
of another server process2013-09-10 10:15:13 EDT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.I cannot square this with the fact that when I echo the commands, the last
echoed command is about setting privileges.
A backend is crashing, and taking down the entire PostgreSQL system. The
commands you see being echoed are from a different process from the one
that triggered the crash, so it is just an innocent bystander which has no
useful information. Are you using parallel restore? (If not, why is there
someone indexing your biggest table during the restore?)
You will want to get the backtrace of the coredump generated by the crashed
backend, not of the running process. Have you tried taking a bt with gdb?
You said you couldn't find the symbols, but have you tried it anyway? On
CentOS and openSuse I often get warnings about some symbols not being
found, but all the symbols I actually need to interpret the backtrace end
up being there.
Cheers,
Jeff
On Sun, Sep 15, 2013 at 7:49 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Sep 10, 2013 at 10:37 AM, Chris Curvey <ccurvey@zuckergoldberg.com
wrote:
Great thought. Looking through the logs, it appears that all my failures
are on a CREATE INDEX. Usually on my biggest table, but often on another
table.2013-09-10 10:09:46 EDT ERROR: canceling autovacuum task
2013-09-10 10:09:46 EDT CONTEXT: automatic analyze of table
"certified_mail_ccc2.public.cm_status_history"2013-09-10 10:15:13 EDT LOG: server process (PID 14386) was terminated
by signal 11: Segmentation fault2013-09-10 10:15:13 EDT DETAIL: Failed process was running: CREATE INDEX
cm_envelope_tracking_number ON cm_envelope USING btree (tracking_number);2013-09-10 10:15:13 EDT LOG: terminating any other active server
processes2013-09-10 10:15:13 EDT WARNING: terminating connection because of crash
of another server process2013-09-10 10:15:13 EDT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.I cannot square this with the fact that when I echo the commands, the
last echoed command is about setting privileges.A backend is crashing, and taking down the entire PostgreSQL system. The
commands you see being echoed are from a different process from the one
that triggered the crash, so it is just an innocent bystander which has no
useful information. Are you using parallel restore? (If not, why is there
someone indexing your biggest table during the restore?)
I'm the only person doing anything, and the only thing going on is the
restore. And I'm not using parallel restore (I thought that might be part
of the issue.)
You will want to get the backtrace of the coredump generated by the
crashed backend, not of the running process. Have you tried taking a bt
with gdb? You said you couldn't find the symbols, but have you tried it
anyway? On CentOS and openSuse I often get warnings about some symbols not
being found, but all the symbols I actually need to interpret the backtrace
end up being there.
I can try, but the instructions said that installing the -dev package by
itself was not sufficient, so I stopped at that point. But as they say in
the lottery ads, "Hey, you never know!"
Cheers,
Jeff
Many thanks!