Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

Started by Dmitry Koterovalmost 13 years ago5 messages
#1Dmitry Koterov
dmitry@koterov.ru

Hello.

I have a database dump file (unfortunately with proprietary information)
which leads to the following error in logs during its restoration (even
after initdb - it is stable reproducible, at the same large table, the same
time):

*LOG: server process (PID 18705) was terminated by signal 7: Bus error*
DETAIL: Failed process was running: COPY *br_agent_log* (id, agent_id,
stamp, trace, message) FROM stdin;
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
...
and then, after recovery:
...
redo done at 0/12DDB7A8
...
LOG: database system is ready to accept connections
ERROR: could not read block 1 in file "base/57390/11783": read only 4448
of 8192 bytes at character 39

I think it could look like a memory corruption in PG? BTW 9.1.8 does not
have such problem - the restoration is OK.

Possibly I could help with this crash investigation? How to do it better?
Maybe you have a tutorial article about it which shows the preferable error
reporting format?

#2Kevin Grittner
kgrittn@ymail.com
In reply to: Dmitry Koterov (#1)
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

Dmitry Koterov <dmitry@koterov.ru> wrote:

LOG:  server process (PID 18705) was terminated by signal 7: Bus error

So far I have only heard of this sort of error when PostgreSQL is
running in a virtual machine and the VM software is buggy.  If you
are not running in a VM, my next two suspects would be
hardware/BIOS configuration issues, or an antivirus product.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Kevin Grittner (#2)
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

On Tue, Mar 5, 2013 at 3:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Dmitry Koterov <dmitry@koterov.ru> wrote:

LOG: server process (PID 18705) was terminated by signal 7: Bus error

So far I have only heard of this sort of error when PostgreSQL is
running in a virtual machine and the VM software is buggy. If you
are not running in a VM, my next two suspects would be
hardware/BIOS configuration issues, or an antivirus product.

for posterity, what's the hardware platform? software bus errors are
more likely on non x86 hardware.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Dmitry Koterov
dmitry@koterov.ru
In reply to: Merlin Moncure (#3)
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

x86_64, PostgreSQL 9.2. is run within an OpenVZ container and generates
SIGBUS.
PostgreSQL 9.1 has no such problem.

(OpenVZ is a linux kernel-level virtualization which adds namespaces for
processes, networking, quotas etc. It works not like e.g. Xen or VMWare,
because all containers share the same kernel.)

On Wed, Mar 6, 2013 at 7:51 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

Show quoted text

On Tue, Mar 5, 2013 at 3:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Dmitry Koterov <dmitry@koterov.ru> wrote:

LOG: server process (PID 18705) was terminated by signal 7: Bus error

So far I have only heard of this sort of error when PostgreSQL is
running in a virtual machine and the VM software is buggy. If you
are not running in a VM, my next two suspects would be
hardware/BIOS configuration issues, or an antivirus product.

for posterity, what's the hardware platform? software bus errors are
more likely on non x86 hardware.

merlin

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Dmitry Koterov (#4)
Re: Reproducible "Bus error" in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)

On 03/11/2013 09:20 PM, Dmitry Koterov wrote:

x86_64, PostgreSQL 9.2. is run within an OpenVZ container and
generates SIGBUS.
PostgreSQL 9.1 has no such problem.

(OpenVZ is a linux kernel-level virtualization which adds namespaces
for processes, networking, quotas etc. It works not like e.g. Xen or
VMWare, because all containers share the same kernel.)

Related to SHM vs mmapped files? Seems unlikely, but I guess it could
affect low-enough level work like kernel TLB usage.

At what point in Pg's execution does the SIGBUS occur? Is it always at
the same place or few places in the code? It would be helpful if you
could enable core files writing and get backtraces from core files or
(since it's reproducible) by attaching a debugger directly to a Pg
backend. See
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

If you restore just the first half of the table or just the last half
does the crash still happen? If it still happens in one part but not in
another, can you do a binary search* to isolate the smallest chunk of
the input file that still reliably causes the crash?

* (ie: split the file roughly in half at a record boundary and test each
half. Discard the half that doesn't crash, keep the half that crashes.
Repeat the process using the kept half as input until you find the
smallest chunk that still crashes, or get down to a single record that
causes the problem.)

Does the same data cause a crash when restored in another VM on the same
OpenVZ container? What about when restored to another machine with the
same OS and Pg version outside OpenVZ?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services