pg_restore out of memory

Started by Miguel Ramosover 9 years ago31 messagesgeneral
Jump to latest
#1Miguel Ramos
org.postgresql@miguel.ramos.name

Hi,

We have backed up a database and now when trying to restore it to the
same server we get this:

# pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
pg_restore: [custom archiver] out of memory
12:09:56.58 9446.593u+1218.508s 24.3% 167+2589k 6+0io 0pf+0sw

6968822cs

Some information about the application:

- We have sensor data, including pictures, and number crunshing output,
then so the large tables on this database have 319, 279, 111 and 26GB.
Mostly on TOAST pages, but the 279GB one divides it evenly. This
database is 765GB. We try to keep them under 4TB.
- Transactions are large, some 100 MB at a time.
- We also use PostGIS.

About the server (dedicated):

- FreeBSD 9.1-RELEASE #0 on AMD64
- 16 GB of RAM
- 8x3GB hardware RAID 10
- 10TB slice for pgdata UFS-formatted and 32kB block
- PostgreSQL 9.1.8 custom compiled to get 32kB blocks
- Installed in 2013 with ~10 people working with it, 145 days uptime today.

I found two relevant threads on the mailing-lists.
The most recent one sugested that postgresql was being configured to use
more memory than what's available.
The older one sugested that the system limits on the size of the data or
stack segments were lower than required.

So here are some server parameters (relevant or otherwise):

max_connections = 100
shared_buffers = 4GB -- 25% of RAM
temp_buffers = 32MB -- irrelevant?
work_mem = 64MB
maintenance_work_mem = was 1G lowered to 256M then 64M
wal_buffers = -1 -- should mean 1/32 of shared_buffers = 128MB
checkpoint_segments = 64 -- WAL segments are 16MB
effective_cache_size = 8GB -- irrelevant?

I suspect that the restore fails when constructing the indices. After
the process is aborted, the data appears to be all or most there, but no
indices.
So, all I did so far, was lowering maintenance_work_mem and it didn't work.

System limits, as you can see, are at defaults (32GB for data and 512MB
for stack):

# limit
cputime unlimited
filesize unlimited
datasize 33554432 kbytes
stacksize 524288 kbytes
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 11095
memorylocked unlimited
maxproc 5547
sbsize unlimited
swapsize unlimited

Shared memory is configured to allow for the single shared memory
segment postgresql appears to use, plus a bit of extra (8GB):

# ipcs -M
shminfo:
shmmax: 8589934592 (max shared memory segment size)
shmmin: 1 (min shared memory segment size)
shmmni: 192 (max number of shared memory identifiers)
shmseg: 128 (max shared memory segments per process)
shmall: 2097152 (max amount of shared memory in pages)

And semaphores (irrelevant?)...

# ipcs -S
seminfo:
semmni: 256 (# of semaphore identifiers)
semmns: 512 (# of semaphores in system)
semmnu: 256 (# of undo structures in system)
semmsl: 340 (max # of semaphores per id)
semopm: 100 (max # of operations per semop call)
semume: 50 (max # of undo entries per process)
semusz: 632 (size in bytes of undo structure)
semvmx: 32767 (semaphore maximum value)
semaem: 16384 (adjust on exit max value)

I don't know what else to try.
I lowered maintenance_work_mem without restarting the server.
In some of the attempts, but not all, the restore was done while people
were working.

Each attempt takes 12 hours...
We couldn't use the directory -Fd dump because it's postgresql 9.1.
The original database is still on the server, this is a test restore.

We have about one or two months of slack before we really need to remove
them from the server to recover space.

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Felipe Santos
felipepts@gmail.com
In reply to: Miguel Ramos (#1)
Re: pg_restore out of memory

2016-07-12 8:25 GMT-03:00 Miguel Ramos <org.postgresql@miguel.ramos.name>:

Hi,

We have backed up a database and now when trying to restore it to the same
server we get this:

# pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
pg_restore: [custom archiver] out of memory
12:09:56.58 9446.593u+1218.508s 24.3% 167+2589k 6+0io 0pf+0sw

6968822cs

Some information about the application:

- We have sensor data, including pictures, and number crunshing output,
then so the large tables on this database have 319, 279, 111 and 26GB.
Mostly on TOAST pages, but the 279GB one divides it evenly. This database
is 765GB. We try to keep them under 4TB.
- Transactions are large, some 100 MB at a time.
- We also use PostGIS.

About the server (dedicated):

- FreeBSD 9.1-RELEASE #0 on AMD64
- 16 GB of RAM
- 8x3GB hardware RAID 10
- 10TB slice for pgdata UFS-formatted and 32kB block
- PostgreSQL 9.1.8 custom compiled to get 32kB blocks
- Installed in 2013 with ~10 people working with it, 145 days uptime today.

I found two relevant threads on the mailing-lists.
The most recent one sugested that postgresql was being configured to use
more memory than what's available.
The older one sugested that the system limits on the size of the data or
stack segments were lower than required.

So here are some server parameters (relevant or otherwise):

max_connections = 100
shared_buffers = 4GB -- 25% of RAM
temp_buffers = 32MB -- irrelevant?
work_mem = 64MB
maintenance_work_mem = was 1G lowered to 256M then 64M
wal_buffers = -1 -- should mean 1/32 of shared_buffers = 128MB
checkpoint_segments = 64 -- WAL segments are 16MB
effective_cache_size = 8GB -- irrelevant?

I suspect that the restore fails when constructing the indices. After the
process is aborted, the data appears to be all or most there, but no
indices.
So, all I did so far, was lowering maintenance_work_mem and it didn't work.

System limits, as you can see, are at defaults (32GB for data and 512MB
for stack):

# limit
cputime unlimited
filesize unlimited
datasize 33554432 kbytes
stacksize 524288 kbytes
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 11095
memorylocked unlimited
maxproc 5547
sbsize unlimited
swapsize unlimited

Shared memory is configured to allow for the single shared memory segment
postgresql appears to use, plus a bit of extra (8GB):

# ipcs -M
shminfo:
shmmax: 8589934592 (max shared memory segment size)
shmmin: 1 (min shared memory segment size)
shmmni: 192 (max number of shared memory identifiers)
shmseg: 128 (max shared memory segments per process)
shmall: 2097152 (max amount of shared memory in pages)

And semaphores (irrelevant?)...

# ipcs -S
seminfo:
semmni: 256 (# of semaphore identifiers)
semmns: 512 (# of semaphores in system)
semmnu: 256 (# of undo structures in system)
semmsl: 340 (max # of semaphores per id)
semopm: 100 (max # of operations per semop call)
semume: 50 (max # of undo entries per process)
semusz: 632 (size in bytes of undo structure)
semvmx: 32767 (semaphore maximum value)
semaem: 16384 (adjust on exit max value)

I don't know what else to try.
I lowered maintenance_work_mem without restarting the server.
In some of the attempts, but not all, the restore was done while people
were working.

Each attempt takes 12 hours...
We couldn't use the directory -Fd dump because it's postgresql 9.1.
The original database is still on the server, this is a test restore.

We have about one or two months of slack before we really need to remove
them from the server to recover space.

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hi Miguel,

I would try lowering max_connections to 50 and then set work_mem to 128MB.

After that restart your server and retry the restore.

Tell us if that helps.

Regards,

#3Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Felipe Santos (#2)
Re: pg_restore out of memory

Às 12:32 de 12-07-2016, Felipe Santos escreveu:

I would try lowering max_connections to 50 and then set work_mem to 128MB.

After that restart your server and retry the restore.

Ok, I will try restarting tonight.
work_mem is the parameter I was most afraid of.

I'll post some news in 24h...

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Felipe Santos
felipepts@gmail.com
In reply to: Miguel Ramos (#3)
Re: pg_restore out of memory

2016-07-12 8:54 GMT-03:00 Miguel Ramos <org.postgresql@miguel.ramos.name>:

Às 12:32 de 12-07-2016, Felipe Santos escreveu:

I would try lowering max_connections to 50 and then set work_mem to 128MB.

After that restart your server and retry the restore.

Ok, I will try restarting tonight.
work_mem is the parameter I was most afraid of.

I'll post some news in 24h...

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

After the restore (being it a success or failure) don't forget to set the
parameters back to their original values (max_conn = 100 and work_mem=64MB).

BR

#5Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Miguel Ramos (#1)
Re: pg_restore out of memory

On Tue, 12 Jul 2016, 7:25 p.m. Miguel Ramos, <
org.postgresql@miguel.ramos.name> wrote:

Hi,

We have backed up a database and now when trying to restore it to the
same server we get this:

# pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
pg_restore: [custom archiver] out of memory
12:09:56.58 9446.593u+1218.508s 24.3% 167+2589k 6+0io 0pf+0sw

6968822cs

Some information about the application:

- We have sensor data, including pictures, and number crunshing output,
then so the large tables on this database have 319, 279, 111 and 26GB.
Mostly on TOAST pages, but the 279GB one divides it evenly. This
database is 765GB. We try to keep them under 4TB.
- Transactions are large, some 100 MB at a time.
- We also use PostGIS.

About the server (dedicated):

- FreeBSD 9.1-RELEASE #0 on AMD64
- 16 GB of RAM
- 8x3GB hardware RAID 10
- 10TB slice for pgdata UFS-formatted and 32kB block
- PostgreSQL 9.1.8 custom compiled to get 32kB blocks
- Installed in 2013 with ~10 people working with it, 145 days uptime today.

I found two relevant threads on the mailing-lists.
The most recent one sugested that postgresql was being configured to use
more memory than what's available.
The older one sugested that the system limits on the size of the data or
stack segments were lower than required.

So here are some server parameters (relevant or otherwise):

max_connections = 100
shared_buffers = 4GB -- 25% of RAM
temp_buffers = 32MB -- irrelevant?
work_mem = 64MB
maintenance_work_mem = was 1G lowered to 256M then 64M

Why did you lower it? I think increasing it should help better. But 1GB
seems like fine.

wal_buffers = -1 -- should mean 1/32 of shared_buffers = 128MB

Increase this during the restore, may be 512MB

checkpoint_segments = 64 -- WAL segments are 16MB

effective_cache_size = 8GB -- irrelevant?

I suspect that the restore fails when constructing the indices. After
the process is aborted, the data appears to be all or most there, but no
indices.

What is logged in database log files? Have you checked that?

So, all I did so far, was lowering maintenance_work_mem and it didn't work.

System limits, as you can see, are at defaults (32GB for data and 512MB
for stack):

# limit
cputime unlimited
filesize unlimited
datasize 33554432 kbytes
stacksize 524288 kbytes
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 11095
memorylocked unlimited
maxproc 5547
sbsize unlimited
swapsize unlimited

Shared memory is configured to allow for the single shared memory
segment postgresql appears to use, plus a bit of extra (8GB):

# ipcs -M
shminfo:
shmmax: 8589934592 (max shared memory segment size)
shmmin: 1 (min shared memory segment size)
shmmni: 192 (max number of shared memory

identifiers)

shmseg: 128 (max shared memory segments per process)
shmall: 2097152 (max amount of shared memory in pages)

And semaphores (irrelevant?)...

# ipcs -S
seminfo:
semmni: 256 (# of semaphore identifiers)
semmns: 512 (# of semaphores in system)
semmnu: 256 (# of undo structures in system)
semmsl: 340 (max # of semaphores per id)
semopm: 100 (max # of operations per semop call)
semume: 50 (max # of undo entries per process)
semusz: 632 (size in bytes of undo structure)
semvmx: 32767 (semaphore maximum value)
semaem: 16384 (adjust on exit max value)

What are your vm.dirty_ratio and vm.dirty_background_ratio? I think
reducing them may help. But can not really say what exactly would help
unless you are able to get the error source in db logs

I don't know what else to try.
I lowered maintenance_work_mem without restarting the server.
In some of the attempts, but not all, the restore was done while people
were working.

Each attempt takes 12 hours...
We couldn't use the directory -Fd dump because it's postgresql 9.1.
The original database is still on the server, this is a test restore.

We have about one or two months of slack before we really need to remove
them from the server to recover space.

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Miguel Ramos (#1)
Re: pg_restore out of memory

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

We have backed up a database and now when trying to restore it to the
same server we get this:

# pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
pg_restore: [custom archiver] out of memory

It looks to me like this error is pg_restore itself running out of memory,
not reporting a server-side OOM condition. You could verify that by
looking in the server log to see whether any out-of-memory error appeared
there. But assuming that I'm right, the other responses suggesting
tweaking server configurations are not on-point at all.

Unless you're running pg_restore under a really small ulimit, this would
seem to suggest some kind of memory leak in pg_restore itself. I wonder
how many objects in your dump (how long is "pg_restore -l" output)?

- PostgreSQL 9.1.8 custom compiled to get 32kB blocks

9.1.8 is pretty old ...

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Tom Lane (#6)
Re: pg_restore out of memory

�s 15:40 de 12-07-2016, Tom Lane escreveu:

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

We have backed up a database and now when trying to restore it to the
same server we get this:

# pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
pg_restore: [custom archiver] out of memory

It looks to me like this error is pg_restore itself running out of memory,
not reporting a server-side OOM condition. You could verify that by
looking in the server log to see whether any out-of-memory error appeared
there. But assuming that I'm right, the other responses suggesting
tweaking server configurations are not on-point at all.

The logs of the last attempt are already gone.
Obviously, I'll keep tonight's logs.

Unless you're running pg_restore under a really small ulimit, this would
seem to suggest some kind of memory leak in pg_restore itself. I wonder
how many objects in your dump (how long is "pg_restore -l" output)?

pg_restore -l | wc gives me:
1055 7984 70675

It looks small to me.
We don't have a lot of tables, instead we have really huge tables.
We try to keep the schema normalized when possible, and we only don't do
that when the resulting tuples become too small for PostgreSQL, when the
row overhead becomes prohibitive.

- PostgreSQL 9.1.8 custom compiled to get 32kB blocks

9.1.8 is pretty old ...

Oh, I'll forward your email to those who were older than I, the many far
wiser than I, but whose love of database servers is certainly not
stronger than mine, by far.

Thanks,

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Miguel Ramos (#7)
Re: pg_restore out of memory

�s 16:23 de 12-07-2016, Miguel Ramos escreveu:

It looks to me like this error is pg_restore itself running out of
memory,
not reporting a server-side OOM condition. You could verify that by
looking in the server log to see whether any out-of-memory error appeared
there. But assuming that I'm right, the other responses suggesting
tweaking server configurations are not on-point at all.

[...]

- PostgreSQL 9.1.8 custom compiled to get 32kB blocks

9.1.8 is pretty old ...

Of course, I could try a recent pg_restore.
That'll take me a bit of time too, but I can try.

--
Miguel Ramos

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Miguel Ramos (#7)
Re: pg_restore out of memory

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

�s 15:40 de 12-07-2016, Tom Lane escreveu:

Unless you're running pg_restore under a really small ulimit, this would
seem to suggest some kind of memory leak in pg_restore itself. I wonder
how many objects in your dump (how long is "pg_restore -l" output)?

pg_restore -l | wc gives me:
1055 7984 70675

It looks small to me.

Yeah, the archive TOC is clearly not large enough to cause any problem in
itself. I'm wondering at this point about libpq buffer bloat. We've
fixed a number of problems in that area over the last few years, though
in a quick review of the commit logs I don't see anything that clearly
bears on your problem. (Commit 86888054a92aeca4 is pretty interesting
but applies to mostly-server-to-client data transfer, which is the wrong
direction.) I wonder whether your data is such that loading it would
trigger a bunch of NOTICEs from the server?

Anyway, it would be useful to try running the restore with a more modern
version of pg_restore, to see if that helps.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Tom Lane (#9)
Re: pg_restore out of memory

A Ter, 12-07-2016 às 11:58 -0400, Tom Lane escreveu:

 
Anyway, it would be useful to try running the restore with a more
modern
version of pg_restore, to see if that helps.

regards, tom lane

I have the scheduled restart tonight.
So, I will do the other test first, nevertheless.

This because I have the impression that it is during index creation,
where I think client role would be minimal.
Maybe I saw something in the logs when this problem was first reported
internally.

I can't spare a second 700G for doing both tests concurrently.

I will get back when I have more news.

Thanks,

--
Miguel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Miguel Ramos (#10)
Re: pg_restore out of memory

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

This because I have the impression that it is during index creation,
where I think client role would be minimal.

Hard to believe really, given the spelling of the message. But anyway,
be sure you do the run with log_statement = all so that it's clear what
is being worked on when the error happens.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Miguel Ramos (#1)
Re: pg_restore out of memory

On Tue, Jul 12, 2016 at 12:25:08PM +0100, Miguel Ramos wrote:

# pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump
pg_restore: [custom archiver] out of memory
12:09:56.58 9446.593u+1218.508s 24.3% 167+2589k 6+0io 0pf+0sw 6968822cs

...

I suspect that the restore fails when constructing the indices. After the
process is aborted, the data appears to be all or most there, but no
indices.

...

I don't know what else to try.

You could try restoring w/o indices and re-adding them later.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Tom Lane (#11)
Re: pg_restore out of memory

Finally, here are the log messages at the moment of the error.
It is clearly not while building indices.

The table in question is a big one, 111GB.
Fields latitude, longitude and height are arrays of length around 500-
700 on each row (double and real).

So, what does this mean?
Was it the client that aborted? I think I saw that "unexpected message
type 0x58" on other types of interruptions.

Jul 13 20:10:10 ema postgres[97889]: [867-1] LOG: statement: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;
Jul 13 20:10:10 ema postgres[97889]: [867-2]
Jul 13 21:08:06 ema postgres[97889]: [868-1] ERROR: unexpected message type 0x58 during COPY from stdin
Jul 13 21:08:06 ema postgres[97889]: [868-2] CONTEXT: COPY positioned_scan, line 2779323
Jul 13 21:08:06 ema postgres[97889]: [868-3] STATEMENT: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;
Jul 13 21:08:06 ema postgres[97889]: [868-4]
Jul 13 21:08:06 ema postgres[97889]: [869-1] LOG: could not send data to client: Broken pipe
Jul 13 21:08:06 ema postgres[97889]: [869-2] STATEMENT: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;
Jul 13 21:08:06 ema postgres[97889]: [869-3]
Jul 13 21:08:06 ema postgres[97889]: [870-1] LOG: unexpected EOF on client connection

Thanks,

-- Miguel

A Ter, 12-07-2016 às 15:10 -0400, Tom Lane escreveu:

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

This because I have the impression that it is during index
creation,
where I think client role would be minimal.

Hard to believe really, given the spelling of the message.  But
anyway,
be sure you do the run with log_statement = all so that it's clear
what
is being worked on when the error happens.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14John R Pierce
pierce@hogranch.com
In reply to: Miguel Ramos (#13)
Re: pg_restore out of memory

On 7/13/2016 1:51 PM, Miguel Ramos wrote:

Finally, here are the log messages at the moment of the error.
It is clearly not while building indices.

The table in question is a big one, 111GB.
Fields latitude, longitude and height are arrays of length around 500-
700 on each row (double and real).

So, what does this mean?
Was it the client that aborted? I think I saw that "unexpected message
type 0x58" on other types of interruptions.

is pg_restore, and the postgres server all the same version?

$ pg_restore --version
pg_restore (PostgreSQL) 9.3.13

$ su - postgres
-bash-4.1$ psql -c "select version()"
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.13 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Sameer Kumar (#5)
Re: pg_restore out of memory

A Ter, 12-07-2016 às 13:08 +0000, Sameer Kumar escreveu:

On Tue, 12 Jul 2016, 7:25 p.m. Miguel Ramos,
<org.postgresql@miguel.ramos.name> wrote:

I found two relevant threads on the mailing-lists.
The most recent one sugested that postgresql was being configured
to use
more memory than what's available.
The older one sugested that the system limits on the size of the
data or
stack segments were lower than required.

So here are some server parameters (relevant or otherwise):

 > max_connections = 100
 > shared_buffers = 4GB  -- 25% of RAM
 > temp_buffers = 32MB  -- irrelevant?
 > work_mem = 64MB
 > maintenance_work_mem = was 1G lowered to 256M then 64M

Why did you lower it? I think increasing it should help better. But
1GB seems like fine.

The advise was on that thread and maybe the problem was very different.
The idea I get is that PostgreSQL can always find a way to do its work,
maybe using an out of core algorithm.

If you tell it to use a lot of memory, then it will try to use RAM and
then it really may run out of memory.

So, basically, increasing the memory available is a performance
improvement, if you feel safe that the memory really is available.

But maybe that logic applies only to work_mem...
And it's also work_mem that is difficult to bound, according to the
manual.
I don't really know...

 > wal_buffers = -1  -- should mean 1/32 of shared_buffers = 128MB

Increase this during the restore, may be 512MB

I retain the advise, but now I have posted the log messages to the
list.

 > checkpoint_segments = 64  -- WAL segments are 16MB
 > effective_cache_size = 8GB  -- irrelevant?

I suspect that the restore fails when constructing the indices.
After
the process is aborted, the data appears to be all or most there,
but no
indices.

What is logged in database log files? Have you checked that?

This time I collected the logs.
I posted the 10 relevant lines as a reply to another message.
I'll repeat only the ERROR line here:

Jul 13 21:08:06 ema postgres[97889]: [868-1] ERROR: unexpected message type 0x58 during COPY from stdin

What are your vm.dirty_ratio and vm.dirty_background_ratio? I think
reducing them may help. But can not really say what exactly would
help unless you are able to get the error source in db logs

This is a FreeBSD server.
I'm not really sure what the equivalent would be.
Also, I don't think tunning the VM would help.

This is quite a deterministic abort, 12-13 hours after the beginning of
the restore, and does not change much whether it is done during the
night or during the day with 10 people working intensively.

Thanks,

-- Miguel

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Miguel Ramos
mail@miguel.ramos.name
In reply to: John R Pierce (#14)
Re: pg_restore out of memory

Yes.
Both 9.1.8, I checked right now.

-- Miguel

A Qua, 13-07-2016 às 13:59 -0700, John R Pierce escreveu:

On 7/13/2016 1:51 PM, Miguel Ramos wrote:

Finally, here are the log messages at the moment of the error.
It is clearly not while building indices.

The table in question is a big one, 111GB.
Fields latitude, longitude and height are arrays of length around
500-
700 on each row (double and real).

So, what does this mean?
Was it the client that aborted? I think I saw that "unexpected
message
type 0x58" on other types of interruptions.

is pg_restore, and the postgres server all the same version?

$ pg_restore --version
pg_restore (PostgreSQL) 9.3.13

$ su - postgres
-bash-4.1$ psql -c "select version()"
version
-------------------------------------------------------------------
---------------------------------------------
  PostgreSQL 9.3.13 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

-- 
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: John R Pierce (#14)
Re: pg_restore out of memory

Yes.
Both 9.1.8, I checked right now.

-- Miguel

A Qua, 13-07-2016 às 13:59 -0700, John R Pierce escreveu:

On 7/13/2016 1:51 PM, Miguel Ramos wrote:

Finally, here are the log messages at the moment of the error.
It is clearly not while building indices.

The table in question is a big one, 111GB.
Fields latitude, longitude and height are arrays of length around
500-
700 on each row (double and real).

So, what does this mean?
Was it the client that aborted? I think I saw that "unexpected
message
type 0x58" on other types of interruptions.

is pg_restore, and the postgres server all the same version?

$ pg_restore --version
pg_restore (PostgreSQL) 9.3.13

$ su - postgres
-bash-4.1$ psql -c "select version()"
version
-------------------------------------------------------------------
---------------------------------------------
  PostgreSQL 9.3.13 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

-- 
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Miguel Ramos (#13)
Re: pg_restore out of memory

On 07/13/2016 01:51 PM, Miguel Ramos wrote:

Finally, here are the log messages at the moment of the error.
It is clearly not while building indices.

The table in question is a big one, 111GB.
Fields latitude, longitude and height are arrays of length around 500-
700 on each row (double and real).

So, what does this mean?
Was it the client that aborted? I think I saw that "unexpected message
type 0x58" on other types of interruptions.

So where are you running the pg_restore from, manually from command line
or from within a script?

Jul 13 20:10:10 ema postgres[97889]: [867-1] LOG: statement: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;
Jul 13 20:10:10 ema postgres[97889]: [867-2]
Jul 13 21:08:06 ema postgres[97889]: [868-1] ERROR: unexpected message type 0x58 during COPY from stdin
Jul 13 21:08:06 ema postgres[97889]: [868-2] CONTEXT: COPY positioned_scan, line 2779323
Jul 13 21:08:06 ema postgres[97889]: [868-3] STATEMENT: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;
Jul 13 21:08:06 ema postgres[97889]: [868-4]
Jul 13 21:08:06 ema postgres[97889]: [869-1] LOG: could not send data to client: Broken pipe
Jul 13 21:08:06 ema postgres[97889]: [869-2] STATEMENT: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;
Jul 13 21:08:06 ema postgres[97889]: [869-3]
Jul 13 21:08:06 ema postgres[97889]: [870-1] LOG: unexpected EOF on client connection

Thanks,

-- Miguel

A Ter, 12-07-2016 às 15:10 -0400, Tom Lane escreveu:

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

This because I have the impression that it is during index
creation,
where I think client role would be minimal.

Hard to believe really, given the spelling of the message. But
anyway,
be sure you do the run with log_statement = all so that it's clear
what
is being worked on when the error happens.

regards, tom lane

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Miguel Ramos (#13)
Re: pg_restore out of memory

Miguel Ramos <org.postgresql@miguel.ramos.name> writes:

So, what does this mean?
Was it the client that aborted? I think I saw that "unexpected message
type 0x58" on other types of interruptions.

Yeah, 0x58 is ASCII 'X' which is a Terminate message. Between that and
the unexpected-EOF report, it's quite clear that the client side went
belly-up, not the server. We still don't know exactly why, but given
that pg_restore reports "out of memory" before quitting, there must be
some kind of memory leak going on inside pg_restore.

Jul 13 20:10:10 ema postgres[97889]: [867-1] LOG: statement: COPY positioned_scan (id_dataset, id_acquired_set, sequence_number, id_scan_dataset, latitude, longitude, height, srid, srid_vertical) FROM stdin;

I'm guessing from the column names that you've got some PostGIS data
types in this table. I wonder if that's a contributing factor.

I'm still suspicious that this might be some sort of NOTICE-processing-
related buffer bloat. Could you try loading the data with the server's
log_min_messages level cranked down to NOTICE, so you can see from the
postmaster log whether any NOTICEs are being issued to the pg_restore
session?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20John R Pierce
pierce@hogranch.com
In reply to: Miguel Ramos (#17)
Re: pg_restore out of memory

On 7/13/2016 2:11 PM, Miguel Ramos wrote:

Yes.
Both 9.1.8, I checked right now.

9.1 is up to 9.1.22, thats a lot of bug fixes you're missing. 9.1.8 was
released 2013-02-07, 9.1.22 in 2016-05-12

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#19)
#22Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: John R Pierce (#20)
#23Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Tom Lane (#19)
#24Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Tom Lane (#21)
#25Miguel Ramos
mail@miguel.ramos.name
In reply to: Miguel Ramos (#24)
#26Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Miguel Ramos (#24)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Miguel Ramos (#25)
#28Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Miguel Ramos (#26)
#29Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Adrian Klaver (#28)
#30Miguel Ramos
mail@miguel.ramos.name
In reply to: Miguel Ramos (#29)
#31Miguel Ramos
org.postgresql@miguel.ramos.name
In reply to: Miguel Ramos (#30)