Trouble Upgrading Postgres
I'd be grateful for some help. I am trying to move a large database from
PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL 11 on
Centos 7. I can't do a pg_dump because it always fails on the largest
table. So tried to do pb_basebackup and copy that to the new PG 11 server.
Except that pg_upgrade expects the new and old versions of PG to be
side-by-side. So I installed 9.6 on the new server, ran initdb, verified
that it started, then stopped it and edited postgresql.conf data path to
the location of the pg_basebackup files. Then 9.6 would no longer start. So
how can I get my PG 9.6 data into a new PG 11 database?
Probably related to my troubles are my attempts to get replication set up.
But before I dive back into that, I thought I'd better try getting my 9.6
data into the new 9.6 server, then run PG 11's pg_upgrade and mount the
data in PG 11. Then maybe I can get replication started.
I've read that logical replication can be used to migrate from 9.6 to 11,
but haven't found any documentation on doing that.
Chuck Martin
On 11/03/2018 02:57 PM, Charles Martin wrote:
I'd be grateful for some help. I am trying to move a large database from
PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL 11 on
Centos 7. I can't do a pg_dump because it always fails on the largest table.
What error message?
--
Angular momentum makes the world go 'round.
On 11/3/18 12:57 PM, Charles Martin wrote:
I'd be grateful for some help. I am trying to move a large database from
PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL 11 on
Centos 7. I can't do a pg_dump because it always fails on the largest
table.
I would answer Ron's question on this first as solving it would be the
easiest fix.
So tried to do pb_basebackup and copy that to the new PG 11
server. Except that pg_upgrade expects the new and old versions of PG to
be side-by-side. So I installed 9.6 on the new server, ran initdb,
The is probably the issue, you now have two 9.6 data directory
instances, the one you created with initdb and the one that came over
with pg_basebackup. I am guessing the editing below has left the server
in a confused state about which directory to use. The error messages you
got when trying to restart the server would be helpful.
verified that it started, then stopped it and edited postgresql.conf
data path to the location of the pg_basebackup files. Then 9.6 would no
longer start. So how can I get my PG 9.6 data into a new PG 11 database?Probably related to my troubles are my attempts to get replication set
up. But before I dive back into that, I thought I'd better try getting
my 9.6 data into the new 9.6 server, then run PG 11's pg_upgrade and
mount the data in PG 11. Then maybe I can get replication started.I've read that logical replication can be used to migrate from 9.6 to
11, but haven't found any documentation on doing that.Chuck Martin
--
Adrian Klaver
adrian.klaver@aklaver.com
When I do a pg_dump using PG 9.6, I got this:
pg_dump: Dumping the contents of table "docfile" failed: PQgetCopyData()
failed.
pg_dump: Error message from server: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.docfile (docfile_pkey,
docfileoriginalname, ordernumber, versionnum, docfilecontents, docfilepath,
docfileextension, enddatetime, endby, editnum, insby, insdatetime, modby,
moddatetime, active, doc_fkey) TO stdout;
I've looked and been unable to find where Centos 7, or Postgres 9.6, stores
the path to the config/data directory outside the data/postgresql.conf
file. But I agree there must be something somewhere.
Chuck
On Sat, Nov 3, 2018 at 6:06 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/3/18 2:56 PM, Charles Martin wrote:
Please reply to list also.
Ccing list.Yes, if I could get pg_dump to work, I think I'd be fine. Maybe. But it
doesn't.Post the error you got to the list and we maybe able to help.
Also which version of Postgres where you using to take the dump?I agree that I've confused Postgres, but I don't know how to resolve the
confusion. It is complicated by the fact that my original Centos 7
install included Postgres 9.2, so those files are hanging around, along
with 9.6 and 11.I posted the error messages I got when postgresql.conf had the data
directory set to my basebackup data:*postgresql-9.6.service: main process exited, code=exited,
status=1/FAILURE*
*
*Not very helpful.
systemctl status postgresql-9.6 provided a bit more info:
*●*postgresql-9.6.service - PostgreSQL 9.6 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-9.6.service;
disabled; vendor preset: disabled)Active: *failed*(Result: exit-code) since Sat 2018-11-03 15:05:30
EDT; 15s agoDocs: https://www.postgresql.org/docs/9.6/static/
Process: 32570 ExecStart=/usr/pgsql-9.6/bin/postmaster -D ${PGDATA}
*(code=exited, status=1/FAILURE)*Process: 32563
ExecStartPre=/usr/pgsql-9.6/bin/postgresql96-check-db-dir ${PGDATA}
(code=exited, status=0/SUCCESS)Main PID: 32570 (code=exited, status=1/FAILURE)
Yet this went away, and PG 9.6 started, when I changed postgresql.conf
to point to the new (empty) data directory, which is confusing.No not confusing. Not that familiar with RPM packaging as I am with the
Debian/Ubunto packaging. Still if I remember correctly it also allows
multiple instances of Postgres to run. To do that it has its own system
of tracking the data directories. Where you created the new data
directory is obviously where the package scripts expect to find it. The
pg_basebackup directory is not.Chuck
On Sat, Nov 3, 2018 at 5:17 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/3/18 12:57 PM, Charles Martin wrote:
I'd be grateful for some help. I am trying to move a large
database from
PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL
11 on
Centos 7. I can't do a pg_dump because it always fails on the
largest
table.
I would answer Ron's question on this first as solving it would be
the
easiest fix.
So tried to do pb_basebackup and copy that to the new PG 11
server. Except that pg_upgrade expects the new and old versionsof PG to
be side-by-side. So I installed 9.6 on the new server, ran initdb,
The is probably the issue, you now have two 9.6 data directory
instances, the one you created with initdb and the one that came over
with pg_basebackup. I am guessing the editing below has left theserver
in a confused state about which directory to use. The error messages
you
got when trying to restart the server would be helpful.verified that it started, then stopped it and edited
postgresql.conf
data path to the location of the pg_basebackup files. Then 9.6
would no
longer start. So how can I get my PG 9.6 data into a new PG 11
database?
Probably related to my troubles are my attempts to get
replication set
up. But before I dive back into that, I thought I'd better try
getting
my 9.6 data into the new 9.6 server, then run PG 11's pg_upgrade
and
mount the data in PG 11. Then maybe I can get replication started.
I've read that logical replication can be used to migrate from
9.6 to
11, but haven't found any documentation on doing that.
Chuck Martin
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>--
Adrian Klaver
adrian.klaver@aklaver.com
Import Notes
Reply to msg id not found: cafc1eb5-b54c-cb31-b085-5b7ff243e273@aklaver.com
On 11/3/18 3:47 PM, Charles Martin wrote:
When I do a pg_dump using PG 9.6, I got this:
pg_dump: Dumping the contents of table "docfile" failed:
PQgetCopyData() failed.pg_dump: Error message from server: server closed the connection
unexpectedly
Is this error the client reporting?
Is this the same that is showing up in the server log?
This probably means the server terminated abnormally
So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?
before or while processing the request.
pg_dump: The command was: COPY public.docfile (docfile_pkey,
docfileoriginalname, ordernumber, versionnum, docfilecontents,
docfilepath, docfileextension, enddatetime, endby, editnum, insby,
insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;I've looked and been unable to find where Centos 7, or Postgres 9.6,
stores the path to the config/data directory outside the
data/postgresql.conf file. But I agree there must be something somewhere.Chuck
On Sat, Nov 3, 2018 at 6:06 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/3/18 2:56 PM, Charles Martin wrote:
Please reply to list also.
Ccing list.Yes, if I could get pg_dump to work, I think I'd be fine. Maybe.
But it
doesn't.
Post the error you got to the list and we maybe able to help.
Also which version of Postgres where you using to take the dump?I agree that I've confused Postgres, but I don't know how to
resolve the
confusion. It is complicated by the fact that my original Centos 7
install included Postgres 9.2, so those files are hanging around,along
with 9.6 and 11.
I posted the error messages I got when postgresql.conf had the data
directory set to my basebackup data:*postgresql-9.6.service: main process exited, code=exited,
status=1/FAILURE*
*
*Not very helpful.
systemctl status postgresql-9.6 provided a bit more info:
*●*postgresql-9.6.service - PostgreSQL 9.6 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-9.6.service;
disabled; vendor preset: disabled)Active: *failed*(Result: exit-code) since Sat 2018-11-03
15:05:30
EDT; 15s ago
Docs: https://www.postgresql.org/docs/9.6/static/
Process: 32570 ExecStart=/usr/pgsql-9.6/bin/postmaster -D
${PGDATA}
*(code=exited, status=1/FAILURE)*
Process: 32563
ExecStartPre=/usr/pgsql-9.6/bin/postgresql96-check-db-dir ${PGDATA}
(code=exited, status=0/SUCCESS)Main PID: 32570 (code=exited, status=1/FAILURE)
Yet this went away, and PG 9.6 started, when I changed
postgresql.conf
to point to the new (empty) data directory, which is confusing.
No not confusing. Not that familiar with RPM packaging as I am with
the
Debian/Ubunto packaging. Still if I remember correctly it also allows
multiple instances of Postgres to run. To do that it has its own system
of tracking the data directories. Where you created the new data
directory is obviously where the package scripts expect to find it.
The
pg_basebackup directory is not.Chuck
On Sat, Nov 3, 2018 at 5:17 PM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
<mailto:adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>>> wrote:
On 11/3/18 12:57 PM, Charles Martin wrote:
> I'd be grateful for some help. I am trying to move a large
database from
> PostgreSQL 9.6 on Centos 6 to a different server usingPostgreSQL
11 on
> Centos 7. I can't do a pg_dump because it always fails on the
largest
> table.I would answer Ron's question on this first as solving it
would be the
easiest fix.
>So tried to do pb_basebackup and copy that to the new PG 11
> server. Except that pg_upgrade expects the new and oldversions
of PG to
> be side-by-side. So I installed 9.6 on the new server, raninitdb,
The is probably the issue, you now have two 9.6 data directory
instances, the one you created with initdb and the one thatcame over
with pg_basebackup. I am guessing the editing below has left
the server
in a confused state about which directory to use. The error
messages
you
got when trying to restart the server would be helpful.> verified that it started, then stopped it and edited
postgresql.conf
> data path to the location of the pg_basebackup files. Then 9.6
would no
> longer start. So how can I get my PG 9.6 data into a new PG 11
database?
>
> Probably related to my troubles are my attempts to get
replication set
> up. But before I dive back into that, I thought I'd better try
getting
> my 9.6 data into the new 9.6 server, then run PG 11'spg_upgrade and
> mount the data in PG 11. Then maybe I can get replication
started.
>
> I've read that logical replication can be used to migrate from
9.6 to
> 11, but haven't found any documentation on doing that.
>
> Chuck Martin--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com><mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
Charles Martin wrote:
When I do a pg_dump using PG 9.6, I got this:
pg_dump: Dumping the contents of table "docfile" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.docfile (docfile_pkey, docfileoriginalname,
ordernumber, versionnum, docfilecontents, docfilepath, docfileextension, enddatetime,
endby, editnum, insby, insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;
You probably have a corrupted database.
You should get that fixed first, then you can upgrade.
Maybe you should hire a professional for that.
The other alternative is that you hava a buggy extension installed
that causes PostgreSQL to crash.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Am 03.11.2018 um 23:47 schrieb Charles Martin:
When I do a pg_dump using PG 9.6, I got this:
pg_dump: Dumping the contents of table "docfile" failed:
PQgetCopyData() failed.pg_dump: Error message from server: server closed the connection
unexpectedlyThis probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.docfile (docfile_pkey,
docfileoriginalname, ordernumber, versionnum, docfilecontents,
docfilepath, docfileextension, enddatetime, endby, editnum, insby,
insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;
which exact minor version please?
Regards, Andreas
--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com
Adtrian said:
pg_dump: Error message from server: server closed the connection
unexpectedly
Is this error the client reporting?
Is this the same that is showing up in the server log?
Yes, that's the client message, i.e. what appeared in the terminal window
that gave the command. The server log shows:
2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process (PID
30438) was terminated by signal 9: Killed
2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
ordernumber, versionnum, docfilecontents, docfilepath, d$
2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
other active server processes
2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating
connection because of crash of another server process
2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnor$
So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?
I gave the command in a terminal session after SSHing to the server from
the same network. It is not a virtual machine.
Lsaurenz said:
You probably have a corrupted database.
You should get that fixed first, then you can upgrade.
Maybe you should hire a professional for that.
I suspect this is is correct, both that there is corruption in the table
and that I need a professional to help. If someone here is available, I'm
interested.
Andreas said:
which exact minor version please?
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-23), 64-bit
Am 04.11.2018 um 17:38 schrieb Charles Martin:
Andreas said:
which exact minor version please?
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-23), 64-bit
okay, i asked to just rule out a stale version. 9.6.7, for instance,
contains some importand bug fixes.
Regards, Andreas
--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com
On 11/4/18 8:38 AM, Charles Martin wrote:
Adtrian said:
pg_dump: Error message from server: server closed the connection
unexpectedlyIs this error the client reporting?
Is this the same that is showing up in the server log?Yes, that's the client message, i.e. what appeared in the terminal
window that gave the command. The server log shows:2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process
(PID 30438) was terminated by signal 9: Killed2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
ordernumber, versionnum, docfilecontents, docfilepath, d$2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
other active server processes2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating
connection because of crash of another server process2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnor$So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?I gave the command in a terminal session after SSHing to the server
from the same network. It is not a virtual machine.Lsaurenz said:
You probably have a corrupted database.
You should get that fixed first, then you can upgrade.
Maybe you should hire a professional for that.I suspect this is is correct, both that there is corruption in the table
and that I need a professional to help. If someone here is available,
I'm interested.
Given that this involves your largest table I would confirm that the
signal 9 kill was not coming from the system OOM killer. Take a look at
the system logs to see what they show over the same time period.
Andreas said:
which exact minor version please?
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-23), 64-bit
--
Adrian Klaver
adrian.klaver@aklaver.com
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB
So it's running out of memory when trying to dump this table. The "old"
server has 4GB of ram, the "new" server 20GB.
On Sun, Nov 4, 2018 at 3:13 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/4/18 8:38 AM, Charles Martin wrote:
Adtrian said:
pg_dump: Error message from server: server closed the connection
unexpectedlyIs this error the client reporting?
Is this the same that is showing up in the server log?Yes, that's the client message, i.e. what appeared in the terminal
window that gave the command. The server log shows:2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process
(PID 30438) was terminated by signal 9: Killed2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
ordernumber, versionnum, docfilecontents, docfilepath, d$2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
other active server processes2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating
connection because of crash of another server process2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnor$So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?I gave the command in a terminal session after SSHing to the server
from the same network. It is not a virtual machine.Lsaurenz said:
You probably have a corrupted database.
You should get that fixed first, then you can upgrade.
Maybe you should hire a professional for that.I suspect this is is correct, both that there is corruption in the table
and that I need a professional to help. If someone here is available,
I'm interested.Given that this involves your largest table I would confirm that the
signal 9 kill was not coming from the system OOM killer. Take a look at
the system logs to see what they show over the same time period.Andreas said:
which exact minor version please?
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-23), 64-bit--
Adrian Klaver
adrian.klaver@aklaver.com
Not enough swap space?
On 11/04/2018 04:55 PM, Charles Martin wrote:
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kBSo it's running out of memory when trying to dump this table. The "old"
server has 4GB of ram, the "new" server 20GB.On Sun, Nov 4, 2018 at 3:13 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/4/18 8:38 AM, Charles Martin wrote:
Adtrian said:
pg_dump: Error message from server: server closed the connection
unexpectedly>Is this error the client reporting?
>Is this the same that is showing up in the server log?Yes, that's the client message, i.e. what appeared in the terminal
window that gave the command. The server log shows:2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process
(PID 30438) was terminated by signal 9: Killed2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
ordernumber, versionnum, docfilecontents, docfilepath, d$2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
other active server processes2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating
connection because of crash of another server process2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnor$So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?I gave the command in a terminal session after SSHing to the server
from the same network. It is not a virtual machine.Lsaurenz said:
You probably have a corrupted database.
You should get that fixed first, then you can upgrade.
Maybe you should hire a professional for that.I suspect this is is correct, both that there is corruption in the
table
and that I need a professional to help. If someone here is available,
I'm interested.Given that this involves your largest table I would confirm that the
signal 9 kill was not coming from the system OOM killer. Take a look at
the system logs to see what they show over the same time period.Andreas said:
>which exact minor version please?
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-23), 64-bit--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Angular momentum makes the world go 'round.
Charles Martin <ssappeals@gmail.com> writes:
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB
So it's running out of memory when trying to dump this table. The "old"
server has 4GB of ram, the "new" server 20GB.
That's kind of odd: a COPY shouldn't really consume very much working
memory. I suspect that much of the process's apparent VM consumption may
be shared buffers ... what have you got shared_buffers set to on the old
server? If it's more than half a GB or so, maybe reducing it would help.
regards, tom lane
On 11/4/18 2:55 PM, Charles Martin wrote:
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kBSo it's running out of memory when trying to dump this table. The "old"
server has 4GB of ram, the "new" server 20GB.
In addition to the other suggestions, what is the exact pg_dump command
you are using?
--
Adrian Klaver
adrian.klaver@aklaver.com
Tom said:
That's kind of odd: a COPY shouldn't really consume very much working
memory. I suspect that much of the process's apparent VM consumption may
be shared buffers ... what have you got shared_buffers set to on the old
server? If it's more than half a GB or so, maybe reducing it would help.
This is not a VM, but hardware.
Adrian said:
In addition to the other suggestions, what is the exact pg_dump command
you are using?
The last time:
[postgres@mandj tmp]$ pg_dump martinandjones >
/mnt/4tbB/pgbackup/2018-11-02/mandj.bak
On Sun, Nov 4, 2018 at 8:16 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/4/18 2:55 PM, Charles Martin wrote:
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kBSo it's running out of memory when trying to dump this table. The "old"
server has 4GB of ram, the "new" server 20GB.In addition to the other suggestions, what is the exact pg_dump command
you are using?--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/5/18 5:56 AM, Charles Martin wrote:
Tom said:
That's kind of odd: a COPY shouldn't really consume very much working
memory. I suspect that much of the process's apparent VM consumption may
be shared buffers ... what have you got shared_buffers set to on the old
server? If it's more than half a GB or so, maybe reducing it would help.This is not a VM, but hardware.
Tom was referring to this from your previous post:
(postmaster) total-vm:3068900kB,
where vm(VM) is Virtual Memory:
https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
So what is your shared_buffers:
https://www.postgresql.org/docs/10/static/runtime-config-resource.html
set to?
Adrian said:
In addition to the other suggestions, what is the exact pg_dump command
you are using?The last time:
[postgres@mandj tmp]$ pg_dump martinandjones >
/mnt/4tbB/pgbackup/2018-11-02/mandj.bakOn Sun, Nov 4, 2018 at 8:16 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/4/18 2:55 PM, Charles Martin wrote:
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB,file-rss:1074692kB
So it's running out of memory when trying to dump this table. The
"old"
server has 4GB of ram, the "new" server 20GB.
In addition to the other suggestions, what is the exact pg_dump command
you are using?--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
Charles Martin wrote:
So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?I gave the command in a terminal session after SSHing to the server from
the same network. It is not a virtual machine.
That means that pg_dump runs on the same machine as the server.
It's plausible that, with only 4GB of RAM, the table that fails
to dump has some very large rows that can't be allocated, especially
since both the backend and pg_dump need to have it simultaneously
in memory.
pg_dump: The command was: COPY public.docfile (docfile_pkey,
docfileoriginalname, ordernumber, versionnum, docfilecontents,
docfilepath, docfileextension, enddatetime, endby, editnum, insby,
insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;
The "docfilecontents" column suggests that it might contain
large contents. If its type is bytea, it's going to be expanded
to twice its size to build the hex representation.
You may get a sense on how big is the biggest row expressed
as text with this query:
SELECT max(length(contents.*::text)) FROM public.docfile;
If it's big enough that it might cause the OOM issue,
try to run pg_dump remotely through an SSH tunnel [1]https://www.postgresql.org/docs/current/static/ssh-tunnels.html, which you
can already do in terms of network permissions since you log in with
SSH, so pg_dump itself does not use any memory on the server.
Also, if the machine doesn't have swap space, it might be
that just adding a few GB's of swap would make the operation
succeed.
[1]: https://www.postgresql.org/docs/current/static/ssh-tunnels.html
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
Adrian said:
Tom was referring to this from your previous post:
(postmaster) total-vm:3068900kB,
where vm(VM) is Virtual Memory:
https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
So what is your shared_buffers:
https://www.postgresql.org/docs/10/static/runtime-config-resource.html
set to?
Ok, thanks for explaining this. Here is the current value:
"shared_buffers" "131072" "8kB"
On Mon, Nov 5, 2018 at 9:06 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/5/18 5:56 AM, Charles Martin wrote:
Tom said:
That's kind of odd: a COPY shouldn't really consume very much working
memory. I suspect that much of the process's apparent VM consumptionmay
be shared buffers ... what have you got shared_buffers set to on the
old
server? If it's more than half a GB or so, maybe reducing it would
help.
This is not a VM, but hardware.
Tom was referring to this from your previous post:
(postmaster) total-vm:3068900kB,
where vm(VM) is Virtual Memory:
https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
So what is your shared_buffers:
https://www.postgresql.org/docs/10/static/runtime-config-resource.html
set to?
Adrian said:
In addition to the other suggestions, what is the exact pg_dump command
you are using?The last time:
[postgres@mandj tmp]$ pg_dump martinandjones >
/mnt/4tbB/pgbackup/2018-11-02/mandj.bakOn Sun, Nov 4, 2018 at 8:16 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/4/18 2:55 PM, Charles Martin wrote:
Yep, you called it:
Nov 2 20:30:45 localhost kernel: Out of memory: Kill process
30438
(postmaster) score 709 or sacrifice child
Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB,file-rss:1074692kB
So it's running out of memory when trying to dump this table. The
"old"
server has 4GB of ram, the "new" server 20GB.
In addition to the other suggestions, what is the exact pg_dump
command
you are using?
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/5/18 7:04 AM, Charles Martin wrote:
Adrian said:
Tom was referring to this from your previous post:
(postmaster) total-vm:3068900kB,
where vm(VM) is Virtual Memory:
https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
So what is your shared_buffers:
https://www.postgresql.org/docs/10/static/runtime-config-resource.html
set to?Ok, thanks for explaining this. Here is the current value:
"shared_buffers""131072""8kB"
It should be a single value something like this for the default:
shared_buffers = 128MB
--
Adrian Klaver
adrian.klaver@aklaver.com
Charles Martin <ssappeals@gmail.com> writes:
Ok, thanks for explaining this. Here is the current value:
"shared_buffers" "131072" "8kB"
Well, that's 1GB, which might be ambitious inside a VM with a hard
restriction to 4GB total RAM. Postgres can get by with a *lot* less.
Try knocking it down to a tenth of that and see if it makes a difference.
regards, tom lane