pg_upgrade --jobs

Started by senorabout 7 years ago15 messagesgeneral

frio_cervesa@hotmail.com

about 7 years ago

The pg_upgrade --jobs option is not passed as an argument when it calls pg_dump. I haven't found anything in docs or forums mentioning a reason for not supporting under certain circumstances other than possibly for pre-9.2. The pg_upgrade docs page states that it allows multiple CPUs to be used for dump and reload of schemas. Some databases I'm upgrading have 500,000+ tables and running with a single process is greatly increasing the upgrade time.

I am also using the --link option.
I have tried "--jobs 20", "--jobs=20", placing this option first and last and many other variations.
I am upgrading 9.2.4 to 9.6.12 on CentOS 6.
Varying hardware but all with 32+ CPU cores.

su - postgres -c "/usr/pgsql-9.6/bin/pg_upgrade --jobs=20 --link \
--old-bindir=/usr/pgsql-9.2/bin/ \
--new-bindir=/usr/pgsql-9.6/bin/ \
--old-datadir=/var/lib/pgsql/9.2/data/ \
--new-datadir=/var/lib/pgsql/9.6/data/"

I feel like there's a simple reason I've missed but this seems pretty straight forward.
A secondary plan would be to find instructions for doing the same as "pg_upgrade --link" manually so I can run "pg_dump --jobs 20".
Any assist is appreciated.
Thanks,
S. Cervesa

Adrian Klaver

adrian.klaver@aklaver.com

about 7 years ago

In reply to: senor (#1)

Re: pg_upgrade --jobs

On 4/6/19 11:44 AM, senor wrote:

The pg_upgrade --jobs option is not passed as an argument when it calls pg_dump. I haven't found anything in docs or forums mentioning a reason for not supporting under certain circumstances other than possibly for pre-9.2. The pg_upgrade docs page states that it allows multiple CPUs to be used for dump and reload of schemas. Some databases I'm upgrading have 500,000+ tables and running with a single process is greatly increasing the upgrade time.

I am also using the --link option.
I have tried "--jobs 20", "--jobs=20", placing this option first and last and many other variations.
I am upgrading 9.2.4 to 9.6.12 on CentOS 6.
Varying hardware but all with 32+ CPU cores.

su - postgres -c "/usr/pgsql-9.6/bin/pg_upgrade --jobs=20 --link \
--old-bindir=/usr/pgsql-9.2/bin/ \
--new-bindir=/usr/pgsql-9.6/bin/ \
--old-datadir=/var/lib/pgsql/9.2/data/ \
--new-datadir=/var/lib/pgsql/9.6/data/"

I feel like there's a simple reason I've missed but this seems pretty straight forward.

https://www.postgresql.org/docs/9.6/app-pgdump.html

"--jobs=njobs

Run the dump in parallel by dumping njobs tables simultaneously.
This option reduces the time of the dump but it also increases the load
on the database server. You can only use this option with the directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
output format because this is the only output format where multiple
^^^^^^^^^^^^^
processes can write their data at the same time."

A secondary plan would be to find instructions for doing the same as "pg_upgrade --link" manually so I can run "pg_dump --jobs 20".
Any assist is appreciated.
Thanks,
S. Cervesa

--
Adrian Klaver
adrian.klaver@aklaver.com

senor

frio_cervesa@hotmail.com

about 7 years ago

In reply to: Adrian Klaver (#2)

Re: pg_upgrade --jobs

Thank you for responding. I did see that note and should have included that as part of my question.

Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory output format when calling pg_dump? Is the schema-only operation incompatible?

________________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Saturday, April 6, 2019 1:52 PM
To: senor; pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

On 4/6/19 11:44 AM, senor wrote:

The pg_upgrade --jobs option is not passed as an argument when it calls pg_dump. I haven't found anything in docs or forums mentioning a reason for not supporting under certain circumstances other than possibly for pre-9.2. The pg_upgrade docs page states that it allows multiple CPUs to be used for dump and reload of schemas. Some databases I'm upgrading have 500,000+ tables and running with a single process is greatly increasing the upgrade time.

I am also using the --link option.
I have tried "--jobs 20", "--jobs=20", placing this option first and last and many other variations.
I am upgrading 9.2.4 to 9.6.12 on CentOS 6.
Varying hardware but all with 32+ CPU cores.

su - postgres -c "/usr/pgsql-9.6/bin/pg_upgrade --jobs=20 --link \
--old-bindir=/usr/pgsql-9.2/bin/ \
--new-bindir=/usr/pgsql-9.6/bin/ \
--old-datadir=/var/lib/pgsql/9.2/data/ \
--new-datadir=/var/lib/pgsql/9.6/data/"

I feel like there's a simple reason I've missed but this seems pretty straight forward.

https://www.postgresql.org/docs/9.6/app-pgdump.html

"--jobs=njobs

A secondary plan would be to find instructions for doing the same as "pg_upgrade --link" manually so I can run "pg_dump --jobs 20".
Any assist is appreciated.
Thanks,
S. Cervesa

--
Adrian Klaver
adrian.klaver@aklaver.com

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: senor (#3)

Re: pg_upgrade --jobs

senor <frio_cervesa@hotmail.com> writes:

Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory output format when calling pg_dump? Is the schema-only operation incompatible?

Well, there's no point in it. pg_dump can only parallelize data dumping,
and there's none to be done in the --schema-only case that pg_upgrade
uses.

Also, since pg_upgrade *does* use parallelism across multiple pg_dump
calls (if you've got multiple databases in the cluster), it'd be a bit
problematic to have another layer of parallelism below that, if it did
indeed do anything. You don't want "--jobs=10" to suddenly turn into
100 sessions.

regards, tom lane

senor

frio_cervesa@hotmail.com

about 7 years ago

In reply to: Tom Lane (#4)

Re: pg_upgrade --jobs

Thanks Tom. I suppose "pg_dump can only parallelize data dumping" answers my original question as "expected behavior" but I would like to understand the reason better.

My knowledge of Postgres and other DBMSs is at casual admin level with the occasional deep dive on specific errors or analysis. I'm not averse to getting into the code. Before my OP I searched for reasons that the schema-only option would prevent pg_dump from being able to run multiple jobs and didn't find anything that I understood to confirm either way.

Is the limitation simply the state of development to date or is there something about dumping the schemas that conflicts with paralleling? I'm willing to do some studying if provided links to relevant articles.

The --link option to pg_upgrade would be so much more useful if it weren't still bound to serially dumping the schemas of half a million tables. As already mentioned, if there is an alternate process that mimics pg_upgrade but allows for paralleling, I'm open to that.

Thanks all

________________________________________
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, April 6, 2019 3:02 PM
To: senor
Cc: pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

senor <frio_cervesa@hotmail.com> writes:

Since pg_upgrade is in control of how it is calling pg_dump, is there a reason pg_upgrade cannot use the directory output format when calling pg_dump? Is the schema-only operation incompatible?

Well, there's no point in it. pg_dump can only parallelize data dumping,
and there's none to be done in the --schema-only case that pg_upgrade
uses.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: senor (#5)

Re: pg_upgrade --jobs

senor <frio_cervesa@hotmail.com> writes:

Is the limitation simply the state of development to date or is there
something about dumping the schemas that conflicts with paralleling?

At minimum, it'd take a complete redesign of pg_dump's output format,
and I'm not even very sure what such a redesign would look like. All
the schema information goes into a single file that has to be written
serially. Trying to make it be one file per table definition wouldn't
really fix much: somewhere there has to be a "table of contents", plus
where are you going to put the dependency info that shows what ordering
is required for restore?

The --link option to pg_upgrade would be so much more useful if it
weren't still bound to serially dumping the schemas of half a million
tables.

To be perfectly blunt, if you've got a database with half a million
tables, You're Doing It Wrong.

regards, tom lane

Ron

ronljohnsonjr@gmail.com

about 7 years ago

In reply to: Tom Lane (#6)

Re: pg_upgrade --jobs

On 4/6/19 6:50 PM, Tom Lane wrote:

senor <frio_cervesa@hotmail.com> writes:

[snip]

The --link option to pg_upgrade would be so much more useful if it
weren't still bound to serially dumping the schemas of half a million
tables.

To be perfectly blunt, if you've got a database with half a million
tables, You're Doing It Wrong.

Heavy (*really heavy*) partitioning?

--
Angular momentum makes the world go 'round.

senor

frio_cervesa@hotmail.com

about 7 years ago

In reply to: Ron (#7)

Re: pg_upgrade --jobs

Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look like a problem that had already been solved and I was missing something.

I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence.

I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4 tables, each representing the output of a single job, are there any shortcuts to upgrading that would circumvent exporting the entire schema? I'm sure a different DB design would be better but that's not what I'm working with.

Thanks

________________________________________
From: Ron <ronljohnsonjr@gmail.com>
Sent: Saturday, April 6, 2019 4:57 PM
To: pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

On 4/6/19 6:50 PM, Tom Lane wrote:

senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes:

[snip]

The --link option to pg_upgrade would be so much more useful if it
weren't still bound to serially dumping the schemas of half a million
tables.

To be perfectly blunt, if you've got a database with half a million
tables, You're Doing It Wrong.

Heavy (really heavy) partitioning?

--
Angular momentum makes the world go 'round.

Sherrylyn Branchaw

sbranchaw@gmail.com

about 7 years ago

In reply to: senor (#8)

Re: pg_upgrade --jobs

are there any shortcuts to upgrading that would circumvent exporting the
entire schema?

By "shortcuts," do you mean you want to minimize the time and energy you
put into the upgrade, or that you want to minimize database downtime? If
you mean downtime, I was able to upgrade a customer-facing database with
~350,000 tables from Postgres 9.0 to 9.6 last year with only 86 seconds of
downtime, using Slony, but I had to make many custom modifications to Slony
and test thoroughly beforehand, and it was not for the faint of heart, the
pressed for time, or the inexperienced. There may be better ways (and if
so, I would be curious to learn about them), but Slony was the tool with
which I was most familiar at the time.

This method does, of course, require exporting the entire schema, but
because our only constraint was to minimize customer downtime, and the
database was online while the schema was being exported, we didn't care how
long it took. Your constraints may be different.

For those reading: we do know that 350,000 tables is Doing It Wrong, and
we're getting rid of them, but we decided being on an EOLed version of
Postgres was worse and should be fixed first.

Sherrylyn

#10

Adrian Klaver

adrian.klaver@aklaver.com

about 7 years ago

In reply to: senor (#8)

Re: pg_upgrade --jobs

On 4/6/19 5:47 PM, senor wrote:

Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look like a problem that had already been solved and I was missing something.

I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence.

I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4 tables, each representing the output of a single job, are there any shortcuts to upgrading that would circumvent exporting the entire schema? I'm sure a different DB design would be better but that's not what I'm working with.

An answer is going to depend on more information:

1) What is the time frame for moving from one version to another?
Both the setup and the actual downtime.

2) There are 500,000+ tables, but what is the amount of data involved?

3) Are all the tables active?

4) How are the tables distributed across databases in the cluster and
schemas in each database?

Thanks

________________________________________
From: Ron <ronljohnsonjr@gmail.com>
Sent: Saturday, April 6, 2019 4:57 PM
To: pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

On 4/6/19 6:50 PM, Tom Lane wrote:

senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes:

[snip]

The --link option to pg_upgrade would be so much more useful if it
weren't still bound to serially dumping the schemas of half a million
tables.

To be perfectly blunt, if you've got a database with half a million
tables, You're Doing It Wrong.

Heavy (really heavy) partitioning?

--
Angular momentum makes the world go 'round.

--
Adrian Klaver
adrian.klaver@aklaver.com

#11

senor

frio_cervesa@hotmail.com

about 7 years ago

In reply to: Adrian Klaver (#10)

Re: pg_upgrade --jobs

Thank you Adrian. I'm not sure if I can provide as much as you'd need for a definite answer but I'll give you what I have.

The original scheduled downtime for one installation was 24 hours. By 21 hours it had not completed the pg_dump schema-only so it was returned to operation.
The amount of data per table is widely varied. Some daily tables are 100-200GB and thousands of reports tables with stats are much smaller. I'm not connected to check now but I'd guess 1GB max. We chose to use the --link option partly because some servers do not have the disk space to copy. The time necessary to copy 1-2TB was also going to be an issue.
The vast majority of activity is on current day inserts and stats reports of that data. All previous days and existing reports are read only.
As is all too common, the DB usage grew with no redesign so it is a single database on a single machine with a single schema.
I get the impression there may be an option of getting the schema dump while in service but possibly not in this scenario. Plan B is to drop a lot of tables and deal with imports later.

I appreciate the help.

________________________________________
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Sunday, April 7, 2019 8:19 AM
To: senor; pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

On 4/6/19 5:47 PM, senor wrote:

Thanks Tom for the explanation. I assumed it was my ignorance of how the schema was handled that was making this look like a problem that had already been solved and I was missing something.

I fully expected the "You're Doing It Wrong" part. That is out of my control but not beyond my influence.

I suspect I know the answer to this but have to ask. Using a simplified example where there are 100K sets of 4 tables, each representing the output of a single job, are there any shortcuts to upgrading that would circumvent exporting the entire schema? I'm sure a different DB design would be better but that's not what I'm working with.

An answer is going to depend on more information:

1) What is the time frame for moving from one version to another?
Both the setup and the actual downtime.

2) There are 500,000+ tables, but what is the amount of data involved?

3) Are all the tables active?

4) How are the tables distributed across databases in the cluster and
schemas in each database?

Thanks

________________________________________
From: Ron <ronljohnsonjr@gmail.com>
Sent: Saturday, April 6, 2019 4:57 PM
To: pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

On 4/6/19 6:50 PM, Tom Lane wrote:

senor <frio_cervesa@hotmail.com><mailto:frio_cervesa@hotmail.com> writes:

[snip]

The --link option to pg_upgrade would be so much more useful if it
weren't still bound to serially dumping the schemas of half a million
tables.

To be perfectly blunt, if you've got a database with half a million
tables, You're Doing It Wrong.

Heavy (really heavy) partitioning?

--
Angular momentum makes the world go 'round.

--
Adrian Klaver
adrian.klaver@aklaver.com

#12

senor

frio_cervesa@hotmail.com

about 7 years ago

In reply to: Sherrylyn Branchaw (#9)

Re: pg_upgrade --jobs

I just noticed I missed Sherrylyn's post.
I did some reading about Slony and believe it is would be useful if I had the time to dig in. As pointed out, it's not an out-of-the box solution. It is included on the TODO list though. For now I can only dream of the 86 second down time.

Thanks

________________________________________
From: Sherrylyn Branchaw <sbranchaw@gmail.com>
Sent: Sunday, April 7, 2019 6:43 AM
To: senor
Cc: pgsql-general@lists.postgresql.org
Subject: Re: pg_upgrade --jobs

are there any shortcuts to upgrading that would circumvent exporting the entire schema?

By "shortcuts," do you mean you want to minimize the time and energy you put into the upgrade, or that you want to minimize database downtime? If you mean downtime, I was able to upgrade a customer-facing database with ~350,000 tables from Postgres 9.0 to 9.6 last year with only 86 seconds of downtime, using Slony, but I had to make many custom modifications to Slony and test thoroughly beforehand, and it was not for the faint of heart, the pressed for time, or the inexperienced. There may be better ways (and if so, I would be curious to learn about them), but Slony was the tool with which I was most familiar at the time.

This method does, of course, require exporting the entire schema, but because our only constraint was to minimize customer downtime, and the database was online while the schema was being exported, we didn't care how long it took. Your constraints may be different.

For those reading: we do know that 350,000 tables is Doing It Wrong, and we're getting rid of them, but we decided being on an EOLed version of Postgres was worse and should be fixed first.

Sherrylyn

#13

Adrian Klaver

adrian.klaver@aklaver.com

about 7 years ago

In reply to: senor (#11)

Re: pg_upgrade --jobs

On 4/7/19 12:05 PM, senor wrote:

Thank you Adrian. I'm not sure if I can provide as much as you'd need for a definite answer but I'll give you what I have.

The original scheduled downtime for one installation was 24 hours. By 21 hours it had not completed the pg_dump schema-only so it was returned to operation.

So this is more then one cluster?

I am assuming the below was repeated at different sites?

The amount of data per table is widely varied. Some daily tables are 100-200GB and thousands of reports tables with stats are much smaller. I'm not connected to check now but I'd guess 1GB max. We chose to use the --link option partly because some servers do not have the disk space to copy. The time necessary to copy 1-2TB was also going to be an issue.
The vast majority of activity is on current day inserts and stats reports of that data. All previous days and existing reports are read only.
As is all too common, the DB usage grew with no redesign so it is a single database on a single machine with a single schema.
I get the impression there may be an option of getting the schema dump while in service but possibly not in this scenario. Plan B is to drop a lot of tables and deal with imports later.

I take the above to mean that a lot of the tables are cruft, correct?

I appreciate the help.

--
Adrian Klaver
adrian.klaver@aklaver.com

#14

Melvin Davidson

melvin6925@gmail.com

about 7 years ago

In reply to: Adrian Klaver (#13)

Re: pg_upgrade --jobs

The original scheduled downtime for one installation was 24 hours. By 21

hours it had not >completed the pg_dump schema-only so it was returned to
operation.

To me, your best option is to create a slony cluster with the version you
need to upgrade to.
When slony is in sync, simply make it the master and switch to it. It may
take a while for
slony replication to be in sync, but when it is, there will be very little
down time to switch
over.

Slony <http://www.slony.info/>

On Sun, Apr 7, 2019 at 3:36 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 4/7/19 12:05 PM, senor wrote:

Thank you Adrian. I'm not sure if I can provide as much as you'd need

for a definite answer but I'll give you what I have.

The original scheduled downtime for one installation was 24 hours. By 21

hours it had not completed the pg_dump schema-only so it was returned to
operation.

So this is more then one cluster?

I am assuming the below was repeated at different sites?

The amount of data per table is widely varied. Some daily tables are

100-200GB and thousands of reports tables with stats are much smaller. I'm
not connected to check now but I'd guess 1GB max. We chose to use the
--link option partly because some servers do not have the disk space to
copy. The time necessary to copy 1-2TB was also going to be an issue.

The vast majority of activity is on current day inserts and stats

reports of that data. All previous days and existing reports are read only.

As is all too common, the DB usage grew with no redesign so it is a

single database on a single machine with a single schema.

I get the impression there may be an option of getting the schema dump

while in service but possibly not in this scenario. Plan B is to drop a lot
of tables and deal with imports later.

I take the above to mean that a lot of the tables are cruft, correct?

I appreciate the help.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
*Melvin Davidson*
*Maj. Database & Exploration Specialist*
*Universe Exploration Command – UXC*
Employment by invitation only!

#15

Sherrylyn Branchaw

sbranchaw@gmail.com

about 7 years ago

In reply to: Melvin Davidson (#14)

Re: pg_upgrade --jobs

It may take a while for slony replication to be in sync, but when it is,

there will be very little down time to switch over.

I agree in principle, which is why I chose Slony over pg_upgrade for my
company's very similar situation, but my experience was that, out of the
box, Slony was projected to take unacceptably long (months) to sync our
350,000 tables, and downtime was going to be many hours. In order to get
those numbers down, I had to reduce the number of per-table steps Slony was
executing, e.g. by eliminating unnecessary-for-us ones and by rewriting
others to happen in bulk.

Here's something I didn't know Slony did when I didn't have 350,000 tables
to worry about: add a table to replication, run through the existing list
of replicated tables to make sure it's captured any changes that have
happened in the meantime, add one more table to replication, run through
the existing list of tables to make sure no changes have happened, and so
on. The more tables you add, the longer it takes to add the next table.
Here's another thing I didn't know it did: during the switchover, manage 4
triggers per table serially on primary and standby. 4 * 350000 * 2 = 2.8
million triggers. (I knew it managed 4 triggers per table, but not that it
was serial or how long that would take when pg_trigger had almost 3 million
relevant rows.)

I would love to help the OP out in a more hands-on way (I have
upgrade-via-Slony consulting experience), as well as to open source the
custom changes I came up with, but I'm debating whether I have the time to
offer to take on another project right now. I'm also reluctant to summarize
exactly what I did, because messing with pg_catalog directly is very
delicate and likely to go wrong, and I don't recommend it to the
inexperienced all, or to the experienced if they have alternatives.

Plan B is to drop a lot of tables and deal with imports later.

If it were me, I would take a copy of my database, restore it to a sandbox
environment, set up Slony, and get an estimate for the projected sync time.
Let it run for a few hours to see how dramatically the sync time slows down
as the number of tables grows. I.e. don't count the number of replicated
tables after an hour and assume that's your hourly rate of syncing. If it
looks like you can sync your entire database in a few days or less, then
let it run and test out your failover time. If that's acceptable, you're
good to go.

If sync time looks like it's going to be incredibly long on your schema +
hardware + Postgres version, etc., then failover time probably would be
too. In that case, temporarily dropping the tables you can drop may be
preferable to the complexity of making changes to speed up Slony, if you
don't have a seriously experienced DBA on hand.

Sherrylyn