DDL Damage Assessment

Started by Dimitri Fontaineover 11 years ago39 messageshackers

dimitri@2ndQuadrant.fr

over 11 years ago

Hi fellow hackers,

I would like to work on a new feature allowing our users to assess the
amount of trouble they will run into when running a DDL script on their
production setups, *before* actually getting their services down.

The main practical example I can offer here is the ALTER TABLE command.
Recent releases are including very nice optimisations to it, so much so
that it's becoming increasingly hard to answer some very basic
questions:

- what kind of locks will be taken? (exclusive, shared)
- on what objects? (foreign keys, indexes, sequences, etc)
- will the table have to be rewritten? the indexes?

Of course the docs are answering parts of those, but in particular the
table rewriting rules are complex enough that “accidental DBAs” will
fail to predict if the target data type is binary coercible to the
current one.

Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

2. What do you think such a feature should look like?

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabrízio de Royes Mello

fabriziomello@gmail.com

over 11 years ago

In reply to: Dimitri Fontaine (#1)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr>
wrote:

Hi fellow hackers,

I would like to work on a new feature allowing our users to assess the
amount of trouble they will run into when running a DDL script on their
production setups, *before* actually getting their services down.

The main practical example I can offer here is the ALTER TABLE command.
Recent releases are including very nice optimisations to it, so much so
that it's becoming increasingly hard to answer some very basic
questions:

- what kind of locks will be taken? (exclusive, shared)
- on what objects? (foreign keys, indexes, sequences, etc)
- will the table have to be rewritten? the indexes?

Of course the docs are answering parts of those, but in particular the
table rewriting rules are complex enough that “accidental DBAs” will
fail to predict if the target data type is binary coercible to the
current one.

Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

2. What do you think such a feature should look like?

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

What you are proposing is some kind of "dry-run" with verbose output?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

Show quoted text

Timbira: http://www.timbira.com.br
Blog: http://fabriziomello.github.io
Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Github: http://github.com/fabriziomello

Claudio Freire

klaussfreire@gmail.com

over 11 years ago

In reply to: Fabrízio de Royes Mello (#2)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 1:46 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:

On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr>
wrote:

Hi fellow hackers,

I would like to work on a new feature allowing our users to assess the
amount of trouble they will run into when running a DDL script on their
production setups, *before* actually getting their services down.

The main practical example I can offer here is the ALTER TABLE command.
Recent releases are including very nice optimisations to it, so much so
that it's becoming increasingly hard to answer some very basic
questions:

- what kind of locks will be taken? (exclusive, shared)
- on what objects? (foreign keys, indexes, sequences, etc)
- will the table have to be rewritten? the indexes?

Of course the docs are answering parts of those, but in particular the
table rewriting rules are complex enough that “accidental DBAs” will
fail to predict if the target data type is binary coercible to the
current one.

Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

2. What do you think such a feature should look like?

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

What you are proposing is some kind of "dry-run" with verbose output?

EXPLAIN ALTER TABLE ?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Harold Giménez

harold@heroku.com

over 11 years ago

In reply to: Fabrízio de Royes Mello (#2)

Re: DDL Damage Assessment

I think the main issue is when a table rewrite is triggered on a DDL
command on a large table, as this is what frequently leads to
unavailability. The idea of introducing a NOREWRITE keyword to DDL
commands then came up (credit: Peter Geoghegan). When the NOREWRITE
keyword is used and the DDL statement would rewrite the table, the
command errors and exits.

This would allow ORM and framework authors to include the NOREWRITE
option by default, only to be disabled on a per-statement basis by the
developer, once they have assessed that it may be safe or otherwise
they still want to proceed with this. The workflow for an app
developer then becomes:

* Write offending data migration (eg: add a column with a NOT NULL
constraint and default value)
* Test it locally, either by running automated test suite or running on staging
* See that it fails because of NOREWRITE option
* Assess situation. If it's a small table, or I still want to ignore,
override the option. Or rewrite migration to avoid rewrite.
* Repeat

I like this a lot just because it's simple, limited in scope, and can
be easily integrated into ORMs saving users hours of downtime and
frustration.

Thoughts?

On Thu, Oct 2, 2014 at 9:46 AM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:

On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr>
wrote:

Hi fellow hackers,

I would like to work on a new feature allowing our users to assess the
amount of trouble they will run into when running a DDL script on their
production setups, *before* actually getting their services down.

The main practical example I can offer here is the ALTER TABLE command.
Recent releases are including very nice optimisations to it, so much so
that it's becoming increasingly hard to answer some very basic
questions:

- what kind of locks will be taken? (exclusive, shared)
- on what objects? (foreign keys, indexes, sequences, etc)
- will the table have to be rewritten? the indexes?

Of course the docs are answering parts of those, but in particular the
table rewriting rules are complex enough that “accidental DBAs” will
fail to predict if the target data type is binary coercible to the
current one.

Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

2. What do you think such a feature should look like?

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

What you are proposing is some kind of "dry-run" with verbose output?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

Timbira: http://www.timbira.com.br
Blog: http://fabriziomello.github.io
Linkedin: http://br.linkedin.com/in/fabriziomello
Twitter: http://twitter.com/fabriziomello
Github: http://github.com/fabriziomello

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joe Conway

mail@joeconway.com

over 11 years ago

In reply to: Dimitri Fontaine (#1)

Re: DDL Damage Assessment

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/02/2014 11:30 AM, Dimitri Fontaine wrote:

Questions:

1. Do you agree that a systematic way to report what a DDL command
(or script, or transaction) is going to do on your production
database is a feature we should provide to our growing user base?

I really like the idea and would find it useful/time-saving

2. What do you think such a feature should look like?

Elsewhere on this thread EXPLAIN was suggested. That makes a certain
amount of sense.

Maybe something like EXPLAIN IMPACT [...]

3. Does it make sense to support the whole set of DDL commands from
the get go (or ever) when most of them are only taking locks in
their own pg_catalog entry anyway?

Yes, I think it should cover all commands that can have an
availability impact.

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJULYhAAAoJEDfy90M199hllYgP/0Du599FAMtGh+Z9PsT+XRp9
eodurnf3TjbN8euh+/KGUDDy9dh8xiyeVCbLwT1a7tbJpY5ziGKQFrFm/5yXteq1
vU58mrvx3RwsuWJiTxVKUUddJgBd/e1Q1n7CS/rDHMWyHHxW9PfVi4c/V/09NB/p
IZQP2lTiEJMZVRgemR53OokQarmrm08fN5HtaAbdwwA0y3q26lPWyx7y0DBiy1w2
2KMNQVxIHDYPby+HlDiJEwq8YxNEOuUcznfr2rICxX5iJxsoA13A04GwqDnzcPdL
W3eg+P4qV7TriytpGD1GgqkyAzqTuQNaOBcGY7pvWBhBjQiDPA0fGuNw/a7MeOco
9JTJeCjOygoSopnMFMXyF7epjZxReZtr88uC8nZDXC8wwkJIVDzhNQefhT1lTA+a
1MTcBwgFBq1lH5ttdOTKjbqD7+uPp7nxaMhD9GNgCLu/NZeMNo1O4HMjv9Ir6AyQ
etbkxcdOFuDaHmnrXnGOAFiM01JmorpVu6LBw4OjiD9KaO9X0gudHPo4LzocCxdB
6V2eTl95z/fKlG7uQOrNJ/S9y43FhFtgMZVsi0qIRqzu34ge7nxowjwyF9wcMZSq
CKCEk4NlzULGsivPF96eMxxtebFgvYp10AvRvckGuf9s3dZBmqHfI6PPT1J3qPyj
goq9yD/KpDfHLziqmZpr
=6cWT
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joshua D. Drake

jd@commandprompt.com

over 11 years ago

In reply to: Dimitri Fontaine (#1)

Re: DDL Damage Assessment

On 10/02/2014 09:30 AM, Dimitri Fontaine wrote:

Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

I would say it is late to the game and a great feature.

2. What do you think such a feature should look like?

I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever.
Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE> in
case people want to run it on staging/test/dev environments to judge impact.

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

I would think that introducing this incrementally makes sense.

--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should
not be surprised when they come back as Romans."

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Dimitri Fontaine (#1)

Re: DDL Damage Assessment

* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

I definitely like the idea of such a 'dry-run' kind of operation to get
an idea of what would happen.

2. What do you think such a feature should look like?

My thinking is that this would be implemented as a new kind of read-only
transaction type.

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

On the fence about this one.. In general, I'd say "yes", but I've not
looked at every case and I imagine there are DDL commands which really
aren't all that interesting for this case.

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

That feels a bit ambitious, given that we've not yet really nailed down
the feature definition yet, but I do like where you're going. :)

Thanks!

Stephen

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Harold Giménez (#4)

Re: DDL Damage Assessment

* Harold Giménez (harold@heroku.com) wrote:

I think the main issue is when a table rewrite is triggered on a DDL
command on a large table, as this is what frequently leads to
unavailability. The idea of introducing a NOREWRITE keyword to DDL
commands then came up (credit: Peter Geoghegan). When the NOREWRITE
keyword is used and the DDL statement would rewrite the table, the
command errors and exits.

This would allow ORM and framework authors to include the NOREWRITE
option by default, only to be disabled on a per-statement basis by the
developer, once they have assessed that it may be safe or otherwise
they still want to proceed with this. The workflow for an app
developer then becomes:

* Write offending data migration (eg: add a column with a NOT NULL
constraint and default value)
* Test it locally, either by running automated test suite or running on staging
* See that it fails because of NOREWRITE option
* Assess situation. If it's a small table, or I still want to ignore,
override the option. Or rewrite migration to avoid rewrite.
* Repeat

I like this a lot just because it's simple, limited in scope, and can
be easily integrated into ORMs saving users hours of downtime and
frustration.

Thoughts?

Not against it, but feels like an independent thing to consider- what
Devrim is suggesting is broader and encompasses the issue of locks,
which are certainly important to consider also.

In short, seems like having both would be worthwhile.

Thanks,

Stephen

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Joshua D. Drake (#6)

Re: DDL Damage Assessment

* Joshua D. Drake (jd@commandprompt.com) wrote:

2. What do you think such a feature should look like?

I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever.
Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE>
in case people want to run it on staging/test/dev environments to
judge impact.

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it. Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'. It might even be
possible to have the local "view" of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out. Having 'explain <whatever>' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

I would think that introducing this incrementally makes sense.

Agreed.

Thanks,

Stephen

#10

Josh Berkus

josh@agliodbs.com

over 11 years ago

In reply to: Dimitri Fontaine (#1)

Re: DDL Damage Assessment

Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

Yes.

2. What do you think such a feature should look like?

As with others, I think EXPLAIN is a good way to do this without adding
a keyword. So you'd do:

EXPLAIN
ALTER TABLE ....

... and it would produce a bunch of actions, available in either text or
JSON formats. For example:

{ locks : [ { lock_type: relation,
relation: table1,
lock type: ACCESS EXCLUSIVE },
{ lock_type: transaction },
{ lock_type: catalog,
catalogs: [pg_class, pg_attribute, pg_statistic],
lock_type: EXCLUSIVE } ]
}
{ writes : [
{ object: relation files,
action: rewrite },
{ object: catalogs
action: update }
]

... etc. Would need a lot of refinement, but you get the idea.

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

Well, eventually we'd want to support all of them just to avoid having
things be wierd for users. However, here's a priority order:

ALTER TABLE
CREATE TABLE
DROP TABLE
ALTER VIEW
CREATE VIEW
CREATE INDEX
DROP INDEX

... since all of the above can have unexpected secondary effects on
locking. For example, if you create a table with FKs it will take an
ACCESS EXCLUSIVE lock on the FK targets. And if you DROP a partition,
it takes an A.E. lock on the parent table.

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

Great!

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM636216bb5fb973ab3bb6c171591cf4da94360208892eb5088c987d3aa10d2223e29874f73994b092f3ba1f5c9d8fadce@asav-2.01.com

#11

José Luis Tallón

jltallon@adv-solutions.net

over 11 years ago

In reply to: Dimitri Fontaine (#1)

Re: DDL Damage Assessment

On 10/02/2014 06:30 PM, Dimitri Fontaine wrote:

Hi fellow hackers,
[snip]
Questions:

1. Do you agree that a systematic way to report what a DDL command (or
script, or transaction) is going to do on your production database
is a feature we should provide to our growing user base?

Yes, please

2. What do you think such a feature should look like?

EXPLAIN [(verbose, format)] [DDL_COMMAND]

as in:
EXPLAIN (verbose on, format text, impact on)
ALTER TABLE emp
ADD COLUMN foo2 jsonb NOT NULL DEFAULT '{}';

where the output would include something like:

...
EXCLUSIVE LOCK ON TABLE emp; // due to "IMPACT ON"
REWRITE TABLE emp due to adding column foo2 (default='{}'::jsonb)
// due to "VERBOSE on"
...

3. Does it make sense to support the whole set of DDL commands from the
get go (or ever) when most of them are only taking locks in their
own pg_catalog entry anyway?

For completeness sake, yes.
But, unless the "impact" and "verbose" modifiers are specified, most
would be quite self-explanatory:

EXPLAIN (verbose on, impact on) TRUNCATE TABLE emp;
Execution plan:
-> EXCLUSIVE LOCK ON TABLE emp;
....
-> truncate index: IIIIII (file=NNNNN) // NNNN
= relfilenode
-> truncate main fork: NNNNN (tablespace: TTTTT) // NNNN
= relfilenode
-> truncate visibility map
....
-> RELEASE LOCK ON TABLE emp;
....
Summary: ZZZZZ pages ( MMM MB ) would be freed

versus a simple:
EXPLAIN TRUNCATE TABLE emp;
Execution plan:
-> truncate index: emp_pkey
-> truncate index: emp_foo2_idx
-> truncate relation emp

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

Sounds very good, indeed.
Count on me as tester :)

--
José Luis Tallón

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Steven Lembark

lembark@wrkhors.com

over 11 years ago

In reply to: Claudio Freire (#3)

Re: DDL Damage Assessment

EXPLAIN ALTER TABLE ?

Good thing: People recognize it.
Bad thing: People might not be able to tell the difference between
a DDL and DML result.

What about "EXPLAIN DDL ..."?

The extra keyword ("DDL") makes it a bit more explicit that the
results are not comparable to the standard explain output.

--
Steven Lembark 3646 Flora Pl
Workhorse Computing St Louis, MO 63110
lembark@wrkhors.com +1 888 359 3508

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Claudio Freire

klaussfreire@gmail.com

over 11 years ago

In reply to: Stephen Frost (#9)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Joshua D. Drake (jd@commandprompt.com) wrote:

2. What do you think such a feature should look like?

I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever.
Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE>
in case people want to run it on staging/test/dev environments to
judge impact.

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it. Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'. It might even be
possible to have the local "view" of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out. Having 'explain <whatever>' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

That sounds extremely complex. You'd have to implement the fake
columns, foreign keys, indexes, etc on most execution nodes, the
planner, and even system views.

IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
locks. I don't think you can simulate the side effects without locks,
so getting the local view of changes will be extremely difficult
unless you limit the scope considerably.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Peter Geoghegan

pg@bowt.ie

over 11 years ago

In reply to: Stephen Frost (#9)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost@snowman.net> wrote:

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it. Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'. It might even be
possible to have the local "view" of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out. Having 'explain <whatever>' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

It's kind of tricky to implement a patch to figure this out ahead of
time. Some of the actual lock acquisitions are well hidden, in terms
of how the code is structured. In others cases, it may not even be
possible to determine ahead of time exactly what locks will be taken.

As Harold mentioned, another idea along the same lines would be to
decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite
assertion". Basically, if this DDL (or perhaps any DDL, if this is
implemented as a GUC instead) necessitates a table rewrite (and
requires an AccessExclusiveLock), throw an error. That's the case that
most people care about.

This may not even be good enough, though. Consider:

Session 1 is a long running transaction. Maybe it's a spurious
idle-in-transaction situation, but it could also be totally
reasonable. It holds an AccessShareLock on some relation, as long
running transactions are inclined to do.

Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
TABLE on the same relation (or whatever). But it doesn't need a
rewrite, which is good. It comes along and attempts to acquire the
lock, blocking on session 1.

Session 3 is an innocent bystander. It goes to query the same table in
an ordinary, routine way - a SELECT statement. Even though session 2's
lock is not granted yet, session 3 is not at liberty to skip the queue
and get its own AccessShareLock. The effect is about the same as if
session 2 did need to hold an AccessExclusiveLock for ages: read
queries block for a long time. And yet, in theory session 2's impact
on production should not be minimal, if we consider something like
EXPLAIN output.

Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
particularly bad case. NOWAIT might be the wrong thing for DDL
generally.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Peter Geoghegan

pg@bowt.ie

over 11 years ago

In reply to: Peter Geoghegan (#14)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 1:37 PM, Peter Geoghegan <pg@heroku.com> wrote:

And yet, in theory session 2's impact
on production should not be minimal, if we consider something like
EXPLAIN output.

"Should have been minimal", I mean.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Claudio Freire

klaussfreire@gmail.com

over 11 years ago

In reply to: Peter Geoghegan (#14)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 5:37 PM, Peter Geoghegan <pg@heroku.com> wrote:

Session 3 is an innocent bystander. It goes to query the same table in
an ordinary, routine way - a SELECT statement. Even though session 2's
lock is not granted yet, session 3 is not at liberty to skip the queue
and get its own AccessShareLock. The effect is about the same as if
session 2 did need to hold an AccessExclusiveLock for ages: read
queries block for a long time. And yet, in theory session 2's impact
on production should not be minimal, if we consider something like
EXPLAIN output.

The explain would show the AccessExclusiveLock, so it would be enough
for a heads-up to kill all idle-in-transaction holding locks on the
target relation (if killable, or just wait).

Granted, it's something that's not easily automatable, whereas a nowait is.

However, rather than nowait, I'd prefer "cancellable" semantics, that
would cancel voluntarily if any other transaction requests a
conflicting lock, like autovacuum does.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Peter Geoghegan

pg@bowt.ie

over 11 years ago

In reply to: Claudio Freire (#16)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 1:52 PM, Claudio Freire <klaussfreire@gmail.com> wrote:

The explain would show the AccessExclusiveLock, so it would be enough
for a heads-up to kill all idle-in-transaction holding locks on the
target relation (if killable, or just wait).

I think that there are very few problems with recognizing when an
AccessExclusiveLock is needed or not needed. The exceptions to the
rule that DDL needs such a lock are narrow enough that I have a hard
time believing that most people think about it, or even need to think
about it. I wish that wasn't the case, but it is.

Granted, it's something that's not easily automatable, whereas a nowait is.

However, rather than nowait, I'd prefer "cancellable" semantics, that
would cancel voluntarily if any other transaction requests a
conflicting lock, like autovacuum does.

I think the problem you'll have with NOWAIT is: you have an error from
having to wait...what now? Do you restart? I imagine this would
frequently result in what is effectively lock starvation. Any old
AccessShareLock-er is going to make our migration tool restart. We'll
never finish.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Claudio Freire (#13)

Re: DDL Damage Assessment

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it. Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'. It might even be
possible to have the local "view" of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out. Having 'explain <whatever>' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

That sounds extremely complex. You'd have to implement the fake
columns, foreign keys, indexes, etc on most execution nodes, the
planner, and even system views.

Eh? We have MVCC catalog access.

IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
locks. I don't think you can simulate the side effects without locks,

Why? If you know the transaction is going to roll back and you only add
entries to the catalog which aren't visible to any other transactions
than your own, and you make sure that nothing you do actually writes
data out which is visible to other transactions..

so getting the local view of changes will be extremely difficult
unless you limit the scope considerably.

I agree that there may be complexities, but I'm not sure this is really
the issue..

Thanks,

Stephen

#19

Claudio Freire

klaussfreire@gmail.com

over 11 years ago

In reply to: Peter Geoghegan (#17)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 6:00 PM, Peter Geoghegan <pg@heroku.com> wrote:

Granted, it's something that's not easily automatable, whereas a nowait is.

However, rather than nowait, I'd prefer "cancellable" semantics, that
would cancel voluntarily if any other transaction requests a
conflicting lock, like autovacuum does.

I think the problem you'll have with NOWAIT is: you have an error from
having to wait...what now? Do you restart? I imagine this would
frequently result in what is effectively lock starvation. Any old
AccessShareLock-er is going to make our migration tool restart. We'll
never finish.

I've done that manually (throw the DDL, and cancel if it takes more
than a couple of seconds) on modest but relatively busy servers with
quite some success.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Claudio Freire

klaussfreire@gmail.com

over 11 years ago

In reply to: Stephen Frost (#18)

Re: DDL Damage Assessment

On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Claudio Freire (klaussfreire@gmail.com) wrote:

On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it. Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'. It might even be
possible to have the local "view" of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out. Having 'explain <whatever>' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

That sounds extremely complex. You'd have to implement the fake
columns, foreign keys, indexes, etc on most execution nodes, the
planner, and even system views.

Eh? We have MVCC catalog access.

And that needs locks, especially if you modify the underlying filesystem layout.

IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
locks. I don't think you can simulate the side effects without locks,

Why? If you know the transaction is going to roll back and you only add
entries to the catalog which aren't visible to any other transactions
than your own, and you make sure that nothing you do actually writes
data out which is visible to other transactions..

But that's not the scope. If you want a dry-run of table-rewriting
DDL, or DDL interspersed with DML like:

alter table blargh add foo integer;
update blargh set foo = coalesce(bar, baz);

You really cannot hope not to have to write data. The above is also
the case with defaulted columns btw.

so getting the local view of changes will be extremely difficult
unless you limit the scope considerably.

I agree that there may be complexities, but I'm not sure this is really
the issue..

In essence, if you want MVCC catalog access without AEL, you're in for
a rough ride. I'm not as experienced with pg's core as you, so you
tell me, but I imagine it will be the case.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Peter Geoghegan

pg@bowt.ie

over 11 years ago

In reply to: Claudio Freire (#19)

#22

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Peter Geoghegan (#14)

#23