vacuumdb parallel has a deadlock detected in 9.5.4

Started by huangover 9 years ago9 messagesbugs
Jump to latest
#1huang
foggyglass@163.com

Hi friends,

There is a error deadlock detected in vacuumdb parallel .
but if I do not use parallel , it has not deadlock detected .

[postgres@vdedu1 ~]$ vacuumdb -fazev -j 10
SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;
vacuumdb: vacuuming database "mydb"
SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns
WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid
ORDER BY c.relpages DESC;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_proc;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_depend;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_attribute;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_description;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_collation;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_statistic;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_class;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_operator;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_rewrite;
VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_type;
ERROR: deadlock detected
DETAIL: Process 19840 waits for AccessShareLock on relation 1249 of
database 16387; blocked by process 19841.
Process 19841 waits for RowExclusiveLock on relation 2608 of
database 16387; blocked by process 19840.
Process 19840: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_depend;
Process 19841: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_attribute;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_depend;
vacuumdb: vacuuming of database "mydb" failed: ERROR: deadlock detected
DETAIL: Process 19840 waits for AccessShareLock on relation 1249 of
database 16387; blocked by process 19841.
Process 19841 waits for RowExclusiveLock on relation 2608 of database
16387; blocked by process 19840.
HINT: See server log for query details.
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_proc;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_attribute;
LOG: could not send data to client: Broken pipe
FATAL: connection to client lost
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_description;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_collation;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_statistic;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_operator;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_rewrite;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_type;
[postgres@vdedu1 ~]$ LOG: could not send data to client: Broken pipe
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, VERBOSE, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lost

[postgres@vdedu1 ~]$
[postgres@vdedu1 ~]$ cat /proc/v
version vmallocinfo vmstat
[postgres@vdedu1 ~]$ cat /proc/version
Linux version 2.6.32-504.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org)
(gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Wed Oct 15
04:27:16 UTC 2014
[postgres@vdedu1 ~]$ vacuumdb --version
vacuumdb (PostgreSQL) 9.5.4
[postgres@vdedu1 ~]$ psql -c"select version();"
version

--------------------------------------------------------------------------------
--------------------------
PostgreSQL 9.5.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (
Red Hat 4.4.7-11), 64-bit
(1 row)

[postgres@vdedu1 ~]$

2016-09-28
TT

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: huang (#1)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

On Wed, Sep 28, 2016 at 09:46:22PM +0800, huang wrote:

Hi friends,

There is a error deadlock detected in vacuumdb parallel .
but if I do not use parallel , it has not deadlock detected .

man vacuumdb says about -j option:

Note that using this mode together with the -f (FULL) option might cause
deadlock failures if certain system catalogs are processed in parallel.

so, while I understand it's not pretty, it's well documented.

Best regards,

depesz

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: hubert depesz lubaczewski (#2)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

hubert depesz lubaczewski wrote:

On Wed, Sep 28, 2016 at 09:46:22PM +0800, huang wrote:

Hi friends,

There is a error deadlock detected in vacuumdb parallel .
but if I do not use parallel , it has not deadlock detected .

man vacuumdb says about -j option:

Note that using this mode together with the -f (FULL) option might cause
deadlock failures if certain system catalogs are processed in parallel.

so, while I understand it's not pretty, it's well documented.

Yeah. However I think this behavior is rather unhelpful. Maybe we
should try to fix it somehow, but I'm not sure how. We could say that
pg_catalog tables can only be processed one at a time, instead of trying
to paralelize them? For example: have vacuumdb fill two lists of
tables, one for pg_catalog and one for tables in other schemas. Each
worker chooses a table from the pg_catalog list first if it's not empty
and there's no other worker doing one of these, otherwise one from the
other table.

I think that'd fix it, while not destroying paralelizability too badly.

I'm not volunteering, though. Patches welcome.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Francisco Olarte
folarte@peoplecall.com
In reply to: Alvaro Herrera (#3)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

Hi:

On Wed, Sep 28, 2016 at 10:39 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

There is a error deadlock detected in vacuumdb parallel .
but if I do not use parallel , it has not deadlock detected .

man vacuumdb says about -j option:

Note that using this mode together with the -f (FULL) option might cause
deadlock failures if certain system catalogs are processed in parallel.

so, while I understand it's not pretty, it's well documented.

Yeah. However I think this behavior is rather unhelpful. Maybe we
should try to fix it somehow, but I'm not sure how. We could say that
pg_catalog tables can only be processed one at a time, instead of trying
to paralelize them? For example: have vacuumdb fill two lists of
tables, one for pg_catalog and one for tables in other schemas. Each
worker chooses a table from the pg_catalog list first if it's not empty
and there's no other worker doing one of these, otherwise one from the
other table.

I think that'd fix it, while not destroying paralelizability too badly.

I would propose another behaviour, which I think can solve the problem
without introducing more complex code. Put a couple of flags to vacuum
only catalog tables / non catalog tables ( I believe this can be
served by include/exclude schemas too, which will be even more useful
for other things ). This way one can do a full paralell vacuum of
non-catalog objects followed by a serial one on catalog objects (
which should not be too slow on 'normal' databases ) ( I propose this
order because IIRC normal vacuum updates catalog tables so they get
vacuumed after to tidy them )

Francisco Olarte

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5huang
foggyglass@163.com
In reply to: Francisco Olarte (#4)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

At 2016-09-29 15:57:58, "Francisco Olarte" <folarte@peoplecall.com> wrote:

Hi:

On Wed, Sep 28, 2016 at 10:39 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

There is a error deadlock detected in vacuumdb parallel .
but if I do not use parallel , it has not deadlock detected .

man vacuumdb says about -j option:

Note that using this mode together with the -f (FULL) option might cause
deadlock failures if certain system catalogs are processed in parallel.

so, while I understand it's not pretty, it's well documented.

Yeah. However I think this behavior is rather unhelpful. Maybe we
should try to fix it somehow, but I'm not sure how. We could say that
pg_catalog tables can only be processed one at a time, instead of trying
to paralelize them? For example: have vacuumdb fill two lists of
tables, one for pg_catalog and one for tables in other schemas. Each
worker chooses a table from the pg_catalog list first if it's not empty
and there's no other worker doing one of these, otherwise one from the
other table.

I think that'd fix it, while not destroying paralelizability too badly.

I would propose another behaviour, which I think can solve the problem
without introducing more complex code. Put a couple of flags to vacuum
only catalog tables / non catalog tables ( I believe this can be
served by include/exclude schemas too, which will be even more useful
for other things ). This way one can do a full paralell vacuum of
non-catalog objects followed by a serial one on catalog objects (
which should not be too slow on 'normal' databases ) ( I propose this
order because IIRC normal vacuum updates catalog tables so they get
vacuumed after to tidy them )

Francisco Olarte

hi ,
I just hope the vacuumdb parallel can quickly finishing work .
our database has a lot of tables .
they need vacuumdb . but the time is too long .
so , I used the vacuumdb parallel .

thanks to everyone . : )

Best regards,
TT
28-09-2016

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Francisco Olarte (#4)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

Francisco Olarte wrote:

I would propose another behaviour, which I think can solve the problem
without introducing more complex code. Put a couple of flags to vacuum
only catalog tables / non catalog tables ( I believe this can be
served by include/exclude schemas too, which will be even more useful
for other things ). This way one can do a full paralell vacuum of
non-catalog objects followed by a serial one on catalog objects (
which should not be too slow on 'normal' databases )

OK, that sounds easier to implement. Are you going to submit a patch?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Francisco Olarte
folarte@peoplecall.com
In reply to: Alvaro Herrera (#6)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

Hello all:

On Thu, Sep 29, 2016 at 7:10 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Francisco Olarte wrote:

I would propose another behaviour, which I think can solve the problem
without introducing more complex code. Put a couple of flags to vacuum
only catalog tables / non catalog tables ( I believe this can be
served by include/exclude schemas too, which will be even more useful
for other things ). This way one can do a full paralell vacuum of
non-catalog objects followed by a serial one on catalog objects (
which should not be too slow on 'normal' databases )

OK, that sounds easier to implement. Are you going to submit a patch?

After looking at vacuumdb source I'm trying to develop and submit a
patch implementing the following:

Two new flags --include-schema=NAME, --exclude-schema=NAME

Two new flags --system-schemas --non-system-schemas, as alias to
--include-schema=pg_catalog and --exclude-schema=pg_catalog

The intended behaviour will be:

- if a table list (--table) is specified, include / exclude schema
will be ignored ( potentially with a warning or may be an error if
more knowledgeable people steer me that way ).

- if both include and exclude are specified, include will take
preference (with warning or error ? ).

- if not the already existing code used to enumerate tables for
paralell vacuum will be augmented to either use only tables in the
include schemas or not use tables in the exclude schemas. If, after
enumeration, the table list becomes empty an error will be produced (
think on --include-schema=oops-I-mistyped-it ). The filtering will be
done client side and will be case sensitive against the schema name
name returned by the queries to the catalogs ( no case folding, to
make it simple ).

- After that vacuumdb will proceed as it would if the table list where
given in the command line, either in serial or paralell mode.

This does not intend to solve the paralell vacuum deadlocks, but it
can let people avoid them by just doing

vacuumdb -f --exclude-schema=pg_catalog -j 20
vacuumdb -f --include-schema=pg_catalog

And vaccuuming catalogs is not going to benefit that much from
paralellism as it is normally fast.

Any comment ?

Francisco Olarte.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Francisco Olarte (#7)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

On 01-10-2016 12:57, Francisco Olarte wrote:

On Thu, Sep 29, 2016 at 7:10 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Francisco Olarte wrote:

I would propose another behaviour, which I think can solve the problem
without introducing more complex code. Put a couple of flags to vacuum
only catalog tables / non catalog tables ( I believe this can be
served by include/exclude schemas too, which will be even more useful
for other things ). This way one can do a full paralell vacuum of
non-catalog objects followed by a serial one on catalog objects (
which should not be too slow on 'normal' databases )

OK, that sounds easier to implement. Are you going to submit a patch?

After looking at vacuumdb source I'm trying to develop and submit a
patch implementing the following:

Two new flags --include-schema=NAME, --exclude-schema=NAME

Two new flags --system-schemas --non-system-schemas, as alias to
--include-schema=pg_catalog and --exclude-schema=pg_catalog

This is another feature. Also, it will *sometimes* solve the problem.
Why don't you implement your parallel + serialization idea or Alvaro's
two lists idea? I'm not against adding schema options, I just think it's
not the right way to solve the deadlock problem.

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Francisco Olarte
folarte@peoplecall.com
In reply to: Euler Taveira de Oliveira (#8)
Re: vacuumdb parallel has a deadlock detected in 9.5.4

Hi Euler:

On Sun, Oct 2, 2016 at 5:33 AM, Euler Taveira <euler@timbira.com.br> wrote:

This is another feature.

Yes, which allows an easier workaround around the problem. Personally
I do not need it because I do not vacuumm paralell and I can easily
write a simple dozen lines in perl which achieves it, but I feel it
can help people like the OP.

Also, it will *sometimes* solve the problem.

No, it does not solve the problem ( paralell vacuum of system catalogs
can still deadlock ). What it does is let you transform a deadlockish
operation ( full paralell database wide vacuum ) into a non
deadlockish one ( full paralell non-system plus linear system )
without too much effort.

Why don't you implement your parallel + serialization idea or Alvaro's
two lists idea?

Well, I want a servant, not a nanny. If I command vacuum db to do full
paralell system catalog vacuum, I want it to do it ( although I WILL
tolerate a warning and MAY tolerate a confirmation request ).

Besides, I proposed the schema solution because I thought that
paralell vacuum would need some code building a table list, and the -t
option will need code to process tables from a list. Code examination
revealed I was correct in both. Adding the options is a simple matter
of adding some simple code ( a couple options, a couple string lists
to hold their data, a couple ifs in the table scanning loop used when
no table list given, a modification to make it happen when schema
optins are given ( presently it plays only when no -t options and
paralell is requested, I need to activate it when schema options are
present too, either in serial or paralell mode ), and the working code
remains untouched). I could refactor a bit to make several runs
easier, have a built in list of system data, and many other things,
but this is a deeper modification which I do not feel presently
prepared to attack. I may do it in the future, but I feel anyone
needing to run full paralell vacuums has a big system and should have
this kind of ops scripted, so it is not going to be that useful.

Bear in mind my work is not specifically in postgres like yours, Its
just a tool. I've been using it since before it had SQL, and have some
experience with C and libpq, but I do not feel confortable adding
complex code even to a simple client program.

I'm not against adding schema options, I just think it's
not the right way to solve the deadlock problem.

As I said, it does not solve the problem, just builds an easier path
than the present ones around it ( you can do exactly the same by
querying the catalog on a script and emitting two vacuumdb commands
with a table list ).

Francisco Olarte.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs