pg_dump assertion failure with "-n pg_catalog"

Started by Jeff Davisalmost 3 years ago8 messagesbugs
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

On version 15 or later:

pg_dump -n pg_catalog postgres > /dev/null
pg_dump: pg_dump.c:13291: dumpCollation: Assertion `collcollate !=
((void *)0)' failed.

Patch attached.

I tried adding a test, but dumping pg_catalog seems to only be
supported for research purposes, so it's not clear what the test should
be testing.

If we went almost two release cycles without anyone noticing, then
perhaps we should just get rid of the ability to dump pg_catalog. But I
think the attached patch clarifies things regardless.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

0001-Fix-pg_dump-assertion-failure-when-dumping-pg_catalo.patchtext/x-patch; charset=UTF-8; name=0001-Fix-pg_dump-assertion-failure-when-dumping-pg_catalo.patchDownload+25-9
#2Michael Paquier
michael@paquier.xyz
In reply to: Jeff Davis (#1)
Re: pg_dump assertion failure with "-n pg_catalog"

On Wed, Jun 07, 2023 at 10:36:29AM -0700, Jeff Davis wrote:

If we went almost two release cycles without anyone noticing, then
perhaps we should just get rid of the ability to dump pg_catalog. But I
think the attached patch clarifies things regardless.

That looks correct for v15 and HEAD, not when dumping from older
versions.

Attempting the same command with a patched pg_dump on ~14 triggers
more failures. The first assertion hit is this one, for instance:
$ pg_dump -n pg_catalog
pg_dump: pg_dump.c:13460: dumpCollation: Assertion `colliculocale != ((void *)0)' failed.

colliculocale and collicurules don't exist there, but the code is
forcing the value to NULL as PQfnumber() returns -1 for an attribute
that does not exist.

else if (collprovider[0] == 'i')
{
+       Assert(colliculocale != NULL);
+       Assert(collcollate == NULL);
+       Assert(collctype == NULL);

If ICU provides the collation, colliculocale is NULL when dumping from
~14, while both collcollate and collctype are not NULL. Hence all
three assertions are incorrect for ~14, while they are correct for
15~. It seems to me that you are going to require a per-version
handling here if you wish to keep collprovider on top of the rest.

I'd like to agree with you about the fact that having collprovider
handled before the rest makes things easier to follow in the future.
At least that's my feeling after looking at your patch.
--
Michael

#3Jeff Davis
pgsql@j-davis.com
In reply to: Michael Paquier (#2)
Re: pg_dump assertion failure with "-n pg_catalog"

On Mon, 2023-06-12 at 15:52 +0900, Michael Paquier wrote:

Attempting the same command with a patched pg_dump on ~14 triggers
more failures.  The first assertion hit is this one, for instance:
$ pg_dump -n pg_catalog
pg_dump: pg_dump.c:13460: dumpCollation: Assertion `colliculocale !=
((void *)0)' failed.

Thank you, new patch attached.

I changed the Assert()s (the old ones as well as the ones I added) into
pg_log_warning()s. I don't think it makes sense to assert based on the
catalog contents in pg_dump -- other places just warn.

Also added a test.

I'd like to agree with you about the fact that having collprovider
handled before the rest makes things easier to follow in the future.
At least that's my feeling after looking at your patch.

The new patch does add about 40 net lines, though.

Given that it fixes an obscure issue, I'm not even sure if it should be
backported -- perhaps it could just go in 17 as more of a cleanup than
a bugfix?

Even aside from the issue fixed by this patch, there are other warnings
emitted when dumping pg_catalog related to types with typtype='p' and
typisdefined=t.

Regards,
Jeff Davis

Attachments:

v2-0001-Fix-pg_dump-assertion-failure-when-dumping-pg_cat.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fix-pg_dump-assertion-failure-when-dumping-pg_cat.patchDownload+61-17
#4Michael Paquier
michael@paquier.xyz
In reply to: Jeff Davis (#3)
Re: pg_dump assertion failure with "-n pg_catalog"

On Thu, Jun 15, 2023 at 03:47:59PM -0700, Jeff Davis wrote:

The new patch does add about 40 net lines, though.

I have looked at dumps generated from servers down to 9.2, and these
seemed OK.

Given that it fixes an obscure issue, I'm not even sure if it should be
backported -- perhaps it could just go in 17 as more of a cleanup than
a bugfix?

Not sure to agree about this point. The first version where this
happens is v15, meaning that two branches are impacted currently. So
the cost of backpatch is minimal.

Even aside from the issue fixed by this patch, there are other warnings
emitted when dumping pg_catalog related to types with typtype='p' and
typisdefined=t.

Right, there are 14 of them, except _record. This is actually much,
much older than the collation bits.
--
Michael

#5Jeff Davis
pgsql@j-davis.com
In reply to: Michael Paquier (#4)
Re: pg_dump assertion failure with "-n pg_catalog"

On Fri, 2023-06-16 at 14:29 +0900, Michael Paquier wrote:

Not sure to agree about this point.  The first version where this
happens is v15, meaning that two branches are impacted currently.  So
the cost of backpatch is minimal.

OK, I will plan to commit and backport early next week unless others
have additional input.

Regards,
Jeff Davis

#6Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#5)
Re: pg_dump assertion failure with "-n pg_catalog"

On Fri, 2023-06-16 at 10:29 -0700, Jeff Davis wrote:

OK, I will plan to commit and backport early next week unless others
have additional input.

Done. Sorry for the long delay -- lost track of this one.

Regards,
Jeff Davis

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Jeff Davis (#3)
Re: pg_dump assertion failure with "-n pg_catalog"

On 16.06.23 00:47, Jeff Davis wrote:

else
{
appendPQExpBufferStr(q, ", lc_collate = ");
-			appendStringLiteralAH(q, collcollate, fout);
+			appendStringLiteralAH(q, collcollate ? collcollate : "", fout);
appendPQExpBufferStr(q, ", lc_ctype = ");
-			appendStringLiteralAH(q, collctype, fout);
+			appendStringLiteralAH(q, collctype ? collctype : "", fout);
}

This new code replaces an (unexpected) null value with an empty string.
Is that correct? Empty locale strings have specific behaviors under the
libc and icu providers. It seems to me it would be better to error out
or don't print a CREATE COLLATION command at all rather than
substituting a value that is not a correct substitute.

Alternatively, a more correct way to produce a null value would be to
not issue the assignment at all (omit the " lc_collation = " etc.).
Then we can leave it up to the backend to accept the command or not, but
at least the dump closely reflects the original catalog state.

#8Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#7)
Re: pg_dump assertion failure with "-n pg_catalog"

On Tue, 2023-09-05 at 11:45 +0200, Peter Eisentraut wrote:

This new code replaces an (unexpected) null value with an empty
string.
Is that correct?  Empty locale strings have specific behaviors under
the
libc and icu providers.  It seems to me it would be better to error
out
or don't print a CREATE COLLATION command at all rather than
substituting a value that is not a correct substitute.

It already issues a warning, which seemed more consistent with other
unexpected situations in pg_dump.

Alternatively, a more correct way to produce a null value would be to
not issue the assignment at all (omit the " lc_collation = " etc.).
Then we can leave it up to the backend to accept the command or not,
but
at least the dump closely reflects the original catalog state.

The backend does not accept a collation specified with lc_ctype and not
lc_collate (or vice versa), so it would be an un-restorable dump. That
doesn't seem great.

Regards,
Jeff Davis