BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped

Started by PG Bug reporting formover 4 years ago15 messages
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17098
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 14beta2
Operating system: Ubuntu 20.04
Description:

When trying to add to an extension a type that is already exists in the
extension while the extension is being dropped I get a failed assertion with
the following stack:
Core was generated by `postgres: law regression [local] ALTER EXTENSION
'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f59c4321859 in __GI_abort () at abort.c:79
#2 0x000055c3c74912e0 in ExceptionalCondition
(conditionName=conditionName@entry=0x55c3c76a024c "strvalue != NULL",
errorType=errorType@entry=0x55c3c74f000b "FailedAssertion",
fileName=0x7ffc8db80390 "\301\022I\307\303U",
fileName@entry=0x55c3c76a0241 "snprintf.c",
lineNumber=lineNumber@entry=442) at assert.c:69
#3 0x000055c3c74ee489 in dopr (target=target@entry=0x7ffc8db80970,
format=0x55c3c7521ee7 "\"",
format@entry=0x55c3c7521ec0 "%s is already a member of extension
\"%s\"", args=0x7ffc8db80a20) at snprintf.c:442
#4 0x000055c3c74eeb5c in pg_vsnprintf (str=<optimized out>,
count=<optimized out>, count@entry=1024,
fmt=fmt@entry=0x55c3c7521ec0 "%s is already a member of extension
\"%s\"", args=args@entry=0x7ffc8db80a20)
at snprintf.c:195
#5 0x000055c3c74e4e33 in pvsnprintf (buf=<optimized out>,
len=len@entry=1024,
fmt=fmt@entry=0x55c3c7521ec0 "%s is already a member of extension
\"%s\"", args=args@entry=0x7ffc8db80a20)
at psprintf.c:110
#6 0x000055c3c74e6221 in appendStringInfoVA (str=str@entry=0x7ffc8db80a00,

fmt=fmt@entry=0x55c3c7521ec0 "%s is already a member of extension
\"%s\"", args=args@entry=0x7ffc8db80a20)
at stringinfo.c:149
#7 0x000055c3c7495fbe in errmsg (fmt=fmt@entry=0x55c3c7521ec0 "%s is
already a member of extension \"%s\"")
at elog.c:919
#8 0x000055c3c7120cda in ExecAlterExtensionContentsStmt
(stmt=stmt@entry=0x55c3c7810f60,
objAddr=objAddr@entry=0x7ffc8db80c14) at extension.c:3342
#9 0x000055c3c7357837 in ProcessUtilitySlow
(pstate=pstate@entry=0x55c3c7831d60, pstmt=pstmt@entry=0x55c3c7811270,
queryString=queryString@entry=0x55c3c7810450 "ALTER EXTENSION cube ADD
TYPE side;",
context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0,
dest=0x55c3c7811340, qc=0x7ffc8db81130) at utility.c:1550
#10 0x000055c3c7356b48 in standard_ProcessUtility (pstmt=0x55c3c7811270,
queryString=0x55c3c7810450 "ALTER EXTENSION cube ADD TYPE side;",
readOnlyTree=<optimized out>,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55c3c7811340, qc=0x7ffc8db81130)
at utility.c:1049
#11 0x000055c3c7356c31 in ProcessUtility (pstmt=pstmt@entry=0x55c3c7811270,
queryString=<optimized out>,
readOnlyTree=<optimized out>,
context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
queryEnv=<optimized out>, dest=0x55c3c7811340, qc=0x7ffc8db81130) at
utility.c:527
#12 0x000055c3c7354157 in PortalRunUtility
(portal=portal@entry=0x55c3c78725b0, pstmt=pstmt@entry=0x55c3c7811270,
isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55c3c7811340,

qc=qc@entry=0x7ffc8db81130) at pquery.c:1147
#13 0x000055c3c7354459 in PortalRunMulti
(portal=portal@entry=0x55c3c78725b0, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,
dest=dest@entry=0x55c3c7811340, altdest=altdest@entry=0x55c3c7811340,
qc=qc@entry=0x7ffc8db81130) at pquery.c:1304
#14 0x000055c3c735488d in PortalRun (portal=portal@entry=0x55c3c78725b0,
count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
dest=dest@entry=0x55c3c7811340,
altdest=altdest@entry=0x55c3c7811340, qc=0x7ffc8db81130) at
pquery.c:786
#15 0x000055c3c7350ada in exec_simple_query (
query_string=query_string@entry=0x55c3c7810450 "ALTER EXTENSION cube ADD
TYPE side;") at postgres.c:1214
#16 0x000055c3c7352aac in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffc8db81320, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4486
#17 0x000055c3c72ada9c in BackendRun (port=port@entry=0x55c3c7831220) at
postmaster.c:4507
#18 0x000055c3c72b0cb1 in BackendStartup (port=port@entry=0x55c3c7831220) at
postmaster.c:4229
#19 0x000055c3c72b0ef8 in ServerLoop () at postmaster.c:1745
#20 0x000055c3c72b2445 in PostmasterMain (argc=3, argv=<optimized out>) at
postmaster.c:1417
#21 0x000055c3c71f309e in main (argc=3, argv=0x55c3c780a4c0) at main.c:209

Reproduced with the following script:
echo "
CREATE EXTENSION cube;
DROP EXTENSION cube;
" >/tmp/ce.sql

echo "
CREATE TYPE side AS ENUM('front');
ALTER EXTENSION cube ADD TYPE side;
" >/tmp/ae.sql

for n in `seq 10`; do
echo "iteration $n"
( { for f in `seq 1000`; do cat /tmp/ce.sql; done } | psql ) >psql-1.log
2>&1 &
( { for f in `seq 1000`; do cat /tmp/ae.sql; done } | psql ) >psql-2.log
2>&1 &
wait
coredumpctl --no-pager && break
done

Reproduced on REL_12_STABLE..master. The assert was added by 6d842be6c, so
on REL_11_STABLE I see just:
ERROR: type side is already a member of extension "(null)"

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped

PG Bug reporting form <noreply@postgresql.org> writes:

When trying to add to an extension a type that is already exists in the
extension while the extension is being dropped I get a failed assertion with
the following stack:

I think that the root issue here is that ExecAlterExtensionContentsStmt
fails to acquire any sort of lock on the extension. Considering that
it *does* lock the object to be added/dropped, that's a rather glaring
oversight. Fortunately it seems easily fixable ... though I wonder
how many other similar oversights we have.

However, that root issue is converted from a relatively minor bug into
a server crash because snprintf.c treats a NULL pointer passed to %s
as a crash-worthy error. I have advocated for that behavior in the
past, but I'm starting to wonder if it wouldn't be wiser to change
over to the glibc-ish behavior of printing "(null)" or the like.
It seems like we've long since found all the interesting bugs that
the assert-or-crash behavior could reveal, and now we're down to
weird corner cases where its main effect is to weaken our robustness.

Thoughts?

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

[ moved from -bugs list for more visibility ]

I wrote:

However, that root issue is converted from a relatively minor bug into
a server crash because snprintf.c treats a NULL pointer passed to %s
as a crash-worthy error. I have advocated for that behavior in the
past, but I'm starting to wonder if it wouldn't be wiser to change
over to the glibc-ish behavior of printing "(null)" or the like.
It seems like we've long since found all the interesting bugs that
the assert-or-crash behavior could reveal, and now we're down to
weird corner cases where its main effect is to weaken our robustness.

I did a little more thinking about this. I believe the strongest
argument for having snprintf.c crash on NULL is that it keeps us
from relying on having more-forgiving behavior in case we're using
platform-supplied *printf functions (cf commit 0c62356cc). However,
that is only relevant for code that's meant to go into pre-v12 branches,
since we stopped using libc's versions of these functions in v12.

So one plausible way to approach this is to say that we should wait
until v11 is EOL and then change it.

However, that feels overly conservative to me. I doubt that anyone
is *intentionally* relying on *printf not crashing on a NULL pointer.
For example, in the case that started this thread:

if (OidIsValid(oldExtension))
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("%s is already a member of extension \"%s\"",
getObjectDescription(&object, false),
get_extension_name(oldExtension))));

the problem is failure to consider the possibility that
get_extension_name could return NULL due to a just-committed
concurrent DROP EXTENSION. I'm afraid there are a lot of
corner cases like that still lurking.

So my feeling about this is that switching snprintf.c's behavior
would produce some net gain in robustness for v12 and up, while
not making things any worse for the older branches. I still hold
to the opinion that we've already flushed out all the cases of
passing NULL that we're likely to find via ordinary testing.

Thoughts?

regards, tom lane

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#3)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:

However, that root issue is converted from a relatively minor bug into
a server crash because snprintf.c treats a NULL pointer passed to %s
as a crash-worthy error. I have advocated for that behavior in the
past, but I'm starting to wonder if it wouldn't be wiser to change
over to the glibc-ish behavior of printing "(null)" or the like.

So my feeling about this is that switching snprintf.c's behavior
would produce some net gain in robustness for v12 and up, while
not making things any worse for the older branches. I still hold
to the opinion that we've already flushed out all the cases of
passing NULL that we're likely to find via ordinary testing.

New cases could be introduced in the future and might remain undetected.

What about adding an Assert that gags on NULLs, but still printing them
as "(null)"? That would help find such problems in a debug build.

Yours,
Laurenz Albe

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#4)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:

So my feeling about this is that switching snprintf.c's behavior
would produce some net gain in robustness for v12 and up, while
not making things any worse for the older branches. I still hold
to the opinion that we've already flushed out all the cases of
passing NULL that we're likely to find via ordinary testing.

New cases could be introduced in the future and might remain undetected.
What about adding an Assert that gags on NULLs, but still printing them
as "(null)"? That would help find such problems in a debug build.

I think you're missing my main point, which is that it seems certain that
there are corner cases that do this *now*. I'm proposing that we redefine
this as not being a crash case, full stop.

Now, what we don't have control of is what will happen in pre-v12
branches on platforms where we use the system's *printf. However,
note what I wrote in the log for 0c62356cc:

Per commit e748e902d, we appear to have little or no coverage in the
buildfarm of machines that will dump core when asked to printf a
null string pointer.

Thus it appears that a large fraction of the world is already either
using glibc or following glibc's lead on this point. If we do likewise,
it will remove some crash cases and not introduce any new ones.

In hindsight I feel like 0c62356cc was an overreaction to the unusual
property e748e902d's bug had, namely that "(null)" was getting printed
in a place where it would not show up in any visible output. Since
we certainly wouldn't consider that behavior OK if we saw it, you'd
really have to assume that there are more lurking bugs with that same
property in order to conclude that the Assert is worth its keep.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
1 attachment(s)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

I wrote:

Now, what we don't have control of is what will happen in pre-v12
branches on platforms where we use the system's *printf. However,
note what I wrote in the log for 0c62356cc:
Per commit e748e902d, we appear to have little or no coverage in the
buildfarm of machines that will dump core when asked to printf a
null string pointer.
Thus it appears that a large fraction of the world is already either
using glibc or following glibc's lead on this point.

Further to that point: I just ran around and verified that the system
printf prints "(null)" rather than crashing on FreeBSD 12.2, NetBSD 8.99,
OpenBSD 6.8, macOS 11.4, and Solaris 11.3. AIX 7.2 and HPUX 10.20 print
"", but still don't crash. If we change snprintf.c then we will also be
okay on Windows, because we've always used our own snprintf on that
platform. In short, the only place I can find where there is actually
any hazard is Solaris 10 [1]Per experimentation locally and on the GCC compile farm, using the attached.. I do not think we should let the risk of
obscure bugs in pre-v12 versions on one obsolete OS drive our
decision-making about this.

regards, tom lane

[1]: Per experimentation locally and on the GCC compile farm, using the attached.
the attached.

Attachments:

pnull.ctext/x-c; charset=us-ascii; name=pnull.cDownload
#7Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#5)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Em ter., 13 de jul. de 2021 às 11:29, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

Laurenz Albe <laurenz.albe@cybertec.at> writes:

On Mon, 2021-07-12 at 13:20 -0400, Tom Lane wrote:

So my feeling about this is that switching snprintf.c's behavior
would produce some net gain in robustness for v12 and up, while
not making things any worse for the older branches. I still hold
to the opinion that we've already flushed out all the cases of
passing NULL that we're likely to find via ordinary testing.

New cases could be introduced in the future and might remain undetected.
What about adding an Assert that gags on NULLs, but still printing them
as "(null)"? That would help find such problems in a debug build.

I think you're missing my main point, which is that it seems certain that
there are corner cases that do this *now*. I'm proposing that we redefine
this as not being a crash case, full stop.

I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
crash.
On production builds, fine, printing (null).
This will put a little more pressure on support, "Hey what mean's (null) in
my logs?",
but it's ok.

regards,
Ranier Vilela

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ranier Vilela (#7)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Ranier Vilela <ranier.vf@gmail.com> writes:

Em ter., 13 de jul. de 2021 às 11:29, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

I think you're missing my main point, which is that it seems certain that
there are corner cases that do this *now*. I'm proposing that we redefine
this as not being a crash case, full stop.

I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
crash.

Did you see my followup? The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.

I flat out don't agree that "crash on debug builds but it's okay on
production" is a useful way to define this. I spend way too much
time already on bug reports that only manifest with asserts enabled.

regards, tom lane

#9Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#8)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Em ter., 13 de jul. de 2021 às 15:26, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

Ranier Vilela <ranier.vf@gmail.com> writes:

Em ter., 13 de jul. de 2021 às 11:29, Tom Lane <tgl@sss.pgh.pa.us>

escreveu:

I think you're missing my main point, which is that it seems certain

that

there are corner cases that do this *now*. I'm proposing that we

redefine

this as not being a crash case, full stop.

I agree with Laurenz Albe, that on Debug builds, *printf with NULL, must
crash.

Did you see my followup?

I am trying.

The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.

I agreed.

I flat out don't agree that "crash on debug builds but it's okay on
production" is a useful way to define this. I spend way too much
time already on bug reports that only manifest with asserts enabled.

I understand.
Bug reports will decrease, because of that, people will lose interest and
motivation to report,
because (null), it doesn't seem like a serious error and my server didn't
crash.

It's a tricky tradeoff.

regards,
Ranier Vilela

#10Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#8)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

On Tue, 2021-07-13 at 14:26 -0400, Tom Lane wrote:

Did you see my followup? The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.

I flat out don't agree that "crash on debug builds but it's okay on
production" is a useful way to define this. I spend way too much
time already on bug reports that only manifest with asserts enabled.

You convinced my that printing "(null)" is better than crashing.
Having a "(null)" show up in a weird place is certainly a minor inconvenience.

But I don't buy your second point: if it is like that, why do we have
Asserts at all?

Yours,
Laurenz Albe

#11Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Tom Lane (#8)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

On 13.07.21 20:26, Tom Lane wrote:

Did you see my followup? The vast majority of live systems do not do
that, so we are accomplishing nothing of value by insisting it's a
crash-worthy bug.

But there are no guarantees that that will be maintained in the future.
In the past, it has often come back to bite us when we relied on
implementation-dependent behavior in the C library or the compiler,
because no optimization might invalidate old assumptions.

In this particular case, I would for example be quite curious how those
alternative minimal C libraries such as musl-libc handle this.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

In this particular case, I would for example be quite curious how those
alternative minimal C libraries such as musl-libc handle this.

Interesting question, so I took a look:

https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593

case 's':
a = arg.p ? arg.p : "(null)";
...

Any others you'd like to consider?

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

I wrote:

Interesting question, so I took a look:
https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
case 's':
a = arg.p ? arg.p : "(null)";

BTW, the adjacent code shows that musl is also supporting glibc's
"%m" extension, so I imagine that they are endeavoring to be
compatible with glibc, and this goes along with that. But that
just supports my larger point: printing "(null)" is clearly the
de facto standard now, whether or not POSIX has caught up with it.

regards, tom lane

#14Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Tom Lane (#12)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

On 14.07.21 18:26, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

In this particular case, I would for example be quite curious how those
alternative minimal C libraries such as musl-libc handle this.

Interesting question, so I took a look:

https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593

case 's':
a = arg.p ? arg.p : "(null)";
...

Any others you'd like to consider?

Similar here:
https://github.com/ensc/dietlibc/blob/master/lib/__v_printf.c#L188

I think unless we get counterexamples, this is all good.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#14)
Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 14.07.21 18:26, Tom Lane wrote:

https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
case 's':
a = arg.p ? arg.p : "(null)";

Similar here:
https://github.com/ensc/dietlibc/blob/master/lib/__v_printf.c#L188

I also took a look at μClibc, as well as glibc itself, and learned some
additional facts. glibc's behavior is not just 'print "(null)" instead'.
It is 'print "(null)" if the field width allows at least six characters,
otherwise print nothing'. μClibc is bug-compatible with this, but other
implementations seem to generally just substitute "(null)" for the input
string and run with that. I'm inclined to side with the latter camp.
I'd rather see something like "(nu" than empty because the latter looks
too much like it might be correct output; so I think glibc is expending
extra code to produce a less-good result.

I think unless we get counterexamples, this is all good.

Barring objections, I'll press ahead with changing snprintf.c to do this.

regards, tom lane