Refactoring SysCacheGetAttr to know when attr cannot be NULL

Started by Daniel Gustafssonabout 3 years ago19 messageshackers
Jump to latest
#1Daniel Gustafsson
daniel@yesql.se

Today we have two fairly common patterns around extracting an attr from a
cached tuple:

a = SysCacheGetAttr(OID, tuple, Anum_pg_foo_bar, &isnull);
Assert(!isnull);

a = SysCacheGetAttr(OID, tuple, Anum_pg_foo_bar, &isnull);
if (isnull)
elog(ERROR, "..");

The error message in the elog() cases also vary quite a lot. I've been unable
to find much in terms of guidelines for when to use en elog or an Assert, with
the likelyhood of a NULL value seemingly being the guiding principle (but not
in all cases IIUC).

The attached refactoring introduce SysCacheGetAttrNotNull as a wrapper around
SysCacheGetAttr where a NULL value triggers an elog(). This removes a lot of
boilerplate error handling which IMO leads to increased readability as the
error handling *in these cases* don't add much (there are other cases where
checking isnull does a lot of valuable work of course). Personally I much
prefer the error-out automatically style of APIs like how palloc saves a ton of
checking the returned allocation for null, this aims at providing a similar
abstraction.

This will reduce granularity of error messages, and as the patch sits now it
does so a lot since the message is left to work on - I wanted to see if this
was at all seen as a net positive before spending time on that part. I chose
an elog since I as a user would prefer to hit an elog instead of a silent keep
going with an assert, this is of course debateable.

Thoughts?

--
Daniel Gustafsson

Attachments:

0001-Add-SysCacheGetAttrNotNull-for-guarnteed-not-null-at.patchapplication/octet-stream; name=0001-Add-SysCacheGetAttrNotNull-for-guarnteed-not-null-at.patch; x-unix-mode=0644Download+233-452
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#1)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

Daniel Gustafsson <daniel@yesql.se> writes:

The attached refactoring introduce SysCacheGetAttrNotNull as a wrapper around
SysCacheGetAttr where a NULL value triggers an elog().

+1, seems like a good idea. (I didn't review the patch details.)

This will reduce granularity of error messages, and as the patch sits now it
does so a lot since the message is left to work on - I wanted to see if this
was at all seen as a net positive before spending time on that part. I chose
an elog since I as a user would prefer to hit an elog instead of a silent keep
going with an assert, this is of course debateable.

I'd venture that the Assert cases are mostly from laziness, and
that once we centralize this it's plenty worthwhile to generate
a decent elog message. You ought to be able to look up the
table and column name from the info that is at hand.

Also ... at least in assert-enabled builds, maybe we could check that
the column being fetched this way is actually marked attnotnull?
That would help to catch misuse.

regards, tom lane

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#1)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 28.02.23 21:14, Daniel Gustafsson wrote:

Today we have two fairly common patterns around extracting an attr from a
cached tuple:

a = SysCacheGetAttr(OID, tuple, Anum_pg_foo_bar, &isnull);
Assert(!isnull);

a = SysCacheGetAttr(OID, tuple, Anum_pg_foo_bar, &isnull);
if (isnull)
elog(ERROR, "..");

The attached refactoring introduce SysCacheGetAttrNotNull as a wrapper around
SysCacheGetAttr where a NULL value triggers an elog(). This removes a lot of
boilerplate error handling which IMO leads to increased readability as the
error handling *in these cases* don't add much (there are other cases where
checking isnull does a lot of valuable work of course). Personally I much
prefer the error-out automatically style of APIs like how palloc saves a ton of
checking the returned allocation for null, this aims at providing a similar
abstraction.

Yes please!

I have occasionally wondered whether just passing the isnull argument as
NULL would be sufficient, so we don't need a new function.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#3)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Yes please!

I have occasionally wondered whether just passing the isnull argument as
NULL would be sufficient, so we don't need a new function.

I thought about that too. I think I prefer Daniel's formulation
with the new function, but I'm not especially set on that.

An advantage of using a new function name is it'd be more obvious
what's wrong if you try to back-patch such code into a branch that
lacks the feature. (Or, of course, we could back-patch the feature.)

regards, tom lane

#5Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#4)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 1 Mar 2023, at 21:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Yes please!

I have occasionally wondered whether just passing the isnull argument as
NULL would be sufficient, so we don't need a new function.

I thought about that too. I think I prefer Daniel's formulation
with the new function, but I'm not especially set on that.

I prefer the new function since the name makes the code self documented rather
than developers not used to the API having to look up what the last NULL
actually means.

--
Daniel Gustafsson

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#1)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 28.02.23 21:14, Daniel Gustafsson wrote:

The attached refactoring introduce SysCacheGetAttrNotNull as a wrapper around
SysCacheGetAttr where a NULL value triggers an elog(). This removes a lot of
boilerplate error handling which IMO leads to increased readability as the
error handling *in these cases* don't add much (there are other cases where
checking isnull does a lot of valuable work of course). Personally I much
prefer the error-out automatically style of APIs like how palloc saves a ton of
checking the returned allocation for null, this aims at providing a similar
abstraction.

I looked through the patch. The changes look ok to me. In some cases,
more line breaks could be removed (that is, the whole call could be put
on one line now).

This will reduce granularity of error messages, and as the patch sits now it
does so a lot since the message is left to work on - I wanted to see if this
was at all seen as a net positive before spending time on that part. I chose
an elog since I as a user would prefer to hit an elog instead of a silent keep
going with an assert, this is of course debateable.

I think an error message like

"unexpected null value in system cache %d column %d"

is sufficient. Since these are "can't happen" errors, we don't need to
spend too much extra effort to make it prettier.

I don't think the unlikely() is going to buy much. If you are worried
on that level, SysCacheGetAttrNotNull() ought to be made inline.
Looking through the sites of the changes, I didn't find any callers
where I'd be worried on that level.

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#2)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 1 Mar 2023, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also ... at least in assert-enabled builds, maybe we could check that
the column being fetched this way is actually marked attnotnull?
That would help to catch misuse.

We could, but that would limit the API to attnotnull columns, rather than when
the caller knows that the attr cannot be NULL either due to attnotnull or due
to intrinsic knowledge based on what is being extracted.

An example of the latter is build_function_result_tupdesc_t() which knows that
proallargtypes cannot be NULL when calling SysCacheGetAttr.

I think I prefer to allow those cases rather than the strict mode where
attnotnull has to be true, do you think it's preferrable to align the API with
attnotnull and keep the current coding for cases like the above?

--
Daniel Gustafsson

#8Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#6)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 2 Mar 2023, at 10:59, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 28.02.23 21:14, Daniel Gustafsson wrote:

The attached refactoring introduce SysCacheGetAttrNotNull as a wrapper around
SysCacheGetAttr where a NULL value triggers an elog(). This removes a lot of
boilerplate error handling which IMO leads to increased readability as the
error handling *in these cases* don't add much (there are other cases where
checking isnull does a lot of valuable work of course). Personally I much
prefer the error-out automatically style of APIs like how palloc saves a ton of
checking the returned allocation for null, this aims at providing a similar
abstraction.

I looked through the patch.

Thanks!

The changes look ok to me. In some cases, more line breaks could be removed (that is, the whole call could be put on one line now).

I've tried to find those that would fit on a single line in the attached v2.

This will reduce granularity of error messages, and as the patch sits now it
does so a lot since the message is left to work on - I wanted to see if this
was at all seen as a net positive before spending time on that part. I chose
an elog since I as a user would prefer to hit an elog instead of a silent keep
going with an assert, this is of course debateable.

I think an error message like

"unexpected null value in system cache %d column %d"

is sufficient. Since these are "can't happen" errors, we don't need to spend too much extra effort to make it prettier.

They really should never happen, but since we have all the information we need
it seems reasonable to ease debugging. I've made a slightly extended elog in
the attached patch.

Callsites which had a detailed errormessage have been left passing isnull, like
for example statext_expressions_load().

I don't think the unlikely() is going to buy much. If you are worried on that level, SysCacheGetAttrNotNull() ought to be made inline. Looking through the sites of the changes, I didn't find any callers where I'd be worried on that level.

Fair enough, removed.

--
Daniel Gustafsson

Attachments:

v2-0001-Add-SysCacheGetAttrNotNull-for-guarnteed-not-null.patchapplication/octet-stream; name=v2-0001-Add-SysCacheGetAttrNotNull-for-guarnteed-not-null.patch; x-unix-mode=0644Download+232-453
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#7)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

Daniel Gustafsson <daniel@yesql.se> writes:

On 1 Mar 2023, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also ... at least in assert-enabled builds, maybe we could check that
the column being fetched this way is actually marked attnotnull?

We could, but that would limit the API to attnotnull columns, rather than when
the caller knows that the attr cannot be NULL either due to attnotnull or due
to intrinsic knowledge based on what is being extracted.
An example of the latter is build_function_result_tupdesc_t() which knows that
proallargtypes cannot be NULL when calling SysCacheGetAttr.

OK, if there are counterexamples then never mind that. I don't think
we want to discourage call sites from using this function.

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#6)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I think an error message like
"unexpected null value in system cache %d column %d"
is sufficient. Since these are "can't happen" errors, we don't need to
spend too much extra effort to make it prettier.

I'd at least like to see it give the catalog's OID. That's easily
convertible to a name, and it doesn't tend to move around across PG
versions, neither of which are true for syscache IDs.

Also, I'm fairly unconvinced that it's a "can't happen" --- this
would be very likely to fire as a result of catalog corruption,
so it would be good if it's at least minimally interpretable
by a non-expert. Given that we'll now have just one copy of the
code, ISTM there's a good case for doing the small extra work
to report catalog and column by name.

regards, tom lane

#11Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#10)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 2 Mar 2023, at 15:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I think an error message like
"unexpected null value in system cache %d column %d"
is sufficient. Since these are "can't happen" errors, we don't need to
spend too much extra effort to make it prettier.

I'd at least like to see it give the catalog's OID. That's easily
convertible to a name, and it doesn't tend to move around across PG
versions, neither of which are true for syscache IDs.

Also, I'm fairly unconvinced that it's a "can't happen" --- this
would be very likely to fire as a result of catalog corruption,
so it would be good if it's at least minimally interpretable
by a non-expert. Given that we'll now have just one copy of the
code, ISTM there's a good case for doing the small extra work
to report catalog and column by name.

Rebased v3 on top of recent conflicting ICU changes causing the patch to not
apply anymore. Also took another look around the tree to see if there were
missed callsites but found none new.

--
Daniel Gustafsson

Attachments:

v3-0001-Add-SysCacheGetAttrNotNull-for-guaranteed-not-nul.patchapplication/octet-stream; name=v3-0001-Add-SysCacheGetAttrNotNull-for-guaranteed-not-nul.patch; x-unix-mode=0644Download+232-453
#12Peter Eisentraut
peter_e@gmx.net
In reply to: Daniel Gustafsson (#11)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 13.03.23 14:19, Daniel Gustafsson wrote:

On 2 Mar 2023, at 15:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I think an error message like
"unexpected null value in system cache %d column %d"
is sufficient. Since these are "can't happen" errors, we don't need to
spend too much extra effort to make it prettier.

I'd at least like to see it give the catalog's OID. That's easily
convertible to a name, and it doesn't tend to move around across PG
versions, neither of which are true for syscache IDs.

Also, I'm fairly unconvinced that it's a "can't happen" --- this
would be very likely to fire as a result of catalog corruption,
so it would be good if it's at least minimally interpretable
by a non-expert. Given that we'll now have just one copy of the
code, ISTM there's a good case for doing the small extra work
to report catalog and column by name.

Rebased v3 on top of recent conflicting ICU changes causing the patch to not
apply anymore. Also took another look around the tree to see if there were
missed callsites but found none new.

I think the only open question here was the granularity of the error
message, which I think you had addressed in v2.

+	if (isnull)
+	{
+		elog(ERROR,
+			 "unexpected NULL value in cached tuple for pg_catalog.%s.%s",
+			 get_rel_name(cacheinfo[cacheId].reloid),
+			 NameStr(TupleDescAttr(SysCache[cacheId]->cc_tupdesc, 
attributeNumber - 1)->attname));
+	}

I prefer to use "null value" for SQL null values, and NULL for the C symbol.

I'm a bit hesitant about hardcoding pg_catalog here. That happens to be
true, of course, but isn't actually enforced, I think. I think that
could be left off. It's not like people will be confused about which
schema "pg_class.relname" is in.

Also, the cached tuple isn't really for the attribute, so maybe split
that up a bit, like

"unexpected null value in cached tuple for catalog %s column %s"

#13David Rowley
dgrowleyml@gmail.com
In reply to: Daniel Gustafsson (#11)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On Tue, 14 Mar 2023 at 02:19, Daniel Gustafsson <daniel@yesql.se> wrote:

Rebased v3 on top of recent conflicting ICU changes causing the patch to not
apply anymore. Also took another look around the tree to see if there were
missed callsites but found none new.

I had a look at this. It generally seems like a good change.

One thing I thought about while looking is it stage 2 might do
something similar for SearchSysCacheN. I then wondered if we're more
likely to want to keep the localised __FILE__, __LINE__ and __func__
in the elog for those or not. It's probably less important that we're
losing those for this change, but worth noting here at least in case
nobody else thought of it.

I only noticed in a couple of places you have a few lines at 80 chars
before the LF. Ideally those would wrap at 79 so that it's 80
including LF. No big deal though.

David

#14Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#13)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 23.03.23 09:52, David Rowley wrote:

One thing I thought about while looking is it stage 2 might do
something similar for SearchSysCacheN. I then wondered if we're more
likely to want to keep the localised __FILE__, __LINE__ and __func__
in the elog for those or not. It's probably less important that we're
losing those for this change, but worth noting here at least in case
nobody else thought of it.

I don't follow what you are asking for here.

#15David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#14)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On Fri, 24 Mar 2023 at 20:31, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 23.03.23 09:52, David Rowley wrote:

One thing I thought about while looking is it stage 2 might do
something similar for SearchSysCacheN. I then wondered if we're more
likely to want to keep the localised __FILE__, __LINE__ and __func__
in the elog for those or not. It's probably less important that we're
losing those for this change, but worth noting here at least in case
nobody else thought of it.

I don't follow what you are asking for here.

I had two points:

1. Doing something similar for SearchSysCache1 and co might be a good
phase two to this change.
2. With the change Daniel is proposing here, \set VERBOSITY verbose is
not going to print as useful information to tracking down where any
unexpected nulls in the catalogue originates.

For #2, I don't think that's necessarily a problem. I can think of two
reasons why SysCacheGetAttrNotNull might throw an ERROR:

a) We used SysCacheGetAttrNotNull() when we should have used SysCacheGetAttr().
b) Catalogue corruption.

A more localised ERROR message might just help more easily tracking
down type a) problems. I imagine it won't be too difficult to just
grep for all the SysCacheGetAttrNotNull calls for the particular
nullable column to find the one causing the issue. For b), the error
message in SysCacheGetAttrNotNull is sufficient without needing to
know where the SysCacheGetAttrNotNull call came from.

David

#16Daniel Gustafsson
daniel@yesql.se
In reply to: David Rowley (#15)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 24 Mar 2023, at 21:59, David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 24 Mar 2023 at 20:31, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 23.03.23 09:52, David Rowley wrote:

One thing I thought about while looking is it stage 2 might do
something similar for SearchSysCacheN. I then wondered if we're more
likely to want to keep the localised __FILE__, __LINE__ and __func__
in the elog for those or not. It's probably less important that we're
losing those for this change, but worth noting here at least in case
nobody else thought of it.

I don't follow what you are asking for here.

I had two points:

1. Doing something similar for SearchSysCache1 and co might be a good
phase two to this change.

Quite possibly yes, they do follow a pretty repeatable pattern.

2. With the change Daniel is proposing here, \set VERBOSITY verbose is
not going to print as useful information to tracking down where any
unexpected nulls in the catalogue originates.

Thats a fair point for the elog() removals, for the rather many assertions it
might be a net positive to get a non-local elog when failing in production.

--
Daniel Gustafsson

#17Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#12)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On 14 Mar 2023, at 08:00, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I prefer to use "null value" for SQL null values, and NULL for the C symbol.

Thats a fair point, I agree with that.

I'm a bit hesitant about hardcoding pg_catalog here. That happens to be true, of course, but isn't actually enforced, I think. I think that could be left off. It's not like people will be confused about which schema "pg_class.relname" is in.

Also, the cached tuple isn't really for the attribute, so maybe split that up a bit, like

"unexpected null value in cached tuple for catalog %s column %s"

No objections, so changed to that wording.

With these changes and a pgindent run across it per Davids comment downthread,
I've pushed this now. Thanks for review!

I'm keeping a watchful eye on the buildfarm; francolin has errored in
recoveryCheck which I'm looking into but at first glance I don't think it's
related (other animals have since passed it and it works locally, but I'll keep
digging at it to make sure).

--
Daniel Gustafsson

#18Justin Pryzby
pryzby@telsasoft.com
In reply to: Daniel Gustafsson (#11)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

On Mon, Mar 13, 2023 at 02:19:07PM +0100, Daniel Gustafsson wrote:

On 2 Mar 2023, at 15:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

I think an error message like
"unexpected null value in system cache %d column %d"
is sufficient. Since these are "can't happen" errors, we don't need to
spend too much extra effort to make it prettier.

I'd at least like to see it give the catalog's OID. That's easily
convertible to a name, and it doesn't tend to move around across PG
versions, neither of which are true for syscache IDs.

Also, I'm fairly unconvinced that it's a "can't happen" --- this
would be very likely to fire as a result of catalog corruption,
so it would be good if it's at least minimally interpretable
by a non-expert. Given that we'll now have just one copy of the
code, ISTM there's a good case for doing the small extra work
to report catalog and column by name.

Rebased v3 on top of recent conflicting ICU changes causing the patch to not
apply anymore. Also took another look around the tree to see if there were
missed callsites but found none new.

+++ b/src/backend/utils/cache/syscache.c
@@ -77,6 +77,7 @@
 #include "catalog/pg_user_mapping.h"
 #include "lib/qunique.h"
 #include "utils/catcache.h"
+#include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
@@ -1099,6 +1100,32 @@ SysCacheGetAttr(int cacheId, HeapTuple tup,
+               elog(ERROR,
+                        "unexpected NULL value in cached tuple for pg_catalog.%s.%s",
+                        get_rel_name(cacheinfo[cacheId].reloid),

Question: Is it safe to be making catalog access inside an error
handler, when one of the most likely reason for hitting the error is
catalog corruption ?

Maybe the answer is that it's not "safe" but "safe enough" - IOW, if
you're willing to throw an assertion, it's good enough to try to show
the table name, and if the error report crashes the server, that's "not
much worse" than having Asserted().

--
Justin

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#18)
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL

Justin Pryzby <pryzby@telsasoft.com> writes:

On Mon, Mar 13, 2023 at 02:19:07PM +0100, Daniel Gustafsson wrote:

+               elog(ERROR,
+                        "unexpected NULL value in cached tuple for pg_catalog.%s.%s",
+                        get_rel_name(cacheinfo[cacheId].reloid),

Question: Is it safe to be making catalog access inside an error
handler, when one of the most likely reason for hitting the error is
catalog corruption ?

I don't see a big problem here. If there were catalog corruption
preventing fetching the catalog's pg_class row, it's highly unlikely
that you'd have managed to retrieve a catalog row to complain about.
(That is, corruption in this particular catalog entry probably does
not extend to the metadata about the catalog containing it.)

Maybe the answer is that it's not "safe" but "safe enough"

Right.

If we got concerned about this we could dodge the extra catalog access
by adding the catalog's name to CatCache entries. I doubt it's worth
it though. We can always re-evaluate if we see actual evidence of
problems.

regards, tom lane