Patch: Code comments: why some text-handling functions are leakproof

Started by Gurjeet Singhabout 4 years ago4 messages

gurjeet@singh.im

about 4 years ago

1 attachment(s)

Please see attached a small patch to document why some text-processing
functions are marked as leakproof, while some others are not.

This is more or less a verbatim copy of Tom's comment in email thread at
[1]: /messages/by-id/673096.1630006990@sss.pgh.pa.us

I could not find an appropriate spot to place these comments, so I placed
them on bttextcmp() function, The only other place that I could see we can
place these comments is in the file src/backend/optimizer/README, because
there is some consideration given to leakproof functions in optimizer docs.
But these comments seem quite out of place in optimizer docs.

[1]: /messages/by-id/673096.1630006990@sss.pgh.pa.us
/messages/by-id/673096.1630006990@sss.pgh.pa.us

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Attachments:

leakproof_comments.patchapplication/octet-stream; name=leakproof_comments.patchDownload

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index bd3091bbfb..4e464ad09d 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1974,6 +1974,25 @@ text_starts_with(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(result);
 }
 
+/*
+ * Generally speaking, we should be resistant to marking anything leakproof
+ * unless it has a very small code footprint that can be easily audited. In
+ * particular, anything that shares a lot of infrastructure with not-leakproof
+ * functions seems quite hazardous. Even if we go through the code and
+ * convince ourselves that it's OK today, innocent changes to the shared
+ * infrastructure could break the leakproofness tomorrow.
+ *
+ * The function bttextcmp() and its cohorts are marked as leakproof, but the
+ * user-visible string processing functions, like upper(), are not, because:
+ *
+ * 1. The query-optimization usefulness of having those be leakproof
+ * is extremely high.
+ *
+ * 2. btree comparison functions should really not have any user-reachable
+ * failure modes (which comes close to being the definition of leakproof). If one
+ * did, that would mean there were legal values of the type that couldn't be put
+ * into a btree index.
+ */
 Datum
 bttextcmp(PG_FUNCTION_ARGS)
 {

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Gurjeet Singh (#1)

Re: Patch: Code comments: why some text-handling functions are leakproof

On Tue, Jan 11, 2022 at 2:07 AM Gurjeet Singh <gurjeet@singh.im> wrote:

Please see attached a small patch to document why some text-processing functions are marked as leakproof, while some others are not.

This is more or less a verbatim copy of Tom's comment in email thread at [1].

I could not find an appropriate spot to place these comments, so I placed them on bttextcmp() function, The only other place that I could see we can place these comments is in the file src/backend/optimizer/README, because there is some consideration given to leakproof functions in optimizer docs. But these comments seem quite out of place in optimizer docs.

It doesn't seem particularly likely that someone who is thinking about
changing this in the future would notice the comment in the place
where you propose to put it, nor that they would read the optimizer
README.

Furthermore, I don't know that everyone agrees with Tom about this. I
do agree that it's more important to mark relational operators
leakproof than other things, and I also agree that conservatism is
warranted. But that does not mean that someone could not make a
compelling argument for marking other functions leakproof.

I think we will be better off leaving this alone.

--
Robert Haas
EDB: http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Robert Haas (#2)

Re: Patch: Code comments: why some text-handling functions are leakproof

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 11, 2022 at 2:07 AM Gurjeet Singh <gurjeet@singh.im> wrote:

This is more or less a verbatim copy of Tom's comment in email thread at [1].

I could not find an appropriate spot to place these comments, so I placed them on bttextcmp() function, The only other place that I could see we can place these comments is in the file src/backend/optimizer/README, because there is some consideration given to leakproof functions in optimizer docs. But these comments seem quite out of place in optimizer docs.

It doesn't seem particularly likely that someone who is thinking about
changing this in the future would notice the comment in the place
where you propose to put it, nor that they would read the optimizer
README.

Agreed. I think if we wanted to make an upgrade in the way function
leakproofness is documented, we ought to add a <sect1> about it in
xfunc.sgml, adjacent to the one about function volatility categories.
This could perhaps consolidate some of the existing documentation mentions
of leakproofness, as well as adding text similar to what Gurjeet suggests.

Furthermore, I don't know that everyone agrees with Tom about this. I
do agree that it's more important to mark relational operators
leakproof than other things, and I also agree that conservatism is
warranted. But that does not mean that someone could not make a
compelling argument for marking other functions leakproof.

ISTM the proposed text does a reasonable job of explaining why
we made the decisions currently embedded in pg_proc.proleakproof.
If we make some other decisions in future, updating the rationale
in the docs would be an appropriate part of that.

regards, tom lane

Greg Stark

stark@mit.edu

almost 4 years ago

In reply to: Tom Lane (#3)

Re: Patch: Code comments: why some text-handling functions are leakproof

I'm going to mark this returned with feedback.

If you have a chance to update the patch moving the documentation to
xfunc.sgml the way Tom describes make sure to create a new commitfest
entry. I would suggest submitting the patch as a followup on this
thread so when it's added to the commitfest it links to this whole
discussion.

On Mon, 28 Feb 2022 at 17:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 11, 2022 at 2:07 AM Gurjeet Singh <gurjeet@singh.im> wrote:

This is more or less a verbatim copy of Tom's comment in email thread at [1].

I could not find an appropriate spot to place these comments, so I placed them on bttextcmp() function, The only other place that I could see we can place these comments is in the file src/backend/optimizer/README, because there is some consideration given to leakproof functions in optimizer docs. But these comments seem quite out of place in optimizer docs.

It doesn't seem particularly likely that someone who is thinking about
changing this in the future would notice the comment in the place
where you propose to put it, nor that they would read the optimizer
README.

Agreed. I think if we wanted to make an upgrade in the way function
leakproofness is documented, we ought to add a <sect1> about it in
xfunc.sgml, adjacent to the one about function volatility categories.
This could perhaps consolidate some of the existing documentation mentions
of leakproofness, as well as adding text similar to what Gurjeet suggests.

Furthermore, I don't know that everyone agrees with Tom about this. I
do agree that it's more important to mark relational operators
leakproof than other things, and I also agree that conservatism is
warranted. But that does not mean that someone could not make a
compelling argument for marking other functions leakproof.

ISTM the proposed text does a reasonable job of explaining why
we made the decisions currently embedded in pg_proc.proleakproof.
If we make some other decisions in future, updating the rationale
in the docs would be an appropriate part of that.

regards, tom lane

--
greg