Replace remaining StrNCpy() by strlcpy()

Started by Peter Eisentrautover 5 years ago12 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

I propose to replace the remaining uses of StrNCpy() with strlcpy() and
remove the former. It's clear that strlcpy() has won the popularity
contest, and the presence of the former is just confusing now.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Replace-remaining-StrNCpy-by-strlcpy.patchtext/plain; charset=UTF-8; name=0001-Replace-remaining-StrNCpy-by-strlcpy.patch; x-mac-creator=0; x-mac-type=0Download+24-62
#2David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#1)
Re: Replace remaining StrNCpy() by strlcpy()

On Mon, 3 Aug 2020 at 18:59, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

I propose to replace the remaining uses of StrNCpy() with strlcpy() and
remove the former. It's clear that strlcpy() has won the popularity
contest, and the presence of the former is just confusing now.

It certainly would be good to get rid of some of these, but are some
of the changes not a bit questionable?

e.g:

@@ -4367,7 +4367,7 @@ pgstat_send_archiver(const char *xlog, bool failed)
  */
  pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_ARCHIVER);
  msg.m_failed = failed;
- StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
+ strlcpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
  msg.m_timestamp = GetCurrentTimestamp();
  pgstat_send(&msg, sizeof(msg));

Will mean that we'll now no longer zero the full length of the m_xlog
field after the end of the string. Won't that mean we'll start writing
junk bytes to the stats collector?

David

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: Replace remaining StrNCpy() by strlcpy()

David Rowley <dgrowleyml@gmail.com> writes:

- StrNCpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));
+ strlcpy(msg.m_xlog, xlog, sizeof(msg.m_xlog));

Will mean that we'll now no longer zero the full length of the m_xlog
field after the end of the string. Won't that mean we'll start writing
junk bytes to the stats collector?

StrNCpy doesn't zero-fill the destination today either (except for
the very last byte). If you need that, you need to memset the
dest buffer ahead of time.

I didn't review the patch in complete detail, but the principle
seems sound to me, and strlcpy is surely more standard than StrNCpy.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: Replace remaining StrNCpy() by strlcpy()

I wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Will mean that we'll now no longer zero the full length of the m_xlog
field after the end of the string. Won't that mean we'll start writing
junk bytes to the stats collector?

StrNCpy doesn't zero-fill the destination today either (except for
the very last byte).

Oh, no, I take that back --- didn't read all of the strncpy man
page :-(. Yeah, this is a point. We'd need to check each call
site to see whether the zero-padding matters.

In the specific case of the stats collector, if you don't want
to be sending junk bytes then you'd better be memset'ing the
whole message buffer not just this string field. So I'm not
sure that the argument has any force there. But in places
like namecpy() and namestrcpy() we absolutely do mean to be
zeroing the whole destination buffer.

memset plus strlcpy might still be preferable to StrNCpy for
readability by people new to Postgres; but it's less of a
slam dunk than I thought.

regards, tom lane

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#4)
Re: Replace remaining StrNCpy() by strlcpy()

On 2020-08-03 14:12, Tom Lane wrote:

I wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Will mean that we'll now no longer zero the full length of the m_xlog
field after the end of the string. Won't that mean we'll start writing
junk bytes to the stats collector?

StrNCpy doesn't zero-fill the destination today either (except for
the very last byte).

Oh, no, I take that back --- didn't read all of the strncpy man
page :-(. Yeah, this is a point. We'd need to check each call
site to see whether the zero-padding matters.

Oh, that's easy to miss.

In the specific case of the stats collector, if you don't want
to be sending junk bytes then you'd better be memset'ing the
whole message buffer not just this string field. So I'm not
sure that the argument has any force there. But in places
like namecpy() and namestrcpy() we absolutely do mean to be
zeroing the whole destination buffer.

That's easy to fix, but it's perhaps wondering briefly why it needs to
be zero-padded. hashname() doesn't care, heap_form_tuple() doesn't
care. Does anything care?

While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff
like namein()?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#5)
Re: Replace remaining StrNCpy() by strlcpy()

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2020-08-03 14:12, Tom Lane wrote:

In the specific case of the stats collector, if you don't want
to be sending junk bytes then you'd better be memset'ing the
whole message buffer not just this string field. So I'm not
sure that the argument has any force there. But in places
like namecpy() and namestrcpy() we absolutely do mean to be
zeroing the whole destination buffer.

That's easy to fix, but it's perhaps wondering briefly why it needs to
be zero-padded. hashname() doesn't care, heap_form_tuple() doesn't
care. Does anything care?

We do have an expectation that there are no undefined bytes in values to
be stored on-disk. There's even some code in coerce_type() that will
complain about this:

* For pass-by-reference data types, repeat the conversion to see if
* the input function leaves any uninitialized bytes in the result. We
* can only detect that reliably if RANDOMIZE_ALLOCATED_MEMORY is
* enabled, so we don't bother testing otherwise. The reason we don't
* want any instability in the input function is that comparison of
* Const nodes relies on bytewise comparison of the datums, so if the
* input function leaves garbage then subexpressions that should be
* identical may not get recognized as such. See pgsql-hackers
* discussion of 2008-04-04.

While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff
like namein()?

Excellent point --- probably so, unless the callers are all truncating
in advance, which I doubt.

regards, tom lane

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#6)
Re: Replace remaining StrNCpy() by strlcpy()

On 2020-08-03 19:39, Tom Lane wrote:

That's easy to fix, but it's perhaps wondering briefly why it needs to
be zero-padded. hashname() doesn't care, heap_form_tuple() doesn't
care. Does anything care?

We do have an expectation that there are no undefined bytes in values to
be stored on-disk. There's even some code in coerce_type() that will
complain about this:

Okay, here is a new patch with improved implementations of namecpy() and
namestrcpy(). I didn't see any other places that relied on the
zero-filling behavior of strncpy().

While we're here, shouldn't namestrcpy() do some pg_mbcliplen() stuff
like namein()?

Excellent point --- probably so, unless the callers are all truncating
in advance, which I doubt.

I will look into that separately.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Replace-remaining-StrNCpy-by-strlcpy.patchtext/plain; charset=UTF-8; name=v2-0001-Replace-remaining-StrNCpy-by-strlcpy.patch; x-mac-creator=0; x-mac-type=0Download+26-62
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#7)
Re: Replace remaining StrNCpy() by strlcpy()

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

Okay, here is a new patch with improved implementations of namecpy() and
namestrcpy(). I didn't see any other places that relied on the
zero-filling behavior of strncpy().

I've looked through this patch, and I concur with your conclusion that
noplace else is depending on zero-fill, with the exception of the one
place in pgstat.c that David already noted. But the issue there is only
that valgrind might bitch about send()'ing undefined bytes, and ISTM
that the existing suppressions in our valgrind.supp should already
handle it, since we already have other pgstat messages with pad bytes.

However I do see one remaining nit to pick, in CreateInitDecodingContext:

 	/* register output plugin name with slot */
 	SpinLockAcquire(&slot->mutex);
-	StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN);
+	namestrcpy(&slot->data.plugin, plugin);
 	SpinLockRelease(&slot->mutex);

This is already a pro-forma violation of our rule about "only
straight-line code inside a spinlock". Now I'm not terribly concerned
about that right now, and the patch as it stands is only changing things
cosmetically. But if you modify namestrcpy to do pg_mbcliplen then all
of a sudden there is a whole lot of code that could get reached within
the spinlock, and I'm not a bit happy about that prospect.

The least-risk fix would be to namestrcpy() into a local variable
and then just use a plain memcpy() inside the spinlock. There might
be better ways if we're willing to make assumptions about what the
plugin name can be. For that matter, do we really need a spinlock
here at all? Why is the plugin name critical but the rest of the
slot not?

BTW, while we're here I think we ought to change namecpy and namestrcpy
to return void (no caller checks their results) and drop their checks
for null-pointer inputs. AFAICS a null pointer would be a caller bug in
every case, and if it isn't, why is failing to initialize the
destination an OK outcome? I find the provisions for nulls in namestrcmp
pretty useless too, although it looks like at least some thought has
been spent there.

I think this is committable once these points are addressed.

regards, tom lane

#9David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#4)
Re: Replace remaining StrNCpy() by strlcpy()

On Tue, 4 Aug 2020 at 00:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

David Rowley <dgrowleyml@gmail.com> writes:

Will mean that we'll now no longer zero the full length of the m_xlog
field after the end of the string. Won't that mean we'll start writing
junk bytes to the stats collector?

StrNCpy doesn't zero-fill the destination today either (except for
the very last byte).

Oh, no, I take that back --- didn't read all of the strncpy man
page :-(. Yeah, this is a point. We'd need to check each call
site to see whether the zero-padding matters.

I just had a thought that even strlcpy() is not really ideal for many
of our purposes for it.

Currently we still have cruddy code like:

strlcpy(fullname, pg_TZDIR(), sizeof(fullname));
if (strlen(fullname) + 1 + strlen(name) >= MAXPGPATH)
return -1; /* not gonna fit */
strcat(fullname, "/");
strcat(fullname, name);

If strlcpy() had been designed differently to take a signed size and
do nothing when the size is <= 0 then we could have had much cleaner
implementations of the above:

size_t len;
len = strlcpy(fullname, pg_TZDIR(), sizeof(fullname));
len += strlcpy(fullname + len, "/", sizeof(fullname) - len);
len += strlcpy(fullname + len, name, sizeof(fullname) - len);
if (len >= sizeof(fullname))
return -1; /* didn't fit */

This should be much more efficient, in general, due to the lack of
strlen() calls and the concatenation not having to refind the end of
the string again each time.

Now, I'm not saying we should change strlcpy() to take a signed type
instead of size_t to allow this to work. Reusing that name for another
purpose is likely a bad idea that will lead to misuse and confusion.
What I am saying is that perhaps strlcpy() is not all that it's
cracked up to be and it could have been done better. Perhaps we can
invent our own version that fixes the shortcomings.

Just a thought.

On the other hand, perhaps we're not using the return value of
strlcpy() enough for such a change to make sense. However, a quick
glance shows we certainly could use it more often, e.g:

if (parsed->xinfo & XACT_XINFO_HAS_GID)
{
strlcpy(parsed->twophase_gid, data, sizeof(parsed->twophase_gid));
data += strlen(data) + 1;
}

David

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#8)
Re: Replace remaining StrNCpy() by strlcpy()

On 2020-08-05 17:49, Tom Lane wrote:

However I do see one remaining nit to pick, in CreateInitDecodingContext:

/* register output plugin name with slot */
SpinLockAcquire(&slot->mutex);
-	StrNCpy(NameStr(slot->data.plugin), plugin, NAMEDATALEN);
+	namestrcpy(&slot->data.plugin, plugin);
SpinLockRelease(&slot->mutex);

This is already a pro-forma violation of our rule about "only
straight-line code inside a spinlock". Now I'm not terribly concerned
about that right now, and the patch as it stands is only changing things
cosmetically. But if you modify namestrcpy to do pg_mbcliplen then all
of a sudden there is a whole lot of code that could get reached within
the spinlock, and I'm not a bit happy about that prospect.

fixed

BTW, while we're here I think we ought to change namecpy and namestrcpy
to return void (no caller checks their results) and drop their checks
for null-pointer inputs. AFAICS a null pointer would be a caller bug in
every case, and if it isn't, why is failing to initialize the
destination an OK outcome? I find the provisions for nulls in namestrcmp
pretty useless too, although it looks like at least some thought has
been spent there.

fixed

I removed namecpy() altogether because you can just use struct assignment.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-Replace-remaining-StrNCpy-by-strlcpy.patchtext/plain; charset=UTF-8; name=v3-0001-Replace-remaining-StrNCpy-by-strlcpy.patch; x-mac-creator=0; x-mac-type=0Download+34-107
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#10)
Re: Replace remaining StrNCpy() by strlcpy()

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

I removed namecpy() altogether because you can just use struct assignment.

Makes sense, and I notice it was unused anyway.

v3 passes eyeball examination (I didn't bother running tests), with
only one remaining nit: the proposed commit message says

They are equivalent,

which per this thread is incorrect. Somebody might possibly refer to this
commit for guidance in updating third-party code, so I don't think we want
to leave a misleading claim here. Perhaps something like

They are equivalent, except that StrNCpy zero-fills the entire
destination buffer instead of providing just one trailing zero.
For all but a tiny number of callers, that's just overhead rather
than being desirable.

regards, tom lane

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#11)
Re: Replace remaining StrNCpy() by strlcpy()

On 2020-08-08 18:09, Tom Lane wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

I removed namecpy() altogether because you can just use struct assignment.

Makes sense, and I notice it was unused anyway.

v3 passes eyeball examination (I didn't bother running tests), with
only one remaining nit: the proposed commit message says

They are equivalent,

which per this thread is incorrect. Somebody might possibly refer to this
commit for guidance in updating third-party code, so I don't think we want
to leave a misleading claim here. Perhaps something like

They are equivalent, except that StrNCpy zero-fills the entire
destination buffer instead of providing just one trailing zero.
For all but a tiny number of callers, that's just overhead rather
than being desirable.

Committed with that change.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services