[PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

Started by Aleksander Alekseevover 9 years ago6 messageshackers
Jump to latest
#1Aleksander Alekseev
aleksander@timescale.com

Hi.

I noticed that there is a lot of repeating code like this:

```
if (strspn(str, " \t\n\r\f") == strlen(str))
```

I personally don't find it particularly readable, not mentioning that
traversing a string twice doesn't look as a good idea (you can check
using objdump that latest GCC 6.2 doesn't optimize this code).

How about rewriting such a code like this?

```
if (pg_str_containsonly(str, " \t\n\r\f"))
```

Corresponding patch is attached. I don't claim that my implementation of
pg_str_containsonly procedure is faster that strspn + strlen, but at
least such refactoring makes code a little more readable.

--
Best regards,
Aleksander Alekseev

Attachments:

pg_str_containsonly-v1.patchtext/x-diff; charset=us-asciiDownload+70-22
#2Geoff Winkless
pgsqladmin@geoff.dj
In reply to: Aleksander Alekseev (#1)
Re: [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

On 8 December 2016 at 15:54, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

Hi.

I noticed that there is a lot of repeating code like this:

```
if (strspn(str, " \t\n\r\f") == strlen(str))
```

I personally don't find it particularly readable, not mentioning that
traversing a string twice doesn't look as a good idea (you can check
using objdump that latest GCC 6.2 doesn't optimize this code).

You could just change it to

if (str[strspn(str, " \t\n\r\f")] == '\0')

to mitigate calling strlen. It's safe to do so because strspn will
only return values from 0 to strlen(str).

Geoff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aleksander Alekseev (#1)
Re: [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:

How about rewriting such a code like this?
if (pg_str_containsonly(str, " \t\n\r\f"))

Function name seems weirdly spelled. Also, I believe that in nearly all
use-cases the number of data characters that would typically be examined
is small, so I have serious doubts that the "optimized" implementation
you propose is actually faster than a naive one; it may be slower, and
it's certainly longer and harder to understand/test.

Whether it's worth worrying about, I dunno. This is hardly the only
C idiom you need to be familiar with to read the PG code.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#3)
Re: [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

Tom, Geoff,

Thanks for your feedback! Here is version 2 of this patch.

You could just change it to
if (str[strspn(str, " \t\n\r\f")] == '\0')
to mitigate calling strlen. It's safe to do so because strspn will
only return values from 0 to strlen(str).

[...] I have serious doubts that the "optimized" implementation
you propose is actually faster than a naive one; it may be slower, and
it's certainly longer and harder to understand/test.

I would like to point out that I never said it's optimized. However I
like the code Geoff proposed. It definitely doesn't make anything worse.
I decided to keep pg_str_contansonly procedure (i.e. not to inline this
code) for now. Code with strspn looks OK in a simple example. However in
a concrete context it looks like a bad Perl code in ROT13 to me:

```
/* try to figure out what's exactly going on */
if(somelongname[strspn(somelongname /* yes, again */, "longlistofchars")] != '\0')
```

Function name seems weirdly spelled.

I named it the same way pg_str_endswith is named. However I'm open for
better suggestions here.

Whether it's worth worrying about, I dunno. This is hardly the only
C idiom you need to be familiar with to read the PG code.

Well, at least this v2 version of the patch removes second string
scanning. And I still believe that this inlined strspn is not readable
or obvious at all.

--
Best regards,
Aleksander Alekseev

Attachments:

pg_str_containsonly-v2.patchtext/x-diff; charset=us-asciiDownload+43-22
#5Michael Paquier
michael@paquier.xyz
In reply to: Aleksander Alekseev (#4)
Re: [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

On Fri, Dec 9, 2016 at 3:23 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:

You could just change it to
if (str[strspn(str, " \t\n\r\f")] == '\0')
to mitigate calling strlen. It's safe to do so because strspn will
only return values from 0 to strlen(str).

[...] I have serious doubts that the "optimized" implementation
you propose is actually faster than a naive one; it may be slower, and
it's certainly longer and harder to understand/test.

I would like to point out that I never said it's optimized. However I
like the code Geoff proposed. It definitely doesn't make anything worse.
I decided to keep pg_str_contansonly procedure (i.e. not to inline this
code) for now. Code with strspn looks OK in a simple example. However in
a concrete context it looks like a bad Perl code in ROT13 to me:

Looking at this patch, I am not sure that it is worth worrying about.
This is a receipt to make back-patching a bit more complicated, and it
makes the code more complicated to understand. So I would vote for
rejecting it and move on.

By the way, as you are placing this routine in src/common/, you may
want to consider updating the code in src/bin/ that use libpqcommon.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#5)
Re: [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

On Wed, Jan 11, 2017 at 4:50 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Looking at this patch, I am not sure that it is worth worrying about.
This is a receipt to make back-patching a bit more complicated, and it
makes the code more complicated to understand. So I would vote for
rejecting it and move on.

I have done so for now to make the CF move, if somebody wants to
complain feel free...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers