Non-compliant SASLprep implementation for ASCII characters

Started by Michael Paquier5 months ago9 messageshackers

michael@paquier.xyz

5 months ago

Hi all,

While reviewing some of the SCRAM code, I have been reminded about the
following bit of code in saslprep.c:
/*
* Quick check if the input is pure ASCII. An ASCII string requires no
* further processing.
*/
if (pg_is_ascii(input))
{
*output = STRDUP(input);
if (!(*output))
goto oom;
return SASLPREP_SUCCESS;
}

And after cross-checking that with RFCs 3454 (Stringprep) and 4013
(SASLprep), I got reminded of the fact that this implementation
artifact is wrong because not all ASCII characters are allowed:
- 0x00~0x1F (0~31), control characters, are prohibited.
- 0x7F (127, DEL) is prohibited.

The rest of the ASCII character range is OK.

Another question one may ask is: does making our SCRAM implementation
compliant impact our SCRAM implementation at all? The answer to this
question is no. If we are dealing with an ASCII-only password with
prohibited characters, our calls of pg_saslprep() deal with the SCRAM
verifiers generated by CREATE/ALTER role and the SASLprep() calls done
during an exchange the same way: even if we have prohibited ASCII
characters, the bytes are fed as-is to the scram build code. All our
callers of pg_saslprep() make sure that the same thing happens.

In short, I see no downside in just making our implementation
compliant, which should be actually beneficial for future callers of
this routine, should we have any. One point can be made for the
efficiency of checking ASCII-only passwords, but the default count of
4096 used for the computation of the SCRAM verifiers outweights that
point by far IMO: the SCRAM computation is more expensive than this
ASCII-only shortcut anyway.

Attached are two patches, that I'd like to propose for this commit
fest:
- 0001 is a test suite that I have been relying on for some time,
introduced as the test module test_saslprep. One artifact that Heikki
has mentioned to me offline while discussing this tool is that we
could also have a check for the entire range of valid UTF8 codepoints
to make sure that we never return an empty password for all these
codepoints. This check is slightly expensive (3s on my laptop, which
is not bad still a bit expensive), so I have implemented that as a TAP
test controlled by a PG_TEST_EXTRA. The only exception for the empty
password case is the nul character, that we disallow in CREATE/ALTER
ROLE. This test suite also adds a test to cover 390b3cbbb2af with an
incomplete UTF8 sequence, as a nice bonus.
- 0002 is the change to make the implementation compliant, impacting
the tests. This removes nul from the list of valid cases, and the SQL
tests show the compliant behavior.

Even if we don't do 0002, 0001 shows benefits of its own.

I am adding that to the upcoming CF.
Thanks,
--
Michael

Michael Paquier

michael@paquier.xyz

5 months ago

In reply to: Michael Paquier (#1)

Re: Non-compliant SASLprep implementation for ASCII characters

On Fri, Feb 27, 2026 at 12:05:28PM +0900, Michael Paquier wrote:

- 0001 is a test suite that I have been relying on for some time,
introduced as the test module test_saslprep. One artifact that Heikki
has mentioned to me offline while discussing this tool is that we
could also have a check for the entire range of valid UTF8 codepoints
to make sure that we never return an empty password for all these
codepoints. This check is slightly expensive (3s on my laptop, which
is not bad still a bit expensive), so I have implemented that as a TAP
test controlled by a PG_TEST_EXTRA. The only exception for the empty
password case is the nul character, that we disallow in CREATE/ALTER
ROLE. This test suite also adds a test to cover 390b3cbbb2af with an
incomplete UTF8 sequence, as a nice bonus.

While thinking more about this one, I have come up with a smarter
query based on set_byte() to build a full range of byteas for the
ASCII characters to check, leading to this simpler pattern:
SELECT set_byte('\x00'::bytea, 0, a) FROM generate_series(0, 127);

A second thing that I have adjusted is the output for non-printable
characters, using a CASE/WHEN shortcut. Attached is an updated
version of the patch set with these adjustments.
--
Michael

John Naylor

john.naylor@enterprisedb.com

4 months ago

In reply to: Michael Paquier (#1)

Re: Non-compliant SASLprep implementation for ASCII characters

On Fri, Feb 27, 2026 at 10:05 AM Michael Paquier <michael@paquier.xyz> wrote:

Even if we don't do 0002, 0001 shows benefits of its own.

Seems sensible to me. I only have minor nitpicks:

+operation for a single byte as well as a range of these, acting as thin
+wrappers standing on top of pg_saslprep().

It's more natural to say "wrappers around", at least that's what comes to me.

+ if (unlikely(utf8_len == 0))

The exceptional path only has two lines of code, so it's unclear what
this hint is trying to do. This module isn't run by default anyway

+ MemSet(nulls, false, sizeof(nulls));

Regular "memset" with a 4-byte constant input is easily inline-able by
the compiler, and I think we should use our homegrown implementation
only when there is a specific reason for it. (I know there are many
dozens of uses without a reason already, but...)

-is($result, 'U+0000|SUCCESS|\x00|\x', "Only nul authorized for all
valid UTF8 codepoints");
+is($result, '', "No empty or NULL values for all valid UTF8 codepoints");

I don't quite understand "only nul authorized..." -- I understand the
explanation in your email, but I having difficulty with the way it's
phrased here. (Although it'll be moot if we go ahead with 0002)

--
John Naylor
Amazon Web Services

Michael Paquier

michael@paquier.xyz

4 months ago

In reply to: John Naylor (#3)

Re: Non-compliant SASLprep implementation for ASCII characters

On Wed, Mar 18, 2026 at 06:34:03PM +0700, John Naylor wrote:

Seems sensible to me. I only have minor nitpicks:

Thanks for the review of the module.

+operation for a single byte as well as a range of these, acting as thin
+wrappers standing on top of pg_saslprep().
It's more natural to say "wrappers around", at least that's what comes to me.

Fixed.

+ if (unlikely(utf8_len == 0))

The exceptional path only has two lines of code, so it's unclear what
this hint is trying to do. This module isn't run by default anyway

Removed that.

+ MemSet(nulls, false, sizeof(nulls));

Regular "memset" with a 4-byte constant input is easily inline-able by
the compiler, and I think we should use our homegrown implementation
only when there is a specific reason for it. (I know there are many
dozens of uses without a reason already, but...)

Removed that.

-is($result, 'U+0000|SUCCESS|\x00|\x', "Only nul authorized for all
valid UTF8 codepoints");
+is($result, '', "No empty or NULL values for all valid UTF8 codepoints");
I don't quite understand "only nul authorized..." -- I understand the
explanation in your email, but I having difficulty with the way it's
phrased here. (Although it'll be moot if we go ahead with 0002)

Yes, still better to keep the state of the tree cleaner at all times,
especially if 0002 gets reverted. I have used a simpler "valid
codepoints returning an empty password".

Applied the result for the module, to have at least the coverage part.
The last piece is refreshed, and attached for now.
--
Michael

Michael Paquier

michael@paquier.xyz

4 months ago

In reply to: Michael Paquier (#4)

Re: Non-compliant SASLprep implementation for ASCII characters

On Thu, Mar 19, 2026 at 01:25:52PM +0900, Michael Paquier wrote:

Applied the result for the module, to have at least the coverage part.
The last piece is refreshed, and attached for now.

I have worked on the final piece of this thread, and applied it.

I am also attaching a small module, called scram_utils(), that I have
used to validate this change by creating SCRAM verifiers with
non-printable ASCII characters, like:
SELECT scram_utils_verifier_bytea('myrole', '\x010203', 200, 10);

This function passes down the password data to scram_build_secret()
after applying pg_saslprep(), reusing the original password if
the SASLprep was not a success. That's the same as what we do in
pg_be_scram_build_secret() but I wanted control over the salt length
and the number of iterations for each function call (implemented that
years ago with tested SCRAM), hence the split.

Then use for example something like that for the input:
export PGPASSWORD=$(printf '%b%b%b' '\01\02\03')

The validation between the non-compliant and the compliant
implementation then comes down to:
- Generate the rolpassword on HEAD patched (new) and unpatched (old).
- Check connections with libpq patched (new) and unpatched (old), with
client->server as of new->old, old->new, new->new.
--
Michael

Alexander Lakhin

exclusion@gmail.com

3 months ago

In reply to: Michael Paquier (#4)

Re: Non-compliant SASLprep implementation for ASCII characters

Hello Michael,

19.03.2026 06:25, Michael Paquier wrote:

Applied the result for the module, to have at least the coverage part.
The last piece is refreshed, and attached for now.
--

When running make check for src/test/modules/test_saslprep under Valgrind,
I've discovered:
# --- /pgtest/postgresql.git/src/test/modules/test_saslprep/expected/test_saslprep.out 2026-04-12 07:44:47.090517505 +0300
# +++ /pgtest/postgresql.git/src/test/modules/test_saslprep/results/test_saslprep.out 2026-04-12 08:03:29.353348951 +0300
# @@ -2,151 +2,7 @@
#  CREATE EXTENSION test_saslprep;
#  -- Incomplete UTF-8 sequence.
#  SELECT test_saslprep('\xef');
# -  test_saslprep
# ------------------
# - (,INVALID_UTF8)
# -(1 row)
# -
...
-
-DROP EXTENSION test_saslprep;
+server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+connection to server was lost

src/test/modules/test_saslprep/log/postmaster.log
2026-04-12 08:03:26.064 EEST postmaster[1043298] LOG: database system is ready to accept connections
2026-04-12 08:03:26.078 EEST dead-end client backend[1043325] [unknown] FATAL: the database system is starting up
==00:00:00:04.413 1043360== Invalid read of size 1
==00:00:00:04.413 1043360==    at 0x484F234: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==00:00:00:04.413 1043360==    by 0x5277C6F: strlcpy (strlcpy.c:24)
==00:00:00:04.413 1043360==    by 0x48648CF: test_saslprep (test_saslprep.c:87)
==00:00:00:04.413 1043360==    by 0x541A45: ExecInterpExpr (execExprInterp.c:979)
==00:00:00:04.413 1043360==    by 0x5446A1: ExecInterpExprStillValid (execExprInterp.c:2301)
==00:00:00:04.413 1043360==    by 0x5AD1FD: ExecEvalExprNoReturn (executor.h:433)
==00:00:00:04.413 1043360==    by 0x5AD2BB: ExecEvalExprNoReturnSwitchContext (executor.h:474)
==00:00:00:04.413 1043360==    by 0x5AD31C: ExecProject (executor.h:506)
==00:00:00:04.413 1043360==    by 0x5AD53F: ExecResult (nodeResult.c:135)
==00:00:00:04.413 1043360==    by 0x55F5E4: ExecProcNodeFirst (execProcnode.c:469)
==00:00:00:04.413 1043360==    by 0x55108F: ExecProcNode (executor.h:327)
==00:00:00:04.413 1043360==    by 0x554149: ExecutePlan (execMain.c:1736)
==00:00:00:04.413 1043360== Address 0x7439295 is 6,917 bytes inside a block of size 8,192 alloc'd
==00:00:00:04.413 1043360==    at 0x4846828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==00:00:00:04.413 1043360==    by 0xADBE27: AllocSetContextCreateInternal (aset.c:444)
==00:00:00:04.413 1043360==    by 0x762E31: PostmasterMain (postmaster.c:534)
==00:00:00:04.413 1043360==    by 0x5E9543: main (main.c:231)
==00:00:00:04.413 1043360==
...
==00:00:00:04.413 1043360==
==00:00:00:04.414 1043360== Exit program on first error (--exit-on-first-error=yes)
2026-04-12 08:03:29.349 EEST postmaster[1043298] LOG: client backend (PID 1043360) exited with exit code 1
2026-04-12 08:03:29.349 EEST postmaster[1043298] DETAIL: Failed process was running: SELECT test_saslprep('\xef');

The corresponding code:
src_len = VARSIZE_ANY_EXHDR(string);
src = VARDATA_ANY(string);

    /*
     * Copy the input given, to make SASLprep() act on a sanitized string.
     */
    input_data = palloc0(src_len + 1);
    strlcpy(input_data, src, src_len + 1);

That is, strlcpy() tries to evaluate strlen() for src, which contains only
one byte without null terminator.

skink tests this module successfully [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2026-04-11%2011%3A31%3A01&stg=misc-check by some reason:
================================== 151/394 ===================================
test:         test_saslprep - postgresql:test_saslprep/regress
start time:   13:26:00
duration:     8.04s
result:       exit status 0
command:      ...
----------------------------------- stdout -----------------------------------
# executing test in /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/test_saslprep/regress group test_saslprep
test regress
# initializing database system by copying initdb template
# using temp instance on port 40096 with PID 3982971
ok 1         - test_saslprep                            1971 ms
1..1
# All 1 tests passed.
# test succeeded

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2026-04-11%2011%3A31%3A01&stg=misc-check

Best regards,
Alexander

Michael Paquier

michael@paquier.xyz

3 months ago

In reply to: Alexander Lakhin (#6)

Re: Non-compliant SASLprep implementation for ASCII characters

On Sun, Apr 12, 2026 at 09:00:00AM +0300, Alexander Lakhin wrote:

That is, strlcpy() tries to evaluate strlen() for src, which contains only
one byte without null terminator.

Thanks for the report. I don't know why skink is not complaining, but
I do see the failure, and I am able to fix it with the attached. Does
it work on your side?
--
Michael

Alexander Lakhin

exclusion@gmail.com

3 months ago

In reply to: Michael Paquier (#7)

Re: Non-compliant SASLprep implementation for ASCII characters

12.04.2026 14:47, Michael Paquier wrote:

On Sun, Apr 12, 2026 at 09:00:00AM +0300, Alexander Lakhin wrote:

That is, strlcpy() tries to evaluate strlen() for src, which contains only
one byte without null terminator.

Thanks for the report. I don't know why skink is not complaining, but
I do see the failure, and I am able to fix it with the attached. Does
it work on your side?

Yes, it works. Thank you for paying attention to the issue!

Maybe it would make sense to find out why skink doesn't detect this (just
in case there are or will be similar defects hiding) before pushing the
fix...

Best regards,
Alexander

Michael Paquier

michael@paquier.xyz

3 months ago

In reply to: Alexander Lakhin (#8)

Re: Non-compliant SASLprep implementation for ASCII characters

On Sun, Apr 12, 2026 at 04:00:00PM +0300, Alexander Lakhin wrote:

Maybe it would make sense to find out why skink doesn't detect this (just
in case there are or will be similar defects hiding) before pushing the
fix...

Other fixes can also be applied separately, tackled by their
respective committers.

Saying that, I have also done an installcheck with an instance running
with valgrind, and did not spot something popping out. The log file I
have used for the output was looking a bit weird, as if valgrind had
the idea to overwrite some portions of it, so perhaps I have missed
something.

I have fixed this one for now, thanks for the report.
--
Michael

Non-compliant SASLprep implementation for ASCII characters

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: