Non-compliant SASLprep implementation for ASCII characters

Started by Michael Paquier20 days ago3 messages
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

While reviewing some of the SCRAM code, I have been reminded about the
following bit of code in saslprep.c:
/*
* Quick check if the input is pure ASCII. An ASCII string requires no
* further processing.
*/
if (pg_is_ascii(input))
{
*output = STRDUP(input);
if (!(*output))
goto oom;
return SASLPREP_SUCCESS;
}

And after cross-checking that with RFCs 3454 (Stringprep) and 4013
(SASLprep), I got reminded of the fact that this implementation
artifact is wrong because not all ASCII characters are allowed:
- 0x00~0x1F (0~31), control characters, are prohibited.
- 0x7F (127, DEL) is prohibited.

The rest of the ASCII character range is OK.

Another question one may ask is: does making our SCRAM implementation
compliant impact our SCRAM implementation at all? The answer to this
question is no. If we are dealing with an ASCII-only password with
prohibited characters, our calls of pg_saslprep() deal with the SCRAM
verifiers generated by CREATE/ALTER role and the SASLprep() calls done
during an exchange the same way: even if we have prohibited ASCII
characters, the bytes are fed as-is to the scram build code. All our
callers of pg_saslprep() make sure that the same thing happens.

In short, I see no downside in just making our implementation
compliant, which should be actually beneficial for future callers of
this routine, should we have any. One point can be made for the
efficiency of checking ASCII-only passwords, but the default count of
4096 used for the computation of the SCRAM verifiers outweights that
point by far IMO: the SCRAM computation is more expensive than this
ASCII-only shortcut anyway.

Attached are two patches, that I'd like to propose for this commit
fest:
- 0001 is a test suite that I have been relying on for some time,
introduced as the test module test_saslprep. One artifact that Heikki
has mentioned to me offline while discussing this tool is that we
could also have a check for the entire range of valid UTF8 codepoints
to make sure that we never return an empty password for all these
codepoints. This check is slightly expensive (3s on my laptop, which
is not bad still a bit expensive), so I have implemented that as a TAP
test controlled by a PG_TEST_EXTRA. The only exception for the empty
password case is the nul character, that we disallow in CREATE/ALTER
ROLE. This test suite also adds a test to cover 390b3cbbb2af with an
incomplete UTF8 sequence, as a nice bonus.
- 0002 is the change to make the implementation compliant, impacting
the tests. This removes nul from the list of valid cases, and the SQL
tests show the compliant behavior.

Even if we don't do 0002, 0001 shows benefits of its own.

I am adding that to the upcoming CF.
Thanks,
--
Michael

Attachments:

v1-0002-Make-implementation-of-SASLprep-compliant-for-ASC.patchtext/plain; charset=us-asciiDownload+33-48
v1-0001-test_saslprep-Add-test-module-to-stress-SASLprep.patchtext/plain; charset=us-asciiDownload+618-3
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: Non-compliant SASLprep implementation for ASCII characters

On Fri, Feb 27, 2026 at 12:05:28PM +0900, Michael Paquier wrote:

- 0001 is a test suite that I have been relying on for some time,
introduced as the test module test_saslprep. One artifact that Heikki
has mentioned to me offline while discussing this tool is that we
could also have a check for the entire range of valid UTF8 codepoints
to make sure that we never return an empty password for all these
codepoints. This check is slightly expensive (3s on my laptop, which
is not bad still a bit expensive), so I have implemented that as a TAP
test controlled by a PG_TEST_EXTRA. The only exception for the empty
password case is the nul character, that we disallow in CREATE/ALTER
ROLE. This test suite also adds a test to cover 390b3cbbb2af with an
incomplete UTF8 sequence, as a nice bonus.

While thinking more about this one, I have come up with a smarter
query based on set_byte() to build a full range of byteas for the
ASCII characters to check, leading to this simpler pattern:
SELECT set_byte('\x00'::bytea, 0, a) FROM generate_series(0, 127);

A second thing that I have adjusted is the output for non-printable
characters, using a CASE/WHEN shortcut. Attached is an updated
version of the patch set with these adjustments.
--
Michael

Attachments:

v2-0001-test_saslprep-Add-test-module-to-stress-SASLprep.patchtext/plain; charset=us-asciiDownload+625-3
v2-0002-Make-implementation-of-SASLprep-compliant-for-ASC.patchtext/plain; charset=us-asciiDownload+34-49
#3John Naylor
john.naylor@enterprisedb.com
In reply to: Michael Paquier (#1)
Re: Non-compliant SASLprep implementation for ASCII characters

On Fri, Feb 27, 2026 at 10:05 AM Michael Paquier <michael@paquier.xyz> wrote:

Even if we don't do 0002, 0001 shows benefits of its own.

Seems sensible to me. I only have minor nitpicks:

+operation for a single byte as well as a range of these, acting as thin
+wrappers standing on top of pg_saslprep().

It's more natural to say "wrappers around", at least that's what comes to me.

+ if (unlikely(utf8_len == 0))

The exceptional path only has two lines of code, so it's unclear what
this hint is trying to do. This module isn't run by default anyway

+ MemSet(nulls, false, sizeof(nulls));

Regular "memset" with a 4-byte constant input is easily inline-able by
the compiler, and I think we should use our homegrown implementation
only when there is a specific reason for it. (I know there are many
dozens of uses without a reason already, but...)

-is($result, 'U+0000|SUCCESS|\x00|\x', "Only nul authorized for all
valid UTF8 codepoints");
+is($result, '', "No empty or NULL values for all valid UTF8 codepoints");

I don't quite understand "only nul authorized..." -- I understand the
explanation in your email, but I having difficulty with the way it's
phrased here. (Although it'll be moot if we go ahead with 0002)

--
John Naylor
Amazon Web Services