Potential buffer overrun in spell.c's CheckAffix()
CheckAffix is used by our ispell text search dictionaries to attach a
prefix or suffix to a given base word. The input word is known to be
no longer than MAXNORMLEN (256), and an output buffer of size
MAXNORMLEN * 2 is provided. But there's not any a-priori limit on the
length of a prefix or suffix string, so in principle a buffer overflow
could occur.
In practice these limits seem like more than plenty for any real-world
word, so I think it's sufficient to just reject the prefix or suffix
if an overflow would occur, as attached.
This bug was reported to pgsql-security by Xint Code as a potential
security issue. However we decided it doesn't seem worth the CVE
treatment, because exploiting it would require getting a malicious
ispell dictionary installed in a PG server. Putting the .dict file
into the installation's file tree would require superuser privileges,
and so would creating a text dictionary SQL object that references it.
Maybe an attacker could persuade a gullible DBA to do that, but there
are plenty of other attack pathways available if you're that
persuasive.
Despite that, it seems worth fixing as a run-of-the-mill bug.
Any objections to the attached?
regards, tom lane
Attachments:
v1-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patc; name*1=hDownload+33-2
Further to that ... I found another item in the pgsql-security
archives concerning a buffer overrun in ispell affix-file parsing,
which we had likewise deemed not a security vulnerability because
text search configuration files are assumed trustworthy.
But if we're going to tighten up CheckAffix() then it's pretty
silly not to fix these issues too.
regards, tom lane
Attachments:
v1-0001-Prevent-some-buffer-overruns-in-spell.c-s-parsing.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Prevent-some-buffer-overruns-in-spell.c-s-parsing.p; name*1=atchDownload+24-11
On 21 Apr 2026, at 22:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
if (Affix->type == FF_SUFFIX) { + /* protect against buffer overrun */ + if (len < Affix->replen || len >= 2 * MAXNORMLEN || + len - Affix->replen + findlen >= 2 * MAXNORMLEN) + return NULL; + strcpy(newword, word); strcpy(newword + len - Affix->replen, Affix->find); if (baselen) /* store length of non-changed part of word */ @@ -2112,11 +2139,16 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww } else { + /* protect against buffer overrun */ + if (len < Affix->replen || + findlen + len - Affix->replen >= 2 * MAXNORMLEN) + return NULL;
Is there a reason for an asymmetric check "len >= 2 * MAXNORMLEN ||”?
Both cases seem symmetrical and we could move it out of “if".
On 22 Apr 2026, at 03:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield(). Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.
Is there a reason not to emit WARNING? The data is obviously suspicious…
Perhaps, there’s a reason, so maybe just document it then.
Both patches look good to me, AFAICT.
Best regards, Andrey Borodin.
Andrey Borodin <x4mmm@yandex-team.ru> writes:
On 21 Apr 2026, at 22:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: + /* protect against buffer overrun */ + if (len < Affix->replen || len >= 2 * MAXNORMLEN || + len - Affix->replen + findlen >= 2 * MAXNORMLEN) + return NULL; + strcpy(newword, word); strcpy(newword + len - Affix->replen, Affix->find);
Is there a reason for an asymmetric check "len >= 2 * MAXNORMLEN ||”?
Yes. Because of that initial "strcpy(newword, word);", we do actually
need "word" to fit in the output buffer, even if the final output
string is shorter. The other path does not require that.
I suppose we could replace the strcpy with
memcpy(newword, word, len - Affix->replen);
and then we would not need the "len >= 2 * MAXNORMLEN" test
and both paths could share the same check. There's something
to be said for that, though it would be changing the logic to
a greater extent than just "add some safety checks".
I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield(). Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.
Is there a reason not to emit WARNING? The data is obviously suspicious…
Perhaps, there’s a reason, so maybe just document it then.
I could agree with changing all of these cases (including the existing
get_nextfield checks) to throw ERROR; but I don't think that's
something to do in a back-patched bug fix.
Another thing I don't love, but wouldn't change in back branches,
is the use of BUFSIZ for the string lengths. That's far more than
necessary (why not just MAXNORMLEN?), causing these functions to eat
much more stack space than they need. Also it seems like an
unnecessary platform dependency. Maybe BUFSIZ is 8K everywhere,
but I'm not sure of that.
regards, tom lane
I wrote:
I suppose we could replace the strcpy with
memcpy(newword, word, len - Affix->replen);
and then we would not need the "len >= 2 * MAXNORMLEN" test
and both paths could share the same check. There's something
to be said for that, though it would be changing the logic to
a greater extent than just "add some safety checks".
Concretely, about like this, where I also tried to make the actual
byte-copying steps a bit more uniform.
regards, tom lane
Attachments:
v2-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patc; name*1=hDownload+41-7
On 22 Apr 2026, at 18:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
On 21 Apr 2026, at 22:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: + /* protect against buffer overrun */ + if (len < Affix->replen || len >= 2 * MAXNORMLEN || + len - Affix->replen + findlen >= 2 * MAXNORMLEN) + return NULL; + strcpy(newword, word); strcpy(newword + len - Affix->replen, Affix->find);Is there a reason for an asymmetric check "len >= 2 * MAXNORMLEN ||”?
Yes. Because of that initial "strcpy(newword, word);", we do actually
need "word" to fit in the output buffer, even if the final output
string is shorter. The other path does not require that.I suppose we could replace the strcpy with
memcpy(newword, word, len - Affix->replen);
and then we would not need the "len >= 2 * MAXNORMLEN" test
and both paths could share the same check. There's something
to be said for that, though it would be changing the logic to
a greater extent than just "add some safety checks".
The argument about not changing behavior in back branches is very convincing.
But IMO v2 of the patch is better. Also I think changes are correct.
I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield(). Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.Is there a reason not to emit WARNING? The data is obviously suspicious…
Perhaps, there’s a reason, so maybe just document it then.I could agree with changing all of these cases (including the existing
get_nextfield checks) to throw ERROR; but I don't think that's
something to do in a back-patched bug fix.
Makes sense.
Another thing I don't love, but wouldn't change in back branches,
is the use of BUFSIZ for the string lengths. That's far more than
necessary (why not just MAXNORMLEN?), causing these functions to eat
much more stack space than they need. Also it seems like an
unnecessary platform dependency. Maybe BUFSIZ is 8K everywhere,
but I'm not sure of that.
On my machine (MacOS) BUFSIZ is 1Kb.
Yes, 40Kb in NIImportOOAffixes() is a lot. But is it important in grand scheme of
things? Minimum max_stack_depth is 100Kb, ought to be enough…
Perhaps, “replen" and “find" should not exceed MAXNORMLEN.
My limited understanding of affixes is not enough to confidently tell that
MAXNORMLEN is the limit. e.g. I see this:
* newword: output buffer (MUST be of length 2 * MAXNORMLEN)
So general rule “MAXNORMLEN is an upper bound everywhere” is not uphold.
I’m still under impression of understanding why NUM_MAX_ITEM_SIZ == 8 in
the nearby thread. Now I know a lot more about Roman numbers. Digging
deeper here might be a similar rabbit hole :)
Best regards, Andrey Borodin.
On 23 Apr 2026, at 12:58, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Yes, 40Kb in NIImportOOAffixes() is a lot. But is it important in grand scheme of
things? Minimum max_stack_depth is 100Kb, ought to be enough…
IsAffixFlagInUse(), addCompoundAffixFlagValue() and getCompoundAffixFlagValue()
also allocate 8Kb on stack...
Would it make sense to add check_stack_depth() into addCompoundAffixFlagValue()?
Other prominent allocators (NIImportOOAffixes(),NIImportAffixes()) call it anyway.
At least we will know if disaster is around the corner.
Best regards, Andrey Borodin.