CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

Started by Tomas Vondraover 6 years ago5 messages
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi,

over in pgsql-bugs [1]/messages/by-id/16050-024ae722464ab604@postgresql.org we got a report about CREATE TEXT SEARCH
DICTIONARY causing segfaults on 12.0. Simply running

CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

does trigger a crash, 100% of the time. The crash was reported on 12.0,
but it's in fact present since 9.6.

On 9.5 the example does not work, because that version does not (a)
include the hunspell dictionaries used in the example, and (b) it does
not support long flags. So even after copying the dictionaries and
tweaking them a bit it still passes without a crash.

Looking at the commit history of spell.c, there seems to be a bunch of
commits in 2016 (e.g. f4ceed6ceba3) touching exactly this part of the
code (hunspell), and it also correlates quite nicely with the affected
branches (9.6+). So my best guess is it's a bug in those changes.

A complete backtrace looks like this:

Program received signal SIGSEGV, Segmentation fault.
0x00000000008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f <error: Cannot access memory at address 0x7f7f7f7f7f7f7f7f>) at spell.c:1126
1126 while (*flagcur)
(gdb) bt
#0 0x00000000008fca10 in getCompoundAffixFlagValue (Conf=0x20dd3b8, s=0x7f7f7f7f7f7f7f7f <error: Cannot access memory at address 0x7f7f7f7f7f7f7f7f>) at spell.c:1126
#1 0x00000000008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at spell.c:1608
#2 0x00000000008fe04e in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=3) at spell.c:1680
#3 0x00000000008fe113 in mkSPNode (Conf=0x20dd3b8, low=0, high=1, level=2) at spell.c:1692
#4 0x00000000008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=4, level=1) at spell.c:1652
#5 0x00000000008fde89 in mkSPNode (Conf=0x20dd3b8, low=0, high=9, level=0) at spell.c:1652
#6 0x00000000008fe50b in NISortDictionary (Conf=0x20dd3b8) at spell.c:1785
#7 0x00000000008f9e14 in dispell_init (fcinfo=0x7ffdda6abc90) at dict_ispell.c:89
#8 0x0000000000a6210a in FunctionCall1Coll (flinfo=0x7ffdda6abcf0, collation=0, arg1=34478896) at fmgr.c:1140
#9 0x0000000000a62c72 in OidFunctionCall1Coll (functionId=3731, collation=0, arg1=34478896) at fmgr.c:1418
#10 0x00000000006c2dcb in verify_dictoptions (tmplId=3733, dictoptions=0x20e1b30) at tsearchcmds.c:402
#11 0x00000000006c2f4c in DefineTSDictionary (names=0x20ba278, parameters=0x20ba458) at tsearchcmds.c:463
#12 0x00000000008eb274 in ProcessUtilitySlow (pstate=0x20db518, pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80,
completionTag=0x7ffdda6ac540 "") at utility.c:1272
#13 0x00000000008ea7e5 in standard_ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80,
completionTag=0x7ffdda6ac540 "") at utility.c:927
#14 0x00000000008e991a in ProcessUtility (pstmt=0x20bab88, queryString=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x20bac80, completionTag=0x7ffdda6ac540 "")
at utility.c:360
#15 0x00000000008e88e1 in PortalRunUtility (portal=0x2121368, pstmt=0x20bab88, isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:1175
#16 0x00000000008e8afe in PortalRunMulti (portal=0x2121368, isTopLevel=true, setHoldSnapshot=false, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:1321
#17 0x00000000008e8032 in PortalRun (portal=0x2121368, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x20bac80, altdest=0x20bac80, completionTag=0x7ffdda6ac540 "") at pquery.c:796
#18 0x00000000008e1f51 in exec_simple_query (query_string=0x20b97a8 "CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,\nDictFile=hunspell_sample_num, AffFile=hunspell_sample_long);") at postgres.c:1215
#19 0x00000000008e6243 in PostgresMain (argc=1, argv=0x20e54f8, dbname=0x20e5340 "test", username=0x20b53e8 "user") at postgres.c:4236
#20 0x000000000083c5e2 in BackendRun (port=0x20dd980) at postmaster.c:4437
#21 0x000000000083bdb3 in BackendStartup (port=0x20dd980) at postmaster.c:4128
#22 0x00000000008381d7 in ServerLoop () at postmaster.c:1704
#23 0x0000000000837a83 in PostmasterMain (argc=3, argv=0x20b3350) at postmaster.c:1377
#24 0x0000000000759507 in main (argc=3, argv=0x20b3350) at main.c:228
(gdb) up
#1 0x00000000008fdd1c in makeCompoundFlags (Conf=0x20dd3b8, affix=303) at spell.c:1608
1608 return (getCompoundAffixFlagValue(Conf, str) & FF_COMPOUNDFLAGMASK);
(gdb) p *Conf
$1 = {maffixes = 16, naffixes = 10, Affix = 0x2181fd0, Suffix = 0x0, Prefix = 0x0, Dictionary = 0x0, AffixData = 0x20e1fa8, lenAffixData = 12, nAffixData = 12, useFlagAliases = true, CompoundAffix = 0x0, usecompound = true, flagMode = FM_LONG, CompoundAffixFlags = 0x217d328, nCompoundAffixFlag = 6,
mCompoundAffixFlag = 10, buildCxt = 0x217cf20, Spell = 0x7bd99b4f6050, nspell = 9, mspell = 20480, firstfree = 0x217f1b8 "", avail = 7608}
(gdb) p affix
$2 = 303

So the affix value is rather strange, because it's clearly outside the
set of flags in Conf (it only has 12 items, so 303 is waaaay too high).

I don't have time to investigate this further and I'm getting lost in
spell.c, so I'm adding Teodor who committed f4ceed6ceba3 in 2016. One
interesting fact is that this is likely due to some discrepancy between
the dictfile and afffile - the segfaulting command appers to mix
hunspell_sample_num and hunspell_sample_long:

CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

But when using the "same" group for both dictfile and afffile, it seems
to work just fine.

[1]: /messages/by-id/16050-024ae722464ab604@postgresql.org

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#1)
Re: CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

I spent a bit of time investigating this, and it seems the new code is
somewhat too trusting when it comes to data from the affix/dict files.
In this particular case, it boils down to this code in NISortDictionary:

if (Conf->useFlagAliases)
{
for (i = 0; i < Conf->nspell; i++)
{
char *end;

if (*Conf->Spell[i]->p.flag != '\0')
{
curaffix = strtol(Conf->Spell[i]->p.flag, &end, 10);
if (Conf->Spell[i]->p.flag == end || errno == ERANGE)
ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg("invalid affix alias \"%s\"",
Conf->Spell[i]->p.flag)));
}
...
Conf->Spell[i]->p.d.affix = curaffix;
...
}
...
}

So it simply grabs whatever it finds in the dict file, parses it and
then (later) we use it as index to access the AffixData array, even if
the value is way out of bounds.

For example in the example, hunspell_sample_long.affix contains about
10 affixes, but then we parse the hunspell_sample_num.dict file, and we
stumble upon

book/302,301,202,303

and we parse the flags as integers, and interpret them as indexes in the
AffixData array. Clearly, 303 is waaaay out of bounds, triggering the
segfault crash.

So I think we need some sort of cross-check here. We certainly need to
make NISortDictionary() check the affix value is within AffixData
bounds, and error out when the index is non-sensical (maybe negative
and/or exceeding nAffixData). Maybe there's a simple way to check if the
affix/dict files match. The failing affix has

FLAG num

while with

FLAG long

it works just fine. But I'm not sure that's actually possible, because I
don't see anything in hunspell_sample_num.dict that would allow us to
decide that it expects "FLAG num" and not "FLAG long". Furthermore, we
certainly can't rely on this - we still need to check the range.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Arthur Zakirov
zaartur@gmail.com
In reply to: Tomas Vondra (#1)
1 attachment(s)
Re: CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

Hello Tomas,

On 2019/10/13 10:26, Tomas Vondra wrote:

over in pgsql-bugs [1] we got a report about CREATE TEXT SEARCH
DICTIONARY causing segfaults on 12.0. Simply running

   CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
   DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

does trigger a crash, 100% of the time. The crash was reported on 12.0,
but it's in fact present since 9.6.

On 9.5 the example does not work, because that version does not (a)
include the hunspell dictionaries used in the example, and (b) it does
not support long flags. So even after copying the dictionaries and
tweaking them a bit it still passes without a crash.

This crash is not because of long flags, but because of aliases (more
thoughts below).

Looking at the commit history of spell.c, there seems to be a bunch of
commits in 2016 (e.g. f4ceed6ceba3) touching exactly this part of the
code (hunspell), and it also correlates quite nicely with the affected
branches (9.6+). So my best guess is it's a bug in those changes.

Yeah, there was a lot changes.

So it simply grabs whatever it finds in the dict file, parses it and
then (later) we use it as index to access the AffixData array, even if
the value is way out of bounds.

Yes, we enter this code if an affix file defines aliases (AF parameter).
AffixData array is used to store those aliases.

More about hunspell format you can find here:
https://linux.die.net/man/4/hunspell

In the example we have the following aliases:
AF 11
AF cZ #1
AF cL #2
...
AF sB #11

And in the dictionary file we should use their indexes (from 1 to 11).
These aliases defines set of flags and in the dict file we can use only
single index:
book/3
book/11

but not:
book/3,4
book/2,11

I added checking of this last case in the attached patch. PostgreSQL
will raise an error if it sees non-numeric and non-whitespace character
after the index.

Aliases can be used with all flag types: 'default' (i.e. FM_CHAR),
'long', and if I'm not mistaken 'num'.

So I think we need some sort of cross-check here. We certainly need to
make NISortDictionary() check the affix value is within AffixData
bounds, and error out when the index is non-sensical (maybe negative
and/or exceeding nAffixData).

I agree, I attached the patch which do this. I also added couple
asserts, tests and fixed condition in getAffixFlagSet():

-		if (curaffix > 0 && curaffix <= Conf->nAffixData)
+		if (curaffix > 0 && curaffix < Conf->nAffixData)

I think it could be a bug, because curaffix can't be equal to
Conf->nAffixData.

Maybe there's a simple way to check if the affix/dict files match.

I'm not sure how to properly fix this either. The only thing we could
check is commas in affix flags in a dict file:

book/302,301,202,303

FM_CHAR and FM_LONG dictionaries can't have commas. They should have the
following affix flags:

book/sGsJpUsS # 4 affixes for FM_LONG
book/GJUS # 4 affixes for FM_CHAR

But I guess they could have numbers in flags (as help says "Set flag
type. Default type is the extended ASCII (8-bit) character.") and other
non alphanumeric characters (as some language dictionaries have):

book/s1s2s3s4 # 4 affixes for FM_LONG

--
Artur

Attachments:

fix_alias_index_handling.difftext/plain; charset=UTF-8; name=fix_alias_index_handling.diffDownload
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index c715f06b8e..1b8766659c 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -458,6 +458,8 @@ IsAffixFlagInUse(IspellDict *Conf, int affix, const char *affixflag)
 	if (*affixflag == 0)
 		return true;
 
+	Assert(affix < Conf->nAffixData);
+
 	flagcur = Conf->AffixData[affix];
 
 	while (*flagcur)
@@ -1160,13 +1162,17 @@ getAffixFlagSet(IspellDict *Conf, char *s)
 					(errcode(ERRCODE_CONFIG_FILE_ERROR),
 					 errmsg("invalid affix alias \"%s\"", s)));
 
-		if (curaffix > 0 && curaffix <= Conf->nAffixData)
+		if (curaffix > 0 && curaffix < Conf->nAffixData)
 
 			/*
 			 * Do not subtract 1 from curaffix because empty string was added
 			 * in NIImportOOAffixes
 			 */
 			return Conf->AffixData[curaffix];
+		else if (curaffix > Conf->nAffixData)
+			ereport(ERROR,
+					(errcode(ERRCODE_CONFIG_FILE_ERROR),
+					 errmsg("invalid affix alias \"%s\"", s)));
 		else
 			return VoidString;
 	}
@@ -1561,6 +1567,8 @@ MergeAffix(IspellDict *Conf, int a1, int a2)
 {
 	char	  **ptr;
 
+	Assert(a1 < Conf->nAffixData && a2 < Conf->nAffixData);
+
 	/* Do not merge affix flags if one of affix flags is empty */
 	if (*Conf->AffixData[a1] == '\0')
 		return a2;
@@ -1603,9 +1611,10 @@ MergeAffix(IspellDict *Conf, int a1, int a2)
 static uint32
 makeCompoundFlags(IspellDict *Conf, int affix)
 {
-	char	   *str = Conf->AffixData[affix];
+	Assert(affix < Conf->nAffixData);
 
-	return (getCompoundAffixFlagValue(Conf, str) & FF_COMPOUNDFLAGMASK);
+	return (getCompoundAffixFlagValue(Conf, Conf->AffixData[affix]) &
+			FF_COMPOUNDFLAGMASK);
 }
 
 /*
@@ -1725,6 +1734,16 @@ NISortDictionary(IspellDict *Conf)
 							(errcode(ERRCODE_CONFIG_FILE_ERROR),
 							 errmsg("invalid affix alias \"%s\"",
 									Conf->Spell[i]->p.flag)));
+				if (curaffix < 0 || curaffix >= Conf->nAffixData)
+					ereport(ERROR,
+							(errcode(ERRCODE_CONFIG_FILE_ERROR),
+							 errmsg("invalid affix alias \"%s\"",
+									Conf->Spell[i]->p.flag)));
+				if (*end != '\0' && !t_isdigit(end) && !t_isspace(end))
+					ereport(ERROR,
+							(errcode(ERRCODE_CONFIG_FILE_ERROR),
+							 errmsg("invalid affix alias \"%s\"",
+									Conf->Spell[i]->p.flag)));
 			}
 			else
 			{
diff --git a/src/test/regress/expected/tsdicts.out b/src/test/regress/expected/tsdicts.out
index 2524ec2768..5a927be948 100644
--- a/src/test/regress/expected/tsdicts.out
+++ b/src/test/regress/expected/tsdicts.out
@@ -413,6 +413,40 @@ SELECT ts_lexize('hunspell_num', 'footballyklubber');
  {foot,ball,klubber}
 (1 row)
 
+-- Test suitability of affix and dict files
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+						Template=ispell,
+						DictFile=ispell_sample,
+						AffFile=hunspell_sample_long
+);
+ERROR:  invalid affix alias "GJUS"
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+						Template=ispell,
+						DictFile=ispell_sample,
+						AffFile=hunspell_sample_num
+);
+ERROR:  invalid affix flag "SZ\"
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_1 (
+						Template=ispell,
+						DictFile=hunspell_sample_long,
+						AffFile=ispell_sample
+);
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_2 (
+						Template=ispell,
+						DictFile=hunspell_sample_long,
+						AffFile=hunspell_sample_num
+);
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_3 (
+						Template=ispell,
+						DictFile=hunspell_sample_num,
+						AffFile=ispell_sample
+);
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+						Template=ispell,
+						DictFile=hunspell_sample_num,
+						AffFile=hunspell_sample_long
+);
+ERROR:  invalid affix alias "302,301,202,303"
 -- Synonym dictionary
 CREATE TEXT SEARCH DICTIONARY synonym (
 						Template=synonym,
diff --git a/src/test/regress/sql/tsdicts.sql b/src/test/regress/sql/tsdicts.sql
index 60906f6549..908e675501 100644
--- a/src/test/regress/sql/tsdicts.sql
+++ b/src/test/regress/sql/tsdicts.sql
@@ -101,6 +101,43 @@ SELECT ts_lexize('hunspell_num', 'footballklubber');
 SELECT ts_lexize('hunspell_num', 'ballyklubber');
 SELECT ts_lexize('hunspell_num', 'footballyklubber');
 
+-- Test suitability of affix and dict files
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+						Template=ispell,
+						DictFile=ispell_sample,
+						AffFile=hunspell_sample_long
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+						Template=ispell,
+						DictFile=ispell_sample,
+						AffFile=hunspell_sample_num
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_1 (
+						Template=ispell,
+						DictFile=hunspell_sample_long,
+						AffFile=ispell_sample
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_2 (
+						Template=ispell,
+						DictFile=hunspell_sample_long,
+						AffFile=hunspell_sample_num
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_3 (
+						Template=ispell,
+						DictFile=hunspell_sample_num,
+						AffFile=ispell_sample
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+						Template=ispell,
+						DictFile=hunspell_sample_num,
+						AffFile=hunspell_sample_long
+);
+
 -- Synonym dictionary
 CREATE TEXT SEARCH DICTIONARY synonym (
 						Template=synonym,
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arthur Zakirov (#3)
Re: CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

Arthur Zakirov <zaartur@gmail.com> writes:

On 2019/10/13 10:26, Tomas Vondra wrote:

So I think we need some sort of cross-check here. We certainly need to
make NISortDictionary() check the affix value is within AffixData
bounds, and error out when the index is non-sensical (maybe negative
and/or exceeding nAffixData).

I agree, I attached the patch which do this. I also added couple
asserts, tests and fixed condition in getAffixFlagSet():

-		if (curaffix > 0 && curaffix <= Conf->nAffixData)
+		if (curaffix > 0 && curaffix < Conf->nAffixData)

Looks reasonable to me, and we need to get something done before
the upcoming releases, so I pushed this. Perhaps there's more
that could be done later.

regards, tom lane

#5Artur Zakirov
zaartur@gmail.com
In reply to: Tom Lane (#4)
Re: CREATE TEXT SEARCH DICTIONARY segfaulting on 9.6+

On Sun, Nov 3, 2019 at 5:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Arthur Zakirov <zaartur@gmail.com> writes:

On 2019/10/13 10:26, Tomas Vondra wrote:

So I think we need some sort of cross-check here. We certainly need to
make NISortDictionary() check the affix value is within AffixData
bounds, and error out when the index is non-sensical (maybe negative
and/or exceeding nAffixData).

I agree, I attached the patch which do this. I also added couple
asserts, tests and fixed condition in getAffixFlagSet():

-             if (curaffix > 0 && curaffix <= Conf->nAffixData)
+             if (curaffix > 0 && curaffix < Conf->nAffixData)

Looks reasonable to me, and we need to get something done before
the upcoming releases, so I pushed this. Perhaps there's more
that could be done later.

Great, thank you!

--
Artur