Memory bug in dsnowball_lexize
Hackers,
In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
malloc (not palloc) to allocate memory, and on memory exhaustion
returns NULL rather than throwing an exception. In this same
file, 'replace_s' calls 'create_s' and if it gets back NULL, returns
the error code -1. Otherwise, it sets z->p to the allocated
memory.
In src/backend/snowball/libstemmer/api.c, 'SN_set_current' calls
'replace_s' and returns whatever 'replace_s' returned, which in
the case of memory exhaustion will be -1.
In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.
I checked one of the stemmers, stem_ISO_8859_1_english.c,
and it treats z->p as an array without checking whether it is
NULL. This will crash the backend in the above error case.
There is something else weird here, though. The call to
'SN_set_current' is wrapped in a memory context switch, along
with a call to the stemmer, as if the caller expects any allocated
memory to be palloc'd, which it is not, given the underlying code's
use of malloc and calloc.
There is a comment higher up in dict_snowball.c that seems to
use some handwaving about all this, or perhaps it is documenting
something else entirely. In any event, I find the documentation
about dictCtx insufficient to explain why this memory handling
is correct.
mark
Mark Dilger <hornschnorter@gmail.com> writes:
In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
malloc (not palloc) to allocate memory, and on memory exhaustion
returns NULL rather than throwing an exception.
Actually not, see macros in src/include/snowball/header.h.
In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.
Hm. This seems like possibly a bug, in that even if we cover the
malloc issue, there's no API guarantee that OOM is the only possible
reason for reporting failure.
There is a comment higher up in dict_snowball.c that seems to
use some handwaving about all this, or perhaps it is documenting
something else entirely. In any event, I find the documentation
about dictCtx insufficient to explain why this memory handling
is correct.
Fair complaint --- do you want to propose some new wording that
references what header.h does?
regards, tom lane
On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <hornschnorter@gmail.com> writes:
In src/backend/snowball/libstemmer/utilities.c, 'create_s' uses
malloc (not palloc) to allocate memory, and on memory exhaustion
returns NULL rather than throwing an exception.Actually not, see macros in src/include/snowball/header.h.
You are correct. Thanks for the pointer.
In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.Hm. This seems like possibly a bug, in that even if we cover the
malloc issue, there's no API guarantee that OOM is the only possible
reason for reporting failure.
Ok, that sounds fair. Since the memory is being palloc'd, I suppose
it would be safe to just ereport when the return value is -1?
There is a comment higher up in dict_snowball.c that seems to
use some handwaving about all this, or perhaps it is documenting
something else entirely. In any event, I find the documentation
about dictCtx insufficient to explain why this memory handling
is correct.Fair complaint --- do you want to propose some new wording that
references what header.h does?
Perhaps something along these lines?
/*
- * snowball saves alloced memory between calls, so we should
run it in our
- * private memory context. Note, init function is executed in long lived
- * context, so we just remember CurrentMemoryContext
+ * snowball saves alloced memory between calls, which we force to be
+ * allocated using palloc and friends via preprocessing macros (see
+ * snowball/header.h), so we should run snowball in our private memory
+ * context. Note, init function is executed in long lived
context, so we
+ * just remember CurrentMemoryContext.
*/
Mark Dilger <hornschnorter@gmail.com> writes:
On Thu, May 23, 2019 at 8:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <hornschnorter@gmail.com> writes:
In src/backend/snowball/dict_snowball.c, 'dsnowball_lexize'
calls 'SN_set_current' and ignores the return value, thereby
failing to notice the error, if any.
Hm. This seems like possibly a bug, in that even if we cover the
malloc issue, there's no API guarantee that OOM is the only possible
reason for reporting failure.
Ok, that sounds fair. Since the memory is being palloc'd, I suppose
it would be safe to just ereport when the return value is -1?
Yeah ... I'd just make it an elog really, since whatever it is
would presumably not be a user-facing error.
Fair complaint --- do you want to propose some new wording that
references what header.h does?
Perhaps something along these lines?
Seems reasonable, please include in patch covering the other thing.
regards, tom lane