BUG #16814: Invalid memory access on regexp_match with .* and BRE

Started by PG Bug reporting formover 5 years ago2 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16814
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 13.1
Operating system: Ubuntu 20.04
Description:

When executing the following regexp call:
select regexp_match('abc', '.*', 'b');
valgrind detects an error:
==00:00:00:46.767 138746== Conditional jump or move depends on uninitialised
value(s)
==00:00:00:46.767 138746== at 0x4657A9: parseqatom (regcomp.c:990)
==00:00:00:46.767 138746== by 0x465CBD: parsebranch (regcomp.c:753)
==00:00:00:46.767 138746== by 0x465E84: parse (regcomp.c:683)
==00:00:00:46.767 138746== by 0x467F24: pg_regcomp (regcomp.c:404)
==00:00:00:46.767 138746== by 0x57D100: RE_compile_and_cache
(regexp.c:185)
==00:00:00:46.767 138746== by 0x57D3D9: setup_regexp_matches
(regexp.c:1114)
==00:00:00:46.767 138746== by 0x57DF86: regexp_match (regexp.c:985)
==00:00:00:46.767 138746== by 0x36839A: ExecInterpExpr
(execExprInterp.c:699)
==00:00:00:46.767 138746== by 0x3657C9: ExecInterpExprStillValid
(execExprInterp.c:1802)
==00:00:00:46.767 138746== by 0x42A172: ExecEvalExprSwitchContext
(executor.h:316)
==00:00:00:46.767 138746== by 0x42A172: evaluate_expr (clauses.c:4809)
==00:00:00:46.767 138746== by 0x42A34B: evaluate_function
(clauses.c:4339)
==00:00:00:46.767 138746== by 0x42C1ED: simplify_function
(clauses.c:3969)

(This was discovered on the back of the new test module test_regex with the
slightly modified 30.4:
select * from test_regex('.*b', 'aab', 'b');
)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16814: Invalid memory access on regexp_match with .* and BRE

PG Bug reporting form <noreply@postgresql.org> writes:

When executing the following regexp call:
select regexp_match('abc', '.*', 'b');
valgrind detects an error:

Hah, nice one. It gives the wrong answer too, at least it does most of
the time for me:

# select regexp_match('abc', '.*', 'b');
regexp_match
--------------
{""}
(1 row)

That's because it's acting like the pattern is '.*?' (prefer shortest
match) rather than '.*'.

This bug is well over the age of consent, btw. Tcl's got it too,
so it surely is aboriginal in Henry Spencer's code.

Thanks for the report!

regards, tom lane