BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Started by PG Bug reporting formover 5 years ago18 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16801
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 13.1
Operating system: Ubuntu 20.04
Description:

When executing the following query:
WITH RECURSIVE rec(x) AS (
WITH outermost(x) AS (
SELECT (
WITH innermost as (SELECT 1)
SELECT * FROM innermost
)
)
SELECT * FROM outermost
)
SELECT * FROM rec;

valgrind detects an invalid read:
==00:00:00:04.145 217144== Invalid read of size 8
==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker
(parse_cte.c:549)
==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph
(parse_cte.c:439)
==00:00:00:04.145 217144== by 0x304557: transformWithClause
(parse_cte.c:176)
==00:00:00:04.145 217144== by 0x2DD70A: transformSelectStmt
(analyze.c:1202)
==00:00:00:04.145 217144== by 0x2DDAB4: transformStmt (analyze.c:301)
==00:00:00:04.145 217144== by 0x2DEDDA: transformOptionalSelectInto
(analyze.c:246)
==00:00:00:04.145 217144== by 0x2DEE0F: transformTopLevelStmt
(analyze.c:196)
==00:00:00:04.145 217144== by 0x2DEE71: parse_analyze (analyze.c:116)
==00:00:00:04.145 217144== by 0x55E69F: pg_analyze_and_rewrite
(postgres.c:691)
==00:00:00:04.145 217144== by 0x55ED66: exec_simple_query
(postgres.c:1155)
==00:00:00:04.145 217144== by 0x560D83: PostgresMain (postgres.c:4315)
==00:00:00:04.145 217144== by 0x4CC6B8: BackendRun (postmaster.c:4526)
==00:00:00:04.145 217144== Address 0x50890a8 is 24 bytes inside a block of
size 32 client-defined
==00:00:00:04.145 217144== at 0x6B4831: palloc (mcxt.c:974)
==00:00:00:04.145 217144== by 0x42B624: new_list (list.c:134)
==00:00:00:04.145 217144== by 0x42BF1B: lcons (list.c:458)
==00:00:00:04.145 217144== by 0x302C73: makeDependencyGraphWalker
(parse_cte.c:542)
==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph
(parse_cte.c:439)
==00:00:00:04.145 217144== by 0x304557: transformWithClause
(parse_cte.c:176)
==00:00:00:04.145 217144== by 0x2DD70A: transformSelectStmt
(analyze.c:1202)
==00:00:00:04.145 217144== by 0x2DDAB4: transformStmt (analyze.c:301)
==00:00:00:04.145 217144== by 0x2DEDDA: transformOptionalSelectInto
(analyze.c:246)
==00:00:00:04.145 217144== by 0x2DEE0F: transformTopLevelStmt
(analyze.c:196)
==00:00:00:04.145 217144== by 0x2DEE71: parse_analyze (analyze.c:116)
==00:00:00:04.145 217144== by 0x55E69F: pg_analyze_and_rewrite
(postgres.c:691)
==00:00:00:04.145 217144==

The first bad commit is 1cff1b95.

#2Michael Paquier
michael@paquier.xyz
In reply to: PG Bug reporting form (#1)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Sat, Jan 02, 2021 at 03:00:00PM +0000, PG Bug reporting form wrote:

valgrind detects an invalid read:
==00:00:00:04.145 217144== Invalid read of size 8
==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker
(parse_cte.c:549)
==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph
(parse_cte.c:439)
==00:00:00:04.145 217144== by 0x304557: transformWithClause
(parse_cte.c:176)

The first bad commit is 1cff1b95.

The same kind of list manipulation is done in two places in
parse_cte.c, and there are extra ones in split_pathtarget_walker(). I
cannot reproduce that here, and I have just tried with different
optimization levels on HEAD and REL_13_STABLE. Are you using specific
options for valgrind?
--
Michael

#3Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#2)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Hello Michael,
03.01.2021 08:15, Michael Paquier wrote:

On Sat, Jan 02, 2021 at 03:00:00PM +0000, PG Bug reporting form wrote:

valgrind detects an invalid read:
==00:00:00:04.145 217144== Invalid read of size 8
==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker
(parse_cte.c:549)
==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph
(parse_cte.c:439)
==00:00:00:04.145 217144== by 0x304557: transformWithClause
(parse_cte.c:176)

The first bad commit is 1cff1b95.

The same kind of list manipulation is done in two places in
parse_cte.c, and there are extra ones in split_pathtarget_walker(). I
cannot reproduce that here, and I have just tried with different
optimization levels on HEAD and REL_13_STABLE. Are you using specific
options for valgrind?

I'm using gcc 10.2.0 and valgrind-3.15.0, and building REL_13_STABLE
(c09f6882) with:
CPPFLAGS="-DUSE_VALGRIND -Og" ./configure --enable-debug
--enable-cassert && make -j8
Also I'm using a� patch [1]/messages/by-id/10dae4a1-e714-601d-7518-c19414255180@gmail.com to inject valgrind into the `make check`
procedure. So it's possible to reproduce the issue with the extended
with.sql:
echo "
WITH RECURSIVE rec(x) AS (

��� WITH outermost(x) AS (
����� SELECT (
������� WITH innermost as (SELECT 1)
������� SELECT * FROM innermost
����� )
��� )
��� SELECT * FROM outermost
)
SELECT * FROM rec;
" >>src/test/regress/sql/with.sql
make check

'make check' fails and src/test/regress/log/postmaster.log contains the
aforementioned valgrind message.

If it's still not reproduced for you, please let me know your
OS/compiler/valgrind version and I'll try it in your environment too.

[1]: /messages/by-id/10dae4a1-e714-601d-7518-c19414255180@gmail.com
/messages/by-id/10dae4a1-e714-601d-7518-c19414255180@gmail.com

Best regards,
Alexander

#4Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#3)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Sun, Jan 03, 2021 at 09:00:00AM +0300, Alexander Lakhin wrote:

I'm using gcc 10.2.0 and valgrind-3.15.0, and building REL_13_STABLE
(c09f6882) with:
CPPFLAGS="-DUSE_VALGRIND -Og" ./configure --enable-debug
--enable-cassert && make -j8
Also I'm using a  patch [1] to inject valgrind into the `make check`
procedure.

Thanks, -DUSE_VALGRIND is the part that mattered here. I can now
reproduce it. I'll try to look at that more in details.
--
Michael

#5Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#2)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Hello Michael,
03.01.2021 08:15, Michael Paquier wrote:

On Sat, Jan 02, 2021 at 03:00:00PM +0000, PG Bug reporting form wrote:

valgrind detects an invalid read:
==00:00:00:04.145 217144== Invalid read of size 8
==00:00:00:04.145 217144== at 0x302CB7: makeDependencyGraphWalker
(parse_cte.c:549)
==00:00:00:04.145 217144== by 0x302EA1: makeDependencyGraph
(parse_cte.c:439)
==00:00:00:04.145 217144== by 0x304557: transformWithClause
(parse_cte.c:176)

The first bad commit is 1cff1b95.

The same kind of list manipulation is done in two places in
parse_cte.c, and there are extra ones in split_pathtarget_walker(). I
cannot reproduce that here, and I have just tried with different
optimization levels on HEAD and REL_13_STABLE. Are you using specific
options for valgrind?

I've found out that the list implementation doesn't support the
following usage pattern:
List *testList = NIL;
ListCell *testCell;

testList = lcons(NIL, testList);
testCell = list_head(testList);
...
testList = lcons(NIL, testList);

elog(INFO, "lfirst(testCell): %p", lfirst(testCell)); // prints
0x7f7f7f7f7f7f7f7f when compiled with -DUSE_VALGRIND

(Such list manipulation is happening in that makeDependencyGraphWalker
call.)

Best regards,
Alexander

#6Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#5)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Tue, Feb 23, 2021 at 09:00:00AM +0300, Alexander Lakhin wrote:

I've found out that the list implementation doesn't support the
following usage pattern:
List *testList = NIL;
ListCell *testCell;

testList = lcons(NIL, testList);
testCell = list_head(testList);
...
testList = lcons(NIL, testList);

elog(INFO, "lfirst(testCell): %p", lfirst(testCell)); // prints
0x7f7f7f7f7f7f7f7f when compiled with -DUSE_VALGRIND

(Such list manipulation is happening in that makeDependencyGraphWalker
call.)

Thanks for the reminder, I was not able to get around this issue.
Anyway.. What's happening here is that the second lcons() call does
new_head_cell() as testList is not NIL. This itself calls
enlarge_list(), followed by wipe_mem() which invalidates the position
of the list head previously stored. Oops.

Coming back to the original problem, as you say there is some
confusion about the list operation that we had better clarify. From
what I can see cstate->innerwiths stores a List of Lists, and in each
passage the code tries to build a new list appended to
cstate->innerwiths. Using a ListCell to be able to track where the
new list head is located is awkward on HEAD (the business with cell1),
and there should be no need to keep around the reference to the first
element innerwiths, as long as you save the result in a new, separate,
List.

The second case in checkWellFormedRecursionWalker() is equally
dangerous. I have been playing with subselects and more recursions
and could not trigger a problem with -DUSE_VALGRIND, but let's be
safe.

So attached is a patch to take care of this, with a regression test
based on what has been sent upthread. This solves the issue for me.

Thoughts?
--
Michael

Attachments:

valgrind-cte-fix.patchtext/x-diff; charset=us-asciiDownload+32-8
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#6)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Michael Paquier <michael@paquier.xyz> writes:

So attached is a patch to take care of this, with a regression test
based on what has been sent upthread. This solves the issue for me.

Surely that breaks things entirely (if it doesn't, then we are badly
under-testing this area). A nil list is just a null pointer, so
appending to "new_cte_list" later isn't going to affect what was
previously put into the innerwiths list.

I haven't tested, but I think a more correct fix would be

- ListCell *cell1;

                cstate->innerwiths = lcons(NIL, cstate->innerwiths);
-               cell1 = list_head(cstate->innerwiths);
                foreach(lc, stmt->withClause->ctes)
                {
                    CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
+                   ListCell   *cell1;
                    (void) makeDependencyGraphWalker(cte->ctequery, cstate);
+                   /* note innerwiths list can change during recursion */
+                   cell1 = list_head(cstate->innerwiths);
                    lfirst(cell1) = lappend((List *) lfirst(cell1), cte);
                }

ie, recompute the "cell1" pointer each time it's needed instead of
assuming that the original value is good throughout the loop.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

I wrote:

Surely that breaks things entirely (if it doesn't, then we are badly
under-testing this area). A nil list is just a null pointer, so
appending to "new_cte_list" later isn't going to affect what was
previously put into the innerwiths list.

Here's a patch that I think fixes it correctly, including a test
case that doesn't work with your patch.

regards, tom lane

Attachments:

bug-16801-fix.patchtext/x-diff; charset=us-ascii; name=bug-16801-fix.patchDownload+48-6
#9Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#8)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Wed, Feb 24, 2021 at 07:36:26PM -0500, Tom Lane wrote:

Here's a patch that I think fixes it correctly, including a test
case that doesn't work with your patch.

Yes, thanks. While looking at that this morning, I have been able to
get a crash with my previous patch once I used more nesting in those
CTEs and your test is much simpler. I also got a test case able to
break things the same way in checkWellFormedRecursionWalker():
WITH RECURSIVE outermost(x) AS (
SELECT 1
UNION (WITH innermost as (WITH innermost2 AS (SELECT 2) SELECT * FROM innermost2)
SELECT * FROM outermost
UNION SELECT * FROM innermost)
)
SELECT * FROM outermost ORDER BY 1;

Your patch does not have a test for that, but it fixes the list
handling. With more nested levels and some UNIONs, the patch I sent
previously would equally break, though I am not sure we need more than
what I am sending here. What do you think about this extra test?
--
Michael

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#9)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Michael Paquier <michael@paquier.xyz> writes:

Yes, thanks. While looking at that this morning, I have been able to
get a crash with my previous patch once I used more nesting in those
CTEs and your test is much simpler. I also got a test case able to
break things the same way in checkWellFormedRecursionWalker():

WITH RECURSIVE outermost(x) AS (
SELECT 1
UNION (WITH innermost as (WITH innermost2 AS (SELECT 2) SELECT * FROM innermost2)
SELECT * FROM outermost
UNION SELECT * FROM innermost)
)
SELECT * FROM outermost ORDER BY 1;

Hmm, I don't see any failure from that...

In my understanding of the bug, you need at least half a dozen levels of
WITH nesting to provoke a problem, because nothing will go wrong until the
innerwiths list gets to be six entries long, causing list.c to move its
elements array to somewhere else. If it is possible to fail without that
then there's still something here that I don't get.

regards, tom lane

#11Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Thu, Feb 25, 2021 at 10:30:41AM +0900, Michael Paquier wrote:

Your patch does not have a test for that, but it fixes the list
handling. With more nested levels and some UNIONs, the patch I sent
previously would equally break, though I am not sure we need more than
what I am sending here. What do you think about this extra test?

By the way, here is a fancier test case to make the list handling
recurse much more in checkWellFormedRecursionWalker():
WITH RECURSIVE outermost(x) AS (
SELECT 1
UNION (WITH innermost1 AS (
SELECT 2
UNION (WITH innermost2 AS (
SELECT 3
UNION (WITH innermost3 AS (
SELECT 4
UNION (WITH innermost4 AS (
SELECT 5
UNION (WITH innermost5 AS (SELECT 6)
SELECT * FROM innermost5))
SELECT * FROM innermost4))
SELECT * FROM innermost3))
SELECT * FROM innermost2))
SELECT * FROM outermost
UNION SELECT * FROM innermost1)
)
SELECT * FROM outermost ORDER BY 1;
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#10)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Wed, Feb 24, 2021 at 09:13:46PM -0500, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

WITH RECURSIVE outermost(x) AS (
SELECT 1
UNION (WITH innermost as (WITH innermost2 AS (SELECT 2) SELECT * FROM innermost2)
SELECT * FROM outermost
UNION SELECT * FROM innermost)
)
SELECT * FROM outermost ORDER BY 1;

Hmm, I don't see any failure from that...

Perhaps because you are not compiling with -DUSE_VALGRIND which is why
this fails with only two nested levels? I get a failure once I do
that.

In my understanding of the bug, you need at least half a dozen levels of
WITH nesting to provoke a problem, because nothing will go wrong until the
innerwiths list gets to be six entries long, causing list.c to move its
elements array to somewhere else. If it is possible to fail without that
then there's still something here that I don't get.

Anyway, I also get a failure without -DUSE_VALGRIND once I apply 6
nested levels:
#1 0x00007ff47eb5b537 in __GI_abort () at abort.c:79
#2 0x000055b24c7b340e in ExceptionalCondition
(conditionName=0x55b24c8e917a "cstate->innerwiths == NIL",
errorType=0x55b24c8e8803 "FailedAssertion",
fileName=0x55b24c8e8a08 "parse_cte.c", lineNumber=869) at
assert.c:69
#3 0x000055b24c2db9a5 in checkWellFormedRecursion
(cstate=0x7ffe2fce8a30) at parse_cte.c:869

Here you go with a test case:
WITH RECURSIVE outermost(x) AS (
SELECT 1
UNION (WITH innermost1 AS (
SELECT 2
UNION (WITH innermost2 AS (
SELECT 3
UNION (WITH innermost3 AS (
SELECT 4
UNION (WITH innermost4 AS (
SELECT 5
UNION (WITH innermost5 AS (
SELECT 6
UNION (WITH innermost6 AS (SELECT 7)
SELECT * FROM innermost6))
SELECT * FROM innermost5))
SELECT * FROM innermost4))
SELECT * FROM innermost3))
SELECT * FROM innermost2))
SELECT * FROM outermost
UNION SELECT * FROM innermost1)
)
SELECT * FROM outermost ORDER BY 1;

My previous patch and HEAD break on that in
checkWellFormedRecursionWalker(), not your patch.
--
Michael

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#12)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Feb 24, 2021 at 09:13:46PM -0500, Tom Lane wrote:

Hmm, I don't see any failure from that...

Perhaps because you are not compiling with -DUSE_VALGRIND which is why
this fails with only two nested levels?

Ah, right, because that enables DEBUG_LIST_MEMORY_USAGE. I did run
it under valgrind but I hadn't bothered to recompile.

Anyway, I think we're better off with a test that doesn't require
DEBUG_LIST_MEMORY_USAGE, or preferably not even CLOBBER_FREED_MEMORY,
to show the problem. The test I showed misbehaves even in non-debug
builds, because it actually depends on what's in the innerwiths lists.

regards, tom lane

#14Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#13)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Wed, Feb 24, 2021 at 09:48:03PM -0500, Tom Lane wrote:

Anyway, I think we're better off with a test that doesn't require
DEBUG_LIST_MEMORY_USAGE, or preferably not even CLOBBER_FREED_MEMORY,
to show the problem. The test I showed misbehaves even in non-debug
builds, because it actually depends on what's in the innerwiths lists.

Yeah. What I sent previously does not break in non-debug builds, but
it does with just --enable-cassert.

Hmm. I'd like to think that this would be enough for this thread, and
I cannot come up now with a test able to break the lists for non-debug
builds. But perhaps you have an idea?
--
Michael

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#14)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Feb 24, 2021 at 09:48:03PM -0500, Tom Lane wrote:

Anyway, I think we're better off with a test that doesn't require
DEBUG_LIST_MEMORY_USAGE, or preferably not even CLOBBER_FREED_MEMORY,
to show the problem. The test I showed misbehaves even in non-debug
builds, because it actually depends on what's in the innerwiths lists.

Yeah. What I sent previously does not break in non-debug builds, but
it does with just --enable-cassert.

Hmm. I'd like to think that this would be enough for this thread, and
I cannot come up now with a test able to break the lists for non-debug
builds. But perhaps you have an idea?

For me, the example I gave fails in a non-debug build. With the code as
of HEAD, I get "ERROR: stack depth limit exceeded" or a segfault. With
your patch applied, it complains about w6 not having the correct form for
a recursive CTE.

regards, tom lane

#16Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#15)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Thu, Feb 25, 2021 at 10:19:43AM -0500, Tom Lane wrote:

For me, the example I gave fails in a non-debug build. With the code as
of HEAD, I get "ERROR: stack depth limit exceeded" or a segfault. With
your patch applied, it complains about w6 not having the correct form for
a recursive CTE.

Yes, same here for the test stressing makeDependencyGraphWalker().

The second test I posted for checkWellFormedRecursionWalker() passes
on HEAD in a non-debug build, and triggers an assertion with debug
builds. So the second test requires CLOBBER_FREED_MEMORY but not
DEBUG_LIST_MEMORY_USAGE. That's not as good as the first one, but I
would vote for having this second test than none to stress more the
list handling when looking after invalid self-references in a CTE.
This gives me the attached.

I would rather have both tests in the version committed, but if you
think that this addition is not necessary I won't fight hard either :)
--
Michael

Attachments:

bug-16801-fix-v2.patchtext/x-diff; charset=us-asciiDownload+109-6
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#16)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

Michael Paquier <michael@paquier.xyz> writes:

I would rather have both tests in the version committed, but if you
think that this addition is not necessary I won't fight hard either :)

I'm doubtful that the extra test is really testing an independent
issue, but I don't wanna fight about it either. Will push this
in a few.

regards, tom lane

#18Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#17)
Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs

On Thu, Feb 25, 2021 at 08:31:26PM -0500, Tom Lane wrote:

I'm doubtful that the extra test is really testing an independent
issue, but I don't wanna fight about it either. Will push this
in a few.

Thanks!
--
Michael