[PATCH] nodeindexscan with reorder memory leak
Hey!
I was investigating a leak reported in the PostGIS issues tracker [1]https://trac.osgeo.org/postgis/ticket/4720 which
led me to the Postgres side where the problem really is. The leak is
reproducible with query from original ticket [1]https://trac.osgeo.org/postgis/ticket/4720:
WITH latitudes AS (
SELECT generate_series AS latitude
FROM generate_series(-90, 90, 0.1)
), longitudes AS (
SELECT generate_series AS longitude
FROM generate_series(-180, 180, 0.1)
), points AS (
SELECT ST_SetSRID(ST_Point(longitude, latitude), 4326)::geography AS geog
FROM latitudes
CROSS JOIN longitudes
)
SELECT
geog,
(
SELECT name
FROM ne_110m_admin_0_countries AS ne
ORDER BY p.geog <-> ne.geog
LIMIT 1
)
FROM points AS p
;
The leak is only noticeable when index scan with reorder happens as part of
subquery plan which is explained by the fact that heap tuples cloned in
reorderqueue_push are not freed during flush of reorder queue in
ExecReScanIndex.
Attachments:
0001-nodeindexscan_with_reorder_memory_leak.patchapplication/octet-stream; name=0001-nodeindexscan_with_reorder_memory_leak.patchDownload+5-2
Aliaksandr Kalenik <akalenik@kontur.io> writes:
The leak is only noticeable when index scan with reorder happens as part of
subquery plan which is explained by the fact that heap tuples cloned in
reorderqueue_push are not freed during flush of reorder queue in
ExecReScanIndex.
Hmm ... I see from the code coverage report[1]https://coverage.postgresql.org/src/backend/executor/nodeIndexscan.c.gcov.html#577 that that part of
ExecReScanIndexScan isn't even reached by our existing regression
tests. Seems bad :-(
regards, tom lane
[1]: https://coverage.postgresql.org/src/backend/executor/nodeIndexscan.c.gcov.html#577
Aliaksandr Kalenik <akalenik@kontur.io> writes:
I was investigating a leak reported in the PostGIS issues tracker [1] which
led me to the Postgres side where the problem really is. The leak is
reproducible with query from original ticket [1]:
...
The leak is only noticeable when index scan with reorder happens as part of
subquery plan which is explained by the fact that heap tuples cloned in
reorderqueue_push are not freed during flush of reorder queue in
ExecReScanIndex.
Actually, that code has got worse problems than that. I tried to improve
our regression tests to exercise that code path, as attached. What I got
was
+SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> p
oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x;
+ERROR: index returned tuples in wrong order
(The error doesn't always appear depending on what generate_series
parameters you use, but it seems to show up consistently with
a step of 1 and a limit of 1000 or more.)
Fixing this is well beyond my knowledge of that code, so I'm punting
it to the original authors.
regards, tom lane
Attachments:
failing-test-case.patchtext/x-diff; charset=us-ascii; name=failing-test-case.patchDownload+4-0
Thanks for your review!
On Sun, Jan 30, 2022 at 7:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually, that code has got worse problems than that. I tried to improve
our regression tests to exercise that code path, as attached. What I got
was+SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> p oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x; +ERROR: index returned tuples in wrong order
I tried to figure out what is the problem with this query. This error
happens when actual distance is less than estimated distance.
For this specific query it happened while comparing these values:
50.263279680219099532223481219262 (actual distance returned by
dist_cpoint in geo_ops.c) and 50.263279680219113743078196421266
(bounding box distance returned by computeDistance in gistproc.c).
So for me it looks like this error is not really related to KNN scan
code but to some floating-arithmetic issue in distance calculation
functions for geometry type.Would be great to figure out a fix but
for now I didn’t manage to find a better way than comparing the
difference of distance with FLT_EPSILON which definitely doesn't seem
like the way to fix :(
Hi!
On Mon, Feb 7, 2022 at 11:42 AM Aliaksandr Kalenik <akalenik@kontur.io> wrote:
Thanks for your review!
On Sun, Jan 30, 2022 at 7:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually, that code has got worse problems than that. I tried to improve
our regression tests to exercise that code path, as attached. What I got
was+SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 <-> p oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x; +ERROR: index returned tuples in wrong orderI tried to figure out what is the problem with this query. This error
happens when actual distance is less than estimated distance.
For this specific query it happened while comparing these values:
50.263279680219099532223481219262 (actual distance returned by
dist_cpoint in geo_ops.c) and 50.263279680219113743078196421266
(bounding box distance returned by computeDistance in gistproc.c).So for me it looks like this error is not really related to KNN scan
code but to some floating-arithmetic issue in distance calculation
functions for geometry type.Would be great to figure out a fix but
for now I didn’t manage to find a better way than comparing the
difference of distance with FLT_EPSILON which definitely doesn't seem
like the way to fix :(
Probably, this is caused by some compiler optimization. Could you
re-check the issue with different compilers and optimization levels?
Regarding the memory leak, could you add a corresponding regression
test to the patch (probably similar to Tom's query upthread)?
------
Regards,
Alexander Korotkov
On Mon, Feb 7, 2022 at 11:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Regarding the memory leak, could you add a corresponding regression
test to the patch (probably similar to Tom's query upthread)?
Yes, added similar to Tom's query but with polygons instead of circles.
Attachments:
0002-nodeindexscan_with_reorder_memory_leak.patchapplication/octet-stream; name=0002-nodeindexscan_with_reorder_memory_leak.patchDownload+36-2
On Thu, Feb 10, 2022 at 1:19 AM Aliaksandr Kalenik <akalenik@kontur.io> wrote:
On Mon, Feb 7, 2022 at 11:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Regarding the memory leak, could you add a corresponding regression
test to the patch (probably similar to Tom's query upthread)?Yes, added similar to Tom's query but with polygons instead of circles.
I've rechecked that the now patched code branch is covered by
regression tests. I think the memory leak issue is independent of the
computational errors we've observed.
So, I'm going to push and backpatch this if no objections.
------
Regards,
Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes:
I've rechecked that the now patched code branch is covered by
regression tests. I think the memory leak issue is independent of the
computational errors we've observed.
So, I'm going to push and backpatch this if no objections.
+1. We should work on the roundoff-error issue as well, but
as you say, it's an independent problem.
regards, tom lane
On Thu, Feb 10, 2022 at 2:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
I've rechecked that the now patched code branch is covered by
regression tests. I think the memory leak issue is independent of the
computational errors we've observed.
So, I'm going to push and backpatch this if no objections.+1. We should work on the roundoff-error issue as well, but
as you say, it's an independent problem.
Pushed, thank you.
------
Regards,
Alexander Korotkov