BUG #16183: PREPARED STATEMENT slowed down by jit

Started by PG Bug reporting formover 6 years ago9 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16183
Logged by: Christian Quest
Email address: github@cquest.org
PostgreSQL version: 12.1
Operating system: Ubuntu 18.04
Description:

I'm running several OpenStreetMap map rendering servers. They are all based
on PG/postgis.
After an upgrade from v10 to 12, the usual update process took 10x more time
than it used to.

After digging a few days, it appear that the default jit=on is the cause.

The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?

The other side of the issue is described here:
https://github.com/openstreetmap/osm2pgsql/issues/1045

#2Jeff Janes
jeff.janes@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

The update tool (osm2pgsql) makes heavy use of PREPARED statements. Could
there be a bug in jit+PREPARED causing the statement to be recompiled on
each execute ?

I don't think that that is a bug, rather it is just how JIT works at the
moment (that it is recompiled for each execution). That might be improved
in a future version.

For the slow down to be 10x though does seem rather extreme. Could you
isolate the query and post an execution with "EXPLAIN (ANALYZE)"? I
suspect it is a very cheap query (e.g. a single-row update), but for some
reason the planner thinks it is expensive, and so JIT kicks in when it
shouldn't. Because of the high overhead, JIT is only supposed to activate
for expensive queries, see jit_above_cost.

Cheers,

Jeff

#3Christian Quest
cquest@cquest.org
In reply to: Jeff Janes (#2)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

The queries are very simple, like:

PREPARE mark_ways_by_node(" POSTGRES_OSMID_TYPE ") AS select id from
planet_osm_ways WHERE nodes && ARRAY[$1];

They are all located here in osm2pgsql source code:

https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786

Here are the EXPLAIN/ANALYZE without jit and with jit:

osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32
rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])
   Heap Blocks: exact=2
   ->  Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00
rows=301467 width=0) (actual time=0.029..0.029 rows=2 loops=1)
         Index Cond: (nodes && '{1836953770}'::bigint[])
 Planning Time: 0.171 ms
 Execution Time: 0.077 ms
(7 rows)

osm=# PREPARE mark_ways_by_node(bigint) AS select id from
planet_osm_ways WHERE nodes && ARRAY[$1];
PREPARE
osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways  (cost=2468.37..305182.32
rows=301467 width=8) (actual time=5.775..5.905 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])
   Heap Blocks: exact=2
   ->  Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00
rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1)
         Index Cond: (nodes && '{1836953770}'::bigint[])
 Planning Time: 3.667 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true,
Deforming true
   Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354
ms, Emission 4.634 ms, Total 5.454 ms
 Execution Time: 30.393 ms
(11 rows)

Le 02/01/2020 à 19:00, Jeff Janes a écrit :

Show quoted text

The update tool (osm2pgsql) makes heavy use of PREPARED
statements. Could
there be a bug in jit+PREPARED causing the statement to be
recompiled on
each execute ?

I don't think that that is a bug, rather it is just how JIT works at
the moment (that it is recompiled for each execution).  That might be
improved in a future version.

For the slow down to be 10x though does seem rather extreme.  Could
you isolate the query and post an execution with "EXPLAIN (ANALYZE)"? 
I suspect it is a very cheap query (e.g. a single-row update), but for
some reason the planner thinks it is expensive, and so JIT kicks in
when it shouldn't.  Because of the high overhead, JIT is only supposed
to activate for expensive queries, see jit_above_cost.

Cheers,

Jeff

--
Ce message a été vérifié par *MailScanner* <http://www.mailscanner.info/&gt;
pour des virus ou des polluriels et rien de
suspect n'a été trouvé.

#4Andres Freund
andres@anarazel.de
In reply to: Christian Quest (#3)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

Hi,

On 2020-01-02 22:29:31 +0100, Christian Quest wrote:

osm=# PREPARE mark_ways_by_node(bigint) AS select id from planet_osm_ways
WHERE nodes && ARRAY[$1];
PREPARE
osm=# explain analyze execute mark_ways_by_node(1836953770);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
�Bitmap Heap Scan on planet_osm_ways� (cost=2468.37..305182.32 rows=301467
width=8) (actual time=5.775..5.905 rows=2 loops=1)
�� Recheck Cond: (nodes && '{1836953770}'::bigint[])
�� Heap Blocks: exact=2
�� ->� Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..2393.00
rows=301467 width=0) (actual time=0.512..0.512 rows=2 loops=1)
�������� Index Cond: (nodes && '{1836953770}'::bigint[])
�Planning Time: 3.667 ms
�JIT:
�� Functions: 4
�� Options: Inlining false, Optimization false, Expressions true, Deforming
true
�� Timing: Generation 0.466 ms, Inlining 0.000 ms, Optimization 0.354 ms,
Emission 4.634 ms, Total 5.454 ms
�Execution Time: 30.393 ms
(11 rows)

I'm not too surprised heuristics down't work, if the row sestimate is
off by ~5 orders of magnitude... Caching could "fix" this by making the
JIT cost less noticable, but you're very liable to get bad plan
decisions as long as you're that far off with estimated plan costs.

Greetings,

Andres Freund

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Christian Quest (#3)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:

osm=# explain analyze execute mark_ways_by_node(1836953770);

QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32 rows=301467
width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])

I think your estimation here is falling victim to an deficiency in how
stats are computed on array types when all values in the array (across all
rows) are rare. See the discussion of this at
/messages/by-id/CAMkU=1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw@mail.gmail.com

(My quick and dirty patch posted there still compiles and works, if you
would like to test that it fixes the problem for you.)

Because the number of rows is vastly overestimated, so is the cost. Which
then causes JIT to kick in counter-productively, due to the deranged cost
exceeding jit_above_cost.

Cheers,

Jeff

Show quoted text
#6Christian Quest
cquest@cquest.org
In reply to: Jeff Janes (#5)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

Le 03/01/2020 à 15:50, Jeff Janes a écrit :

On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org
<mailto:cquest@cquest.org>> wrote:

osm=# explain analyze execute mark_ways_by_node(1836953770);

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32
rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
   Recheck Cond: (nodes && '{1836953770}'::bigint[])

I think your estimation here is falling victim to an deficiency in how
stats are computed on array types when all values in the array (across
all rows) are rare.  See the discussion of this at
/messages/by-id/CAMkU=1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw@mail.gmail.com

(My quick and dirty patch posted there still compiles and works, if
you would like to test that it fixes the problem for you.)
Because the number of rows is vastly overestimated, so is the cost. 
Which then causes JIT to kick in counter-productively, due to the
deranged cost exceeding jit_above_cost.

Cheers,

Jeff

This wrong cost may have other side effect, like launching parallel workers.

Another person got the same problem, but my simple fix of disabling jit
did not make it for him. My test were done on a smaller database
(OpenStreetMap data extract only covering France), his was on a full
planet dataset. The computed rows where 10x higher.

We found a workaround (disabling jit and parallel workers for the
session), but a more global fix on this wrong evaluation of rows should
be considered for other cases ;)

Thanks for your time on this issue.

Christian

#7Ryan Lambert
ryan@rustprooflabs.com
In reply to: Jeff Janes (#5)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

On Fri, Jan 3, 2020 at 7:50 AM Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Jan 2, 2020 at 5:03 PM Christian Quest <cquest@cquest.org> wrote:

osm=# explain analyze execute mark_ways_by_node(1836953770);

QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on planet_osm_ways (cost=2468.37..305182.32
rows=301467 width=8) (actual time=0.039..0.042 rows=2 loops=1)
Recheck Cond: (nodes && '{1836953770}'::bigint[])

I think your estimation here is falling victim to an deficiency in how
stats are computed on array types when all values in the array (across all
rows) are rare. See the discussion of this at
/messages/by-id/CAMkU=1x2W1gpEP3AQsrSA30uxQk1Sau5VDOLL4LkhWLwrOY8Lw@mail.gmail.com

(My quick and dirty patch posted there still compiles and works, if you
would like to test that it fixes the problem for you.)

Because the number of rows is vastly overestimated, so is the cost. Which
then causes JIT to kick in counter-productively, due to the deranged cost
exceeding jit_above_cost.

Cheers,

Jeff

Hi Jeff,

I tried applying your original patch and the later patch from that thread
to REL_12_STABLE, unfortunately no luck on either. The original patch
errors with:

$ git apply -p1 < array_type_analyze_MCE_V001.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working
directory

The other with:

$ git apply -p1 < array_typanalyze_0_mce.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working
directory
error: src/backend/utils/cache/lsyscache.c: already exists in working
directory

I am happy to review and test patches that address this particular issue.
Thanks,

*Ryan Lambert*

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Ryan Lambert (#7)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

On Sun, Jan 5, 2020 at 7:21 PM Ryan Lambert <ryan@rustprooflabs.com> wrote:

I tried applying your original patch and the later patch from that thread
to REL_12_STABLE, unfortunately no luck on either. The original patch
errors with:

$ git apply -p1 < array_type_analyze_MCE_V001.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in working
directory

I just use the GNU "patch -p1" command. I don't know what "git apply" is
up to here, and the error message isn't very informative.

By the way, the patch says "However, there is no histogram to fall back on
when the MCE array is NULL", which is a bit of nonsense. It is ndistinct,
not histogram, which the scalar case falls back on when MCV is NULL. But
there isn't an ndistinct for MCE either, so the conclusion is still sound.

Cheers,

Jeff

Show quoted text
#9Ryan Lambert
ryan@rustprooflabs.com
In reply to: Jeff Janes (#8)
Re: BUG #16183: PREPARED STATEMENT slowed down by jit

On Mon, Jan 6, 2020 at 10:51 AM Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Jan 5, 2020 at 7:21 PM Ryan Lambert <ryan@rustprooflabs.com>
wrote:

I tried applying your original patch and the later patch from that thread
to REL_12_STABLE, unfortunately no luck on either. The original patch
errors with:

$ git apply -p1 < array_type_analyze_MCE_V001.patch
error: src/backend/utils/adt/array_typanalyze.c: already exists in
working directory

I just use the GNU "patch -p1" command. I don't know what "git apply" is
up to here, and the error message isn't very informative.

By the way, the patch says "However, there is no histogram to fall back on
when the MCE array is NULL", which is a bit of nonsense. It is ndistinct,
not histogram, which the scalar case falls back on when MCV is NULL. But
there isn't an ndistinct for MCE either, so the conclusion is still sound.

Cheers,

Jeff

Thank you, using patch instead of git apply works. Your patch [1]/messages/by-id/attachment/39315/array_type_analyze_MCE_V001.patch applies
cleanly and passes make installcheck-world against REL_12_STABLE. It
greatly improves the estimated row counts in the prepared statements [2]https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786
that seem to be causing the issue. I have not yet verified this patch
fixes the originally reported performance issue regarding updating
data with osm2pgsql [3]https://github.com/openstreetmap/osm2pgsql/issues/1045.

The following query uses an ID from Colorado region's ways table. A gzipped
plain pg_dump (97MB) is available [4]https://drive.google.com/file/d/1hDpiTiAazG_g1ObMBTRpe-4B_r8gidch/view?usp=sharing to make this easier to replicate if
helpful.

PREPARE mark_ways_by_node(BIGINT) AS
SELECT id FROM public.planet_osm_ways
WHERE nodes && ARRAY[$1];
EXPLAIN(ANALYZE)
EXECUTE mark_ways_by_node(736973985);

The explain results from Unpatched Postgres 12.1 shows estimated 10,222
rows when actual rows=1.

QUERY PLAN

--------------------------------------------------------------------------------------------------------------
----------------------
Bitmap Heap Scan on planet_osm_ways (cost=119.22..28648.09 rows=10222
width=8) (actual time=0.045..0.047 rows=1 loops=1)
Recheck Cond: (nodes && '{736973985}'::bigint[])
Heap Blocks: exact=1
-> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..116.67
rows=10222 width=0) (actual time=0.032..
0.033 rows=1 loops=1)
Index Cond: (nodes && '{736973985}'::bigint[])
Planning Time: 0.288 ms
Execution Time: 0.102 ms

The same data with the patch [1]/messages/by-id/attachment/39315/array_type_analyze_MCE_V001.patch applied is much closer with an estimated
68 rows. The OP's query plan showed an estimated row count > 300k when
actual rows was 2. It's logical to suspect the costly JIT / parallel
query would not be chosen if their row count estimates improve by similar
margins.

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on planet_osm_ways (cost=28.53..295.41 rows=68 width=8)
(actual time=0.045..0.046 rows=1 loops=1)
Recheck Cond: (nodes && '{736973985}'::bigint[])
Heap Blocks: exact=1
-> Bitmap Index Scan on planet_osm_ways_nodes (cost=0.00..28.51
rows=68 width=0) (actual time=0.034..0.035 rows=1 loops
=1)
Index Cond: (nodes && '{736973985}'::bigint[])
Planning Time: 0.209 ms
Execution Time: 0.092 ms

I'll try to do more testing in a few days to verify if this patch fixes the
issue with osm2pgsql updates.

Thanks,
Ryan

[1]: /messages/by-id/attachment/39315/array_type_analyze_MCE_V001.patch
/messages/by-id/attachment/39315/array_type_analyze_MCE_V001.patch
[2]: https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786
https://github.com/openstreetmap/osm2pgsql/blob/master/src/middle-pgsql.cpp#L786
[3]: https://github.com/openstreetmap/osm2pgsql/issues/1045
[4]: https://drive.google.com/file/d/1hDpiTiAazG_g1ObMBTRpe-4B_r8gidch/view?usp=sharing
https://drive.google.com/file/d/1hDpiTiAazG_g1ObMBTRpe-4B_r8gidch/view?usp=sharing