PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Started by Andres Freundalmost 4 years ago75 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

Mark Callaghan reported a regression in 15, in the post linked here (with
comments in the twitter thread, hence linked here)
https://twitter.com/MarkCallaghanDB/status/1537475430227161098

A differential flame graph shows increased time spent doing memory
allocations, below ExecInitExpr():
https://mdcallag.github.io/reports/22_06_06_ibench.20m.pg.all/l.i0.0.143.15b1.n.svg

Quite the useful comparison.

That lead to me to look at the size of ExprEvalStep - and indeed, it has
*exploded* in 15. 10-13: 64 bytes, 14: 320 bytes.

The comment above the union for data fields says:
/*
* Inline data for the operation. Inline data is faster to access, but
* also bloats the size of all instructions. The union should be kept to
* no more than 40 bytes on 64-bit systems (so that the entire struct is
* no more than 64 bytes, a single cacheline on common systems).
*/

However, jsonexpr/EEOP_JSONEXPR is 296 bytes, and
hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
limit is 40 bytes.

The EEOP_JSONEXPR stuff was added during 15 development in:

commit 1a36bc9dba8eae90963a586d37b6457b32b2fed4
Author: Andrew Dunstan <andrew@dunslane.net>
Date: 2022-03-03 13:11:14 -0500

SQL/JSON query functions

the EEOP_HASHED_SCALARARRAYOP stuff was added during 14 development in:

commit 50e17ad281b8d1c1b410c9833955bc80fbad4078
Author: David Rowley <drowley@postgresql.org>
Date: 2021-04-08 23:51:22 +1200

Speedup ScalarArrayOpExpr evaluation

Unfortunately ExprEvalStep is public, so I don't think we can fix the 14
regression. If somebody installed updated server packages while the server is
running, we could end up loading extensions referencing ExprEvalStep (at least
plpgsql and LLVMJIT).

It's not great to have an ABI break at this point of the 15 cycle, but I don't
think we have a choice. Exploding the size of ExprEvalStep by ~4x is bad -
both for memory overhead (expressions often have dozens to hundreds of steps)
and expression evaluation performance.

The Hashed SAO case can perhaps be squeezed sufficiently to fit inline, but
clearly that's not going to happen for the json case. So we should just move
that out of line.

Maybe it's worth sticking a StaticAssert() for the struct size somewhere. I'm
a bit wary about that being too noisy, there are some machines with odd
alignment requirements. Perhaps worth restricting the assertion to x86-64 +
armv8 or such?

It very well might be that this isn't the full explanation of the regression
Mark observed. E.g. the difference in DecodeDateTime() looks more likely to be
caused by 591e088dd5b - but we need to fix the above issue, whether it's the
cause of the regression Mark observed or not.

Greetings,

Andres Freund

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Andres Freund <andres@anarazel.de> writes:

However, jsonexpr/EEOP_JSONEXPR is 296 bytes, and
hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
limit is 40 bytes.

Oops.

Maybe it's worth sticking a StaticAssert() for the struct size
somewhere.

Indeed. I thought we had one already.

I'm a bit wary about that being too noisy, there are some machines with
odd alignment requirements. Perhaps worth restricting the assertion to
x86-64 + armv8 or such?

I'd put it in first and only reconsider if it shows unfixable problems.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-16 16:31:30 -0700, Andres Freund wrote:

The EEOP_JSONEXPR stuff was added during 15 development in:

commit 1a36bc9dba8eae90963a586d37b6457b32b2fed4
Author: Andrew Dunstan <andrew@dunslane.net>
Date: 2022-03-03 13:11:14 -0500

SQL/JSON query functions

I'm quite confused about part of the struct definition of this:

struct JsonCoercionsState
{
struct JsonCoercionState
{
JsonCoercion *coercion; /* coercion expression */
ExprState *estate; /* coercion expression state */
} null,
string,
numeric ,
boolean,
date,
time,
timetz,
timestamp,
timestamptz,
composite;
} coercions; /* states for coercion from SQL/JSON item
* types directly to the output type */

Why on earth do we have coercion state for all these different types? That
really adds up:

struct {
JsonExpr * jsexpr; /* 24 8 */
struct {
FmgrInfo func; /* 32 48 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
Oid typioparam; /* 80 4 */
} input; /* 32 56 */

/* XXX last struct has 4 bytes of padding */

NullableDatum * formatted_expr; /* 88 8 */
NullableDatum * res_expr; /* 96 8 */
NullableDatum * coercion_expr; /* 104 8 */
NullableDatum * pathspec; /* 112 8 */
ExprState * result_expr; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
ExprState * default_on_empty; /* 128 8 */
ExprState * default_on_error; /* 136 8 */
List * args; /* 144 8 */
void * cache; /* 152 8 */
struct JsonCoercionsState coercions; /* 160 160 */
} jsonexpr; /* 24 296 */

And why is FmgrInfo stored inline in the struct? Everything else just stores
pointers to FmgrInfo.

Now that I look at this: It's a *bad* idea to have subsidiary ExprState inside
an ExprState. Nearly always the correct thing to do is to build those
expressions. There's plenty memory and evaluation overhead in jumping to a
different expression. And I see no reason for doing it that way here?

This stuff doesn't look ready.

Greetings,

Andres Freund

#4David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#1)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

On Fri, 17 Jun 2022 at 11:31, Andres Freund <andres@anarazel.de> wrote:

hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
limit is 40 bytes.

commit 50e17ad281b8d1c1b410c9833955bc80fbad4078
Author: David Rowley <drowley@postgresql.org>
Date: 2021-04-08 23:51:22 +1200

Speedup ScalarArrayOpExpr evaluation

I've put together the attached patch which removes 4 fields from the
hashedscalararrayop portion of the struct which, once the JSON part is
fixed, will put sizeof(ExprEvalStep) back down to 64 bytes again.

The attached patch causes some extra pointer dereferencing to perform
a hashed saop step, so I tested the performance on f4fb45d15 (prior to
the JSON patch that pushed the sizeof(ExprEvalStep) up further. I
found:

setup:
create table a (a int);
insert into a select x from generate_series(1000000,2000000) x;

bench.sql
select * from a where a in(1,2,3,4,5,6,7,8,9,10);

f4fb45d15 + reduce_sizeof_hashedsaop_ExprEvalStep.patch
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.841851 (without initial connection time)
tps = 44.986472 (without initial connection time)
tps = 44.944315 (without initial connection time)

f4fb45d15
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.446127 (without initial connection time)
tps = 44.614134 (without initial connection time)
tps = 44.895011 (without initial connection time)

(Patched is ~0.61% faster here)

So, there appears to be no performance regression due to the extra
indirection. There's maybe even some gains due to the smaller step
size.

David

Attachments:

reduce_sizeof_hashedsaop_ExprEvalStep.patchtext/plain; charset=US-ASCII; name=reduce_sizeof_hashedsaop_ExprEvalStep.patchDownload+4-19
In reply to: David Rowley (#4)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

On Thu, Jun 16, 2022 at 7:15 PM David Rowley <dgrowleyml@gmail.com> wrote:

So, there appears to be no performance regression due to the extra
indirection. There's maybe even some gains due to the smaller step
size.

Have you tried this with the insert benchmark [1]http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html -- Peter Geoghegan?

I've run it myself in the past (when working on B-Tree deduplication).
It's quite straightforward to set up and run.

[1]: http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html -- Peter Geoghegan
--
Peter Geoghegan

#6David Rowley
dgrowleyml@gmail.com
In reply to: Peter Geoghegan (#5)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

On Fri, 17 Jun 2022 at 15:33, Peter Geoghegan <pg@bowt.ie> wrote:

Have you tried this with the insert benchmark [1]?

I was mostly focusing on the performance of the hashed saop feature
after having removed the additional fields that pushed ExprEvalStep
over 64 bytes in 14.

I agree it would be good to do further benchmarking to see if there's
anything else that's snuck into 15 that's slowed that benchmark down,
but we can likely work on that after we get the ExprEvalStep size back
to 64 bytes again.

David

#7Andres Freund
andres@anarazel.de
In reply to: David Rowley (#6)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-17 16:53:31 +1200, David Rowley wrote:

On Fri, 17 Jun 2022 at 15:33, Peter Geoghegan <pg@bowt.ie> wrote:

Have you tried this with the insert benchmark [1]?

I was mostly focusing on the performance of the hashed saop feature
after having removed the additional fields that pushed ExprEvalStep
over 64 bytes in 14.

I agree it would be good to do further benchmarking to see if there's
anything else that's snuck into 15 that's slowed that benchmark down,
but we can likely work on that after we get the ExprEvalStep size back
to 64 bytes again.

I did reproduce a regression between 14 and 15, using both pgbench -Mprepared
-S (scale 1) and TPC-H Q01 (scale 5). Between 7-10% - not good, particularly
that that's not been found so far. Fixing the json size issue gets that down
to ~2%. Not sure what that's caused by yet.

Greetings,

Andres Freund

Attachments:

v1-0001-wip-fix-EEOP_HASHED_SCALARARRAYOP-state-width.patchtext/x-diff; charset=us-asciiDownload+14-20
v1-0002-WIP-Fix-EEOP_JSON_CONSTRUCTOR-and-EEOP_JSONEXPR-s.patchtext/x-diff; charset=us-asciiDownload+2-1
#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#7)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-16 22:22:28 -0700, Andres Freund wrote:

On 2022-06-17 16:53:31 +1200, David Rowley wrote:

On Fri, 17 Jun 2022 at 15:33, Peter Geoghegan <pg@bowt.ie> wrote:

Have you tried this with the insert benchmark [1]?

I was mostly focusing on the performance of the hashed saop feature
after having removed the additional fields that pushed ExprEvalStep
over 64 bytes in 14.

I agree it would be good to do further benchmarking to see if there's
anything else that's snuck into 15 that's slowed that benchmark down,
but we can likely work on that after we get the ExprEvalStep size back
to 64 bytes again.

I did reproduce a regression between 14 and 15, using both pgbench -Mprepared
-S (scale 1) and TPC-H Q01 (scale 5). Between 7-10% - not good, particularly
that that's not been found so far. Fixing the json size issue gets that down
to ~2%. Not sure what that's caused by yet.

The remaining difference looks like it's largely caused by the
enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, ...) introduced as part of the
pgstats patch. It's only really visible when I pin a single connection pgbench
to the same CPU core as the server (which gives a ~16% boost here).

It's not the timeout itself - that we amortize nicely (via 09cf1d522). It's
that enable_timeout_after() does a GetCurrentTimestamp().

Not sure yet what the best way to fix that is.

We could just leave the timer active and add some gating condition indicating
idleness to the IdleStatsUpdateTimeoutPending body in ProcessInterrupts()?

Or we could add a timeout.c API that specifies the timeout?
pgstat_report_stat() uses GetCurrentTransactionStopTimestamp(), it seems like
it'd make sense to use the same for arming the timeout?

- Andres

#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#8)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

At Thu, 16 Jun 2022 23:24:56 -0700, Andres Freund <andres@anarazel.de> wrote in

The remaining difference looks like it's largely caused by the
enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, ...) introduced as part of the
pgstats patch. It's only really visible when I pin a single connection pgbench
to the same CPU core as the server (which gives a ~16% boost here).

It's not the timeout itself - that we amortize nicely (via 09cf1d522). It's
that enable_timeout_after() does a GetCurrentTimestamp().

Not sure yet what the best way to fix that is.

We could just leave the timer active and add some gating condition indicating
idleness to the IdleStatsUpdateTimeoutPending body in ProcessInterrupts()?

Or we could add a timeout.c API that specifies the timeout?

I sometimes wanted this, But I don't see a simple way to sort multiple
relative timeouts in absolute time order. Maybe we can skip
GetCurrentTimestamp only when inserting the first timeout, but I don't
think it benefits this case.

pgstat_report_stat() uses GetCurrentTransactionStopTimestamp(), it seems like
it'd make sense to use the same for arming the timeout?

This seems like the feasible best fix for this specific issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#9)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

At Fri, 17 Jun 2022 15:54:13 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Or we could add a timeout.c API that specifies the timeout?

I sometimes wanted this, But I don't see a simple way to sort multiple
relative timeouts in absolute time order. Maybe we can skip
GetCurrentTimestamp only when inserting the first timeout, but I don't
think it benefits this case.

Or we can use a free-run interval timer and individual down-counter
for each timtouts. I think we need at-most 0.1s resolution and error
of long-run timer doesn't harm?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#10)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

At Fri, 17 Jun 2022 15:59:26 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Fri, 17 Jun 2022 15:54:13 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Or we could add a timeout.c API that specifies the timeout?

I sometimes wanted this, But I don't see a simple way to sort multiple
relative timeouts in absolute time order. Maybe we can skip
GetCurrentTimestamp only when inserting the first timeout, but I don't
think it benefits this case.

Or we can use a free-run interval timer and individual down-counter
for each timtouts. I think we need at-most 0.1s resolution and error
of long-run timer doesn't harm?

Yeah, stupid. We don't want awake process with such a high frequency..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12James Coleman
jtc331@gmail.com
In reply to: David Rowley (#4)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

On Thu, Jun 16, 2022 at 10:15 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 17 Jun 2022 at 11:31, Andres Freund <andres@anarazel.de> wrote:

hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
limit is 40 bytes.

commit 50e17ad281b8d1c1b410c9833955bc80fbad4078
Author: David Rowley <drowley@postgresql.org>
Date: 2021-04-08 23:51:22 +1200

Speedup ScalarArrayOpExpr evaluation

I've put together the attached patch which removes 4 fields from the
hashedscalararrayop portion of the struct which, once the JSON part is
fixed, will put sizeof(ExprEvalStep) back down to 64 bytes again.

The attached patch causes some extra pointer dereferencing to perform
a hashed saop step, so I tested the performance on f4fb45d15 (prior to
the JSON patch that pushed the sizeof(ExprEvalStep) up further. I
found:

setup:
create table a (a int);
insert into a select x from generate_series(1000000,2000000) x;

bench.sql
select * from a where a in(1,2,3,4,5,6,7,8,9,10);

f4fb45d15 + reduce_sizeof_hashedsaop_ExprEvalStep.patch
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.841851 (without initial connection time)
tps = 44.986472 (without initial connection time)
tps = 44.944315 (without initial connection time)

f4fb45d15
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.446127 (without initial connection time)
tps = 44.614134 (without initial connection time)
tps = 44.895011 (without initial connection time)

(Patched is ~0.61% faster here)

So, there appears to be no performance regression due to the extra
indirection. There's maybe even some gains due to the smaller step
size.

I didn't see that comment when working on this (it's quite a long
unioned struct; I concur on adding an assert to catch it).

This patch looks very reasonable to me though.

James Coleman

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Andres Freund <andres@anarazel.de> writes:

The remaining difference looks like it's largely caused by the
enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, ...) introduced as part of the
pgstats patch. It's only really visible when I pin a single connection pgbench
to the same CPU core as the server (which gives a ~16% boost here).

It's not the timeout itself - that we amortize nicely (via 09cf1d522). It's
that enable_timeout_after() does a GetCurrentTimestamp().

Not sure yet what the best way to fix that is.

Maybe not queue a new timeout if the old one is still active?

BTW, it looks like that patch also falsified this comment
(postgres.c:4478):

* At most one of these timeouts will be active, so there's no need to
* worry about combining the timeout.c calls into one.

Maybe fixing that end of things would be a simpler way of buying back
the delta.

Or we could add a timeout.c API that specifies the timeout?

Don't think that will help: it'd be morally equivalent to
enable_timeout_at(), which also has to do GetCurrentTimestamp().

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

I wrote:

Andres Freund <andres@anarazel.de> writes:

Or we could add a timeout.c API that specifies the timeout?

Don't think that will help: it'd be morally equivalent to
enable_timeout_at(), which also has to do GetCurrentTimestamp().

BTW, if we were willing to drop get_timeout_start_time(), it might
be possible to avoid doing GetCurrentTimestamp() in enable_timeout_at,
in the common case where the specified timestamp is beyond signal_due_at
so that no setitimer call is needed. But getting the race conditions
right could be tricky. On the whole this doesn't sound like something
to tackle post-beta.

regards, tom lane

#15Andres Freund
andres@anarazel.de
In reply to: David Rowley (#4)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-17 14:14:54 +1200, David Rowley wrote:

I've put together the attached patch which removes 4 fields from the
hashedscalararrayop portion of the struct which, once the JSON part is
fixed, will put sizeof(ExprEvalStep) back down to 64 bytes again.

The attached patch causes some extra pointer dereferencing to perform
a hashed saop step, so I tested the performance on f4fb45d15 (prior to
the JSON patch that pushed the sizeof(ExprEvalStep) up further. I
found:

What do you think about the approach prototyped in my patch to move the hash
FunctionCallInfo into the element_tab? With a tiny bit more work that should
reduce the amount of dereferincing over the state today, while also keeping
below the limit?

setup:
create table a (a int);
insert into a select x from generate_series(1000000,2000000) x;

bench.sql
select * from a where a in(1,2,3,4,5,6,7,8,9,10);

f4fb45d15 + reduce_sizeof_hashedsaop_ExprEvalStep.patch
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.841851 (without initial connection time)
tps = 44.986472 (without initial connection time)
tps = 44.944315 (without initial connection time)

f4fb45d15
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.446127 (without initial connection time)
tps = 44.614134 (without initial connection time)
tps = 44.895011 (without initial connection time)

(Patched is ~0.61% faster here)

So, there appears to be no performance regression due to the extra
indirection. There's maybe even some gains due to the smaller step
size.

"smaller step size"?

Greetings,

Andres Freund

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-17 10:33:08 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

The remaining difference looks like it's largely caused by the
enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, ...) introduced as part of the
pgstats patch. It's only really visible when I pin a single connection pgbench
to the same CPU core as the server (which gives a ~16% boost here).

It's not the timeout itself - that we amortize nicely (via 09cf1d522). It's
that enable_timeout_after() does a GetCurrentTimestamp().

Not sure yet what the best way to fix that is.

Maybe not queue a new timeout if the old one is still active?

Right now we disable the timer after ReadCommand(). We can of course change
that. At first I thought we might need more bookkeeping to do so, to avoid
ProcessInterrupts() triggering pgstat_report_stat() when the timer fires
later, but we probably can jury-rig something with DoingCommandRead &&
IsTransactionOrTransactionBlock() or such.

I guess one advantage of something like this could be that we could possibly
move the arming of the timeout to pgstat.c. But that looks like it might be
more complicated than really worth it.

BTW, it looks like that patch also falsified this comment
(postgres.c:4478):

* At most one of these timeouts will be active, so there's no need to
* worry about combining the timeout.c calls into one.

Hm, yea. I guess we can just disable them at once.

Maybe fixing that end of things would be a simpler way of buying back
the delta.

I don't think that'll do the trick - in the case I'm looking at none of the
other timers are active...

Or we could add a timeout.c API that specifies the timeout?

Don't think that will help: it'd be morally equivalent to
enable_timeout_at(), which also has to do GetCurrentTimestamp().

I should have been more precise - what I meant was a timeout.c API that allows
the caller to pass in "now", which in this case we'd get from
GetCurrentTransactionStopTimestamp(), which would avoid the additional
timestamp computation.

Greetings,

Andres Freund

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#16)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Andres Freund <andres@anarazel.de> writes:

I should have been more precise - what I meant was a timeout.c API that allows
the caller to pass in "now", which in this case we'd get from
GetCurrentTransactionStopTimestamp(), which would avoid the additional
timestamp computation.

I don't care for that one bit: it makes the accuracy of all timeouts
dependent on how careful that caller is to provide an up-to-date "now".
In the example at hand, there is WAY too much code between
SetCurrentTransactionStopTimestamp() and the timer arming to make me
think the results will be acceptable.

regards, tom lane

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#17)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-17 13:43:49 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I should have been more precise - what I meant was a timeout.c API that allows
the caller to pass in "now", which in this case we'd get from
GetCurrentTransactionStopTimestamp(), which would avoid the additional
timestamp computation.

I don't care for that one bit: it makes the accuracy of all timeouts
dependent on how careful that caller is to provide an up-to-date "now".

I don't think it'd necessarily have to influence the accuracy of all timeouts
- but I've not looked at timeout.c much before. From what I understand we use
'now' for two things: First, to set ->start_time in enable_timeout() and
second to schedule the alarm in schedule_alarm(). An inaccurate start_time
won't cause problems for other timers afaics and it looks to me that it
wouldn't be too hard to only require an accurate 'now' if the new timeout is
nearest_timeout and now + nearest_timeout < signal_due_at?

It's probably to complicated to tinker with now tho.

Greetings,

Andres Freund

#19Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#16)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

Hi,

On 2022-06-17 10:30:55 -0700, Andres Freund wrote:

On 2022-06-17 10:33:08 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

The remaining difference looks like it's largely caused by the
enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, ...) introduced as part of the
pgstats patch. It's only really visible when I pin a single connection pgbench
to the same CPU core as the server (which gives a ~16% boost here).

It's not the timeout itself - that we amortize nicely (via 09cf1d522). It's
that enable_timeout_after() does a GetCurrentTimestamp().

Not sure yet what the best way to fix that is.

Maybe not queue a new timeout if the old one is still active?

Right now we disable the timer after ReadCommand(). We can of course change
that. At first I thought we might need more bookkeeping to do so, to avoid
ProcessInterrupts() triggering pgstat_report_stat() when the timer fires
later, but we probably can jury-rig something with DoingCommandRead &&
IsTransactionOrTransactionBlock() or such.

Here's a patch for that.

One thing I noticed is that disable_timeout() calls do
schedule_alarm(GetCurrentTimestamp()) if there's any other active timeout,
even if the to-be-disabled timer is already disabled. Of course callers of
disable_timeout() can guard against that using get_timeout_active(), but that
spreads repetitive code around...

I opted to add a fastpath for that, instead of using
get_timeout_active(). Afaics that's safe to do without disarming the signal
handler, but I'd welcome a look from somebody that knows this code.

I guess one advantage of something like this could be that we could possibly
move the arming of the timeout to pgstat.c. But that looks like it might be
more complicated than really worth it.

I didn't do that yet, but am curious whether others think this would be
preferrable.

BTW, it looks like that patch also falsified this comment
(postgres.c:4478):

* At most one of these timeouts will be active, so there's no need to
* worry about combining the timeout.c calls into one.

Hm, yea. I guess we can just disable them at once.

With the proposed change we don't need to change the separate timeout.c to
one, or update the comment, as it should now look the same as 14.

I also attached my heavily-WIP patches for the ExprEvalStep issues, I
accidentally had only included a small part of the contents of the json fix.

Greetings,

Andres Freund

Attachments:

v2-0001-Add-already-disabled-fastpath-to-disable_timeout.patchtext/x-diff; charset=us-asciiDownload+5-1
v2-0002-WIP-pgstat-reduce-timer-overhead.patchtext/x-diff; charset=us-asciiDownload+31-19
v2-0003-WIP-expression-eval-fix-EEOP_HASHED_SCALARARRAYOP.patchtext/x-diff; charset=us-asciiDownload+14-20
v2-0004-WIP-expression-eval-Fix-EEOP_JSON_CONSTRUCTOR-and.patchtext/x-diff; charset=us-asciiDownload+168-141
#20Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#19)
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

On 2022-06-17 Fr 16:06, Andres Freund wrote:

I also attached my heavily-WIP patches for the ExprEvalStep issues,

Many thanks

I
accidentally had only included a small part of the contents of the json fix.

Yeah, that confused me mightily last week :-)

I and a couple of colleagues have looked it over. As far as it goes the
json fix looks kosher to me. I'll play with it some more.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#21Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#20)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#23)
#25Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#24)
#26David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#15)
#27David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#19)
#28Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#25)
#29Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#24)
#31Andres Freund
andres@anarazel.de
In reply to: David Rowley (#27)
#32David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#31)
#33Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#29)
#34Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#33)
#35Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#33)
#36Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#35)
#37Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andrew Dunstan (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#37)
#39Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#38)
#40Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#39)
#41Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#40)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#40)
#43Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#42)
#44David Rowley
dgrowleyml@gmail.com
In reply to: Amit Langote (#43)
#45Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#44)
#46Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#45)
#47Andrew Dunstan
andrew@dunslane.net
In reply to: Amit Langote (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#47)
#49Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#46)
#50Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#49)
#51Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#50)
#52Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#51)
#53Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#52)
#54Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#53)
#55Jonathan S. Katz
jkatz@postgresql.org
In reply to: Andres Freund (#53)
#56Andrew Dunstan
andrew@dunslane.net
In reply to: Jonathan S. Katz (#55)
#57Jonathan S. Katz
jkatz@postgresql.org
In reply to: Andrew Dunstan (#56)
#58Andres Freund
andres@anarazel.de
In reply to: Jonathan S. Katz (#57)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#57)
#60Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#59)
#61Jonathan S. Katz
jkatz@postgresql.org
In reply to: Andres Freund (#60)
#62Jonathan S. Katz
jkatz@postgresql.org
In reply to: Andres Freund (#58)
#63Andrew Dunstan
andrew@dunslane.net
In reply to: Jonathan S. Katz (#61)
#64Jonathan S. Katz
jkatz@postgresql.org
In reply to: Andrew Dunstan (#63)
#65Andrew Dunstan
andrew@dunslane.net
In reply to: Jonathan S. Katz (#64)
#66Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#58)
#67Jonathan S. Katz
jkatz@postgresql.org
In reply to: Amit Langote (#66)
#68Andres Freund
andres@anarazel.de
In reply to: Jonathan S. Katz (#67)
#69Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
#70Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#69)
#71Corey Huinker
corey.huinker@gmail.com
In reply to: Andres Freund (#70)
#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Corey Huinker (#71)
#73Andres Freund
andres@anarazel.de
In reply to: Corey Huinker (#71)
#74Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#72)
#75Corey Huinker
corey.huinker@gmail.com
In reply to: Andres Freund (#74)