Modernizing our GUC infrastructure

Started by Tom Laneover 3 years ago18 messages

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

4 attachment(s)

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables. I wrote this because
I am starting to question the schema-variables patch [1]/messages/by-id/CAFj8pRD053CY_N4=6SvPe7ke6xPbh=K50LUAOwjC3jm8Me9Obg@mail.gmail.com --- that's
getting to be quite a large patch and I grow less and less sure
that it's solving a problem our users want solved. I think what
people actually want is better support of the existing mechanism
for ad-hoc session variables, namely abusing custom GUCs for that
purpose. One of the big reasons we have been resistant to formally
supporting that is fear of the poor performance guc.c would have
with lots of variables. But we already have quite a lot of them:

regression=# select count(*) from pg_settings;
count
-------
353
(1 row)

and more are getting added all the time. I think this patch series
could likely be justified just in terms of positive effect on core
performance, never mind user-added GUCs.

0001 and 0002 below are concerned with converting guc.c to store its
data in a dedicated memory context (GUCMemoryContext) instead of using
raw malloc(). This is not directly a performance improvement, and
I'm prepared to drop the idea if there's a lot of pushback, but I
think it would be a good thing to do. The only hard reason for using
malloc() there was the lack of ability to avoid throwing elog(ERROR)
on out-of-memory in palloc(). But mcxt.c grew that ability years ago.
Switching to a dedicated context would greatly improve visibility and
accountability of GUC-related allocations. Also, the 0003 patch will
switch guc.c to relying on a palloc-based hashtable, and it seems a
bit silly to have part of the data structure in palloc and part in
malloc. However 0002 is a bit invasive, in that it forces code
changes in GUC check callbacks, if they want to reallocate the new
value or create an "extra" data structure. My feeling is that not
enough external modules use those facilities for this to pose a big
problem. However, the ones that are subject to it will have a
non-fun time tracking down why their code is crashing. (The recent
context-header changes mean that you get a very obscure failure when
trying to pfree() a malloc'd chunk -- for me, that typically ends
in an assertion failure in generation.c. Can we make that less
confusing?)

0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
with a dynahash hash table. (I also looked at simplehash, but it
has no support for not elog'ing on OOM, and it increases the size
of guc.o by 10KB or so.) This fixes the worse-than-O(N^2) time
needed to create N new GUCs, as in

do $$
begin
for i in 1..10000 loop
perform set_config('foo.bar' || i::text, i::text, false);
end loop;
end $$;

On my machine, this takes about 4700 ms in HEAD, versus 23 ms
with this patch. However, the places that were linearly scanning
the array now need to use hash_seq_search, so some other things
like transaction shutdown (AtEOXact_GUC) get slower.

To address that, 0004 adds some auxiliary lists that link together
just the variables that are interesting for specific purposes.
This is helpful even without considering the possibility of a
lot of user-added GUCs: in a typical session, for example, not
many of those 353 GUCs have non-default values, and even fewer
get modified in any one transaction (typically, anyway).

As an example of the speedup from 0004, these DO loops:

create or replace function func_with_set(int) returns int
strict immutable language plpgsql as
$$ begin return $1; end $$
set enable_seqscan = false;

do $$
begin
for i in 1..100000 loop
perform func_with_set(i);
end loop;
end $$;

do $$
begin
for i in 1..100000 loop
begin
perform func_with_set(i);
exception when others then raise;
end;
end loop;
end $$;

take about 260 and 320 ms respectively for me, in HEAD with
just the stock set of variables. But after creating 10000
GUCs with the previous DO loop, they're up to about 3200 ms.
0004 brings that back down to being indistinguishable from the
speed with few GUCs.

So I think this is good cleanup in its own right, plus it
removes one major objection to considering user-defined GUCs
as a supported feature.

regards, tom lane

[1]: /messages/by-id/CAFj8pRD053CY_N4=6SvPe7ke6xPbh=K50LUAOwjC3jm8Me9Obg@mail.gmail.com

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#1)

Re: Modernizing our GUC infrastructure

Hi,

I wrote this because I am starting to question the schema-variables patch
[1] --- that's getting to be quite a large patch and I grow less and less
sure that it's solving a problem our users want solved. --- that's getting
to be quite a large patch and I grow less and less sure that it's solving a
problem our users want solved. I think what people actually want is better
support of the existing mechanism for ad-hoc session variables, namely
abusing custom GUCs for that purpose.

I don't really have an opinion on the highlevel directional question, yet
anyway. But the stuff you're talking about changing in guc.c seem like a good
idea independently.

On 2022-09-05 18:27:46 -0400, Tom Lane wrote:

0001 and 0002 below are concerned with converting guc.c to store its
data in a dedicated memory context (GUCMemoryContext) instead of using
raw malloc(). This is not directly a performance improvement, and
I'm prepared to drop the idea if there's a lot of pushback, but I
think it would be a good thing to do.

+1 - I've been annoyed at this a couple times, even just because it makes it
harder to identify memory leaks etc.

The only hard reason for using
malloc() there was the lack of ability to avoid throwing elog(ERROR)
on out-of-memory in palloc(). But mcxt.c grew that ability years ago.
Switching to a dedicated context would greatly improve visibility and
accountability of GUC-related allocations. Also, the 0003 patch will
switch guc.c to relying on a palloc-based hashtable, and it seems a
bit silly to have part of the data structure in palloc and part in
malloc. However 0002 is a bit invasive, in that it forces code
changes in GUC check callbacks, if they want to reallocate the new
value or create an "extra" data structure. My feeling is that not
enough external modules use those facilities for this to pose a big
problem. However, the ones that are subject to it will have a
non-fun time tracking down why their code is crashing.

That sucks, but I think it's a bullet we're going to have to bite at some
point.

Perhaps we could do something like checking MemoryContextContains() and assert
if not allocated in the right context? That way the crash is at least
obvious. Or perhaps even transparently reallocate in that case? It does look
like MemoryContextContains() currently is broken, I've raised that in the
other thread.

(The recent context-header changes mean that you get a very obscure failure
when trying to pfree() a malloc'd chunk -- for me, that typically ends in an
assertion failure in generation.c. Can we make that less confusing?)

Hm. We can do better in assert builds, but I'm not sure we want to add the
overhead of explicit checks in normal builds, IIRC David measured the overhead
of additional branches in pfree, and it was noticable.

0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
with a dynahash hash table. (I also looked at simplehash, but it
has no support for not elog'ing on OOM, and it increases the size
of guc.o by 10KB or so.)

Dynahash seems reasonable here. Hard to believe raw lookup speed is a relevant
bottleneck and due to the string names the key would be pretty wide (could
obviously just be done via pointer, but then the locality benefits aren't as
big).

However, the places that were linearly scanning the array now need to use
hash_seq_search, so some other things like transaction shutdown
(AtEOXact_GUC) get slower.

To address that, 0004 adds some auxiliary lists that link together
just the variables that are interesting for specific purposes.

Seems sane.

It's only half related, but since we're talking about renovating guc.c: I
think it'd be good if we split the list of GUCs from the rest of the guc
machinery. Both for humans and compilers it's getting pretty large. And
commonly one either wants to edit the definition of GUCs or wants to edit the
GUC machinery.

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#2)

Re: Modernizing our GUC infrastructure

Andres Freund <andres@anarazel.de> writes:

It's only half related, but since we're talking about renovating guc.c: I
think it'd be good if we split the list of GUCs from the rest of the guc
machinery. Both for humans and compilers it's getting pretty large. And
commonly one either wants to edit the definition of GUCs or wants to edit the
GUC machinery.

I don't mind doing that, but it seems like an independent patch.

regards, tom lane

Junwang Zhao

zhjwpku@gmail.com

over 3 years ago

In reply to: Tom Lane (#1)

Re: Modernizing our GUC infrastructure

Hi Tom,

@@ -5836,74 +5865,106 @@ build_guc_variables(void)
}

/*
- * Create table with 20% slack
+ * Create hash table with 20% slack
*/
size_vars = num_vars + num_vars / 4;

Should we change 20% to 25%, I thought that might be
a typo.

On Tue, Sep 6, 2022 at 6:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables. I wrote this because
I am starting to question the schema-variables patch [1] --- that's
getting to be quite a large patch and I grow less and less sure
that it's solving a problem our users want solved. I think what
people actually want is better support of the existing mechanism
for ad-hoc session variables, namely abusing custom GUCs for that
purpose. One of the big reasons we have been resistant to formally
supporting that is fear of the poor performance guc.c would have
with lots of variables. But we already have quite a lot of them:

regression=# select count(*) from pg_settings;
count
-------
353
(1 row)

and more are getting added all the time. I think this patch series
could likely be justified just in terms of positive effect on core
performance, never mind user-added GUCs.

0001 and 0002 below are concerned with converting guc.c to store its
data in a dedicated memory context (GUCMemoryContext) instead of using
raw malloc(). This is not directly a performance improvement, and
I'm prepared to drop the idea if there's a lot of pushback, but I
think it would be a good thing to do. The only hard reason for using
malloc() there was the lack of ability to avoid throwing elog(ERROR)
on out-of-memory in palloc(). But mcxt.c grew that ability years ago.
Switching to a dedicated context would greatly improve visibility and
accountability of GUC-related allocations. Also, the 0003 patch will
switch guc.c to relying on a palloc-based hashtable, and it seems a
bit silly to have part of the data structure in palloc and part in
malloc. However 0002 is a bit invasive, in that it forces code
changes in GUC check callbacks, if they want to reallocate the new
value or create an "extra" data structure. My feeling is that not
enough external modules use those facilities for this to pose a big
problem. However, the ones that are subject to it will have a
non-fun time tracking down why their code is crashing. (The recent
context-header changes mean that you get a very obscure failure when
trying to pfree() a malloc'd chunk -- for me, that typically ends
in an assertion failure in generation.c. Can we make that less
confusing?)

0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
with a dynahash hash table. (I also looked at simplehash, but it
has no support for not elog'ing on OOM, and it increases the size
of guc.o by 10KB or so.) This fixes the worse-than-O(N^2) time
needed to create N new GUCs, as in

do $$
begin
for i in 1..10000 loop
perform set_config('foo.bar' || i::text, i::text, false);
end loop;
end $$;

On my machine, this takes about 4700 ms in HEAD, versus 23 ms
with this patch. However, the places that were linearly scanning
the array now need to use hash_seq_search, so some other things
like transaction shutdown (AtEOXact_GUC) get slower.

To address that, 0004 adds some auxiliary lists that link together
just the variables that are interesting for specific purposes.
This is helpful even without considering the possibility of a
lot of user-added GUCs: in a typical session, for example, not
many of those 353 GUCs have non-default values, and even fewer
get modified in any one transaction (typically, anyway).

As an example of the speedup from 0004, these DO loops:

create or replace function func_with_set(int) returns int
strict immutable language plpgsql as
$$ begin return $1; end $$
set enable_seqscan = false;

do $$
begin
for i in 1..100000 loop
perform func_with_set(i);
end loop;
end $$;

do $$
begin
for i in 1..100000 loop
begin
perform func_with_set(i);
exception when others then raise;
end;
end loop;
end $$;

take about 260 and 320 ms respectively for me, in HEAD with
just the stock set of variables. But after creating 10000
GUCs with the previous DO loop, they're up to about 3200 ms.
0004 brings that back down to being indistinguishable from the
speed with few GUCs.

So I think this is good cleanup in its own right, plus it
removes one major objection to considering user-defined GUCs
as a supported feature.

regards, tom lane

[1] /messages/by-id/CAFj8pRD053CY_N4=6SvPe7ke6xPbh=K50LUAOwjC3jm8Me9Obg@mail.gmail.com

--
Regards
Junwang Zhao

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Junwang Zhao (#4)

Re: Modernizing our GUC infrastructure

Junwang Zhao <zhjwpku@gmail.com> writes:

/*
- * Create table with 20% slack
+ * Create hash table with 20% slack
*/
size_vars = num_vars + num_vars / 4;

Should we change 20% to 25%, I thought that might be
a typo.

No ... 20% of the allocated space is spare.

regards, tom lane

Junwang Zhao

zhjwpku@gmail.com

over 3 years ago

In reply to: Tom Lane (#5)

Re: Modernizing our GUC infrastructure

ah, yes, that makes sense ;)

On Tue, Sep 6, 2022 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Junwang Zhao <zhjwpku@gmail.com> writes:

/*
- * Create table with 20% slack
+ * Create hash table with 20% slack
*/
size_vars = num_vars + num_vars / 4;

Should we change 20% to 25%, I thought that might be
a typo.

No ... 20% of the allocated space is spare.

regards, tom lane

--
Regards
Junwang Zhao

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Tom Lane (#1)

Re: Modernizing our GUC infrastructure

út 6. 9. 2022 v 0:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables. I wrote this because
I am starting to question the schema-variables patch [1] --- that's
getting to be quite a large patch and I grow less and less sure
that it's solving a problem our users want solved. I think what
people actually want is better support of the existing mechanism
for ad-hoc session variables, namely abusing custom GUCs for that
purpose. One of the big reasons we have been resistant to formally
supporting that is fear of the poor performance guc.c would have
with lots of variables. But we already have quite a lot of them:

The bad performance is not the main reason for implementing session
variables (and in almost all cases the performance of GUC is not a problem,
because it is not a bottleneck, and in some terrible cases, I can save the
GUC to a variable). There are other differences:

1. Session variables can be persistent - so the usage of session variables
can be checked by static analyze like plpgsql_check

2. Session variables supports not atomic data types - so the work with row
types or arrays is much more comfortable and faster, because there is no
conversion string <-> binary

3. Session variables allows to set access rights

4. Session variables are nullable and allowed to specify default values.

I don't think so users have ten thousand GUC and the huge count of GUC is
the main performance problem. The source of the performance problem is
storing the value only as string.

Regards

Pavel

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#7)

Re: Modernizing our GUC infrastructure

út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

út 6. 9. 2022 v 0:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables. I wrote this because
I am starting to question the schema-variables patch [1] --- that's
getting to be quite a large patch and I grow less and less sure
that it's solving a problem our users want solved. I think what
people actually want is better support of the existing mechanism
for ad-hoc session variables, namely abusing custom GUCs for that
purpose. One of the big reasons we have been resistant to formally
supporting that is fear of the poor performance guc.c would have
with lots of variables. But we already have quite a lot of them:

The bad performance is not the main reason for implementing session
variables (and in almost all cases the performance of GUC is not a problem,
because it is not a bottleneck, and in some terrible cases, I can save the
GUC to a variable). There are other differences:

1. Session variables can be persistent - so the usage of session variables
can be checked by static analyze like plpgsql_check

more precious - metadata of session variables are persistent

Show quoted text

2. Session variables supports not atomic data types - so the work with row
types or arrays is much more comfortable and faster, because there is no
conversion string <-> binary

3. Session variables allows to set access rights

4. Session variables are nullable and allowed to specify default values.

I don't think so users have ten thousand GUC and the huge count of GUC is
the main performance problem. The source of the performance problem is
storing the value only as string.

Regards

Pavel

Julien Rouhaud

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#7)

Re: Modernizing our GUC infrastructure

Hi,

On Tue, Sep 06, 2022 at 06:32:21AM +0200, Pavel Stehule wrote:

Hi

Ãºt 6. 9. 2022 v 0:28 odesÃlatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables. I wrote this because
I am starting to question the schema-variables patch [1] --- that's
getting to be quite a large patch and I grow less and less sure
that it's solving a problem our users want solved. I think what
people actually want is better support of the existing mechanism
for ad-hoc session variables, namely abusing custom GUCs for that
purpose. One of the big reasons we have been resistant to formally
supporting that is fear of the poor performance guc.c would have
with lots of variables. But we already have quite a lot of them:

The bad performance is not the main reason for implementing session
variables (and in almost all cases the performance of GUC is not a problem,
because it is not a bottleneck, and in some terrible cases, I can save the
GUC to a variable). There are other differences:

1. Session variables metadata can be persistent - so the usage of session
variables can be checked by static analyze like plpgsql_check

2. Session variables supports not atomic data types - so the work with row
types or arrays is much more comfortable and faster, because there is no
conversion string <-> binary

3. Session variables allows to set access rights

4. Session variables are nullable and allowed to specify default values.

I don't think so users have ten thousand GUC and the huge count of GUC is
the main performance problem. The source of the performance problem is
storing the value only as string.

I think we can also mention those differences with the proposed schema
variables:

- schema variables have normal SQL integration, having to use current_setting()
isn't ideal (on top of only supporting text) and doesn't really play nice
with pg_stat_statements for instance

- schema variables implement stability in a single SQL statement (not in
plpgsql), while current_setting always report the latest set value. This one
may or may not be wanted, and maybe the discrepancy with procedural languages
would be too problematic, but it's still something proposed

#10

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Pavel Stehule (#7)

Re: Modernizing our GUC infrastructure

Pavel Stehule <pavel.stehule@gmail.com> writes:

The bad performance is not the main reason for implementing session
variables (and in almost all cases the performance of GUC is not a problem,
because it is not a bottleneck, and in some terrible cases, I can save the
GUC to a variable). There are other differences:

Well, yeah, the schema-variables patch offers a bunch of other features.
What I'm not sure about is whether there's actually much field demand
for those. I think if we fix guc.c's performance issues and add some
simple features on top of that, like the ability to declare bool, int,
float data types not just string for a user-defined GUC, we'd have
exactly what a lot of people want --- not least because it'd be
upwards-compatible with what they are already doing.

However, that's probably a debate to have on the other thread not here.
This patch doesn't foreclose pushing forward with the schema-variables
patch, if people want that.

regards, tom lane

#11

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Pavel Stehule (#8)

Re: Modernizing our GUC infrastructure

Pavel Stehule <pavel.stehule@gmail.com> writes:

Ãºt 6. 9. 2022 v 6:32 odesÃlatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

1. Session variables can be persistent - so the usage of session variables
can be checked by static analyze like plpgsql_check

more precious - metadata of session variables are persistent

Right ... so the question is, is that a feature or a bug?

I think there's a good analogy here to temporary tables. The SQL
spec says that temp-table schemas are persistent and database-wide,
but what we actually have is that they are session-local. People
occasionally propose that we implement the SQL semantics for that,
but in the last twenty-plus years no one has bothered to write a
committable patch to support it ... much less remove the existing
behavior in favor of that, which I'm pretty sure no one would think
is a good idea.

So, is it actually a good idea to have persistent metadata for
session variables? I'd say that the issue is at best debatable,
and at worst proven wrong by a couple of decades of experience.
In what way are session variables less mutable than temp tables?

Still, this discussion would be better placed on the other thread.

regards, tom lane

#12

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Tom Lane (#11)

Re: Modernizing our GUC infrastructure

út 6. 9. 2022 v 7:42 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

út 6. 9. 2022 v 6:32 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

1. Session variables can be persistent - so the usage of session

variables

can be checked by static analyze like plpgsql_check

more precious - metadata of session variables are persistent

Right ... so the question is, is that a feature or a bug?

I think there's a good analogy here to temporary tables. The SQL
spec says that temp-table schemas are persistent and database-wide,
but what we actually have is that they are session-local. People
occasionally propose that we implement the SQL semantics for that,
but in the last twenty-plus years no one has bothered to write a
committable patch to support it ... much less remove the existing
behavior in favor of that, which I'm pretty sure no one would think
is a good idea.

So, is it actually a good idea to have persistent metadata for
session variables? I'd say that the issue is at best debatable,
and at worst proven wrong by a couple of decades of experience.
In what way are session variables less mutable than temp tables?

The access pattern is very different. The session variable is like the temp
table with exactly one row. It reduces a lot of overheads with storage (for
reading, for writing).

For example, the minimum size of an empty temp table is 8KB. You can store
all "like" session values to one temp table, but then there will be brutal
overhead with reading.

Still, this discussion would be better placed on the other thread.

sure - faster GUC is great - there are a lot of applications that overuse
GUC, because there are no other solutions now. But I don't think so it is
good solution when somebody need some like global variables in procedural
code. And the design of session variables is more wide.

Regards

Pavel

Show quoted text

regards, tom lane

#13

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Tom Lane (#11)

Re: Modernizing our GUC infrastructure

On Tue, Sep 6, 2022 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's a good analogy here to temporary tables. The SQL
spec says that temp-table schemas are persistent and database-wide,
but what we actually have is that they are session-local. People
occasionally propose that we implement the SQL semantics for that,
but in the last twenty-plus years no one has bothered to write a
committable patch to support it ... much less remove the existing
behavior in favor of that, which I'm pretty sure no one would think
is a good idea.

Well, I've thought about doing this a few times, but it's a real pain
in the neck, primarily because we store metadata that needs to be
per-instantiation in the catalog rows: relfrozenxid, relminmxid, and
the relation statistics. So I'm not sure "no one has bothered" is
quite the right way to characterize it. "no one has been able to
adequately untangle the mess" might be more accurate.

So, is it actually a good idea to have persistent metadata for
session variables? I'd say that the issue is at best debatable,
and at worst proven wrong by a couple of decades of experience.
In what way are session variables less mutable than temp tables?

I haven't looked at that patch at all, but I would assume that
variables would have SQL types, and that we would never add GUCs with
SQL types, which seems like a pretty major semantic difference.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Robert Haas (#13)

Re: Modernizing our GUC infrastructure

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 6, 2022 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think there's a good analogy here to temporary tables. The SQL
spec says that temp-table schemas are persistent and database-wide,
but what we actually have is that they are session-local.

Well, I've thought about doing this a few times, but it's a real pain
in the neck, primarily because we store metadata that needs to be
per-instantiation in the catalog rows: relfrozenxid, relminmxid, and
the relation statistics. So I'm not sure "no one has bothered" is
quite the right way to characterize it. "no one has been able to
adequately untangle the mess" might be more accurate.

I could agree on "no one has thought it was worth the work". It could
be made to happen if we were sufficiently motivated, but we aren't.
I believe a big chunk of the reason is that the SQL semantics are not
obviously better than what we have. And some of the advantages they
do have, like less catalog thrashing, wouldn't apply in the session
variable case.

I haven't looked at that patch at all, but I would assume that
variables would have SQL types, and that we would never add GUCs with
SQL types, which seems like a pretty major semantic difference.

Yeah, I do not think we'd want to extend GUCs beyond the existing
bool/int/float/string cases, since they have to be readable under
non-transactional circumstances. Having said that, that covers
an awful lot of practical territory. Schema variables of
arbitrary SQL types sound cool, sure, but how many real use cases
are there that can't be met with the GUC types?

I think a large part of the reason the schema-variables patch
has gone sideways for so many years is that it's an ambitious
overdesign.

regards, tom lane

#15

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Tom Lane (#14)

Re: Modernizing our GUC infrastructure

I think a large part of the reason the schema-variables patch
has gone sideways for so many years is that it's an ambitious
overdesign.

Last two weeks this patch is shorter and shorter. I removed a large part
related to check of type consistency, because I can do this check more
easily - and other work is done by dependencies.

Big thanks to Julien - it does a lot of work and he shows me a lot of
issues and possibilities on how to fix it. With Julien work this patch
moved forward. Years before it was just a prototype.

This patch is not too complex - important part is session_variable.c with
1500 lines , and it is almost simple code - store value to hashtab, and
cleaning hash tab on sinval or on transaction end or abort + debug routine.

[pavel@localhost commands]$ cloc session_variable.c
1 text file.
1 unique file.
0 files ignored.

github.com/AlDanial/cloc v 1.90 T=0.02 s (50.0 files/s, 77011.1 lines/s)
-------------------------------------------------------------------------------
Language files blank comment
code
-------------------------------------------------------------------------------
C 1 257 463
820
-------------------------------------------------------------------------------

In other files there are +/- mechanical code

Show quoted text

regards, tom lane

#16

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Tom Lane (#14)

Re: Modernizing our GUC infrastructure

On Tue, Sep 6, 2022 at 10:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I haven't looked at that patch at all, but I would assume that
variables would have SQL types, and that we would never add GUCs with
SQL types, which seems like a pretty major semantic difference.

Yeah, I do not think we'd want to extend GUCs beyond the existing
bool/int/float/string cases, since they have to be readable under
non-transactional circumstances. Having said that, that covers
an awful lot of practical territory. Schema variables of
arbitrary SQL types sound cool, sure, but how many real use cases
are there that can't be met with the GUC types?

Well, if you use an undefined custom GUC, you're just going to get a
string data type, I believe, which is pretty well equivalent to not
having any type checking at all. You could extend that in some way to
allow users to create dummy GUCs of any type supported by the
mechanism, but I think that's mostly stacking one hack on top of
another. I believe there's good evidence that users want variables
based on SQL data types, whereas I can't see any reason why users
would variables based on GUC data types. It is of course true that the
GUC data types cover the cases people are mostly likely to want, but
that's just because it covers the most generally useful data types. If
you can want to pass an integer between one part of your application
and another, why can't you want to pass a numeric or a bytea? I think
you can, and I think people do.

This is not really an endorsement of the SQL variables patch, which I
haven't studied and which for all I know may have lots of problems,
either as to design or as to implementation. But I think it's a little
crazy to pretend that the ability to store strings - or even values of
any GUC type - into a fictional GUC is an adequate substitute for SQL
variables. Honestly, the fact that you can do that in the first place
seems more like an undesirable wart necessitated by the way loadable
modules interact with the GUC system than a feature -- but even if it
were a feature, it's not the same feature.

--
Robert Haas
EDB: http://www.enterprisedb.com

#17

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Tom Lane (#1)

4 attachment(s)

Re: Modernizing our GUC infrastructure

I wrote:

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables.

Rebased over 0a20ff54f.

regards, tom lane

#18

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Tom Lane (#17)

4 attachment(s)

Re: Modernizing our GUC infrastructure

I wrote:

Attached is a patch series that attempts to modernize our GUC
infrastructure, in particular removing the performance bottlenecks
it has when there are lots of GUC variables.

Rebased over 0a20ff54f.

Here's a v3 rebased up to HEAD. The only real change is that I added
a couple of "Assert(GetMemoryChunkContext(ptr) == GUCMemoryContext)"
checks in hopes of improving detection of not-updated code that is
still using malloc/free where it should be using guc_malloc/guc_free.
This is per the nearby discussion of whether the mcxt.c infrastructure
could recognize that [1]/messages/by-id/2910981.1665080361@sss.pgh.pa.us. I experimented a bit with leaving out parts
of the 0002 patch to simulate such mistakes, and at least on a Linux
box that seems to produce fairly intelligible errors now. In the case
of free'ing a palloc'd pointer, what you get is a message from glibc
followed by abort(), so their error detection is pretty solid too.

I'm feeling pretty good about this patchset now. Does anyone want
to review it further?

regards, tom lane

[1]: /messages/by-id/2910981.1665080361@sss.pgh.pa.us

Participants

Thread Outline

Modernizing our GUC infrastructure

Attachments:

Attachments:

Attachments: