plpgsql versus domains

Started by Tom Laneover 11 years ago16 messageshackers

tgl@sss.pgh.pa.us

over 11 years ago

At the behest of Salesforce, I've been looking into improving plpgsql's
handling of variables of domain types, which is currently pretty awful.
It's really slow, because any assignment to a domain variable goes through
the slow double-I/O-conversion path in exec_cast_value, even if no actual
work is needed. And I noticed that the domain's constraints get looked up
only once per function per session; for example this script gets
unexpected results:

create domain di as int;

create function foo1(int) returns di as $$
declare d di;
begin
d := $1;
return d;
end
$$ language plpgsql immutable;

select foo1(0); -- call once to set up cache

alter domain di add constraint pos check (value > 0);

select 0::di; -- fails, as expected

select foo1(0); -- should fail, does not

\c -

select foo1(0); -- now it fails

The reason for this is that domain_in looks up the domain's constraints
and caches them under the calling FmgrInfo struct. That behavior was
designed to ensure we'd do just one constraint-lookup per query, which
I think is reasonable. But plpgsql sets up an FmgrInfo in the variable's
PLpgSQL_type record, which persists for the whole session unless the
function's parse tree is flushed for some reason. So the constraints
only get looked up once.

The rough sketch I have in mind for fixing this is:

1. Arrange to cache the constraints for domain types in typcache.c,
so as to reduce the number of trips to the actual catalogs. I think
typcache.c will have to flush these cache items upon seeing any sinval
change in pg_constraint, but that's still cheaper than searching
pg_constraint for every query.

2. In plpgsql, explicitly model a domain type as being its base type
plus some constraints --- that is, stop depending on domain_in() to
enforce the constraints, and do it ourselves. This would mean that
the FmgrInfo in a PLpgSQL_type struct would reference the base type's
input function rather than domain_in, so we'd not have to be afraid
to cache that. The constraints would be represented as a separate list,
which we'd arrange to fetch from the typcache once per transaction.
(I think checking for new constraints once per transaction is sufficient
for reasonable situations, though I suppose that could be argued about.)
The advantage of this approach is first that we could avoid an I/O
conversion when the incoming value to be assigned matches the domain's
base type, which is the typical case; and second that a domain without
any CHECK constraints would become essentially free, which is also a
common case.

3. We could still have domains.c responsible for most of the details;
the domain_check() API may be good enough as-is, though it seems to lack
any simple way to force re-lookup of the domain constraints once per xact.

Thoughts, better ideas?

Given the lack of field complaints, I don't feel a need to try to
back-patch a fix for this, but I do want to see it fixed for 9.5.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 11 years ago

In reply to: Tom Lane (#1)

Re: plpgsql versus domains

2015-02-26 18:31 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

At the behest of Salesforce, I've been looking into improving plpgsql's
handling of variables of domain types, which is currently pretty awful.
It's really slow, because any assignment to a domain variable goes through
the slow double-I/O-conversion path in exec_cast_value, even if no actual
work is needed. And I noticed that the domain's constraints get looked up
only once per function per session; for example this script gets
unexpected results:

create domain di as int;

create function foo1(int) returns di as $$
declare d di;
begin
d := $1;
return d;
end
$$ language plpgsql immutable;

select foo1(0); -- call once to set up cache

alter domain di add constraint pos check (value > 0);

select 0::di; -- fails, as expected

select foo1(0); -- should fail, does not

\c -

select foo1(0); -- now it fails

The reason for this is that domain_in looks up the domain's constraints
and caches them under the calling FmgrInfo struct. That behavior was
designed to ensure we'd do just one constraint-lookup per query, which
I think is reasonable. But plpgsql sets up an FmgrInfo in the variable's
PLpgSQL_type record, which persists for the whole session unless the
function's parse tree is flushed for some reason. So the constraints
only get looked up once.

The rough sketch I have in mind for fixing this is:

1. Arrange to cache the constraints for domain types in typcache.c,
so as to reduce the number of trips to the actual catalogs. I think
typcache.c will have to flush these cache items upon seeing any sinval
change in pg_constraint, but that's still cheaper than searching
pg_constraint for every query.

2. In plpgsql, explicitly model a domain type as being its base type
plus some constraints --- that is, stop depending on domain_in() to
enforce the constraints, and do it ourselves. This would mean that
the FmgrInfo in a PLpgSQL_type struct would reference the base type's
input function rather than domain_in, so we'd not have to be afraid
to cache that. The constraints would be represented as a separate list,
which we'd arrange to fetch from the typcache once per transaction.
(I think checking for new constraints once per transaction is sufficient
for reasonable situations, though I suppose that could be argued about.)
The advantage of this approach is first that we could avoid an I/O
conversion when the incoming value to be assigned matches the domain's
base type, which is the typical case; and second that a domain without
any CHECK constraints would become essentially free, which is also a
common case.

I like this variant

There can be some good optimization with scalar types: text, varchars,
numbers and reduce IO cast.

3. We could still have domains.c responsible for most of the details;
the domain_check() API may be good enough as-is, though it seems to lack
any simple way to force re-lookup of the domain constraints once per xact.

Thoughts, better ideas?

Given the lack of field complaints, I don't feel a need to try to
back-patch a fix for this, but I do want to see it fixed for 9.5.

Regards

Pavel

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Tom Lane (#1)

Re: plpgsql versus domains

Hi,

On 2015-02-26 12:31:46 -0500, Tom Lane wrote:

It's really slow, because any assignment to a domain variable goes through
the slow double-I/O-conversion path in exec_cast_value, even if no actual
work is needed.

Hm, that's surely not nice for some types. Doing syscache lookups every
time can't help either.

And I noticed that the domain's constraints get looked up
only once per function per session; for example this script gets
unexpected results:
The reason for this is that domain_in looks up the domain's constraints
and caches them under the calling FmgrInfo struct.

That's probably far from the only such case we have :(

1. Arrange to cache the constraints for domain types in typcache.c,
so as to reduce the number of trips to the actual catalogs. I think
typcache.c will have to flush these cache items upon seeing any sinval
change in pg_constraint, but that's still cheaper than searching
pg_constraint for every query.

That sounds sane.

2. In plpgsql, explicitly model a domain type as being its base type
plus some constraints --- that is, stop depending on domain_in() to
enforce the constraints, and do it ourselves. This would mean that
the FmgrInfo in a PLpgSQL_type struct would reference the base type's
input function rather than domain_in, so we'd not have to be afraid
to cache that. The constraints would be represented as a separate list,
which we'd arrange to fetch from the typcache once per transaction.

Hm. A bit sad to open code that in plpgsql instead of making it
available more generally.

3. We could still have domains.c responsible for most of the details;
the domain_check() API may be good enough as-is, though it seems to lack
any simple way to force re-lookup of the domain constraints once per xact.

Thoughts, better ideas?

You're probably going to kill me because of the increased complexity,
but how about making typecache.c smarter, more in the vein of
relcache.c, and store the relevant info in there? And then, to avoid
repeated lookups, store a reference to that in fcinfo? The lifetime of
objects wouldn't be entirely trivial, but it doesn't sound impossible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Andres Freund (#3)

Re: plpgsql versus domains

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-26 12:31:46 -0500, Tom Lane wrote:

2. In plpgsql, explicitly model a domain type as being its base type
plus some constraints --- that is, stop depending on domain_in() to
enforce the constraints, and do it ourselves. This would mean that
the FmgrInfo in a PLpgSQL_type struct would reference the base type's
input function rather than domain_in, so we'd not have to be afraid
to cache that. The constraints would be represented as a separate list,
which we'd arrange to fetch from the typcache once per transaction.

Hm. A bit sad to open code that in plpgsql instead of making it
available more generally.

The actual checking wouldn't be open-coded, it would come from
domain_check() (or possibly a modified version of that). It is true
that plpgsql would know more about domains than it does today, but
I'm not sure I see another use-case for this type of logic.

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the parser's
cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to fight
two compatibility battles concurrently ;-)

You're probably going to kill me because of the increased complexity,
but how about making typecache.c smarter, more in the vein of
relcache.c, and store the relevant info in there?

I thought that's what I was proposing in point #1.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Tom Lane (#4)

Re: plpgsql versus domains

On 2015-02-26 13:53:18 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

Hm. A bit sad to open code that in plpgsql instead of making it
available more generally.

The actual checking wouldn't be open-coded, it would come from
domain_check() (or possibly a modified version of that). It is true
that plpgsql would know more about domains than it does today, but

It'd still teach plpgsql more about types than it knows right now. But
on a second thought that ship has sailed long ago - and the amount of
knowledge seems relatively small. There's much more stuff about
composites there already.

and I'm not planning to fight two compatibility battles concurrently ;-)

Heh ;)

You're probably going to kill me because of the increased complexity,
but how about making typecache.c smarter, more in the vein of
relcache.c, and store the relevant info in there?

I thought that's what I was proposing in point #1.

Sure, but my second point was to avoid duplicating that information into
->fcinfo and such and instead reference the typecache entry from
there. So that if the typecache entry is being rebuilt (a new mechanism
I'm afraid), whatever uses ->fcinfo gets the updated
functions/definition.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 11 years ago

In reply to: Tom Lane (#4)

Re: plpgsql versus domains

2015-02-26 19:53 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-26 12:31:46 -0500, Tom Lane wrote:

2. In plpgsql, explicitly model a domain type as being its base type
plus some constraints --- that is, stop depending on domain_in() to
enforce the constraints, and do it ourselves. This would mean that
the FmgrInfo in a PLpgSQL_type struct would reference the base type's
input function rather than domain_in, so we'd not have to be afraid
to cache that. The constraints would be represented as a separate list,
which we'd arrange to fetch from the typcache once per transaction.

Hm. A bit sad to open code that in plpgsql instead of making it
available more generally.

The actual checking wouldn't be open-coded, it would come from
domain_check() (or possibly a modified version of that). It is true
that plpgsql would know more about domains than it does today, but
I'm not sure I see another use-case for this type of logic.

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the parser's
cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to fight
two compatibility battles concurrently ;-)

I understand, but in this case, a compatibility can be enforced simply - we
can support "a only know cast" mode and "IO cast" mode.

IO cast is simple for lot of people, but there is a lot of performance
issues and few errors related to this topic. But this is offtopic now.

But, what can be related - for plpgsql_check is necessary some simple check
of a assigning - that should to return error or warning

This part of plpgsql_check is too complex now.

Regards

Pavel

Show quoted text

You're probably going to kill me because of the increased complexity,
but how about making typecache.c smarter, more in the vein of
relcache.c, and store the relevant info in there?

I thought that's what I was proposing in point #1.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Andres Freund (#5)

Re: plpgsql versus domains

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-26 13:53:18 -0500, Tom Lane wrote:

I thought that's what I was proposing in point #1.

Sure, but my second point was to avoid duplicating that information into
->fcinfo and such and instead reference the typecache entry from
there. So that if the typecache entry is being rebuilt (a new mechanism
I'm afraid), whatever uses ->fcinfo gets the updated
functions/definition.

The trick there would be that if we don't want to copy then we'd need a
reference counting mechanism, and FmgrInfos aren't used in a way that
would allow that to work easily. (Whatever the typcache is holding onto
clearly must be long-lived, but you do want an obsoleted one to go away
once there are no more FmgrInfos referencing it.)

I was just thinking though that we could possibly solve that if we went
ahead and invented the memory context reset callback mechanism that
I suggested in another thread a week or two back. Then you could imagine
that when a domain-checking function caches a reference to a "domain
constraints info" object in its FmgrInfo, it could increment a refcount
on the info object, and register a callback on the context containing the
FmgrInfo to release that refcount.

A different approach would be to try to use the ResourceOwner mechanism
to track these info objects. But I think that does not work nicely for
plpgsql; at least not unless it creates a ResourceOwner for every
function, which seems kinda messy.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Tom Lane (#4)

Re: plpgsql versus domains

On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the parser's
cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to fight
two compatibility battles concurrently ;-)

A sensible plan, but since we're here:

- I do agree that changing the way PL/pgsql does those type
conversions is scary. I can't predict what will break, and there's no
getting around the scariness of that.

- On the other hand, I do think it would be good to change it, because
this sucks:

rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric); end $$;
ERROR: invalid input syntax for integer: "3.0000000000000000"
CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment

I didn't realize that this issue had even been discussed before...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Robert Haas (#8)

Re: plpgsql versus domains

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the parser's
cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to fight
two compatibility battles concurrently ;-)

A sensible plan, but since we're here:

- I do agree that changing the way PL/pgsql does those type
conversions is scary. I can't predict what will break, and there's no
getting around the scariness of that.

- On the other hand, I do think it would be good to change it, because
this sucks:

rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric); end $$;
ERROR: invalid input syntax for integer: "3.0000000000000000"
CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment

If we did want to open that can of worms, my proposal would be to make
plpgsql type conversions work like so:

* If there is a cast pathway known to the core SQL parser, use that
mechanism.

* Otherwise, attempt to convert via I/O as we do today.

This seems to minimize the risk of breaking things, although there would
probably be corner cases that work differently (for instance I bet boolean
to text might turn out differently). In the very long run we could perhaps
deprecate and remove the second phase.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Pavel Stehule

pavel.stehule@gmail.com

over 11 years ago

In reply to: Tom Lane (#9)

Re: plpgsql versus domains

2015-02-27 21:57 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the

parser's

cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to

fight

two compatibility battles concurrently ;-)

A sensible plan, but since we're here:

- I do agree that changing the way PL/pgsql does those type
conversions is scary. I can't predict what will break, and there's no
getting around the scariness of that.

- On the other hand, I do think it would be good to change it, because
this sucks:

rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric);

end $$;

ERROR: invalid input syntax for integer: "3.0000000000000000"
CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment

If we did want to open that can of worms, my proposal would be to make
plpgsql type conversions work like so:

* If there is a cast pathway known to the core SQL parser, use that
mechanism.

* Otherwise, attempt to convert via I/O as we do today.

This seems to minimize the risk of breaking things, although there would
probably be corner cases that work differently (for instance I bet boolean
to text might turn out differently). In the very long run we could perhaps
deprecate and remove the second phase.

There can be similar solution like plpgsql/sql identifiers priority
configuration. Some levels - and what you are proposing should be default.

It works perfectly - and from my view and what I know from my neighborhood,
there are no issues.

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andres Freund (#5)

Re: plpgsql versus domains

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-26 13:53:18 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

You're probably going to kill me because of the increased complexity,
but how about making typecache.c smarter, more in the vein of
relcache.c, and store the relevant info in there?

I thought that's what I was proposing in point #1.

Sure, but my second point was to avoid duplicating that information into
->fcinfo and such and instead reference the typecache entry from
there. So that if the typecache entry is being rebuilt (a new mechanism
I'm afraid), whatever uses ->fcinfo gets the updated
functions/definition.

Here's a draft patch for that. This fixes the delayed-domain-constraint
problem pretty thoroughly, in that any attempt to check a domain's
constraints will use fully up-to-date info.

This is the first attempt at weaponizing the memory context reset/delete
feature, and I'm fairly happy with it, except for one thing: I had to
#include utils/memnodes.h into typcache.h in order to preserve the
intended property that the callback structs could be included directly
into structs using them. Now, that's not awful in itself, because
typcache.h isn't used everywhere; but if this feature gets popular we'll
find memnodes.h being included pretty much everywhere, which is not so
great. I'm thinking about moving struct MemoryContextCallback and the
extern for MemoryContextRegisterResetCallback into palloc.h to avoid this.
Maybe that's too far in the other direction. Thoughts? Compromise ideas?

Also, instrumenting the code showed that TypeCacheConstrCallback gets
called quite a lot during the standard regression tests, which is why
I went out of my way to make it quick. Almost all of those cache flushes
are from non-domain-related updates on pg_type or pg_constraint, so
they're not really necessary from a logical perspective, and they're
surely going to hurt performance for heavy users of domains. I think to
fix this we'd have to split pg_constraint into two catalogs, one for table
constraints and one for domain constraints; which would be a good thing
anyway from a normal-form-theory perspective. And we'd have to get rid of
pg_type.typnotnull and instead store domain NOT NULL constraints in this
hypothetical domain constraint catalog. I don't plan to do anything about
that myself right now, because I'm not sure that production databases
would have the kind of traffic on pg_type and pg_constraint that the
regression tests exhibit. But at some point we might have to fix it.

regards, tom lane

#12

Andrew Gierth

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Tom Lane (#11)

Re: plpgsql versus domains

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> This is the first attempt at weaponizing the memory context
Tom> reset/delete feature, and I'm fairly happy with it, except for one
Tom> thing: I had to #include utils/memnodes.h into typcache.h in order
Tom> to preserve the intended property that the callback structs could
Tom> be included directly into structs using them. Now, that's not
Tom> awful in itself, because typcache.h isn't used everywhere; but if
Tom> this feature gets popular we'll find memnodes.h being included
Tom> pretty much everywhere, which is not so great. I'm thinking about
Tom> moving struct MemoryContextCallback and the extern for
Tom> MemoryContextRegisterResetCallback into palloc.h to avoid this.
Tom> Maybe that's too far in the other direction. Thoughts?
Tom> Compromise ideas?

This was pretty much my first thought on looking at the callback
patch. It may seem logical to put the reset callback stuff in
memutils/memnodes alongside context creation and reset and so on, but in
fact the places that are going to want to use callbacks are primarily
_not_ the places that are doing their own context management - since
those could do their own cleanup - but rather places that are allocating
things in contexts supplied by others. So palloc.h is the place for it.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andrew Gierth (#12)

Re: plpgsql versus domains

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> This is the first attempt at weaponizing the memory context
Tom> reset/delete feature, and I'm fairly happy with it, except for one
Tom> thing: I had to #include utils/memnodes.h into typcache.h in order
Tom> to preserve the intended property that the callback structs could
Tom> be included directly into structs using them. Now, that's not
Tom> awful in itself, because typcache.h isn't used everywhere; but if
Tom> this feature gets popular we'll find memnodes.h being included
Tom> pretty much everywhere, which is not so great. I'm thinking about
Tom> moving struct MemoryContextCallback and the extern for
Tom> MemoryContextRegisterResetCallback into palloc.h to avoid this.
Tom> Maybe that's too far in the other direction. Thoughts?
Tom> Compromise ideas?

This was pretty much my first thought on looking at the callback
patch. It may seem logical to put the reset callback stuff in
memutils/memnodes alongside context creation and reset and so on, but in
fact the places that are going to want to use callbacks are primarily
_not_ the places that are doing their own context management - since
those could do their own cleanup - but rather places that are allocating
things in contexts supplied by others. So palloc.h is the place for it.

That argument sounds good to me ;-). Moved.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Jim Nasby

Jim.Nasby@BlueTreble.com

about 11 years ago

In reply to: Tom Lane (#11)

Re: plpgsql versus domains

On 2/28/15 11:26 PM, Tom Lane wrote:

Also, instrumenting the code showed that TypeCacheConstrCallback gets
called quite a lot during the standard regression tests, which is why
I went out of my way to make it quick. Almost all of those cache flushes
are from non-domain-related updates on pg_type or pg_constraint, so
they're not really necessary from a logical perspective, and they're
surely going to hurt performance for heavy users of domains. I think to
fix this we'd have to split pg_constraint into two catalogs, one for table
constraints and one for domain constraints; which would be a good thing
anyway from a normal-form-theory perspective. And we'd have to get rid of
pg_type.typnotnull and instead store domain NOT NULL constraints in this
hypothetical domain constraint catalog. I don't plan to do anything about
that myself right now, because I'm not sure that production databases
would have the kind of traffic on pg_type and pg_constraint that the
regression tests exhibit. But at some point we might have to fix it.

FWIW, my experience running a low-downtime website and supporting DDL
during normal operations (ie: no maintenance windows) is that by far the
biggest concern is acquiring locks. Once you have the locks, taking an
extra second for the actual DDL isn't that big a deal (and I suspect
you'd need to do a LOT of DDL to add up to that).

Likewise, after piling up waiting for a DDL lock to release, I really
doubt the extra sinval workload is going to matter much. If you're
pushing the hardware that hard I doubt you'd be able to do online DDL
for a slew of other reasons.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Tom Lane (#9)

Re: plpgsql versus domains

I wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the parser's
cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to fight
two compatibility battles concurrently ;-)

A sensible plan, but since we're here:

- I do agree that changing the way PL/pgsql does those type
conversions is scary. I can't predict what will break, and there's no
getting around the scariness of that.

- On the other hand, I do think it would be good to change it, because
this sucks:

rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric); end $$;
ERROR: invalid input syntax for integer: "3.0000000000000000"
CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment

If we did want to open that can of worms, my proposal would be to make
plpgsql type conversions work like so:
* If there is a cast pathway known to the core SQL parser, use that
mechanism.
* Otherwise, attempt to convert via I/O as we do today.
This seems to minimize the risk of breaking things, although there would
probably be corner cases that work differently (for instance I bet boolean
to text might turn out differently). In the very long run we could perhaps
deprecate and remove the second phase.

Since people didn't seem to be running away screaming, here is a patch to
do that. I've gone through the list of existing casts, and as far as I
can tell the only change in behavior in cases that would have worked
before is the aforementioned boolean->string case. Before, if you made
plpgsql convert a bool to text, you got 't' or 'f', eg

regression=# do $$declare t text; begin t := true; raise notice 't = %', t; end $$;
NOTICE: t = t
DO

Now you get 'true' or 'false', because that's what the cast does, eg

regression=# do $$declare t text; begin t := true; raise notice 't = %', t; end $$;
NOTICE: t = true
DO

This seems to me to be a narrow enough behavioral change that we could
tolerate it in a major release. (Of course, maybe somebody out there
thinks that failures like the one Robert exhibits are a feature, in
which case they'd have a rather longer list of compatibility issues.)

The performance gain is fairly nifty. I tested int->bigint coercions
like this:

do $$
declare b bigint;
begin
for i in 1 .. 1000000 loop
b := i;
end loop;
end$$;

On my machine, in non-cassert builds, this takes around 750 ms in 9.4,
610 ms in HEAD (so we already did something good, I'm not quite sure
what), and 420 ms with this patch. Another example is a no-op domain
conversion:

create domain di int;

do $$
declare b di;
begin
for i in 1 .. 1000000 loop
b := i;
end loop;
end$$;

This takes about 760 ms in 9.4, 660 ms in HEAD, 380 ms with patch.

Comments?

regards, tom lane

#16

Pavel Stehule

pavel.stehule@gmail.com

about 11 years ago

In reply to: Tom Lane (#15)

Re: plpgsql versus domains

2015-03-03 20:32 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

I wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Feb 26, 2015 at 1:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

To some extent this is a workaround for the fact that plpgsql does type
conversions the way it does (ie, by I/O rather than by using the

parser's

cast mechanisms). We've talked about changing that, but people seem to
be afraid of the compatibility consequences, and I'm not planning to

fight

two compatibility battles concurrently ;-)

A sensible plan, but since we're here:

- I do agree that changing the way PL/pgsql does those type
conversions is scary. I can't predict what will break, and there's no
getting around the scariness of that.

- On the other hand, I do think it would be good to change it, because
this sucks:

rhaas=# do $$ declare x int; begin x := (3.0::numeric)/(1.0::numeric);

end $$;

ERROR: invalid input syntax for integer: "3.0000000000000000"
CONTEXT: PL/pgSQL function inline_code_block line 1 at assignment

If we did want to open that can of worms, my proposal would be to make
plpgsql type conversions work like so:
* If there is a cast pathway known to the core SQL parser, use that
mechanism.
* Otherwise, attempt to convert via I/O as we do today.
This seems to minimize the risk of breaking things, although there would
probably be corner cases that work differently (for instance I bet

boolean

to text might turn out differently). In the very long run we could

perhaps

deprecate and remove the second phase.

Since people didn't seem to be running away screaming, here is a patch to
do that. I've gone through the list of existing casts, and as far as I
can tell the only change in behavior in cases that would have worked
before is the aforementioned boolean->string case. Before, if you made
plpgsql convert a bool to text, you got 't' or 'f', eg

regression=# do $$declare t text; begin t := true; raise notice 't = %',
t; end $$;
NOTICE: t = t
DO

Now you get 'true' or 'false', because that's what the cast does, eg

regression=# do $$declare t text; begin t := true; raise notice 't = %',
t; end $$;
NOTICE: t = true
DO

This seems to me to be a narrow enough behavioral change that we could
tolerate it in a major release. (Of course, maybe somebody out there
thinks that failures like the one Robert exhibits are a feature, in
which case they'd have a rather longer list of compatibility issues.)

The performance gain is fairly nifty. I tested int->bigint coercions
like this:

do $$
declare b bigint;
begin
for i in 1 .. 1000000 loop
b := i;
end loop;
end$$;

On my machine, in non-cassert builds, this takes around 750 ms in 9.4,
610 ms in HEAD (so we already did something good, I'm not quite sure
what), and 420 ms with this patch. Another example is a no-op domain
conversion:

create domain di int;

do $$
declare b di;
begin
for i in 1 .. 1000000 loop
b := i;
end loop;
end$$;

This takes about 760 ms in 9.4, 660 ms in HEAD, 380 ms with patch.

Comments?

it is perfect,

thank you very much for this

Regards

Pavel Stehule

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

plpgsql versus domains

Attachments:

Attachments: