Domains and arrays and composites, oh my

Started by Tom Lanealmost 9 years ago11 messageshackers

tgl@sss.pgh.pa.us

almost 9 years ago

I started to look into allowing domains over composite types, which is
another never-implemented case that there's no very good reason not to
allow. Well, other than the argument that the SQL standard only allows
domains over "predefined" (built-in) types ... but we blew past that
restriction ages ago.

Any thought that there might be some fundamental problem with that was
soon dispelled when I noticed that we allow domains over arrays of
composite types. Ahem.

They even work, mostly. I wrote a few test cases and couldn't find
anything that failed except for attempts to insert or update multiple
subfields of the same base column; that's because process_matched_tle()
fails to look through CoerceToDomain nodes. But that turns out to be a
bug even for the simpler case of domains over arrays of scalars, which
is certainly supported. That is, given

create domain domainint4arr int4[];
create table domarrtest (testint4arr domainint4arr);

this ought to work:

insert into domarrtest (testint4arr[1], testint4arr[3]) values (11,22);

It would work with a plain-array target column, but with the domain
it fails with

ERROR: multiple assignments to same column "testint4arr"

Hence, the attached proposed patch, which fixes that oversight and
adds some relevant test cases. (I've not yet looked to see if any
documentation changes would be appropriate.)

I think this is a bug fix and should be back-patched. Any objections?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: Tom Lane (#1)

Re: Domains and arrays and composites, oh my

I wrote:

I started to look into allowing domains over composite types, which is
another never-implemented case that there's no very good reason not to
allow. Well, other than the argument that the SQL standard only allows
domains over "predefined" (built-in) types ... but we blew past that
restriction ages ago.

Attached is a draft patch that allows domains over composite types.
I think it's probably complete on its own terms, but there are some
questions around behavior of functions returning domain-over-composite
that could use discussion, and some of the PLs need some follow-on work.

The core principle here was to not modify execution-time behavior by
adding domain checks to existing code paths. Rather, I wanted the
parser to insert CoerceToDomain nodes wherever checks would be needed.
Row-returning node types such as RowExpr and FieldStore only return
regular composite types, as before; to generate a value of a composite
domain, we construct a value of the base type and then CoerceToDomain.
(This is pretty analogous to what happens for domains over arrays.)
Whole-row Vars can only have regular composite types as vartype,
TupleDescs can only have regular composite types as tdtypeid, composite
Datums only have regular composite type OIDs in datum_typeid. (The last
would be expected anyway, since the physical representation of a domain
value is never different from that of its base type.)

Doing it that way led to a nicely small patch, only about 160 net new
lines of code. However, not touching the executor meant not touching
the behavior of functions that return domains, even if the domain is
domain-over-composite. In code terms this means that
get_call_result_type() and sibling functions will return TYPEFUNC_SCALAR
not TYPEFUNC_COMPOSITE for such a result type. The difficulty with
changing that is that if these functions look through the domain, then
the calling code (in, usually, a PL) will simply build and return a result
of the underlying composite type, failing to apply any domain constraints.
Trying to get out-of-core PLs on board with a change in those requirements
seems like a risky proposition.

Concretely, consider

create type complex as (r float8, i float8);
create domain dcomplex as complex;

You can make a SQL-language function to return complex in either of two
ways:

create function fc() returns complex language sql
as 'select 1.0::float8, 2.0::float8';

create function fc() returns complex language sql
as 'select row(1.0::float8, 2.0::float8)::complex';

As the patch stands, though, only the second way works for domains over
composite:

regression=# create function fdc() returns dcomplex language sql
as 'select 1.0::float8, 2.0::float8';
ERROR: return type mismatch in function declared to return dcomplex
DETAIL: Final statement must return exactly one column.
CONTEXT: SQL function "fdc"
regression=# create function fdc() returns dcomplex language sql
as 'select row(1.0::float8, 2.0)::dcomplex';
CREATE FUNCTION

Now, maybe that's fine. SQL-language functions have never been very
willing to insert implicit casts to get to the function result type,
and certainly the only way that the first definition could be legal
is if there were an implicit up-cast to the domain type. It might be
OK to just leave it like this, though some documentation about it
would be a good idea.

plpgsql functions seem generally okay as far as composite domain return
types go, because they don't have anything corresponding to the row
return convention of SQL functions. And plpgsql's greater willingness
to do implicit coercions reduces the notational burden, too. But
there's some work yet to be done to get plpgsql to realize that
composite domain local variables have substructure. For example,
this works:

declare x complex;
...
x.r := 1;

but it fails if x is dcomplex. But ISTM that that would be better
handled as a followon feature patch. I suspect that the other PLs may
have similar issues where it'd be nice to allow domain-over-composite
to act like a plain composite for specific purposes; but I've not looked.

Another issue related to function result types is that the parser
considers a function-in-FROM returning a composite domain to be
producing a scalar result not a rowtype. Thus you get this for a
function returning complex:

regression=# select * from fc();
r | i
---+---
1 | 2
(1 row)

but this for a function returning dcomplex:

regression=# select * from fdc();
fdc
-------
(1,2)
(1 row)

I think that that could be changed with only local changes in parse
analysis, but do we want to change it? Arguably, making fdc() act the
same as fc() here would amount to implicitly downcasting the domain to
its base type. But doing so here is optional, not necessary in order to
make the statement sane at all, and it's arguable that we shouldn't do
that if the user didn't tell us to. A user who does want that to happen
can downcast explicitly:

regression=# select * from cast(fdc() as complex);
r | i
---+---
1 | 2
(1 row)

(For arcane syntactic reasons you can't abbreviate CAST with :: here.)
Another point is that if you do want the domain value as a domain
value, and not smashed to its base type, it would be hard to get at
if the parser acts this way --- "foo.*" would end up producing the base
rowtype, or if it didn't, we'd have some issues with the previously
noted rule about whole-row Vars never having domain types.

So there's a case to be made that this behavior is fine as-is, but
certainly you could also argue that it's a POLA violation.

Digression: one reason I'm hesitant to introduce inessential reductions
of domains to base types is that I'm looking ahead to arrays over
domains, which will provide a workaround for the people who complain
that they wish 2-D arrays would work type-wise like arrays of 1-D array
objects. If you "create domain inta as int[]" then inta[] would act
like an array of array objects, mostly solving the problem I think.
But it solves the problem only because we don't consider that a domain
is indistinguishable from its base type. It's hard to be sure without
having done the work yet, but I think there will be cases where being
over-eager to treat a domain as its base type might break the behavior
we want for that case. So I don't want to create a precedent for that
here.

Thoughts?

Obviously this is work for v11; I'll go stick it in the next commitfest.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 9 years ago

In reply to: Tom Lane (#2)

Re: Domains and arrays and composites, oh my

On Thursday, July 13, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote:

regression=# select * from fdc();
fdc
-------
(1,2)
(1 row)

Select (fdc).* from fdc(); is considerably more intuitive that the cast.
Does that give the expected multi-column result?

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: David G. Johnston (#3)

Re: Domains and arrays and composites, oh my

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Thursday, July 13, 2017, Tom Lane <tgl@sss.pgh.pa.us> wrote:

regression=# select * from fdc();
fdc
-------
(1,2)
(1 row)

Select (fdc).* from fdc(); is considerably more intuitive that the cast.
Does that give the expected multi-column result?

Yeah, it does, although I'm not sure how intuitive it is that the
parentheses are significant ...

regression=# select * from fdc();
fdc
-------
(1,2)
(1 row)

regression=# select fdc from fdc();
fdc
-------
(1,2)
(1 row)

regression=# select fdc.* from fdc();
fdc
-------
(1,2)
(1 row)

regression=# select (fdc).* from fdc();
r | i
---+---
1 | 2
(1 row)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#4)

Re: Domains and arrays and composites, oh my

On Thu, Jul 13, 2017 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it does, although I'm not sure how intuitive it is that the
parentheses are significant ...

regression=# select fdc.* from fdc();
fdc
-------
(1,2)
(1 row)

regression=# select (fdc).* from fdc();
r | i
---+---
1 | 2
(1 row)

Not intuitive at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 8 years ago

In reply to: Tom Lane (#2)

Re: Domains and arrays and composites, oh my

On 07/13/2017 03:19 PM, Tom Lane wrote:

I wrote:

I started to look into allowing domains over composite types, which is
another never-implemented case that there's no very good reason not to
allow. Well, other than the argument that the SQL standard only allows
domains over "predefined" (built-in) types ... but we blew past that
restriction ages ago.

Attached is a draft patch that allows domains over composite types.
I think it's probably complete on its own terms, but there are some
questions around behavior of functions returning domain-over-composite
that could use discussion, and some of the PLs need some follow-on work.

The core principle here was to not modify execution-time behavior by
adding domain checks to existing code paths. Rather, I wanted the
parser to insert CoerceToDomain nodes wherever checks would be needed.
Row-returning node types such as RowExpr and FieldStore only return
regular composite types, as before; to generate a value of a composite
domain, we construct a value of the base type and then CoerceToDomain.
(This is pretty analogous to what happens for domains over arrays.)
Whole-row Vars can only have regular composite types as vartype,
TupleDescs can only have regular composite types as tdtypeid, composite
Datums only have regular composite type OIDs in datum_typeid. (The last
would be expected anyway, since the physical representation of a domain
value is never different from that of its base type.)

Doing it that way led to a nicely small patch, only about 160 net new
lines of code. However, not touching the executor meant not touching
the behavior of functions that return domains, even if the domain is
domain-over-composite. In code terms this means that
get_call_result_type() and sibling functions will return TYPEFUNC_SCALAR
not TYPEFUNC_COMPOSITE for such a result type. The difficulty with
changing that is that if these functions look through the domain, then
the calling code (in, usually, a PL) will simply build and return a result
of the underlying composite type, failing to apply any domain constraints.
Trying to get out-of-core PLs on board with a change in those requirements
seems like a risky proposition.

Concretely, consider

create type complex as (r float8, i float8);
create domain dcomplex as complex;

You can make a SQL-language function to return complex in either of two
ways:

create function fc() returns complex language sql
as 'select 1.0::float8, 2.0::float8';

create function fc() returns complex language sql
as 'select row(1.0::float8, 2.0::float8)::complex';

As the patch stands, though, only the second way works for domains over
composite:

regression=# create function fdc() returns dcomplex language sql
as 'select 1.0::float8, 2.0::float8';
ERROR: return type mismatch in function declared to return dcomplex
DETAIL: Final statement must return exactly one column.
CONTEXT: SQL function "fdc"
regression=# create function fdc() returns dcomplex language sql
as 'select row(1.0::float8, 2.0)::dcomplex';
CREATE FUNCTION

Now, maybe that's fine. SQL-language functions have never been very
willing to insert implicit casts to get to the function result type,
and certainly the only way that the first definition could be legal
is if there were an implicit up-cast to the domain type. It might be
OK to just leave it like this, though some documentation about it
would be a good idea.

plpgsql functions seem generally okay as far as composite domain return
types go, because they don't have anything corresponding to the row
return convention of SQL functions. And plpgsql's greater willingness
to do implicit coercions reduces the notational burden, too. But
there's some work yet to be done to get plpgsql to realize that
composite domain local variables have substructure. For example,
this works:

declare x complex;
...
x.r := 1;

but it fails if x is dcomplex. But ISTM that that would be better
handled as a followon feature patch. I suspect that the other PLs may
have similar issues where it'd be nice to allow domain-over-composite
to act like a plain composite for specific purposes; but I've not looked.

Another issue related to function result types is that the parser
considers a function-in-FROM returning a composite domain to be
producing a scalar result not a rowtype. Thus you get this for a
function returning complex:

regression=# select * from fc();
r | i
---+---
1 | 2
(1 row)

but this for a function returning dcomplex:

regression=# select * from fdc();
fdc
-------
(1,2)
(1 row)

I think that that could be changed with only local changes in parse
analysis, but do we want to change it? Arguably, making fdc() act the
same as fc() here would amount to implicitly downcasting the domain to
its base type. But doing so here is optional, not necessary in order to
make the statement sane at all, and it's arguable that we shouldn't do
that if the user didn't tell us to. A user who does want that to happen
can downcast explicitly:

regression=# select * from cast(fdc() as complex);
r | i
---+---
1 | 2
(1 row)

(For arcane syntactic reasons you can't abbreviate CAST with :: here.)
Another point is that if you do want the domain value as a domain
value, and not smashed to its base type, it would be hard to get at
if the parser acts this way --- "foo.*" would end up producing the base
rowtype, or if it didn't, we'd have some issues with the previously
noted rule about whole-row Vars never having domain types.

So there's a case to be made that this behavior is fine as-is, but
certainly you could also argue that it's a POLA violation.

Digression: one reason I'm hesitant to introduce inessential reductions
of domains to base types is that I'm looking ahead to arrays over
domains, which will provide a workaround for the people who complain
that they wish 2-D arrays would work type-wise like arrays of 1-D array
objects. If you "create domain inta as int[]" then inta[] would act
like an array of array objects, mostly solving the problem I think.
But it solves the problem only because we don't consider that a domain
is indistinguishable from its base type. It's hard to be sure without
having done the work yet, but I think there will be cases where being
over-eager to treat a domain as its base type might break the behavior
we want for that case. So I don't want to create a precedent for that
here.

Thoughts?

This is a pretty nice patch, and very small indeed all things
considered. From a code point of view I have no criticism, although
maybe we need to be a bit more emphatic in the header file comments
about the unwisdom of using get_expr_result_tupdesc().

I do think that treating a function returning a domain-over-composite
differently from one returning a base composite is a POLA. We'd be very
hard put to explain the reasons for it to an end user.

I also think we shouldn't commit this until we have accompanying patches
for the core PLs, at least for plpgsql but I bet there are things that
should be fixed for the others too.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andrew Dunstan (#6)

Re: Domains and arrays and composites, oh my

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 07/13/2017 03:19 PM, Tom Lane wrote:

Attached is a draft patch that allows domains over composite types.
I think it's probably complete on its own terms, but there are some
questions around behavior of functions returning domain-over-composite
that could use discussion, and some of the PLs need some follow-on work.

This is a pretty nice patch, and very small indeed all things
considered. From a code point of view I have no criticism, although
maybe we need to be a bit more emphatic in the header file comments
about the unwisdom of using get_expr_result_tupdesc().

Thanks for reviewing!

I do think that treating a function returning a domain-over-composite
differently from one returning a base composite is a POLA. We'd be very
hard put to explain the reasons for it to an end user.

Do you have any thoughts about how we ought to resolve that?

I also think we shouldn't commit this until we have accompanying patches
for the core PLs, at least for plpgsql but I bet there are things that
should be fixed for the others too.

For my own part, I think it would be reasonable to commit the core patch
once we've resolved the question of what to do with the case of
function-in-FROM returning domain over composite. That's core parser
behavior so it should be part of the same patch. I think addressing
each PL separately in followon patches would be fine and would help to
avoid the giant-unreviewable-patch syndrome. It is important to get
all the related work done in one release cycle, but since we're just
starting v11 I'm not too worried about that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 8 years ago

In reply to: Tom Lane (#7)

Re: Domains and arrays and composites, oh my

On 09/28/2017 01:02 PM, Tom Lane wrote:

I do think that treating a function returning a domain-over-composite
differently from one returning a base composite is a POLA. We'd be very
hard put to explain the reasons for it to an end user.

Do you have any thoughts about how we ought to resolve that?

Not offhand. Maybe we need to revisit the decision not to modify the
executor at all. Obviously that would make the patch a good deal more
invasive ;-( One thought I had was that we could invent a new return
type of TYPEFUNC_DOMAIN_COMPOSITE so there would be less danger of a PL
just treating it as an unconstrained base type as it might do if it saw
TYPEFUNC_COMPOSITE.

Maybe I'm wrong, but I have a strong suspicion that of we leave it like
this now we'll regret it in the future.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andrew Dunstan (#8)

Re: Domains and arrays and composites, oh my

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 09/28/2017 01:02 PM, Tom Lane wrote:

I do think that treating a function returning a domain-over-composite
differently from one returning a base composite is a POLA. We'd be very
hard put to explain the reasons for it to an end user.

Do you have any thoughts about how we ought to resolve that?

Not offhand. Maybe we need to revisit the decision not to modify the
executor at all.

I think it's more of a parse analysis change: the issue is whether to
smash a function's result type to base when determining whether it emits
columns. Maybe we could just do that in that context, and otherwise leave
domains alone.

One thought I had was that we could invent a new return
type of TYPEFUNC_DOMAIN_COMPOSITE so there would be less danger of a PL
just treating it as an unconstrained base type as it might do if it saw
TYPEFUNC_COMPOSITE.

Hmm. That would be a way of forcing the issue, no doubt ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Tom Lane (#9)

Re: Domains and arrays and composites, oh my

I wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 09/28/2017 01:02 PM, Tom Lane wrote:

I do think that treating a function returning a domain-over-composite
differently from one returning a base composite is a POLA. We'd be very
hard put to explain the reasons for it to an end user.

Do you have any thoughts about how we ought to resolve that?

Not offhand. Maybe we need to revisit the decision not to modify the
executor at all.

I think it's more of a parse analysis change: the issue is whether to
smash a function's result type to base when determining whether it emits
columns. Maybe we could just do that in that context, and otherwise leave
domains alone.

After fooling with that for awhile, I concluded that the only reasonable
path forward is to go ahead and modify the behavior of
get_expr_result_type and sibling routines. While this fixes the parser
behavior to be pretty much what I think we want, it means that we've got
holes to fill in a lot of other places. Most of them will manifest as
unexpected "domaintypename is not a composite type" errors, but there
are definitely places where the net effect is to silently fail to enforce
domain constraints against a constructed row value :-(. In the attached
still-WIP patch, I think that I've got most of the core code fixed, but
there are at least these holes remaining to fill:

* json_populate_record and sibling routines won't enforce domain
constraints; depending on how they're called, you might or might not
get a "not a composite type" error. This is because they use two
different methods for getting the target type OID depending on whether
the input prototype record is NULL. Maybe that was a bad idea.
(I'm disinclined to try to fix this code right now since there are
pending bug fixes nearby; better to wait till that dust settles.)

* Ditto for hstore's populate_record, which is pretty much same logic.

* plpgsql mostly seems to work, but not quite 100%: RETURN QUERY will
fail to enforce domain constraints if the return type is domain over
composite. It also still needs feature extension to handle d-over-c
variables more fully (e.g. allow field assignment).

* I haven't looked at the other PLs much; I believe they will mostly
fail safe with "not a composite type" errors, but I wouldn't swear
that all code paths will.

It seems like this is probably the way forward, but I'm slightly
discouraged by the fact that the patch footprint is getting bigger
and there are paths where we can get domain-enforcement omissions
rather than something more benign. Still, we had lots of
domain-enforcement omissions in the early days of the existing
domain feature, if memory serves. Maybe we should just accept
that working through that will be a process.

One thought I had was that we could invent a new return
type of TYPEFUNC_DOMAIN_COMPOSITE so there would be less danger of a PL
just treating it as an unconstrained base type as it might do if it saw
TYPEFUNC_COMPOSITE.

Hmm. That would be a way of forcing the issue, no doubt ...

I did that, but it turns out not to help much; turns out a lot of the
broken code is doing stuff on the basis of type_is_rowtype(), which
this patch allows to return true for domains over composite. Maybe
we should undo that and invent a separate type_is_rowtype_or_domain()
function to be used only by repaired code, but that seems pretty ugly :-(

Anyway, PFA an updated patch that also fixes some conflicts with the
already-committed arrays-of-domains patch.

regards, tom lane

#11

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Tom Lane (#10)

Re: Domains and arrays and composites, oh my

I wrote:

Anyway, PFA an updated patch that also fixes some conflicts with the
already-committed arrays-of-domains patch.

I realized that the pending patch for jsonb_build_object doesn't
actually have any conflict with what I needed to touch here, so
I went ahead and fixed the JSON functions that needed fixing,
along with hstore's populate_record. I ended up rewriting the
argument-metadata-collection portions of populate_record_worker
and populate_recordset_worker rather heavily, because I didn't
like them at all: aside from not working for domains over composite,
they were pretty inefficient (redoing a lot of work on each call
for no good reason) and they were randomly different from each
other, resulting in json{b}_populate_recordset rejecting some cases
that worked in json{b}_populate_record.

I've also updated the documentation.

I think that this patch version is done so far as the core code
and contrib are concerned. The PLs need varying amounts of work,
but as I said earlier, I think it would be better to tackle those
in separate patches instead of continuing to enlarge the footprint
of the core patch. So, barring objection, I'd like to go ahead
and commit this.

regards, tom lane

Domains and arrays and composites, oh my

Attachments:

Attachments:

Attachments:

Attachments: