pg_restore casts check constraints differently

Started by Joshua Maalmost 10 years ago10 messages
#1Joshua Ma
josh@benchling.com

This might not be a common case, but we're using pg_dump in a testing
environment to check migrations - 1) we initialize the db from HEAD,
pg_dump it, 2) we initialize the db from migration_base.sql, apply
migrations, pg_dump it, and 3) compare the two dumps to verify that our
migrations are correct wrt schema.

However, we're seeing pg_restore transforming our check constraints with
different casting.

# \d arrayed_library
CONSTRAINT arrayed_library_step_check CHECK (((step)::text = ANY
((ARRAY['ADD_RESERVED_SEQUENCES'::character varying,
'ANALYZE_DESIGN_WARNINGS'::character varying, 'COMPLETE_ORDER'::character
varying, 'DEFINE_VARIANTS'::character varying,
'LABEL_TRANSLATION'::character varying])::text[])))

$ dropdb db && createdb db
$ pg_dump db --schema-only --no-owner > migration_base.sql
# migration_base.sql has the same CONSTRAINT as above
$ psql db -q -f migration_base.sql

# \d arrayed_library
CONSTRAINT arrayed_library_step_check CHECK (((step)::text = ANY
(ARRAY[('ADD_RESERVED_SEQUENCES'::character varying)::text,
('ANALYZE_DESIGN_WARNINGS'::character varying)::text,
('COMPLETE_ORDER'::character varying)::text, ('DEFINE_VARIANTS'::character
varying)::text, ('LABEL_TRANSLATION'::character varying)::text])))

Note that the restored constraint has ARRAY('a'::text, 'b'::text, ...)
while the original had (ARRAY['a', 'b', ...])::text[]

Is there any way to have postgres NOT do the extra conversions?

--
- Josh

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua Ma (#1)
Re: pg_restore casts check constraints differently

Joshua Ma <josh@benchling.com> writes:

This might not be a common case, but we're using pg_dump in a testing
environment to check migrations - 1) we initialize the db from HEAD,
pg_dump it, 2) we initialize the db from migration_base.sql, apply
migrations, pg_dump it, and 3) compare the two dumps to verify that our
migrations are correct wrt schema.

However, we're seeing pg_restore transforming our check constraints with
different casting.

It's not really different. What you're seeing is pg_dump (or actually
ruleutils.c) choosing to dump some implicit casts explicitly to ensure
that the expression is parsed the same way next time. It might be
overly conservative to do so, but we've found that erring in this
direction tends to avoid breakage when the result is loaded into another
server version; it's a bit like the intentional overparenthesization.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#2)
Re: pg_restore casts check constraints differently

On Tue, Mar 29, 2016 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joshua Ma <josh@benchling.com> writes:

This might not be a common case, but we're using pg_dump in a testing
environment to check migrations - 1) we initialize the db from HEAD,
pg_dump it, 2) we initialize the db from migration_base.sql, apply
migrations, pg_dump it, and 3) compare the two dumps to verify that our
migrations are correct wrt schema.

However, we're seeing pg_restore transforming our check constraints with
different casting.

It's not really different. What you're seeing is pg_dump (or actually
ruleutils.c) choosing to dump some implicit casts explicitly to ensure
that the expression is parsed the same way next time. It might be
overly conservative to do so, but we've found that erring in this
direction tends to avoid breakage when the result is loaded into another
server version; it's a bit like the intentional overparenthesization.

​Why don't we just use ruleutils.c to generate \d results so that what we
end up showing is canonical?

David J.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#3)
Re: pg_restore casts check constraints differently

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Mar 29, 2016 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not really different. What you're seeing is pg_dump (or actually
ruleutils.c) choosing to dump some implicit casts explicitly to ensure
that the expression is parsed the same way next time.

​Why don't we just use ruleutils.c to generate \d results so that what we
end up showing is canonical?

We do. AFAIK, what psql's \d shows in these cases is the same as what
pg_dump will print. Joshua's complaint is that it isn't necessarily
identical to what was input.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#4)
Re: pg_restore casts check constraints differently

On Tue, Mar 29, 2016 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Mar 29, 2016 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not really different. What you're seeing is pg_dump (or actually
ruleutils.c) choosing to dump some implicit casts explicitly to ensure
that the expression is parsed the same way next time.

​Why don't we just use ruleutils.c to generate \d results so that what we
end up showing is canonical?

We do. AFAIK, what psql's \d shows in these cases is the same as what
pg_dump will print. Joshua's complaint is that it isn't necessarily
identical to what was input.

​Then I must be lacking info here because given that the two constraints
shown using \d are equivalent if we were to output a canonical form there
could only be one valid representation that could be output.

Looking at it in this manner Joshua's goal is achieved even if we don't
output exactly what was input - because at least regardless of the input
form the attempt to compare direct HEAD and migration result​ would be the
same result.

I guess my "so that" clause is overly optimistic - we'd likely need to
expend more effort to actually derive a canonical version of a given
arbitrary constraint and our current implementation is allowed to simplify
without deriving a canonical form: in this case failing to consistently
choose whether to cast the array elements and leave the array type itself
implied versus leaving the array elements in their natural form and casting
the final array to the necessary type. And, at the same time, ideally
recognizing that the built-in types "character varying" and "text" are
compatible and thus ('value'::varchar)::text should be simplified to
'value'::text.

David J.

#6Amit Langote
amitlangote09@gmail.com
In reply to: Tom Lane (#2)
Re: pg_restore casts check constraints differently

On Wed, Mar 30, 2016 at 6:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joshua Ma <josh@benchling.com> writes:

This might not be a common case, but we're using pg_dump in a testing
environment to check migrations - 1) we initialize the db from HEAD,
pg_dump it, 2) we initialize the db from migration_base.sql, apply
migrations, pg_dump it, and 3) compare the two dumps to verify that our
migrations are correct wrt schema.

However, we're seeing pg_restore transforming our check constraints with
different casting.

It's not really different. What you're seeing is pg_dump (or actually
ruleutils.c) choosing to dump some implicit casts explicitly to ensure
that the expression is parsed the same way next time. It might be
overly conservative to do so, but we've found that erring in this
direction tends to avoid breakage when the result is loaded into another
server version; it's a bit like the intentional overparenthesization.

Saw a post on pgsql-bugs awhile back that looked related:

/messages/by-id/011001d17b05$4e70c000$eb524000$@commoninf.com

In their case, the restored expression in different shape caused some
problems elsewhere. An example:

$ createdb srcdb
$ psql srcdb
psql (9.6devel)
Type "help" for help.

srcdb=# CREATE TABLE p (a varchar, CHECK (a IN ('a', 'b', 'c')));
CREATE TABLE

srcdb=# ^D\q

$ createdb destdb
$ pg_dump srcdb | psql destdb
$ psql destdb
psql (9.6devel)
Type "help" for help.

destdb=# \d
List of relations
Schema | Name | Type | Owner
--------+------+-------+-------
public | p | table | amit
(1 row)

destdb=# CREATE TABLE c (LIKE p);
CREATE TABLE

destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b', 'c'));
ALTER TABLE

destdb=# \d c
Table "public.c"
Column | Type | Modifiers
--------+-------------------+-----------
a | character varying |
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
'b'::character varying, 'c'::character varying]::text[]))

destdb=# INSERT INTO c VALUES ('a'), ('b'), ('c');
INSERT 0 3

destdb=# ALTER TABLE c INHERIT p;
ERROR: child table "c" has different definition for check constraint
"p_a_check"

Hmm, how to go about to get it to match what p_a_check looks on p? Maybe:

destdb=# CREATE TABLE c (LIKE p INCLUDING CONSTRAINTS);

destdb=# \d c
Table "public.c"
Column | Type | Modifiers
--------+-------------------+-----------
a | character varying |
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::character
varying::text, 'b'::character varying::text, 'c'::character
varying::text]))

Thanks,
Amit

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#6)
Re: pg_restore casts check constraints differently

Amit Langote <amitlangote09@gmail.com> writes:

destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b', 'c'));
destdb=# \d c
...
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
'b'::character varying, 'c'::character varying]::text[]))

Hm. It seems like the parser is doing something weird with IN there.
If you just do a simple comparison the constant ends up as TEXT to start
with:

regression=# CREATE TABLE pp (a varchar, CHECK (a = 'a'));
regression=# \d pp
...
Check constraints:
"pp_a_check" CHECK (a::text = 'a'::text)

Or for that matter

regression=# CREATE TABLE p (a varchar, CHECK (a = any(array['a', 'b', 'c'])));
regression=# \d p
...
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::text, 'b'::text, 'c'::text]))

I wonder why you don't get an array of text constants in the IN case.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
2 attachment(s)
Re: [GENERAL] pg_restore casts check constraints differently

I wrote:

Amit Langote <amitlangote09@gmail.com> writes:

destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b', 'c'));
destdb=# \d c
...
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
'b'::character varying, 'c'::character varying]::text[]))

Hm. It seems like the parser is doing something weird with IN there.
I wonder why you don't get an array of text constants in the IN case.

I poked into this and determined that it happens because transformAExprIn
identifies the array type to use without considering whether an additional
coercion will have to happen before the array elements can be compared to
the lefthand input.

I tried to fix that in a principled fashion, by resolving the actual
comparison operator and using its righthand input type as the array
element type (see first patch attached). That fixes this case all right,
but it also makes several existing regression test cases fail, for
example:

***************
*** 381,392 ****
     FROM pg_class
     WHERE oid::regclass IN ('a_star', 'c_star')
     ORDER BY 1;
!  relname | has_toast_table 
! ---------+-----------------
!  a_star  | t
!  c_star  | t
! (2 rows)
! 
  --UPDATE b_star*
  --   SET a = text 'gazpacho'
  --   WHERE aa > 4;
--- 381,389 ----
     FROM pg_class
     WHERE oid::regclass IN ('a_star', 'c_star')
     ORDER BY 1;
! ERROR:  invalid input syntax for type oid: "a_star"
! LINE 3:    WHERE oid::regclass IN ('a_star', 'c_star')
!                                    ^
  --UPDATE b_star*
  --   SET a = text 'gazpacho'
  --   WHERE aa > 4;

The problem is that regclass, like varchar, has no comparison operators
of its own, relying on OID's operators. So this patch causes us to choose
OID not regclass as the type of the unknown literals, which in this case
seems like a loss of useful behavior.

I'm tempted to just improve the situation for varchar with a complete
kluge, ie the second patch attached. Thoughts?

regards, tom lane

Attachments:

improve-IN-type-resolution-1.patchtext/x-diff; charset=us-ascii; name=improve-IN-type-resolution-1.patchDownload
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index ebda55d..8d75c88 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
***************
*** 15,20 ****
--- 15,22 ----
  
  #include "postgres.h"
  
+ #include "access/htup_details.h"
+ #include "catalog/pg_operator.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "miscadmin.h"
***************
*** 35,40 ****
--- 37,43 ----
  #include "parser/parse_agg.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
+ #include "utils/syscache.h"
  #include "utils/xml.h"
  
  
*************** transformAExprIn(ParseState *pstate, A_E
*** 1148,1166 ****
  	 */
  	if (list_length(rnonvars) > 1)
  	{
- 		List	   *allexprs;
  		Oid			scalar_type;
  		Oid			array_type;
  
  		/*
! 		 * Try to select a common type for the array elements.  Note that
! 		 * since the LHS' type is first in the list, it will be preferred when
! 		 * there is doubt (eg, when all the RHS items are unknown literals).
! 		 *
! 		 * Note: use list_concat here not lcons, to avoid damaging rnonvars.
  		 */
! 		allexprs = list_concat(list_make1(lexpr), rnonvars);
! 		scalar_type = select_common_type(pstate, allexprs, NULL, NULL);
  
  		/*
  		 * Do we have an array type to use?  Aside from the case where there
--- 1151,1225 ----
  	 */
  	if (list_length(rnonvars) > 1)
  	{
  		Oid			scalar_type;
  		Oid			array_type;
+ 		Node	   *which_expr;
  
  		/*
! 		 * Try to select a common type for the array elements.  In the general
! 		 * case we should let select_common_type() do this.  But if the RHS is
! 		 * all unknown literals, select_common_type() will return TEXT which
! 		 * we do not want here: we want to let oper() see the unknown type and
! 		 * resolve it.  So, if it said TEXT but the input is really UNKNOWN,
! 		 * go back to UNKNOWN.
  		 */
! 		scalar_type = select_common_type(pstate, rnonvars, NULL, &which_expr);
! 
! 		if (scalar_type == TEXTOID && exprType(which_expr) == UNKNOWNOID)
! 			scalar_type = UNKNOWNOID;
! 
! 		/*
! 		 * If we identified a common type, check to see if that's actually
! 		 * what the operator wants, and if not change to what it does want.
! 		 * This takes care of resolving UNKNOWN as some appropriate type.
! 		 * Also, if the operator would require a coercion, we might as well do
! 		 * the coercion on the individual array elements rather than having to
! 		 * apply an array-level coercion.
! 		 */
! 		if (OidIsValid(scalar_type))
! 		{
! 			/*
! 			 * This is a cut-down version of make_scalar_array_op()'s logic.
! 			 * We need not bother to duplicate error checks it will make.
! 			 */
! 			Oid			ltypeId,
! 						rtypeId;
! 			Operator	tup;
! 			Form_pg_operator opform;
! 			Oid			actual_arg_types[2];
! 			Oid			declared_arg_types[2];
! 
! 			ltypeId = exprType(lexpr);
! 			rtypeId = scalar_type;
! 
! 			/* Resolve the operator */
! 			tup = oper(pstate, a->name, ltypeId, rtypeId, false, a->location);
! 			opform = (Form_pg_operator) GETSTRUCT(tup);
! 
! 			actual_arg_types[0] = ltypeId;
! 			actual_arg_types[1] = rtypeId;
! 			declared_arg_types[0] = opform->oprleft;
! 			declared_arg_types[1] = opform->oprright;
! 
! 			/*
! 			 * enforce consistency with polymorphic argument and return types,
! 			 * possibly adjusting return type or declared_arg_types
! 			 */
! 			(void) enforce_generic_type_consistency(actual_arg_types,
! 													declared_arg_types,
! 													2,
! 													opform->oprresult,
! 													false);
! 
! 			/*
! 			 * If operator is polymorphic, assume scalar_type is OK as is,
! 			 * otherwise absorb the correct input type.
! 			 */
! 			if (!IsPolymorphicType(declared_arg_types[1]))
! 				scalar_type = declared_arg_types[1];
! 
! 			ReleaseSysCache(tup);
! 		}
  
  		/*
  		 * Do we have an array type to use?  Aside from the case where there
improve-IN-type-resolution-2.patchtext/x-diff; charset=us-ascii; name=improve-IN-type-resolution-2.patchDownload
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index ebda55d..7f314f5 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
*************** transformAExprIn(ParseState *pstate, A_E
*** 1163,1168 ****
--- 1163,1181 ----
  		scalar_type = select_common_type(pstate, allexprs, NULL, NULL);
  
  		/*
+ 		 * If the common type is varchar, use text, to skip the inevitable
+ 		 * coercion later (since varchar depends on text's operators).  This
+ 		 * is pretty ugly, but it's hard to see how to do it in a more
+ 		 * principled way; there are seemingly-equivalent cases where we do
+ 		 * NOT want to substitute the comparison operator's actual input type.
+ 		 * For example, replacing regclass by oid would break several useful
+ 		 * cases in the regression tests.  Varchar is common enough that it
+ 		 * seems worth a kluge here.
+ 		 */
+ 		if (scalar_type == VARCHAROID)
+ 			scalar_type = TEXTOID;
+ 
+ 		/*
  		 * Do we have an array type to use?  Aside from the case where there
  		 * isn't one, we don't risk using ScalarArrayOpExpr when the common
  		 * type is RECORD, because the RowExpr comparison logic below can cope
#9Amit Langote
amitlangote09@gmail.com
In reply to: Tom Lane (#8)
Re: [GENERAL] pg_restore casts check constraints differently

On Thu, Mar 31, 2016 at 1:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Amit Langote <amitlangote09@gmail.com> writes:

destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b', 'c'));
destdb=# \d c
...
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
'b'::character varying, 'c'::character varying]::text[]))

Hm. It seems like the parser is doing something weird with IN there.
I wonder why you don't get an array of text constants in the IN case.

I poked into this and determined that it happens because transformAExprIn
identifies the array type to use without considering whether an additional
coercion will have to happen before the array elements can be compared to
the lefthand input.

I tried to fix that in a principled fashion, by resolving the actual
comparison operator and using its righthand input type as the array
element type (see first patch attached). That fixes this case all right,
but it also makes several existing regression test cases fail, for
example:

***************
*** 381,392 ****
FROM pg_class
WHERE oid::regclass IN ('a_star', 'c_star')
ORDER BY 1;
!  relname | has_toast_table
! ---------+-----------------
!  a_star  | t
!  c_star  | t
! (2 rows)
!
--UPDATE b_star*
--   SET a = text 'gazpacho'
--   WHERE aa > 4;
--- 381,389 ----
FROM pg_class
WHERE oid::regclass IN ('a_star', 'c_star')
ORDER BY 1;
! ERROR:  invalid input syntax for type oid: "a_star"
! LINE 3:    WHERE oid::regclass IN ('a_star', 'c_star')
!                                    ^
--UPDATE b_star*
--   SET a = text 'gazpacho'
--   WHERE aa > 4;

The problem is that regclass, like varchar, has no comparison operators
of its own, relying on OID's operators. So this patch causes us to choose
OID not regclass as the type of the unknown literals, which in this case
seems like a loss of useful behavior.

Agreed; no need to break that.

I'm tempted to just improve the situation for varchar with a complete
kluge, ie the second patch attached. Thoughts?

Fixes for me.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Victor Pontis
victor@benchling.com
In reply to: Amit Langote (#9)
Re: [GENERAL] pg_restore casts check constraints differently

Hey, I work with Josh Ma and we were troubleshooting this problem together.

We ended up creating a workaround by taking the dumps from different DBs,
initializing new DBs based on those dumps, and then dumping these new DBs.
This work around worked since the dumps of databases that were initialized
via a psql script outputted the text array constraint in the same way.

So there are definitely ways to workaround this inconsistent representation
for our use case.

Thanks again for the help!

--
Victor Pontis
Benchling
Engineer
858-761-5232

On Wed, Mar 30, 2016 at 9:51 AM, Amit Langote <amitlangote09@gmail.com>
wrote:

Show quoted text

On Thu, Mar 31, 2016 at 1:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Amit Langote <amitlangote09@gmail.com> writes:

destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b',

'c'));

destdb=# \d c
...
Check constraints:
"p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
'b'::character varying, 'c'::character varying]::text[]))

Hm. It seems like the parser is doing something weird with IN there.
I wonder why you don't get an array of text constants in the IN case.

I poked into this and determined that it happens because transformAExprIn
identifies the array type to use without considering whether an

additional

coercion will have to happen before the array elements can be compared to
the lefthand input.

I tried to fix that in a principled fashion, by resolving the actual
comparison operator and using its righthand input type as the array
element type (see first patch attached). That fixes this case all right,
but it also makes several existing regression test cases fail, for
example:

***************
*** 381,392 ****
FROM pg_class
WHERE oid::regclass IN ('a_star', 'c_star')
ORDER BY 1;
!  relname | has_toast_table
! ---------+-----------------
!  a_star  | t
!  c_star  | t
! (2 rows)
!
--UPDATE b_star*
--   SET a = text 'gazpacho'
--   WHERE aa > 4;
--- 381,389 ----
FROM pg_class
WHERE oid::regclass IN ('a_star', 'c_star')
ORDER BY 1;
! ERROR:  invalid input syntax for type oid: "a_star"
! LINE 3:    WHERE oid::regclass IN ('a_star', 'c_star')
!                                    ^
--UPDATE b_star*
--   SET a = text 'gazpacho'
--   WHERE aa > 4;

The problem is that regclass, like varchar, has no comparison operators
of its own, relying on OID's operators. So this patch causes us to

choose

OID not regclass as the type of the unknown literals, which in this case
seems like a loss of useful behavior.

Agreed; no need to break that.

I'm tempted to just improve the situation for varchar with a complete
kluge, ie the second patch attached. Thoughts?

Fixes for me.

Thanks,
Amit