[PROPOSAL] : Use of ORDER BY clause in insert.sql

Started by Nishant Sharmaabout 3 years ago7 messages
#1Nishant Sharma
nishant.sharma@enterprisedb.com
1 attachment(s)

Hi,

We would like to share a proposal of a patch, where we have added order by
clause in two select statements in src/test/regress/sql/insert.sql file and
respective changes in src/test/regress/expected/insert.out output file.

This would help in generating output in consistent sequence, as sometimes
we have observed change in sequence in output.

Please find the patch attached <Proposal_OrderBy_insert.sql.out.patch>

Regards,
Nishant Sharma
EDB: http://www.enterprisedb.com

Attachments:

Proposal_OrderBy_insert.sql.out.patchapplication/octet-stream; name=Proposal_OrderBy_insert.sql.out.patchDownload
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index dd4354f..5f09654 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -287,17 +287,17 @@ insert into part_default_p2 values ('de', 35);
 insert into list_parted values ('ab', 21);
 insert into list_parted values ('xx', 1);
 insert into list_parted values ('yy', 2);
-select tableoid::regclass, * from list_parted;
+select tableoid::regclass, * from list_parted order by 1, 3;
       tableoid      | a  | b  
 --------------------+----+----
  part_cc_dd         | cC |  1
+ part_null          |    |  0
  part_ee_ff1        | ff |  1
  part_ee_ff2        | ff | 11
  part_xx_yy_p1      | xx |  1
  part_xx_yy_defpart | yy |  2
- part_null          |    |  0
- part_default_p1    | cd | 25
  part_default_p1    | ab | 21
+ part_default_p1    | cd | 25
  part_default_p2    | de | 35
 (9 rows)
 
@@ -361,21 +361,21 @@ DETAIL:  Partition key of the failing row contains (b) = (0).
 -- ok
 insert into list_parted values ('EE', 1);
 insert into part_ee_ff values ('EE', 10);
-select tableoid::regclass, * from list_parted;
+select tableoid::regclass, * from list_parted order by 1, 3;
       tableoid      | a  | b  
 --------------------+----+----
  part_aa_bb         | aA |   
  part_cc_dd         | cC |  1
- part_ee_ff1        | ff |  1
+ part_null          |    |  0
+ part_null          |    |  1
  part_ee_ff1        | EE |  1
- part_ee_ff2        | ff | 11
+ part_ee_ff1        | ff |  1
  part_ee_ff2        | EE | 10
+ part_ee_ff2        | ff | 11
  part_xx_yy_p1      | xx |  1
  part_xx_yy_defpart | yy |  2
- part_null          |    |  0
- part_null          |    |  1
- part_default_p1    | cd | 25
  part_default_p1    | ab | 21
+ part_default_p1    | cd | 25
  part_default_p2    | de | 35
 (13 rows)
 
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index bdcffd0..bb65135 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -190,7 +190,7 @@ insert into part_default_p2 values ('de', 35);
 insert into list_parted values ('ab', 21);
 insert into list_parted values ('xx', 1);
 insert into list_parted values ('yy', 2);
-select tableoid::regclass, * from list_parted;
+select tableoid::regclass, * from list_parted order by 1, 3;
 
 -- Check tuple routing for partitioned tables
 
@@ -228,7 +228,7 @@ insert into part_ee_ff values ('EE', 0);
 -- ok
 insert into list_parted values ('EE', 1);
 insert into part_ee_ff values ('EE', 10);
-select tableoid::regclass, * from list_parted;
+select tableoid::regclass, * from list_parted order by 1, 3;
 
 -- some more tests to exercise tuple-routing with multi-level partitioning
 create table part_gg partition of list_parted for values in ('gg') partition by range (b);
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nishant Sharma (#1)
Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

Nishant Sharma <nishant.sharma@enterprisedb.com> writes:

We would like to share a proposal of a patch, where we have added order by
clause in two select statements in src/test/regress/sql/insert.sql file and
respective changes in src/test/regress/expected/insert.out output file.

This would help in generating output in consistent sequence, as sometimes
we have observed change in sequence in output.

Please be specific about the circumstances in which the output is
unstable for you. With zero information to go on, it seems about as
likely that this change is masking a bug as that it's a good idea.

regards, tom lane

#3Amul Sul
sulamul@gmail.com
In reply to: Tom Lane (#2)
Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

On Thu, Oct 27, 2022 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nishant Sharma <nishant.sharma@enterprisedb.com> writes:

We would like to share a proposal of a patch, where we have added order by
clause in two select statements in src/test/regress/sql/insert.sql file and
respective changes in src/test/regress/expected/insert.out output file.

This would help in generating output in consistent sequence, as sometimes
we have observed change in sequence in output.

Please be specific about the circumstances in which the output is
unstable for you. With zero information to go on, it seems about as
likely that this change is masking a bug as that it's a good idea.

At the first glance, I thought the patch is pretty much obvious, and
we usually add an ORDER BY clause to ensure stable output. If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

I have tested on my Centos and the Mac OS, insert.sql test is giving
stable output, I didn't find failure in the subsequent runs too but I
am not sure if that is enough evidence to skip the ORDER BY clause.

Regards,
Amul

#4David Rowley
dgrowleyml@gmail.com
In reply to: Amul Sul (#3)
Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:

On Thu, Oct 27, 2022 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Please be specific about the circumstances in which the output is
unstable for you. With zero information to go on, it seems about as
likely that this change is masking a bug as that it's a good idea.

At the first glance, I thought the patch is pretty much obvious, and
we usually add an ORDER BY clause to ensure stable output.

Unfortunately, you'll need to do better than that. We're not in the
business of accepting patches with zero justification for why they're
required. If you're not willing to do the analysis on why the order
changes sometimes, why should we accept your patch?

If you can't find the problem then you should modify insert.sql to
EXPLAIN the problem query to see if the plan has changed between the
passing and failing run. The only thing that comes to mind about why
this test might produce rows in a different order would be if a
parallel Append was sorting the subpaths by cost (See
create_append_path's call to list_sort) and the costs were for some
reason coming out differently sometimes. It's hard to imagine why this
query would be parallelised though. If you show us the EXPLAIN from a
passing and failing run, it might help us see the problem.

If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

You can't claim that if this test shouldn't get an ORDER BY that all
tests shouldn't have an ORDER BY. That's just crazy. What if the test
is doing something like testing sort?!

David

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#4)
Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

David Rowley <dgrowleyml@gmail.com> writes:

On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:

If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

You can't claim that if this test shouldn't get an ORDER BY that all
tests shouldn't have an ORDER BY. That's just crazy. What if the test
is doing something like testing sort?!

The general policy is that we'll add ORDER BY when a test is demonstrated
to have unstable output order for identifiable environmental reasons
(e.g. locale dependency) or timing reasons (e.g. background autovacuum
sometimes changing statistics). But the key word there is "identifiable".
Without some evidence as to what's causing this, it remains possible
that it's a code bug not the fault of the test case.

regress.sgml explains the policy further:

You might wonder why we don't order all the regression test queries explicitly
to get rid of this issue once and for all. The reason is that that would
make the regression tests less useful, not more, since they'd tend
to exercise query plan types that produce ordered results to the
exclusion of those that don't.

regards, tom lane

#6Amul Sul
sulamul@gmail.com
In reply to: David Rowley (#4)
Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

On Fri, Oct 28, 2022 at 10:28 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:

On Thu, Oct 27, 2022 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Please be specific about the circumstances in which the output is
unstable for you. With zero information to go on, it seems about as
likely that this change is masking a bug as that it's a good idea.

At the first glance, I thought the patch is pretty much obvious, and
we usually add an ORDER BY clause to ensure stable output.

Unfortunately, you'll need to do better than that. We're not in the
business of accepting patches with zero justification for why they're
required. If you're not willing to do the analysis on why the order
changes sometimes, why should we accept your patch?

Unfortunately the test is not failing at me. Otherwise, I would have
done that analysis. When I saw the patch for the first time, somehow,
I didn't think anything spurious due to my misconception that we
usually add the ORDER BY clause for the select queries just to be
sure.

If you can't find the problem then you should modify insert.sql to
EXPLAIN the problem query to see if the plan has changed between the
passing and failing run. The only thing that comes to mind about why
this test might produce rows in a different order would be if a
parallel Append was sorting the subpaths by cost (See
create_append_path's call to list_sort) and the costs were for some
reason coming out differently sometimes. It's hard to imagine why this
query would be parallelised though. If you show us the EXPLAIN from a
passing and failing run, it might help us see the problem.

Understood.

If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

You can't claim that if this test shouldn't get an ORDER BY that all
tests shouldn't have an ORDER BY. That's just crazy. What if the test
is doing something like testing sort?!

That I can understand that the sorted output doesn't need further
sorting. I am just referring to the simple SELECT queries that do not
have any sorting.

Thanks & Regards,
Amul

#7Amul Sul
sulamul@gmail.com
In reply to: Tom Lane (#5)
Re: [PROPOSAL] : Use of ORDER BY clause in insert.sql

On Fri, Oct 28, 2022 at 10:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

On Fri, 28 Oct 2022 at 16:51, Amul Sul <sulamul@gmail.com> wrote:

If we
are too sure that the output usually comes in the same order then the
ORDER BY clause that exists in other tests seems useless. I am a bit
confused & what could be a possible bug?

You can't claim that if this test shouldn't get an ORDER BY that all
tests shouldn't have an ORDER BY. That's just crazy. What if the test
is doing something like testing sort?!

The general policy is that we'll add ORDER BY when a test is demonstrated
to have unstable output order for identifiable environmental reasons
(e.g. locale dependency) or timing reasons (e.g. background autovacuum
sometimes changing statistics). But the key word there is "identifiable".
Without some evidence as to what's causing this, it remains possible
that it's a code bug not the fault of the test case.

regress.sgml explains the policy further:

You might wonder why we don't order all the regression test queries explicitly
to get rid of this issue once and for all. The reason is that that would
make the regression tests less useful, not more, since they'd tend
to exercise query plan types that produce ordered results to the
exclusion of those that don't.

Understood. Thanks for the clarification.

Regards,
Amul