master check fails on Windows Server 2008

Started by Marina Polyakovaalmost 8 years ago10 messages
#1Marina Polyakova
m.polyakova@postgrespro.ru
3 attachment(s)

Hello, hackers! I got a permanent failure of master (commit
2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server 2008.
Regression output and diffs as well as config.pl are attached.

I used the following commands:
build.bat > build.txt
vcregress.bat check > check.txt

Binary search has shown that this failure begins with commit
bed9ef5a16239d91d97a1fa2efd9309c3cbbc4b2 (Rework the stats_ext test). On
the previous commit (70ec3f1f8f0b753c38a1a582280a02930d7cac5f) the check
passes.
I'm trying to figure out what went wrong, and any suspicions are
welcome.

About the system: Windows Server 2008 R2 Standard, Service Pack 1,
64-bit.
--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

2a41507_regression.diffstext/x-diff; name=2a41507_regression.diffsDownload
*** C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/expected/stats_ext.out	Fri Feb 16 12:56:00 2018
--- C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/results/stats_ext.out	Fri Feb 16 13:09:47 2018
***************
*** 312,322 ****
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
           QUERY PLAN          
! -----------------------------
!  HashAggregate
     Group Key: b, c, d
     ->  Seq Scan on ndistinct
! (3 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
--- 312,324 ----
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
              QUERY PLAN             
! -----------------------------------
!  GroupAggregate
     Group Key: b, c, d
+    ->  Sort
+          Sort Key: b, c, d
           ->  Seq Scan on ndistinct
! (5 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;

======================================================================

2a41507_regression.outtext/plain; name=2a41507_regression.outDownload
config.pltext/plain; name=config.plDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marina Polyakova (#1)
2 attachment(s)
Re: master check fails on Windows Server 2008

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

Hello, hackers! I got a permanent failure of master (commit
2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server 2008.
Regression output and diffs as well as config.pl are attached.

Weird. AFAICS the cost estimates for those two plans should be quite
different, so this isn't just a matter of the estimates maybe being
a bit platform-dependent. (And that test has been there nearly a
year without causing reported problems.)

To dig into it a bit more, I tweaked the test case to show the costs
for both plans, and got an output diff as attached. Could you try
the same experiment on your Windows box? In order to force the choice
in the other direction, you'd need to temporarily disable enable_sort,
not enable_hashagg as I did here, but the principle is the same.

regards, tom lane

Attachments:

hack-plan-output.patchtext/x-diff; charset=us-ascii; name=hack-plan-output.patchDownload
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 46acaad..1082660 100644
*** a/src/test/regress/sql/stats_ext.sql
--- b/src/test/regress/sql/stats_ext.sql
*************** EXPLAIN (COSTS off)
*** 177,184 ****
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
  
! EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
--- 177,188 ----
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
  
! EXPLAIN --(COSTS off)
!  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
! set enable_hashagg = 0;
! EXPLAIN --(COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+ reset enable_hashagg;
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
regression.diffstext/x-diff; charset=us-ascii; name=regression.diffsDownload
*** /home/postgres/pgsql/src/test/regress/expected/stats_ext.out	Mon Feb 12 14:53:46 2018
--- /home/postgres/pgsql/src/test/regress/results/stats_ext.out	Fri Feb 16 11:23:11 2018
***************
*** 309,323 ****
           ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
!          QUERY PLAN          
! -----------------------------
!  HashAggregate
     Group Key: b, c, d
!    ->  Seq Scan on ndistinct
  (3 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
           QUERY PLAN          
--- 309,336 ----
           ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN --(COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
!                               QUERY PLAN                              
! ----------------------------------------------------------------------
!  HashAggregate  (cost=291.00..307.32 rows=1632 width=20)
     Group Key: b, c, d
!    ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=10000 width=12)
  (3 rows)
  
+ set enable_hashagg = 0;
+ EXPLAIN --(COSTS off)
+  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+                                  QUERY PLAN                                 
+ ----------------------------------------------------------------------------
+  GroupAggregate  (cost=1026.89..1168.21 rows=1632 width=20)
+    Group Key: b, c, d
+    ->  Sort  (cost=1026.89..1051.89 rows=10000 width=12)
+          Sort Key: b, c, d
+          ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=10000 width=12)
+ (5 rows)
+ 
+ reset enable_hashagg;
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
           QUERY PLAN          

======================================================================

#3Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Tom Lane (#2)
2 attachment(s)
Re: master check fails on Windows Server 2008

On 16-02-2018 19:31, Tom Lane wrote:

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

Hello, hackers! I got a permanent failure of master (commit
2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server
2008.
Regression output and diffs as well as config.pl are attached.

Weird. AFAICS the cost estimates for those two plans should be quite
different, so this isn't just a matter of the estimates maybe being
a bit platform-dependent. (And that test has been there nearly a
year without causing reported problems.)

To dig into it a bit more, I tweaked the test case to show the costs
for both plans, and got an output diff as attached. Could you try
the same experiment on your Windows box? In order to force the choice
in the other direction, you'd need to temporarily disable enable_sort,
not enable_hashagg as I did here, but the principle is the same.

Thank you very much! Your test showed that hash aggregation was not even
added to the possible paths (see windows_regression.diffs attached).
Exploring this, I found that not allowing float8 to pass by value in
config.pl was crucial for the size of the hash table used in this query
(see diff.patch attached):

From postmaster.log on Windows:

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext STATEMENT:
EXPLAIN
SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext LOG: rewritten
parse tree:
...
2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext STATEMENT:
EXPLAIN
SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
# 20 = INT8OID => pg_type.typbyval = FLOAT8PASSBYVAL:
get_agg_clause_costs_walker aggtranstype 20 get_typbyval(aggtranstype) 0
get_agg_clause_costs_walker avgwidth 8 sizeof(void *) 8
costs->transitionSpace 24
# add AGG_SORTED path:
add_paths_to_grouping_rel 1 create_agg_path (aggstrategy 1)
estimate_hashagg_tablesize 1 hashentrysize 32
# add transitionSpace = 24:
estimate_hashagg_tablesize 2 hashentrysize 56
estimate_hashagg_tablesize 3 hashentrysize 96
estimate_hashagg_tablesize dNumGroups 1632.000000
# 156672 = 96 * 1632 > 131072:
add_paths_to_grouping_rel hashaggtablesize 156672 work_mem 128 work_mem
* 1024L 131072 grouped_rel->pathlist == NIL 0
2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext LOG: plan:
...

From postmaster.log on my computer (allow float8 to pass by value):

2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext STATEMENT:
EXPLAIN
SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext LOG: rewritten
parse tree:
...
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext STATEMENT:
EXPLAIN
SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
# 20 = INT8OID => pg_type.typbyval = FLOAT8PASSBYVAL:
get_agg_clause_costs_walker aggtranstype 20 get_typbyval(aggtranstype) 1
# add AGG_SORTED path:
add_paths_to_grouping_rel 1 create_agg_path (aggstrategy 1)
estimate_hashagg_tablesize 1 hashentrysize 32
# add transitionSpace = 0:
estimate_hashagg_tablesize 2 hashentrysize 32
estimate_hashagg_tablesize 3 hashentrysize 72
estimate_hashagg_tablesize dNumGroups 1632.000000
# 117504 = 72 * 1632 < 131072:
add_paths_to_grouping_rel hashaggtablesize 117504 work_mem 128 work_mem
* 1024L 131072 grouped_rel->pathlist == NIL 0
# add AGG_HASHED path:
add_paths_to_grouping_rel 2 create_agg_path (aggstrategy 2)
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext LOG: plan:
...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff.patchtext/x-diff; name=diff.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd14..d9788fd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3583,11 +3583,24 @@ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
 	hashentrysize = MAXALIGN(path->pathtarget->width) +
 		MAXALIGN(SizeofMinimalTupleHeader);
 
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 1 hashentrysize %ld\n",
+			hashentrysize);
+
 	/* plus space for pass-by-ref transition values... */
 	hashentrysize += agg_costs->transitionSpace;
+
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 2 hashentrysize %ld\n",
+			hashentrysize);
+
 	/* plus the per-hash-entry overhead */
 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
 
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 3 hashentrysize %ld\nestimate_hashagg_tablesize dNumGroups %f\n",
+			hashentrysize, dNumGroups);
+
 	/*
 	 * Note that this disregards the effect of fill-factor and growth policy
 	 * of the hash-table. That's probably ok, given default the default
@@ -6043,6 +6056,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 					 * We have aggregation, possibly with plain GROUP BY. Make
 					 * an AggPath.
 					 */
+					fprintf(stderr,
+							"add_paths_to_grouping_rel 1 create_agg_path (aggstrategy %d)\n",
+							parse->groupClause ? AGG_SORTED : AGG_PLAIN);
 					add_path(grouped_rel, (Path *)
 							 create_agg_path(root,
 											 grouped_rel,
@@ -6213,6 +6229,11 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 														  agg_costs,
 														  dNumGroups);
 
+			fprintf(stderr,
+					"add_paths_to_grouping_rel hashaggtablesize %ld work_mem %d work_mem * 1024L %ld grouped_rel->pathlist == NIL %d\n",
+					hashaggtablesize, work_mem, work_mem * 1024L,
+					grouped_rel->pathlist == NIL);
+
 			/*
 			 * Provided that the estimated size of the hashtable does not
 			 * exceed work_mem, we'll generate a HashAgg Path, although if we
@@ -6226,6 +6247,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 				 * We just need an Agg over the cheapest-total input path,
 				 * since input order won't matter.
 				 */
+				fprintf(stderr,
+						"add_paths_to_grouping_rel 2 create_agg_path (aggstrategy %d)\n",
+						AGG_HASHED);
 				add_path(grouped_rel, (Path *)
 						 create_agg_path(root, grouped_rel,
 										 cheapest_path,
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce..e45927e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -637,6 +637,10 @@ get_agg_clause_costs_walker(Node *node, get_agg_clause_costs_context *context)
 			costs->finalCost += argcosts.per_tuple;
 		}
 
+		fprintf(stderr,
+				"get_agg_clause_costs_walker aggtranstype %d get_typbyval(aggtranstype) %d\n",
+				aggtranstype, get_typbyval(aggtranstype));
+
 		/*
 		 * If the transition type is pass-by-value then it doesn't add
 		 * anything to the required size of the hashtable.  If it is
@@ -683,6 +687,9 @@ get_agg_clause_costs_walker(Node *node, get_agg_clause_costs_context *context)
 
 			avgwidth = MAXALIGN(avgwidth);
 			costs->transitionSpace += avgwidth + 2 * sizeof(void *);
+			fprintf(stderr,
+					"get_agg_clause_costs_walker avgwidth %d sizeof(void *) %ld costs->transitionSpace %ld\n",
+					avgwidth, sizeof(void *), costs->transitionSpace);
 		}
 		else if (aggtranstype == INTERNALOID)
 		{
diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile
index deef08d..498bf4d 100644
--- a/src/test/regress/GNUmakefile
+++ b/src/test/regress/GNUmakefile
@@ -124,7 +124,7 @@ tablespace-setup:
 ## Run tests
 ##
 
-REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 $(EXTRA_REGRESS_OPTS)
+REGRESS_OPTS = --dlpath=. --max-concurrent-tests=20 --debug $(EXTRA_REGRESS_OPTS)
 
 check: all tablespace-setup
 	$(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS)
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index ad9434f..ad44bf5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -8,39 +8,39 @@
 # run tablespace by itself, and first, because it forces a checkpoint;
 # we'd prefer not to have checkpoints later in the tests because that
 # interferes with crash-recovery testing.
-test: tablespace
+# test: tablespace
 
 # ----------
 # The first group of parallel tests
 # ----------
-test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
+# test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
 
 # Depends on things setup during char, varchar and text
-test: strings
+# test: strings
 # Depends on int2, int4, int8, float4, float8
-test: numerology
+# test: numerology
 
 # ----------
 # The second group of parallel tests
 # ----------
-test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval abstime reltime tinterval inet macaddr macaddr8 tstypes
+# test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval abstime reltime tinterval inet macaddr macaddr8 tstypes
 
 # ----------
 # Another group of parallel tests
 # geometry depends on point, lseg, box, path, polygon and circle
 # horology depends on interval, timetz, timestamp, timestamptz, reltime and abstime
 # ----------
-test: geometry horology regex oidjoins type_sanity opr_sanity misc_sanity comments expressions
+# test: geometry horology regex oidjoins type_sanity opr_sanity misc_sanity comments expressions
 
 # ----------
 # These four each depend on the previous one
 # ----------
-test: insert
-test: insert_conflict
-test: create_function_1
-test: create_type
-test: create_table
-test: create_function_2
+# test: insert
+# test: insert_conflict
+# test: create_function_1
+# test: create_type
+# test: create_table
+# test: create_function_2
 
 # ----------
 # Load huge amounts of data
@@ -48,78 +48,79 @@ test: create_function_2
 # execute two copy tests parallel, to check that copy itself
 # is concurrent safe.
 # ----------
-test: copy copyselect copydml
+# test: copy copyselect copydml
 
 # ----------
 # More groups of parallel tests
 # ----------
-test: create_misc create_operator create_procedure
+# test: create_misc create_operator create_procedure
 # These depend on the above two
-test: create_index create_view
+# test: create_index create_view
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+# test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
 
 # ----------
 # sanity_check does a vacuum, affecting the sort order of SELECT *
 # results. So it should not run parallel to other tests.
 # ----------
-test: sanity_check
+# test: sanity_check
 
 # ----------
 # Believe it or not, select creates a table, subsequent
 # tests need.
 # ----------
-test: errors
-test: select
-ignore: random
+# test: errors
+# test: select
+# ignore: random
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
+# test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update namespace prepared_xacts delete
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator password
+# test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator password
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+# test: alter_generic alter_operator misc psql async dbsize misc_functions sysviews tsrf tidscan stats_ext
+test: stats_ext
 
 # rules cannot run concurrently with any test that creates a view
-test: rules psql_crosstab amutils
+# test: rules psql_crosstab amutils
 
 # run by itself so it can run parallel workers
-test: select_parallel
-test: write_parallel
+# test: select_parallel
+# test: write_parallel
 
 # no relation related tests can be put in this group
-test: publication subscription
+# test: publication subscription
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass
+# test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass
 
 # ----------
 # Another group of parallel tests
 # NB: temp.sql does a reconnect which transiently uses 2 connections,
 # so keep this parallel group to at most 19 tests
 # ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml
+# test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml
 
 # ----------
 # Another group of parallel tests
 # ----------
-test: identity partition_join partition_prune reloptions hash_part indexing
+# test: identity partition_join partition_prune reloptions hash_part indexing
 
 # event triggers cannot run concurrently with any test that runs DDL
-test: event_trigger
+# test: event_trigger
 
 # run stats by itself because its delay may be insufficient under heavy load
-test: stats
+# test: stats
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 46acaad..e9642da 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -177,8 +177,13 @@ EXPLAIN (COSTS off)
 EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
 
-EXPLAIN (COSTS off)
+EXPLAIN --(COSTS off)
+ SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+
+set enable_sort = 0;
+EXPLAIN --(COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+reset enable_sort;
 
 EXPLAIN (COSTS off)
  SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 314f2c3..b8548bc 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -126,7 +126,8 @@ sub check
 		"--max-concurrent-tests=20",
 		"--encoding=SQL_ASCII",
 		"--no-locale",
-		"--temp-instance=./tmp_check");
+		"--temp-instance=./tmp_check",
+		"--debug");
 	push(@args, $maxconn)     if $maxconn;
 	push(@args, $temp_config) if $temp_config;
 	system(@args);
windows_regression.diffstext/x-diff; name=windows_regression.diffsDownload
*** C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/expected/stats_ext.out	Fri Feb 16 12:56:00 2018
--- C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/results/stats_ext.out	Sat Feb 17 20:50:48 2018
***************
*** 309,323 ****
           ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
           QUERY PLAN          
! -----------------------------
!  HashAggregate
     Group Key: b, c, d
!    ->  Seq Scan on ndistinct
! (3 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
           QUERY PLAN          
--- 309,338 ----
           ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN --(COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
                                   QUERY PLAN                                 
! ----------------------------------------------------------------------------
!  GroupAggregate  (cost=1026.89..1168.21 rows=1632 width=20)
     Group Key: b, c, d
!    ->  Sort  (cost=1026.89..1051.89 rows=10000 width=12)
!          Sort Key: b, c, d
!          ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=10000 width=12)
! (5 rows)
! 
! set enable_sort = 0;
! EXPLAIN --(COSTS off)
!  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
!                                  QUERY PLAN                                 
! ----------------------------------------------------------------------------
!  GroupAggregate  (cost=10000001026.89..10000001168.21 rows=1632 width=20)
!    Group Key: b, c, d
!    ->  Sort  (cost=10000001026.89..10000001051.89 rows=10000 width=12)
!          Sort Key: b, c, d
!          ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=10000 width=12)
! (5 rows)
  
+ reset enable_sort;
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
           QUERY PLAN          

======================================================================

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marina Polyakova (#3)
Re: master check fails on Windows Server 2008

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

On 16-02-2018 19:31, Tom Lane wrote:

Weird. AFAICS the cost estimates for those two plans should be quite
different, so this isn't just a matter of the estimates maybe being
a bit platform-dependent. (And that test has been there nearly a
year without causing reported problems.)

Thank you very much! Your test showed that hash aggregation was not even
added to the possible paths (see windows_regression.diffs attached).
Exploring this, I found that not allowing float8 to pass by value in
config.pl was crucial for the size of the hash table used in this query
(see diff.patch attached):

Ah-hah. I can reproduce the described failure if I configure with
--disable-float8-byval on an otherwise 64-bit machine. It appears that
the minimum work_mem setting that will allow this query to use a hashagg
plan on such a configuration is about 155kB (which squares with the
results you show). On the other hand, in a normal 64-bit configuration,
work_mem settings of 160kB or more cause other failures (due to other
plans switching to hashagg), and on a 32-bit machine I see such failures
with work_mem of 150kB or more. So there's basically no setting of
work_mem that would make all these tests pass everywhere.

I see several things we could do about this:

1. Nothing; just say "sorry, we don't promise that the regression tests
pass with no plan differences on nonstandard configurations". Given that
--disable-float8-byval has hardly any real-world use, there is not a lot
of downside to that.

2. Decide that --disable-float8-byval, and for that matter
--disable-float4-byval, have no real-world use at all and take them out.
There was some point in those options back when we cared about binary
compatibility with version-zero C functions, but now I'm not sure why
anyone would use them.

3. Drop that one test case from stats_ext.sql; I'm not sure how much
additional test value it's actually bringing.

4. Try to tweak the stats_ext.sql test conditions in some more refined
way to get the test to pass everywhere. This'd be a lot of work with
no guarantee of success, so I'm not too excited about it.

5. Something else?

regards, tom lane

In reply to: Tom Lane (#4)
Re: master check fails on Windows Server 2008

On Mon, Feb 19, 2018 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I see several things we could do about this:

1. Nothing; just say "sorry, we don't promise that the regression tests
pass with no plan differences on nonstandard configurations". Given that
--disable-float8-byval has hardly any real-world use, there is not a lot
of downside to that.

That would make sense to me. It will also be necessary to formalize
what "nonstandard configuration" actually means if we go this way, of
course.

I believe that this is already true with "dynamic_shared_memory_type
== DSM_IMPL_NONE", so that's a second entry for the "nonstandard
configuration" list.

--
Peter Geoghegan

#6Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Tom Lane (#4)
Re: master check fails on Windows Server 2008

On 20-02-2018 3:37, Tom Lane wrote:

Ah-hah. I can reproduce the described failure if I configure with
--disable-float8-byval on an otherwise 64-bit machine. It appears that
the minimum work_mem setting that will allow this query to use a
hashagg
plan on such a configuration is about 155kB (which squares with the
results you show). On the other hand, in a normal 64-bit
configuration,
work_mem settings of 160kB or more cause other failures (due to other
plans switching to hashagg), and on a 32-bit machine I see such
failures
with work_mem of 150kB or more. So there's basically no setting of
work_mem that would make all these tests pass everywhere.

I see several things we could do about this:

1. Nothing; just say "sorry, we don't promise that the regression tests
pass with no plan differences on nonstandard configurations". Given
that
--disable-float8-byval has hardly any real-world use, there is not a
lot
of downside to that.

2. Decide that --disable-float8-byval, and for that matter
--disable-float4-byval, have no real-world use at all and take them
out.
There was some point in those options back when we cared about binary
compatibility with version-zero C functions, but now I'm not sure why
anyone would use them.

3. Drop that one test case from stats_ext.sql; I'm not sure how much
additional test value it's actually bringing.

4. Try to tweak the stats_ext.sql test conditions in some more refined
way to get the test to pass everywhere. This'd be a lot of work with
no guarantee of success, so I'm not too excited about it.

Thank you for your explanations! I'll try to do something in this
direction..

5. Something else?

regards, tom lane

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marina Polyakova (#6)
Re: master check fails on Windows Server 2008

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

On 20-02-2018 3:37, Tom Lane wrote:

4. Try to tweak the stats_ext.sql test conditions in some more refined
way to get the test to pass everywhere. This'd be a lot of work with
no guarantee of success, so I'm not too excited about it.

Thank you for your explanations! I'll try to do something in this
direction..

OK. The least painful fix might be to establish a different work_mem
setting just for that one query.

However, if you're intent on putting work into continued support of
--disable-float8-byval, I would *strongly* suggest setting up a buildfarm
member that runs that way, because otherwise we're pretty much guaranteed
to break it again. I continue to wonder if it's not better to just remove
the option and thereby simplify our lives. What's the actual value of
having it anymore?

regards, tom lane

#8Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Tom Lane (#7)
Re: master check fails on Windows Server 2008

On 20-02-2018 21:23, Tom Lane wrote:

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

On 20-02-2018 3:37, Tom Lane wrote:

4. Try to tweak the stats_ext.sql test conditions in some more
refined
way to get the test to pass everywhere. This'd be a lot of work with
no guarantee of success, so I'm not too excited about it.

Thank you for your explanations! I'll try to do something in this
direction..

OK. The least painful fix might be to establish a different work_mem
setting just for that one query.

However, if you're intent on putting work into continued support of
--disable-float8-byval, I would *strongly* suggest setting up a
buildfarm
member that runs that way, because otherwise we're pretty much
guaranteed
to break it again.

Oh, thank you again!

I continue to wonder if it's not better to just remove
the option and thereby simplify our lives. What's the actual value of
having it anymore?

I agree with you, but I have too little experience to vote for removing
this option.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marina Polyakova (#8)
Re: master check fails on Windows Server 2008

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

On 20-02-2018 21:23, Tom Lane wrote:

I continue to wonder if it's not better to just remove
the option and thereby simplify our lives. What's the actual value of
having it anymore?

I agree with you, but I have too little experience to vote for removing
this option.

I've started a separate thread to propose removal of the option, at
/messages/by-id/10862.1519228208@sss.pgh.pa.us

regards, tom lane

#10Marina Polyakova
m.polyakova@postgrespro.ru
In reply to: Tom Lane (#9)
Re: master check fails on Windows Server 2008

On 21-02-2018 18:51, Tom Lane wrote:

Marina Polyakova <m.polyakova@postgrespro.ru> writes:

On 20-02-2018 21:23, Tom Lane wrote:

I continue to wonder if it's not better to just remove
the option and thereby simplify our lives. What's the actual value
of
having it anymore?

I agree with you, but I have too little experience to vote for
removing
this option.

I've started a separate thread to propose removal of the option, at
/messages/by-id/10862.1519228208@sss.pgh.pa.us

Thank you!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company