create index updates nrows statistics

Started by ZEUGSWETTER Andreas IZ5over 26 years ago9 messages
#1ZEUGSWETTER Andreas IZ5
Andreas.Zeugswetter@telecom.at

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

If you want to see what I mean simply run the performance test in
our test subdirectory.

I think the create index statement should not update this statistic.
(at least not in the newly created empty table case)
This behavior would then be in sync with the create table behavior.

Andreas

#2ZEUGSWETTER Andreas IZ5
Andreas.Zeugswetter@telecom.at
In reply to: ZEUGSWETTER Andreas IZ5 (#1)
1 attachment(s)
Re: create index updates nrows statistics

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

If you want to see what I mean simply run the performance test in
our test subdirectory.

I think the create index statement should not update this statistic.
(at least not in the newly created empty table case)
This behavior would then be in sync with the create table behavior.

To fix this I urgently suggest the following patch:
<<index.patch>>
regression passes and is a little faster :-)
performance test: without patch 7min now 22 sec

Andreas

Attachments:

index.patchapplication/octet-stream; name=index.patchDownload
*** ./src/backend/catalog/index.c.orig	Tue May 25 12:16:15 1999
--- ./src/backend/catalog/index.c	Tue May 25 19:44:59 1999
***************
*** 1323,1338 ****
  	 */
  	relpages = RelationGetNumberOfBlocks(whichRel);
  
- 	/*
- 	 * We shouldn't have to do this, but we do...  Modify the reldesc in
- 	 * place with the new values so that the cache contains the latest
- 	 * copy.
- 	 */
- 
- 	whichRel->rd_rel->relhasindex = hasindex;
- 	whichRel->rd_rel->relpages = relpages;
- 	whichRel->rd_rel->reltuples = reltuples;
- 
  	for (i = 0; i < Natts_pg_class; i++)
  	{
  		nulls[i] = heap_attisnull(tuple, i + 1) ? 'n' : ' ';
--- 1323,1328 ----
***************
*** 1344,1351 ****
  	 * If reltuples wasn't supplied take an educated guess.
  	 */
  	if (reltuples == 0)
! 		reltuples = relpages * NTUPLES_PER_PAGE(whichRel->rd_rel->relnatts);
  
  	if (IsBootstrapProcessingMode())
  	{
  
--- 1334,1366 ----
  	 * If reltuples wasn't supplied take an educated guess.
  	 */
  	if (reltuples == 0)
! 		if (relpages == 0)
! 		{
! 		/*
! 		 * Empty relation, leave bogus default in place, see comment in heap.c
! 		 */
! 			reltuples = 1000; /* bogus estimates */
! 			relpages = 10;
! 		}
! 		else if (whichRel->rd_rel->relkind == RELKIND_INDEX && relpages == 2)
! 		/*
! 		 * Empty index ?, leave bogus default in place
! 		 */
! 			reltuples = 1000; /* bogus estimates */
! 		else
! 			reltuples = relpages * NTUPLES_PER_PAGE(whichRel->rd_rel->relnatts);
  
+ 	/*
+ 	 * We shouldn't have to do this, but we do...  Modify the reldesc in
+ 	 * place with the new values so that the cache contains the latest
+ 	 * copy.
+ 	 */
+ 
+ 	whichRel->rd_rel->relhasindex = hasindex;
+ 	whichRel->rd_rel->relpages = relpages;
+ 	whichRel->rd_rel->reltuples = reltuples;
+ 
+ 
  	if (IsBootstrapProcessingMode())
  	{
  
***************
*** 1522,1527 ****
--- 1537,1543 ----
  	{
  		reltuples++;
  
+ #ifndef OMIT_PARTIAL_INDEX
  		/*
  		 * If oldPred != NULL, this is an EXTEND INDEX command, so skip
  		 * this tuple if it was already in the existing partial index
***************
*** 1528,1534 ****
  		 */
  		if (oldPred != NULL)
  		{
- #ifndef OMIT_PARTIAL_INDEX
  			/* SetSlotContents(slot, heapTuple); */
  			slot->val = heapTuple;
  			if (ExecQual((List *) oldPred, econtext) == true)
--- 1544,1549 ----
***************
*** 1536,1542 ****
  				indtuples++;
  				continue;
  			}
- #endif	 /* OMIT_PARTIAL_INDEX */
  		}
  
  		/*
--- 1551,1556 ----
***************
*** 1545,1557 ****
  		 */
  		if (predicate != NULL)
  		{
- #ifndef OMIT_PARTIAL_INDEX
  			/* SetSlotContents(slot, heapTuple); */
  			slot->val = heapTuple;
  			if (ExecQual((List *) predicate, econtext) == false)
  				continue;
- #endif	 /* OMIT_PARTIAL_INDEX */
  		}
  
  		indtuples++;
  
--- 1559,1570 ----
  		 */
  		if (predicate != NULL)
  		{
  			/* SetSlotContents(slot, heapTuple); */
  			slot->val = heapTuple;
  			if (ExecQual((List *) predicate, econtext) == false)
  				continue;
  		}
+ #endif	 /* OMIT_PARTIAL_INDEX */
  
  		indtuples++;
  
***************
*** 1584,1595 ****
  
  	heap_endscan(scan);
  
  	if (predicate != NULL || oldPred != NULL)
  	{
- #ifndef OMIT_PARTIAL_INDEX
  		ExecDestroyTupleTable(tupleTable, false);
- #endif	 /* OMIT_PARTIAL_INDEX */
  	}
  
  	pfree(nullv);
  	pfree(datum);
--- 1597,1608 ----
  
  	heap_endscan(scan);
  
+ #ifndef OMIT_PARTIAL_INDEX
  	if (predicate != NULL || oldPred != NULL)
  	{
  		ExecDestroyTupleTable(tupleTable, false);
  	}
+ #endif	 /* OMIT_PARTIAL_INDEX */
  
  	pfree(nullv);
  	pfree(datum);
*** ./src/backend/catalog/heap.c.orig	Tue May 25 12:16:15 1999
--- ./src/backend/catalog/heap.c	Tue May 25 19:01:56 1999
***************
*** 676,686 ****
  	 * enough to discourage the optimizer from using nested-loop plans.
  	 * With this hack, nested-loop plans will be preferred only after
  	 * the table has been proven to be small by VACUUM or CREATE INDEX.
! 	 * (NOTE: if user does CREATE TABLE, then CREATE INDEX, then loads
! 	 * the table, he still loses until he vacuums, because CREATE INDEX
! 	 * will set reltuples to zero.  Can't win 'em all.  Maintaining the
! 	 * stats on-the-fly would solve the problem, but the overhead of that
! 	 * would likely cost more than it'd save.)
  	 * ----------------
  	 */
  	new_rel_reltup->relpages = 10; /* bogus estimates */
--- 676,685 ----
  	 * enough to discourage the optimizer from using nested-loop plans.
  	 * With this hack, nested-loop plans will be preferred only after
  	 * the table has been proven to be small by VACUUM or CREATE INDEX.
! 	 * (NOTE: CREATE INDEX also sets these bogus estimates if the
! 	 * relation has 0 rows and pages. See index.c.) 
! 	 * Maintaining the stats on-the-fly would solve the problem, 
! 	 * but the overhead of that would likely cost more than it'd save.
  	 * ----------------
  	 */
  	new_rel_reltup->relpages = 10; /* bogus estimates */
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: ZEUGSWETTER Andreas IZ5 (#2)
Re: [HACKERS] create index updates nrows statistics

ZEUGSWETTER Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

Create index did that in 6.4.2 as well --- how could it be making
performance worse?

I think the create index statement should not update this statistic.
(at least not in the newly created empty table case)
This behavior would then be in sync with the create table behavior.

Hmm, skip the update if size is found to be 0 you mean? Might be
reasonable ... it would eliminate the problem that
CREATE TABLE
CREATE INDEX
COPY ...
results in horrible plans compared to doing it in the "right" order.

regards, tom lane

#4ZEUGSWETTER Andreas IZ5
Andreas.Zeugswetter@telecom.at
In reply to: Tom Lane (#3)
AW: [HACKERS] create index updates nrows statistics

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

Create index did that in 6.4.2 as well --- how could it be making
performance worse?

I am not sure why, but in 6.4.2 a create table, create index, insert,
select * from tab where indexedcol=5 did actually use the index path,
even if table reltuples and relpages was 0.
It currently uses a seq scan, which is exactly what we wanted to avoid
in the newly created table case, but do want on an actually small table.

Please apply the patch I previously sent.

Andreas

#5Noname
jwieck@debis.com
In reply to: ZEUGSWETTER Andreas IZ5 (#4)
Re: AW: [HACKERS] create index updates nrows statistics

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

Create index did that in 6.4.2 as well --- how could it be making
performance worse?

I am not sure why, but in 6.4.2 a create table, create index, insert,
select * from tab where indexedcol=5 did actually use the index path,
even if table reltuples and relpages was 0.
It currently uses a seq scan, which is exactly what we wanted to avoid
in the newly created table case, but do want on an actually small table.

Please apply the patch I previously sent.

From memory not verified:

Doesn't CREATE INDEX update pg_statistics? I think it does so
the faked statistics only cause different joins to happen as
long as there is no index created immediately after CREATE
TABLE (HASHJOIN vs. NESTLOOP).

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #

#6ZEUGSWETTER Andreas IZ5
Andreas.Zeugswetter@telecom.at
In reply to: Noname (#5)
AW: AW: [HACKERS] create index updates nrows statistics

Jan wrote:

From memory not verified:

Doesn't CREATE INDEX update pg_statistics?

No.

I think it does so
the faked statistics only cause different joins to happen as
long as there is no index created immediately after CREATE
TABLE (HASHJOIN vs. NESTLOOP).

No, create index on a newly created table does:
1. set reltuples and relpages of the table to 0
2. set relpages=2 and a calculated reltuples of 2048 or below
on the index depending on how many multy columns

This leads to a rather strange state where reltuples of table <
reltuples of index. It forces seq scans on update and select
of single table. (see E. Mergl's update problem)

Andreas

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: ZEUGSWETTER Andreas IZ5 (#6)
Re: AW: [HACKERS] create index updates nrows statistics

ZEUGSWETTER Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

Create index did that in 6.4.2 as well --- how could it be making
performance worse?

I am not sure why, but in 6.4.2 a create table, create index, insert,
select * from tab where indexedcol=5 did actually use the index path,
even if table reltuples and relpages was 0.

Hmm, you're right. Using 6.4.2:

play=> create table foobar (f1 int4);
CREATE
play=> explain select * from foobar where f1 = 4;
NOTICE: QUERY PLAN:

Seq Scan on foobar (cost=0.00 size=0 width=4)

play=> create index foobar_f1 on foobar(f1);
CREATE
play=> explain select * from foobar where f1 = 4;
NOTICE: QUERY PLAN:

Index Scan using foobar_f1 on foobar (cost=0.00 size=0 width=4)

whereas in 6.5 you still get a sequential scan because it estimates the
cost of the index scan at 1.0 not 0.0. I think I'm to blame for this
behavior change: I remember twiddling costsize.c to provide more
realistic numbers for an index scan, and in particular to ensure that
an index scan would be considered more expensive than a sequential scan
unless it was able to eliminate a useful number of rows. But when
the estimated relation size is zero (or very small) the selectivity
benefit can't make up even a mere 1.0 cost bias.

I believe 6.5 is operating as it should --- 6.4 was producing inferior
plans for small tables. But it is clearly a Bad Thing to allow the 6.5
optimizer to believe that a relation is empty when it isn't. I concur
with your suggestion to hack up CREATE INDEX so that creating an index
before you load the table isn't quite such a losing proposition.

Please apply the patch I previously sent.

Will do.

regards, tom lane

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: ZEUGSWETTER Andreas IZ5 (#2)
Re: [HACKERS] Re: create index updates nrows statistics

I had this patch, but was unsure if it was save for 6.5.

Looks like Tom has decided. Good.

a create index updates the statistics in pg_class,
this leads to substantial performance degradation compared to
6.4.2.

If you want to see what I mean simply run the performance test in
our test subdirectory.

I think the create index statement should not update this statistic.
(at least not in the newly created empty table case)
This behavior would then be in sync with the create table behavior.

To fix this I urgently suggest the following patch:
<<index.patch>>
regression passes and is a little faster :-)
performance test: without patch 7min now 22 sec

Andreas

[Attachment, skipping...]

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9ZEUGSWETTER Andreas IZ5
Andreas.Zeugswetter@telecom.at
In reply to: Bruce Momjian (#8)
Re: [HACKERS] create index updates nrows statistics

I believe 6.5 is operating as it should --- 6.4 was producing inferior
plans for small tables.

Yes, absolutely.

But it is clearly a Bad Thing to allow the 6.5
optimizer to believe that a relation is empty when it isn't. I concur
with your suggestion to hack up CREATE INDEX so that creating an index
before you load the table isn't quite such a losing proposition.

Please apply the patch I previously sent.

Will do.

I think this will save us a lot of complaints. Thanx

Andreas