remove flatfiles.c

Started by Alvaro Herreraover 16 years ago69 messages
#1Alvaro Herrera
alvherre@commandprompt.com
1 attachment(s)

This patch removes flatfiles.c for good.

It doesn't change the keeping of locks in dbcommands.c and user.c,
because at least some of them are still required.

Regarding sync commits that previously happen and now won't, I think the
only case worth worrying about is the one in vacuum.c. Do we need a
ForceSyncCommit() in there? I'm not sure if vacuum itself already
forces sync commit.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachments:

remove-flatfiles.patchtext/x-diff; charset=us-asciiDownload
Index: src/backend/access/transam/twophase_rmgr.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/twophase_rmgr.c,v
retrieving revision 1.8
diff -c -p -r1.8 twophase_rmgr.c
*** src/backend/access/transam/twophase_rmgr.c	1 Jan 2009 17:23:36 -0000	1.8
--- src/backend/access/transam/twophase_rmgr.c	31 Aug 2009 21:52:17 -0000
***************
*** 18,24 ****
  #include "commands/async.h"
  #include "pgstat.h"
  #include "storage/lock.h"
- #include "utils/flatfiles.h"
  #include "utils/inval.h"
  
  
--- 18,23 ----
*************** const TwoPhaseCallback twophase_recover_
*** 27,33 ****
  	NULL,						/* END ID */
  	lock_twophase_recover,		/* Lock */
  	NULL,						/* Inval */
- 	NULL,						/* flat file update */
  	NULL,						/* notify/listen */
  	NULL						/* pgstat */
  };
--- 26,31 ----
*************** const TwoPhaseCallback twophase_postcomm
*** 37,43 ****
  	NULL,						/* END ID */
  	lock_twophase_postcommit,	/* Lock */
  	inval_twophase_postcommit,	/* Inval */
- 	flatfile_twophase_postcommit,		/* flat file update */
  	notify_twophase_postcommit, /* notify/listen */
  	pgstat_twophase_postcommit	/* pgstat */
  };
--- 35,40 ----
*************** const TwoPhaseCallback twophase_postabor
*** 47,53 ****
  	NULL,						/* END ID */
  	lock_twophase_postabort,	/* Lock */
  	NULL,						/* Inval */
- 	NULL,						/* flat file update */
  	NULL,						/* notify/listen */
  	pgstat_twophase_postabort	/* pgstat */
  };
--- 44,49 ----
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.274
diff -c -p -r1.274 xact.c
*** src/backend/access/transam/xact.c	11 Jun 2009 14:48:54 -0000	1.274
--- src/backend/access/transam/xact.c	31 Aug 2009 21:50:41 -0000
***************
*** 43,49 ****
  #include "storage/sinvaladt.h"
  #include "storage/smgr.h"
  #include "utils/combocid.h"
- #include "utils/flatfiles.h"
  #include "utils/guc.h"
  #include "utils/inval.h"
  #include "utils/memutils.h"
--- 43,48 ----
*************** CommitTransaction(void)
*** 1608,1619 ****
  	/* NOTIFY commit must come before lower-level cleanup */
  	AtCommit_Notify();
  
- 	/*
- 	 * Update flat files if we changed pg_database, pg_authid or
- 	 * pg_auth_members.  This should be the last step before commit.
- 	 */
- 	AtEOXact_UpdateFlatFiles(true);
- 
  	/* Prevent cancel/die interrupt while cleaning up */
  	HOLD_INTERRUPTS();
  
--- 1607,1612 ----
*************** PrepareTransaction(void)
*** 1797,1803 ****
  	/* close large objects before lower-level cleanup */
  	AtEOXact_LargeObject(true);
  
! 	/* NOTIFY and flatfiles will be handled below */
  
  	/*
  	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
--- 1790,1796 ----
  	/* close large objects before lower-level cleanup */
  	AtEOXact_LargeObject(true);
  
! 	/* NOTIFY will be handled below */
  
  	/*
  	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
*************** PrepareTransaction(void)
*** 1860,1866 ****
  	StartPrepare(gxact);
  
  	AtPrepare_Notify();
- 	AtPrepare_UpdateFlatFiles();
  	AtPrepare_Inval();
  	AtPrepare_Locks();
  	AtPrepare_PgStat();
--- 1853,1858 ----
*************** PrepareTransaction(void)
*** 1909,1915 ****
  	/* Clean up the snapshot manager */
  	AtEarlyCommit_Snapshot();
  
! 	/* notify and flatfiles don't need a postprepare call */
  
  	PostPrepare_PgStat();
  
--- 1901,1907 ----
  	/* Clean up the snapshot manager */
  	AtEarlyCommit_Snapshot();
  
! 	/* notify doesn't need a postprepare call */
  
  	PostPrepare_PgStat();
  
*************** AbortTransaction(void)
*** 2036,2042 ****
  	AtAbort_Portals();
  	AtEOXact_LargeObject(false);	/* 'false' means it's abort */
  	AtAbort_Notify();
- 	AtEOXact_UpdateFlatFiles(false);
  
  	/*
  	 * Advertise the fact that we aborted in pg_clog (assuming that we got as
--- 2028,2033 ----
*************** CommitSubTransaction(void)
*** 3764,3771 ****
  	AtEOSubXact_LargeObject(true, s->subTransactionId,
  							s->parent->subTransactionId);
  	AtSubCommit_Notify();
- 	AtEOSubXact_UpdateFlatFiles(true, s->subTransactionId,
- 								s->parent->subTransactionId);
  
  	CallSubXactCallbacks(SUBXACT_EVENT_COMMIT_SUB, s->subTransactionId,
  						 s->parent->subTransactionId);
--- 3755,3760 ----
*************** AbortSubTransaction(void)
*** 3885,3892 ****
  		AtEOSubXact_LargeObject(false, s->subTransactionId,
  								s->parent->subTransactionId);
  		AtSubAbort_Notify();
- 		AtEOSubXact_UpdateFlatFiles(false, s->subTransactionId,
- 									s->parent->subTransactionId);
  
  		/* Advertise the fact that we aborted in pg_clog. */
  		(void) RecordTransactionAbort(true);
--- 3874,3879 ----
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.350
diff -c -p -r1.350 xlog.c
*** src/backend/access/transam/xlog.c	31 Aug 2009 02:23:22 -0000	1.350
--- src/backend/access/transam/xlog.c	31 Aug 2009 21:56:46 -0000
***************
*** 49,55 ****
  #include "storage/smgr.h"
  #include "storage/spin.h"
  #include "utils/builtins.h"
- #include "utils/flatfiles.h"
  #include "utils/guc.h"
  #include "utils/ps_status.h"
  #include "pg_trace.h"
--- 49,54 ----
*************** StartupProcessMain(void)
*** 8077,8084 ****
  
  	StartupXLOG();
  
- 	BuildFlatFiles(false);
- 
  	/*
  	 * Exit normally. Exit code 0 tells postmaster that we completed recovery
  	 * successfully.
--- 8076,8081 ----
Index: src/backend/commands/dbcommands.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/dbcommands.c,v
retrieving revision 1.225
diff -c -p -r1.225 dbcommands.c
*** src/backend/commands/dbcommands.c	11 Jun 2009 14:48:55 -0000	1.225
--- src/backend/commands/dbcommands.c	31 Aug 2009 22:37:27 -0000
***************
*** 49,55 ****
  #include "storage/smgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
- #include "utils/flatfiles.h"
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
  #include "utils/lsyscache.h"
--- 49,54 ----
*************** createdb(const CreatedbStmt *stmt)
*** 691,709 ****
  		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
  
  		/*
! 		 * Close pg_database, but keep lock till commit (this is important to
! 		 * prevent any risk of deadlock failure while updating flat file)
  		 */
  		heap_close(pg_database_rel, NoLock);
  
  		/*
! 		 * Set flag to update flat database file at commit.  Note: this also
! 		 * forces synchronous commit, which minimizes the window between
  		 * creation of the database files and commital of the transaction. If
  		 * we crash before committing, we'll have a DB that's taking up disk
  		 * space but is not in pg_database, which is not good.
  		 */
! 		database_file_update_needed();
  	}
  	PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
  								PointerGetDatum(&fparms));
--- 690,706 ----
  		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
  
  		/*
! 		 * Close pg_database, but keep lock till commit.
  		 */
  		heap_close(pg_database_rel, NoLock);
  
  		/*
! 		 * Force synchronous commit, thus minimizing the window between
  		 * creation of the database files and commital of the transaction. If
  		 * we crash before committing, we'll have a DB that's taking up disk
  		 * space but is not in pg_database, which is not good.
  		 */
! 		ForceSyncCommit();
  	}
  	PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
  								PointerGetDatum(&fparms));
*************** dropdb(const char *dbname, bool missing_
*** 862,880 ****
  	remove_dbtablespaces(db_id);
  
  	/*
! 	 * Close pg_database, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(pgdbrel, NoLock);
  
  	/*
! 	 * Set flag to update flat database file at commit.  Note: this also
! 	 * forces synchronous commit, which minimizes the window between removal
  	 * of the database files and commital of the transaction. If we crash
  	 * before committing, we'll have a DB that's gone on disk but still there
  	 * according to pg_database, which is not good.
  	 */
! 	database_file_update_needed();
  }
  
  
--- 859,875 ----
  	remove_dbtablespaces(db_id);
  
  	/*
! 	 * Close pg_database, but keep lock till commit.
  	 */
  	heap_close(pgdbrel, NoLock);
  
  	/*
! 	 * Force synchronous commit, thus minimizing the window between removal
  	 * of the database files and commital of the transaction. If we crash
  	 * before committing, we'll have a DB that's gone on disk but still there
  	 * according to pg_database, which is not good.
  	 */
! 	ForceSyncCommit();
  }
  
  
*************** RenameDatabase(const char *oldname, cons
*** 957,971 ****
  	CatalogUpdateIndexes(rel, newtup);
  
  	/*
! 	 * Close pg_database, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(rel, NoLock);
- 
- 	/*
- 	 * Set flag to update flat database file at commit.
- 	 */
- 	database_file_update_needed();
  }
  
  
--- 952,960 ----
  	CatalogUpdateIndexes(rel, newtup);
  
  	/*
! 	 * Close pg_database, but keep lock till commit.
  	 */
  	heap_close(rel, NoLock);
  }
  
  
*************** movedb(const char *dbname, const char *t
*** 1212,1228 ****
  		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
  
  		/*
! 		 * Set flag to update flat database file at commit.  Note: this also
! 		 * forces synchronous commit, which minimizes the window between
  		 * copying the database files and commital of the transaction. If we
  		 * crash before committing, we'll leave an orphaned set of files on
  		 * disk, which is not fatal but not good either.
  		 */
! 		database_file_update_needed();
  
  		/*
! 		 * Close pg_database, but keep lock till commit (this is important to
! 		 * prevent any risk of deadlock failure while updating flat file)
  		 */
  		heap_close(pgdbrel, NoLock);
  	}
--- 1201,1215 ----
  		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
  
  		/*
! 		 * Force synchronous commit, thus minimizing the window between
  		 * copying the database files and commital of the transaction. If we
  		 * crash before committing, we'll leave an orphaned set of files on
  		 * disk, which is not fatal but not good either.
  		 */
! 		ForceSyncCommit();
  
  		/*
! 		 * Close pg_database, but keep lock till commit.
  		 */
  		heap_close(pgdbrel, NoLock);
  	}
*************** AlterDatabase(AlterDatabaseStmt *stmt, b
*** 1401,1411 ****
  
  	/* Close pg_database, but keep lock till commit */
  	heap_close(rel, NoLock);
- 
- 	/*
- 	 * We don't bother updating the flat file since the existing options for
- 	 * ALTER DATABASE don't affect it.
- 	 */
  }
  
  
--- 1388,1393 ----
*************** AlterDatabaseSet(AlterDatabaseSetStmt *s
*** 1494,1504 ****
  
  	/* Close pg_database, but keep lock till commit */
  	heap_close(rel, NoLock);
- 
- 	/*
- 	 * We don't bother updating the flat file since ALTER DATABASE SET doesn't
- 	 * affect it.
- 	 */
  }
  
  
--- 1476,1481 ----
*************** AlterDatabaseOwner(const char *dbname, O
*** 1608,1618 ****
  
  	/* Close pg_database, but keep lock till commit */
  	heap_close(rel, NoLock);
- 
- 	/*
- 	 * We don't bother updating the flat file since ALTER DATABASE OWNER
- 	 * doesn't affect it.
- 	 */
  }
  
  
--- 1585,1590 ----
Index: src/backend/commands/user.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/user.c,v
retrieving revision 1.187
diff -c -p -r1.187 user.c
*** src/backend/commands/user.c	11 Jun 2009 14:48:56 -0000	1.187
--- src/backend/commands/user.c	31 Aug 2009 22:40:04 -0000
***************
*** 26,32 ****
  #include "storage/lmgr.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
- #include "utils/flatfiles.h"
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
  #include "utils/lsyscache.h"
--- 26,31 ----
*************** CreateRole(CreateRoleStmt *stmt)
*** 385,399 ****
  				GetUserId(), false);
  
  	/*
! 	 * Close pg_authid, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(pg_authid_rel, NoLock);
- 
- 	/*
- 	 * Set flag to update flat auth file at commit.
- 	 */
- 	auth_file_update_needed();
  }
  
  
--- 384,392 ----
  				GetUserId(), false);
  
  	/*
! 	 * Close pg_authid, but keep lock till commit.
  	 */
  	heap_close(pg_authid_rel, NoLock);
  }
  
  
*************** AlterRole(AlterRoleStmt *stmt)
*** 710,724 ****
  					false);
  
  	/*
! 	 * Close pg_authid, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(pg_authid_rel, NoLock);
- 
- 	/*
- 	 * Set flag to update flat auth file at commit.
- 	 */
- 	auth_file_update_needed();
  }
  
  
--- 703,711 ----
  					false);
  
  	/*
! 	 * Close pg_authid, but keep lock till commit.
  	 */
  	heap_close(pg_authid_rel, NoLock);
  }
  
  
*************** AlterRoleSet(AlterRoleSetStmt *stmt)
*** 808,814 ****
  	CatalogUpdateIndexes(rel, newtuple);
  
  	ReleaseSysCache(oldtuple);
- 	/* needn't keep lock since we won't be updating the flat file */
  	heap_close(rel, RowExclusiveLock);
  }
  
--- 795,800 ----
*************** DropRole(DropRoleStmt *stmt)
*** 970,985 ****
  	}
  
  	/*
! 	 * Now we can clean up; but keep locks until commit (to avoid possible
! 	 * deadlock failure while updating flat file)
  	 */
  	heap_close(pg_auth_members_rel, NoLock);
  	heap_close(pg_authid_rel, NoLock);
- 
- 	/*
- 	 * Set flag to update flat auth file at commit.
- 	 */
- 	auth_file_update_needed();
  }
  
  /*
--- 956,965 ----
  	}
  
  	/*
! 	 * Now we can clean up; but keep locks until commit.
  	 */
  	heap_close(pg_auth_members_rel, NoLock);
  	heap_close(pg_authid_rel, NoLock);
  }
  
  /*
*************** RenameRole(const char *oldname, const ch
*** 1092,1106 ****
  	ReleaseSysCache(oldtuple);
  
  	/*
! 	 * Close pg_authid, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(rel, NoLock);
- 
- 	/*
- 	 * Set flag to update flat auth file at commit.
- 	 */
- 	auth_file_update_needed();
  }
  
  /*
--- 1072,1080 ----
  	ReleaseSysCache(oldtuple);
  
  	/*
! 	 * Close pg_authid, but keep lock till commit.
  	 */
  	heap_close(rel, NoLock);
  }
  
  /*
*************** GrantRole(GrantRoleStmt *stmt)
*** 1157,1171 ****
  	}
  
  	/*
! 	 * Close pg_authid, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(pg_authid_rel, NoLock);
- 
- 	/*
- 	 * Set flag to update flat auth file at commit.
- 	 */
- 	auth_file_update_needed();
  }
  
  /*
--- 1131,1139 ----
  	}
  
  	/*
! 	 * Close pg_authid, but keep lock till commit.
  	 */
  	heap_close(pg_authid_rel, NoLock);
  }
  
  /*
*************** AddRoleMems(const char *rolename, Oid ro
*** 1385,1392 ****
  	}
  
  	/*
! 	 * Close pg_authmem, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(pg_authmem_rel, NoLock);
  }
--- 1353,1359 ----
  	}
  
  	/*
! 	 * Close pg_authmem, but keep lock till commit.
  	 */
  	heap_close(pg_authmem_rel, NoLock);
  }
*************** DelRoleMems(const char *rolename, Oid ro
*** 1498,1505 ****
  	}
  
  	/*
! 	 * Close pg_authmem, but keep lock till commit (this is important to
! 	 * prevent any risk of deadlock failure while updating flat file)
  	 */
  	heap_close(pg_authmem_rel, NoLock);
  }
--- 1465,1471 ----
  	}
  
  	/*
! 	 * Close pg_authmem, but keep lock till commit.
  	 */
  	heap_close(pg_authmem_rel, NoLock);
  }
Index: src/backend/commands/vacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.391
diff -c -p -r1.391 vacuum.c
*** src/backend/commands/vacuum.c	31 Aug 2009 02:23:22 -0000	1.391
--- src/backend/commands/vacuum.c	31 Aug 2009 21:56:21 -0000
***************
*** 46,52 ****
  #include "storage/procarray.h"
  #include "utils/acl.h"
  #include "utils/builtins.h"
- #include "utils/flatfiles.h"
  #include "utils/fmgroids.h"
  #include "utils/inval.h"
  #include "utils/lsyscache.h"
--- 46,51 ----
*************** vac_update_datfrozenxid(void)
*** 893,907 ****
  	heap_close(relation, RowExclusiveLock);
  
  	/*
! 	 * If we were able to advance datfrozenxid, mark the flat-file copy of
! 	 * pg_database for update at commit, and see if we can truncate pg_clog.
! 	 * Also force update if the shared XID-wrap-limit info is stale.
  	 */
  	if (dirty || !TransactionIdLimitIsValid())
- 	{
- 		database_file_update_needed();
  		vac_truncate_clog(newFrozenXid);
- 	}
  }
  
  
--- 892,902 ----
  	heap_close(relation, RowExclusiveLock);
  
  	/*
! 	 * If we were able to advance datfrozenxid, see if we can truncate pg_clog.
! 	 * Also do it if the shared XID-wrap-limit info is stale.
  	 */
  	if (dirty || !TransactionIdLimitIsValid())
  		vac_truncate_clog(newFrozenXid);
  }
  
  
Index: src/backend/libpq/hba.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/libpq/hba.c,v
retrieving revision 1.189
diff -c -p -r1.189 hba.c
*** src/backend/libpq/hba.c	29 Aug 2009 19:26:51 -0000	1.189
--- src/backend/libpq/hba.c	31 Aug 2009 22:42:33 -0000
*************** load_hba(void)
*** 1275,1325 ****
  }
  
  /*
-  * Read and parse one line from the flat pg_database file.
-  *
-  * Returns TRUE on success, FALSE if EOF; bad data causes elog(FATAL).
-  *
-  * Output parameters:
-  *	dbname: gets database name (must be of size NAMEDATALEN bytes)
-  *	dboid: gets database OID
-  *	dbtablespace: gets database's default tablespace's OID
-  *	dbfrozenxid: gets database's frozen XID
-  *
-  * This is not much related to the other functions in hba.c, but we put it
-  * here because it uses the next_token() infrastructure.
-  */
- bool
- read_pg_database_line(FILE *fp, char *dbname, Oid *dboid,
- 					  Oid *dbtablespace, TransactionId *dbfrozenxid)
- {
- 	char		buf[MAX_TOKEN];
- 
- 	if (feof(fp))
- 		return false;
- 	if (!next_token(fp, buf, sizeof(buf)))
- 		return false;
- 	if (strlen(buf) >= NAMEDATALEN)
- 		elog(FATAL, "bad data in flat pg_database file");
- 	strcpy(dbname, buf);
- 	next_token(fp, buf, sizeof(buf));
- 	if (!isdigit((unsigned char) buf[0]))
- 		elog(FATAL, "bad data in flat pg_database file");
- 	*dboid = atooid(buf);
- 	next_token(fp, buf, sizeof(buf));
- 	if (!isdigit((unsigned char) buf[0]))
- 		elog(FATAL, "bad data in flat pg_database file");
- 	*dbtablespace = atooid(buf);
- 	next_token(fp, buf, sizeof(buf));
- 	if (!isdigit((unsigned char) buf[0]))
- 		elog(FATAL, "bad data in flat pg_database file");
- 	*dbfrozenxid = atoxid(buf);
- 	/* expect EOL next */
- 	if (next_token(fp, buf, sizeof(buf)))
- 		elog(FATAL, "bad data in flat pg_database file");
- 	return true;
- }
- 
- /*
   *	Process one line from the ident config file.
   *
   *	Take the line and compare it to the needed map, pg_role and ident_user.
--- 1275,1280 ----
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.571
diff -c -p -r1.571 postgres.c
*** src/backend/tcop/postgres.c	29 Aug 2009 19:26:51 -0000	1.571
--- src/backend/tcop/postgres.c	31 Aug 2009 22:00:17 -0000
***************
*** 66,72 ****
  #include "tcop/pquery.h"
  #include "tcop/tcopprot.h"
  #include "tcop/utility.h"
- #include "utils/flatfiles.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/ps_status.h"
--- 66,71 ----
*************** PostgresMain(int argc, char *argv[], con
*** 3283,3294 ****
  		 */
  		StartupXLOG();
  		on_shmem_exit(ShutdownXLOG, 0);
- 
- 		/*
- 		 * We have to build the flat file for pg_database, but not for the
- 		 * user and group tables, since we won't try to do authentication.
- 		 */
- 		BuildFlatFiles(true);
  	}
  
  	/*
--- 3282,3287 ----
Index: src/backend/utils/init/Makefile
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/init/Makefile,v
retrieving revision 1.22
diff -c -p -r1.22 Makefile
*** src/backend/utils/init/Makefile	19 Feb 2008 10:30:08 -0000	1.22
--- src/backend/utils/init/Makefile	31 Aug 2009 22:06:37 -0000
*************** subdir = src/backend/utils/init
*** 12,17 ****
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = flatfiles.o globals.o miscinit.o postinit.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 ----
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = globals.o miscinit.o postinit.o
  
  include $(top_srcdir)/src/backend/common.mk
Index: src/bin/initdb/initdb.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/bin/initdb/initdb.c,v
retrieving revision 1.172
diff -c -p -r1.172 initdb.c
*** src/bin/initdb/initdb.c	11 Jun 2009 14:49:07 -0000	1.172
--- src/bin/initdb/initdb.c	31 Aug 2009 22:43:11 -0000
*************** setup_auth(void)
*** 1405,1424 ****
  	const char **line;
  	static const char *pg_authid_setup[] = {
  		/*
- 		 * Create triggers to ensure manual updates to shared catalogs will be
- 		 * reflected into their "flat file" copies.
- 		 */
- 		"CREATE TRIGGER pg_sync_pg_database "
- 		"  AFTER INSERT OR UPDATE OR DELETE ON pg_database "
- 		"  FOR EACH STATEMENT EXECUTE PROCEDURE flatfile_update_trigger();\n",
- 		"CREATE TRIGGER pg_sync_pg_authid "
- 		"  AFTER INSERT OR UPDATE OR DELETE ON pg_authid "
- 		"  FOR EACH STATEMENT EXECUTE PROCEDURE flatfile_update_trigger();\n",
- 		"CREATE TRIGGER pg_sync_pg_auth_members "
- 		"  AFTER INSERT OR UPDATE OR DELETE ON pg_auth_members "
- 		"  FOR EACH STATEMENT EXECUTE PROCEDURE flatfile_update_trigger();\n",
- 
- 		/*
  		 * The authid table shouldn't be readable except through views, to
  		 * ensure passwords are not publicly visible.
  		 */
--- 1405,1410 ----
Index: src/include/access/twophase_rmgr.h
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/include/access/twophase_rmgr.h,v
retrieving revision 1.7
diff -c -p -r1.7 twophase_rmgr.h
*** src/include/access/twophase_rmgr.h	1 Jan 2009 17:23:56 -0000	1.7
--- src/include/access/twophase_rmgr.h	31 Aug 2009 21:52:03 -0000
*************** typedef uint8 TwoPhaseRmgrId;
*** 24,32 ****
  #define TWOPHASE_RM_END_ID			0
  #define TWOPHASE_RM_LOCK_ID			1
  #define TWOPHASE_RM_INVAL_ID		2
! #define TWOPHASE_RM_FLATFILES_ID	3
! #define TWOPHASE_RM_NOTIFY_ID		4
! #define TWOPHASE_RM_PGSTAT_ID		5
  #define TWOPHASE_RM_MAX_ID			TWOPHASE_RM_PGSTAT_ID
  
  extern const TwoPhaseCallback twophase_recover_callbacks[];
--- 24,31 ----
  #define TWOPHASE_RM_END_ID			0
  #define TWOPHASE_RM_LOCK_ID			1
  #define TWOPHASE_RM_INVAL_ID		2
! #define TWOPHASE_RM_NOTIFY_ID		3
! #define TWOPHASE_RM_PGSTAT_ID		4
  #define TWOPHASE_RM_MAX_ID			TWOPHASE_RM_PGSTAT_ID
  
  extern const TwoPhaseCallback twophase_recover_callbacks[];
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.549
diff -c -p -r1.549 pg_proc.h
*** src/include/catalog/pg_proc.h	4 Aug 2009 04:04:12 -0000	1.549
--- src/include/catalog/pg_proc.h	31 Aug 2009 22:05:20 -0000
*************** DESCR("matches LIKE expression, case-ins
*** 2182,2190 ****
  DATA(insert OID = 1661 (  bpcharicnlike		PGNSP PGUID 12 1 0 0 f f f t f i 2 0 16 "1042 25" _null_ _null_ _null_ _null_ texticnlike _null_ _null_ _null_ ));
  DESCR("does not match LIKE expression, case-insensitive");
  
- DATA(insert OID = 1689 (  flatfile_update_trigger  PGNSP PGUID 12 1 0 0 f f f t f v 0 0 2279 "" _null_ _null_ _null_ _null_ flatfile_update_trigger _null_ _null_ _null_ ));
- DESCR("update flat-file copy of a shared catalog");
- 
  /* Oracle Compatibility Related Functions - By Edmund Mergl <E.Mergl@bawue.de> */
  DATA(insert OID =  868 (  strpos	   PGNSP PGUID 12 1 0 0 f f f t f i 2 0 23 "25 25" _null_ _null_ _null_ _null_ textpos _null_ _null_ _null_ ));
  DESCR("find position of substring");
--- 2182,2187 ----
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

This patch removes flatfiles.c for good.

Aw, you beat me to it.

Regarding sync commits that previously happen and now won't, I think the
only case worth worrying about is the one in vacuum.c. Do we need a
ForceSyncCommit() in there? I'm not sure if vacuum itself already
forces sync commit.

Hmm, I had been assuming we wouldn't need that anymore.

regards, tom lane

#3Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#2)
Re: remove flatfiles.c

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Regarding sync commits that previously happen and now won't, I think the
only case worth worrying about is the one in vacuum.c. Do we need a
ForceSyncCommit() in there? I'm not sure if vacuum itself already
forces sync commit.

Hmm, I had been assuming we wouldn't need that anymore.

The comment in user.c and dbcommands.c says

/*
* Force synchronous commit, thus minimizing the window between
* creation of the database files and commital of the transaction. If
* we crash before committing, we'll have a DB that's taking up disk
* space but is not in pg_database, which is not good.
*/
ForceSyncCommit();

so I think those ones are still necessary. There's another call in
RenameDatabase() which I don't think needs a sync commit (because it
won't change the dir name), and one in vacuum.c:

/*
! * If we were able to advance datfrozenxid, mark the flat-file copy of
! * pg_database for update at commit, and see if we can truncate pg_clog.
! * Also force update if the shared XID-wrap-limit info is stale.
*/
if (dirty || !TransactionIdLimitIsValid())
- {
- database_file_update_needed();
vac_truncate_clog(newFrozenXid);
- }
}

AFAICT this doesn't need a sync commit. (Right now, VACUUM FULL forces
one, but lazy vacuum doesn't).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#3)
Re: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

Hmm, I had been assuming we wouldn't need that anymore.

The comment in user.c and dbcommands.c says [...]
so I think those ones are still necessary.

Yeah, after a look through the code I think you can trust the associated
comments: if it says it needs sync commit, put in ForceSyncCommit, else
we don't need it.

regards, tom lane

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#1)
Re: remove flatfiles.c

On Mon, 2009-08-31 at 18:53 -0400, Alvaro Herrera wrote:

Regarding sync commits that previously happen and now won't, I think the
only case worth worrying about is the one in vacuum.c. Do we need a
ForceSyncCommit() in there? I'm not sure if vacuum itself already
forces sync commit.

VACUUM FULL requires ForceSyncCommit().

Not sure why removing them elsewhere is important? Getting robustness
wrong is a big, bad thing and this opens us to future error. We already
tuned VACUUM so it does very little if it has no work to do, why would
one extra I/O improve things so much? If it ain't broke...

VACUUM does so many things that I'd rather have it all safely on disk.
I'd feel happier with the rule "VACUUM always sync commits", so we all
remember it and can rely upon it to be the same from release to release.

--
Simon Riggs www.2ndQuadrant.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#5)
Re: remove flatfiles.c

Simon Riggs <simon@2ndQuadrant.com> writes:

VACUUM does so many things that I'd rather have it all safely on disk.
I'd feel happier with the rule "VACUUM always sync commits", so we all
remember it and can rely upon it to be the same from release to release.

Non-FULL vacuum has *never* done a sync commit, except in the unusual
corner case that it moves the database's datfrozenxid, which is a corner
case that didn't even exist until fairly recently. I think the argument
that we should have it force sync for no reason whatsoever is silly.
We get beat up on a regular basis about "spikes" in response time;
why would you want to have vacuum creating one when it doesn't need to?

As for the FULL case, the sync commit is to try to protect a horribly
unsafe kluge that should go away entirely (if vacuum full itself doesn't
go away entirely). That's hardly something I want to institutionalize
either.

regards, tom lane

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#6)
Re: remove flatfiles.c

On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote:

We get beat up on a regular basis about "spikes" in response time;
why would you want to have vacuum creating one when it doesn't need
to?

If one I/O on a background utility can cause such a spike, we are in
serious shitake. I would be more comfortable if the various important
things VACUUM does were protected by sync commit. I see no reason to
optimise away one I/O just because we might theoretically do so. Any
mistake in the theory and we are exposed. Why take the risk? We do many
things to check and secure our data, why not this one? If this was
suggested separately it as an optimisation you'd laugh and say why
bother?

--
Simon Riggs www.2ndQuadrant.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#7)
Re: remove flatfiles.c

Simon Riggs <simon@2ndQuadrant.com> writes:

On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote:

We get beat up on a regular basis about "spikes" in response time;
why would you want to have vacuum creating one when it doesn't need
to?

If one I/O on a background utility can cause such a spike, we are in
serious shitake. I would be more comfortable if the various important
things VACUUM does were protected by sync commit. I see no reason to
optimise away one I/O just because we might theoretically do so. Any
mistake in the theory and we are exposed. Why take the risk?

*WHAT* risk? Most vacuums do not do a sync commit, and never have.

regards, tom lane

#9Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#6)
Re: remove flatfiles.c

On Tue, Sep 1, 2009 at 2:58 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

We get beat up on a regular basis about "spikes" in response time;
why would you want to have vacuum creating one when it doesn't need to?

Isn't this sync commit just going to do the same thing that the wal
writer is going to do in at most 200ms anyways?

As for the FULL case, the sync commit is to try to protect a horribly
unsafe kluge that should go away entirely (if vacuum full itself doesn't
go away entirely).

I'm all for throwing away VACUUM FULL btw. I was thinking of proposing
that we replace it with something like CLUSTER which just rewrites the
tuples in the order it finds them.

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.
Also perhaps tables where there are too many large indexes to make
rebuilding them all in one maintenance window practical.

I don't see any way to address these problems without something as
complex as xvac and moved_in/moved_off and without the index bloat
problems. I think we could improve the i/o access patterns we have
currently which make vacuum full so slow, but the fundamental problems
would remain.

So the question is whether those use cases are worth keeping our
existing vacuum full for or whether we could do without it and just
recommend partitioning for people with tables large enough to make
table rewrites impractical.

--
greg
http://mit.edu/~gsstark/resume.pdf

#10Alvaro Herrera
alvherre@commandprompt.com
In reply to: Greg Stark (#9)
Re: remove flatfiles.c

Greg Stark wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL? If that's really
a problem then there ain't that many dead tuples around.

Also perhaps tables where there are too many large indexes to make
rebuilding them all in one maintenance window practical.

If that's the concern maybe we oughta do something about concurrently
re-creating those indexes somehow. Plain REINDEX doesn't work of
course, but maybe we can do some trick with creating a new index and
dropping the original one afterwards.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11Greg Stark
gsstark@mit.edu
In reply to: Alvaro Herrera (#10)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL?  If that's really
a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

Also perhaps tables where there are too many large indexes to make
rebuilding them all in one maintenance window practical.

If that's the concern maybe we oughta do something about concurrently
re-creating those indexes somehow.  Plain REINDEX doesn't work of
course, but maybe we can do some trick with creating a new index and
dropping the original one afterwards.

Well that doesn't really work if you want to rewrite the table.
CLUSTER has to rebuild all the indexes when it's done.

I think the solution for both of these is actually partitioning. The
bottom line is that having a single table which contains very large
amounts of data is awkward to maintain.

--
greg
http://mit.edu/~gsstark/resume.pdf

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#11)
Re: remove flatfiles.c

Greg Stark <gsstark@mit.edu> writes:

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL? �If that's really
a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards. You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

regards, tom lane

#13Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Greg Stark (#11)
Re: remove flatfiles.c

Greg Stark wrote:

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

Could one hypothetically do
update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
vacuum;
and repeat until max(ctid) is small enough?

Sure, it'll take longer than vacuum full; but at first glance
it seems lightweight enough to do even on a live, heavily accessed
table.

IIRC I tried something like this once, and it worked to some extent,
but after a few loops didn't shrink the table as much as I had expected.

#14Alvaro Herrera
alvherre@commandprompt.com
In reply to: Ron Mayer (#13)
Re: remove flatfiles.c

Ron Mayer wrote:

Greg Stark wrote:

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

Could one hypothetically do
update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
vacuum;
and repeat until max(ctid) is small enough?

I remember Hannu Krosing said they used something like that to shrink
really bloated tables. Maybe we should try to explicitely support a
mechanism that worked in that fashion. I think I tried it at some point
and found that the problem with it was that ctid was too limited in what
it was able to do.

The neat thing is that now that we have the visibility fork, each vacuum
needn't scan the whole table each time.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#15Rod Taylor
rod.taylor@gmail.com
In reply to: Greg Stark (#11)
Re: remove flatfiles.c

On Tue, Sep 1, 2009 at 19:34, Greg Stark <gsstark@mit.edu> wrote:

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL? If that's really
a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

It would be interesting if there was something between VACUUM FULL and
CLUSTER which could, say, work on a single 1GB segment at a time in a manner
similar to cluster.

You would still end up with index bloat like vacuum full, though perhaps not
as bad, but shuffling around the tuples should be faster.

The idea here is that the files can be truncated individually. Two 500MB
files is pretty much the same as a single 1GB file on disk.

Of course, I'm hand waving and don't have the technical expertise to figure
out if it can be done easily within PostgreSQL.

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#12)
Re: remove flatfiles.c

On Tue, Sep 1, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL?  If that's really
a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards.  You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

What if your large table doesn't have an index? Then there's no way to cluster.

I'm a bit skeptical about partitioning as a solution, too. The
planner is just not clever enough with partitioned tables, yet.

...Robert

#17Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#14)
Re: remove flatfiles.c

On Tue, Sep 1, 2009 at 9:29 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

Ron Mayer wrote:

Greg Stark wrote:

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

Could one hypothetically do
   update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
   vacuum;
and repeat until max(ctid) is small enough?

I remember Hannu Krosing said they used something like that to shrink
really bloated tables.  Maybe we should try to explicitely support a
mechanism that worked in that fashion.  I think I tried it at some point
and found that the problem with it was that ctid was too limited in what
it was able to do.

I think a way to incrementally shrink large tables would be enormously
beneficial. Maybe vacuum could try to do a bit of that each time it
runs.

...Robert

#18Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#16)
Re: remove flatfiles.c

Robert Haas escribi�:

On Tue, Sep 1, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards. �You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

What if your large table doesn't have an index? Then there's no way to cluster.

But there's nothing saying we cannot provide a version of CLUSTER that
does not follow any index and just copies the live tuples.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#18)
Re: remove flatfiles.c

On Tue, Sep 1, 2009 at 10:58 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

Robert Haas escribió:

On Tue, Sep 1, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards.  You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

What if your large table doesn't have an index?  Then there's no way to cluster.

But there's nothing saying we cannot provide a version of CLUSTER that
does not follow any index and just copies the live tuples.

Agreed.

...Robert

#20Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Robert Haas (#16)
Re: remove flatfiles.c

On Tue, Sep 1, 2009 at 9:55 PM, Robert Haas<robertmhaas@gmail.com> wrote:

I'm a bit skeptical about partitioning as a solution, too.  The
planner is just not clever enough with partitioned tables, yet.

analyze and vacuum a *very* big table and even scan a huge index is
not a joke neither...
and yes the planner is not very clever about partitioning and
certainly that is something we need to fix not something we have to
live with... no that that will be easy but hey! we have very brilliant
people here (you being one of them)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#21Greg Stark
gsstark@mit.edu
In reply to: Jaime Casanova (#20)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 6:30 AM, Jaime
Casanova<jcasanov@systemguards.com.ec> wrote:

On Tue, Sep 1, 2009 at 9:55 PM, Robert Haas<robertmhaas@gmail.com> wrote:

I'm a bit skeptical about partitioning as a solution, too.  The
planner is just not clever enough with partitioned tables, yet.

Yeah, we need to fix that :)

I think we're already reaching the point where the pains of dealing
with partitioned tables are usually less than the pains of dealing
with VACUUM FULL.

analyze and vacuum a *very* big table and even scan a huge index is
not a joke neither...

Hm, not sure I see this. The sample size for Analyze is not dependent
on the size of the table. Only on the stats_target. And vacuum with
the VM is now going to be dependent only on the number of updates to
the table, not on the size of the table.

The problem use cases we have today are only when you really do have
enough dead space to clean up that you want to compact the file -- but
not so much that it's worth rewriting the whole table using CLUSTER or
ALTER TABLE.

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I did start a while ago on a replacement which used the existing
rewrite mechanism to do the equivalent of cluster without changing the
ordering. I forget where I left that but I could go back and look at
it. I'll be busy for the next few weeks though so it won't be right
away.

--
greg
http://mit.edu/~gsstark/resume.pdf

#22Josh Berkus
josh@agliodbs.com
In reply to: Greg Stark (#21)
Re: remove flatfiles.c

All,

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

Basically, for:
a) people who don't understand CLUSTER (easily fixed, simply create a
VACUUM FULL command which just does CLUSTER on the primary key)

b) people who are completely out of space on disk and are trying to
shrink the database to free up space.

For (b), I think it's OK to just tell those people that they need to
move the database files or find something else to delete. Most of the
time, they have to do that *anyway* in order for VACUUM FULL to work,
since the transaction log is on the same disk. We just need a little
more documentation, is all.

The problem use cases we have today are only when you really do have
enough dead space to clean up that you want to compact the file -- but
not so much that it's worth rewriting the whole table using CLUSTER or
ALTER TABLE.

I haven't seen this use-case in the field. I'm not sure that it
actually exists. Anyone run across a case where this made sense?

Recently I actually had a client dump and reload their database rather
than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
more than 18.

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts. But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

I did start a while ago on a replacement which used the existing
rewrite mechanism to do the equivalent of cluster without changing the
ordering. I forget where I left that but I could go back and look at
it. I'll be busy for the next few weeks though so it won't be right
away.

This would be very helpful.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#23Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#22)
Re: remove flatfiles.c

On Wed, 2009-09-02 at 10:41 -0700, Josh Berkus wrote:

All,

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

Basically, for:
a) people who don't understand CLUSTER (easily fixed, simply create a
VACUUM FULL command which just does CLUSTER on the primary key)

b) people who are completely out of space on disk and are trying to
shrink the database to free up space.

For (b), I think it's OK to just tell those people that they need to
move the database files or find something else to delete. Most of the
time, they have to do that *anyway* in order for VACUUM FULL to work,
since the transaction log is on the same disk. We just need a little
more documentation, is all.

Right.

The problem use cases we have today are only when you really do have
enough dead space to clean up that you want to compact the file -- but
not so much that it's worth rewriting the whole table using CLUSTER or
ALTER TABLE.

I haven't seen this use-case in the field. I'm not sure that it
actually exists. Anyone run across a case where this made sense?

No.

Recently I actually had a client dump and reload their database rather
than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
more than 18.

Exactly.

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts. But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

Agreed, but I think we shouldn't even put it in the postgresql.conf by
default. Just document that it exists. Settings for the sake of settings
(even ones that may have a corner case) seem to confuse users.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

#24Greg Stark
gsstark@mit.edu
In reply to: Josh Berkus (#22)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkus<josh@agliodbs.com> wrote:

All,

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

Basically, for:
a) people who don't understand CLUSTER (easily fixed, simply create a
VACUUM FULL command which just does CLUSTER on the primary key)

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.

The problem use cases we have today are only when you really do have
enough dead space to clean up that you want to compact the file -- but
not so much that it's worth rewriting the whole table using CLUSTER or
ALTER TABLE.

I haven't seen this use-case in the field.  I'm not sure that it
actually exists.  Anyone run across a case where this made sense?

Well I've certainly seen people whose disks are more than 50% full.
They tend to be the same people who want to compact their tables. I
can't say whether any of them had a single table with associated
indexes that were taking up more than 50% but it's not uncommon to
have a single table that dominates your database.

Recently I actually had a client dump and reload their database rather
than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
more than 18.

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts.  But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken. We could name the command something which is more
descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
that.

--
greg
http://mit.edu/~gsstark/resume.pdf

#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Greg Stark (#24)
Re: remove flatfiles.c

Greg Stark <gsstark@mit.edu> wrote:

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.

Well if that's what you want to do, couldn't you do something like?:

Lock the table.
Prop all indexes
Pass the heap with two pointers, one to the first available empty
space and one to the first non-dead row past that, and move inside the
existing file.
Rebuild the indexes.
Release the lock.

-Kevin

#26Josh Berkus
josh@agliodbs.com
In reply to: Greg Stark (#24)
Re: remove flatfiles.c

Greg,

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.

Agreed.

Well I've certainly seen people whose disks are more than 50% full.
They tend to be the same people who want to compact their tables. I
can't say whether any of them had a single table with associated
indexes that were taking up more than 50% but it's not uncommon to
have a single table that dominates your database.

Those people would also need for the tables involved to be fairly small,
or to be able to afford a lot of downtime. VACUUM FULL on a 100GB table
with current commodity servers can take upwards of 8 hours. I really
think the cases of people who have more available downtime than disk
space is is vanishingly small group.

However, I'll do a survey. Why not?

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken. We could name the command something which is more
descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
that.

Agreed. I like VACUUM REWRITE, as it makes it fairly clear what's going on.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#27Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#26)
Re: remove flatfiles.c

On Wed, 2009-09-02 at 11:01 -0700, Josh Berkus wrote:

Greg,

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.

Agreed.

Are we sure about that? I would argue that the majority of users out
their (think Django and other Web*Frameworks) are all searching
primarily by primary key + other anyway.

We could always offer the ability to vacuum full (cluster) on index foo
but default to the primary key.

Joshua D. Drake
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

#28Robert Haas
robertmhaas@gmail.com
In reply to: Greg Stark (#24)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 1:52 PM, Greg Stark<gsstark@mit.edu> wrote:

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken.

So I have a script that goes and finds bloated tables and runs VACUUM
FULL on them in the middle of the night if the bloat passes a certain
threshold. The tables are small enough and the number of users is low
enough that this doesn't cause any problems for me. I'm OK if the
name of the command changes, but I'd like there to be a command that I
can pass a table name to and get my table debloated without having to
make any follow-on decisions (such as picking an index to cluster by).

...Robert

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#24)
Re: remove flatfiles.c

Greg Stark <gsstark@mit.edu> writes:

On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkus<josh@agliodbs.com> wrote:

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts. �But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken. We could name the command something which is more
descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
that.

What's wrong with just ignoring the FULL option? It's a reserved
word anyway because of FULL OUTER JOINs, so there's no syntactic
benefit to be had from eliminating it from the VACUUM syntax.

regards, tom lane

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#28)
Re: remove flatfiles.c

Robert Haas <robertmhaas@gmail.com> writes:

So I have a script that goes and finds bloated tables and runs VACUUM
FULL on them in the middle of the night if the bloat passes a certain
threshold. The tables are small enough and the number of users is low
enough that this doesn't cause any problems for me. I'm OK if the
name of the command changes, but I'd like there to be a command that I
can pass a table name to and get my table debloated without having to
make any follow-on decisions (such as picking an index to cluster by).

I think we *should* have a command that works like CLUSTER except it just
seqscans the source table without depending on any particular index.
Whether it's called VACUUM FULL or something else is a detail.

regards, tom lane

#31Greg Stark
gsstark@mit.edu
In reply to: Kevin Grittner (#25)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 6:57 PM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:

Greg Stark <gsstark@mit.edu> wrote:

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.

Well if that's what you want to do, couldn't you do something like?:

Lock the table.
Prop all indexes
Pass the heap with two pointers, one to the first available empty
space and one to the first non-dead row past that, and move inside the
existing file.
Rebuild the indexes.
Release the lock.

Well dropping the indexes and moving tuples are both "hard" if you
care about crash-safety and transactional integrity.

The way we rewrite tables now is:

Lock table
Create new filenode.
Scan old table and copy each record into the new filenode keeping
update chains intact.
Rebuild all indexes for the table (using a similar strategy with new
relfilenodes)
Commit the transaction

If the transaction aborts at any point you still have the old pg_class
record which points to the old relfilenode and all the old indexes are
still valid.

We have all the pieces we need to do this, it's just a matter of
putting them together with a command to call them.

A big part of what VACUUM FULL is annoying is the complexity of moving
tuples in place. VACUUM FULL has to mark the old tuples and the new
copies with its xid. It can't truncate the relation until it commits
that xid.

Actually I wonder how much performance improvement would come on
normal DML just from not having to check xvac in the visibility
checks. It's probably not much but...

--
greg
http://mit.edu/~gsstark/resume.pdf

#32Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#29)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 2:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkus<josh@agliodbs.com> wrote:

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts.  But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken. We could name the command something which is more
descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
that.

What's wrong with just ignoring the FULL option?  It's a reserved
word anyway because of FULL OUTER JOINs, so there's no syntactic
benefit to be had from eliminating it from the VACUUM syntax.

Silent behavior changes are usually a bad idea.

...Robert

#33Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#30)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 2:54 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

So I have a script that goes and finds bloated tables and runs VACUUM
FULL on them in the middle of the night if the bloat passes a certain
threshold.  The tables are small enough and the number of users is low
enough that this doesn't cause any problems for me.  I'm OK if the
name of the command changes, but I'd like there to be a command that I
can pass a table name to and get my table debloated without having to
make any follow-on decisions (such as picking an index to cluster by).

I think we *should* have a command that works like CLUSTER except it just
seqscans the source table without depending on any particular index.
Whether it's called VACUUM FULL or something else is a detail.

Yeah. We could do this by extending the syntax for cluster (e.g.
CLUSTER [VERBOSE] tablename [USING indexname | WITH NO INDEX]), but
I'm not sure whether that has any real advantage over just using the
existing command name.

I confess to being a little fuzzy on the details of how this
implementation (seq-scanning the source table for live tuples) is
different/better from the current VACUUM FULL implementation. Can
someone fill me in?

...Robert

#34Greg Stark
gsstark@mit.edu
In reply to: Robert Haas (#33)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 8:10 PM, Robert Haas<robertmhaas@gmail.com> wrote:

I confess to being a little fuzzy on the details of how this
implementation (seq-scanning the source table for live tuples) is
different/better from the current VACUUM FULL implementation.  Can
someone fill me in?

VACUUM FULL is a *lot* more complex.

It scans pages *backwards* from the end (which does wonderful things
on rotating media). Marks each live tuple it finds as "moved off",
finds a new place for it (using the free space map I think?). Insert
the tuple on the new page and marks it "moved in" and updates the
indexes.

Then it commits the transaction but keeps the lock. Then it has to
vacuum all the indexes of the references to the old tuples at the end
of the table. I think it has to commit that too before it can finally
truncate the table.

The backwards scan is awful for rotating media. The reading from the
end and writing to the beginning is bad too, though hopefully the
cache can help that.

A lot of the complexity comes in from other parts of the system that
have to be aware of tuples that have been "moved off" or "moved in".
They have to be able to check whether the vacuum committed or not.

That reminds me there was another proposal to do an "online" vacuum
full similar to our concurrent index builds. Do noop-updates to tuples
at the end of the table, hopefully finding space for them earlier in
the table. Wait until those transactions are no longer visible to
anyone else and then truncate. (Actually I think you could just not do
anything and let regular lazy vacuum do the truncate). That might be a
good practical alternative for sites where copying their entire table
isn't practical.

--
greg
http://mit.edu/~gsstark/resume.pdf

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#34)
Re: remove flatfiles.c

Greg Stark <gsstark@mit.edu> writes:

The backwards scan is awful for rotating media. The reading from the
end and writing to the beginning is bad too, though hopefully the
cache can help that.

Yeah. And all that pales in comparison to what happens in the indexes.
You have to insert index entries (retail) for each moved-in tuple,
then after doing the intermediate commit you run around and remove
the index entries for the moved-off tuples. Lots of nonsequential
access to insert the entries. The cleanup isn't so bad --- it's
comparable to what regular lazy VACUUM has to do --- but that's just
one step in a very expensive process.

regards, tom lane

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#34)
Re: remove flatfiles.c

Greg Stark <gsstark@mit.edu> writes:

It scans pages *backwards* from the end (which does wonderful things
on rotating media). Marks each live tuple it finds as "moved off",
finds a new place for it (using the free space map I think?).

BTW, VACUUM FULL doesn't use the free space map --- that code predates
the FSM by a lot. It builds its own map of free space during its
initial lazy-VACUUM-equivalent scan that just removes dead tuples.
While I don't think this hurts performance any, I have seen reports of
VACUUM FULL failing outright on large tables because it runs out of
memory for this map. So that's still another way in which it's not
actually all that useful for huge tables.

regards, tom lane

#37Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#36)
Re: remove flatfiles.c

Tom Lane escribi�:

Greg Stark <gsstark@mit.edu> writes:

It scans pages *backwards* from the end (which does wonderful things
on rotating media). Marks each live tuple it finds as "moved off",
finds a new place for it (using the free space map I think?).

BTW, VACUUM FULL doesn't use the free space map --- that code predates
the FSM by a lot. It builds its own map of free space during its
initial lazy-VACUUM-equivalent scan that just removes dead tuples.

Another weird consequence of this is that it bails out if it finds a
tuple larger than it can fit in one of the earlier pages; if there's
dead space to be compacted before that, it's not compacted.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#38Robert Haas
robertmhaas@gmail.com
In reply to: Greg Stark (#34)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 3:30 PM, Greg Stark<gsstark@mit.edu> wrote:

On Wed, Sep 2, 2009 at 8:10 PM, Robert Haas<robertmhaas@gmail.com> wrote:

I confess to being a little fuzzy on the details of how this
implementation (seq-scanning the source table for live tuples) is
different/better from the current VACUUM FULL implementation.  Can
someone fill me in?

VACUUM FULL is a *lot* more complex.

It scans pages *backwards* from the end (which does wonderful things
on rotating media). Marks each live tuple it finds as "moved off",
finds a new place for it (using the free space map I think?). Insert
the tuple on the new page and marks it "moved in" and updates the
indexes.

Then it commits the transaction but keeps the lock. Then it has to
vacuum all the indexes of the references to the old tuples at the end
of the table. I think it has to commit that too before it can finally
truncate the table.

The backwards scan is awful for rotating media. The reading from the
end and writing to the beginning is bad too, though hopefully the
cache can help that.

A lot of the complexity comes in from other parts of the system that
have to be aware of tuples that have been "moved off" or "moved in".
They have to be able to check whether the vacuum committed or not.

Ugh.

That reminds me there was another proposal to do an "online" vacuum
full similar to our concurrent index builds. Do noop-updates to tuples
at the end of the table, hopefully finding space for them earlier in
the table. Wait until those transactions are no longer visible to
anyone else and then truncate. (Actually I think you could just not do
anything and let regular lazy vacuum do the truncate). That might be a
good practical alternative for sites where copying their entire table
isn't practical.

I don't have a strong opinion about whether it's better to take an
exclusive lock on everything and reorganize freely, or whether it's
better to try to use MVCC semantics to reduce the locking impact (at a
cost of difficulty in the presence of long-running transactions). But
I think it would be really nice to have an incremental way to reduce
table bloat. Locking a table (or even better, part of a table) for a
couple of seconds once an hour for several days or weeks figures to be
practical in some (many?) environments where locking a table for
minutes or hours is not.

...Robert

#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#37)
Re: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

Another weird consequence of this is that it bails out if it finds a
tuple larger than it can fit in one of the earlier pages; if there's
dead space to be compacted before that, it's not compacted.

I don't find a lot wrong with that. The code defines its purpose as
being to shorten the table file length. Once it hits a page that
can't be emptied, it cannot shorten the file any further, so why
shouldn't it stop?

regards, tom lane

#40Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#39)
Re: remove flatfiles.c

Tom Lane escribi�:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Another weird consequence of this is that it bails out if it finds a
tuple larger than it can fit in one of the earlier pages; if there's
dead space to be compacted before that, it's not compacted.

I don't find a lot wrong with that. The code defines its purpose as
being to shorten the table file length. Once it hits a page that
can't be emptied, it cannot shorten the file any further, so why
shouldn't it stop?

All that work, and it wasn't capable of defragging the other pages? At
the very least it could register them in the FSM.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#40)
Re: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane escribi�:

I don't find a lot wrong with that. The code defines its purpose as
being to shorten the table file length. Once it hits a page that
can't be emptied, it cannot shorten the file any further, so why
shouldn't it stop?

All that work, and it wasn't capable of defragging the other pages? At
the very least it could register them in the FSM.

You mean like vac_update_fsm() ?

regards, tom lane

#42Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#41)
Re: remove flatfiles.c

Tom Lane escribió:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane escribi�:

I don't find a lot wrong with that. The code defines its purpose as
being to shorten the table file length. Once it hits a page that
can't be emptied, it cannot shorten the file any further, so why
shouldn't it stop?

All that work, and it wasn't capable of defragging the other pages? At
the very least it could register them in the FSM.

You mean like vac_update_fsm() ?

Huh :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#43Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Robert Haas (#17)
Re: remove flatfiles.c

Robert Haas wrote:

On Tue, Sep 1, 2009 at 9:29 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

Ron Mayer wrote:

Greg Stark wrote:

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

Could one hypothetically do
update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
vacuum;
and repeat until max(ctid) is small enough?

I remember Hannu Krosing said they used something like that to shrink
really bloated tables. Maybe we should try to explicitely support a
mechanism that worked in that fashion. I think I tried it at some point
and found that the problem with it was that ctid was too limited in what
it was able to do.

I think a way to incrementally shrink large tables would be enormously
beneficial. Maybe vacuum could try to do a bit of that each time it
runs.

Yet when I try it now, I'm having trouble making it work.
Would you expect the ctid to be going down in the psql session
shown below? I wonder why it isn't.

regression=# create table shrink_test as select * from tenk1;
SELECT
regression=# delete from shrink_test where (unique2 % 2) = 0;
DELETE 5000
regression=# create index "shrink_test(unique1)" on shrink_test(unique1);
CREATE INDEX
regression=# select max(ctid) from shrink_test;
max
----------
(333,10)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
max
----------
(333,21)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
max
----------
(333,27)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
max
----------
(333,33)
(1 row)

#44Greg Stark
gsstark@mit.edu
In reply to: Ron Mayer (#43)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 11:55 PM, Ron Mayer<rm_pg@cheapcomplexdevices.com> wrote:

Yet when I try it now, I'm having trouble making it work.
Would you expect the ctid to be going down in the psql session
shown below?  I wonder why it isn't.

Even before HOT we preferentially tried to put updated tuples on the
same page they were on before. On pre-8.3 if you did these updates
*without* the vacuum they would eventually be forced to find a new
page and hopefully would find one earlier in the table.

On 8.4 HOT will (hopefully) prevent even that from working. Unless you
have a long-running transaction in the background it will clean up the
old tuples in the chain on the page each time the page fills up.
You've deleted half the tuples on the page so the updates will always
fit in that space.

Eventually you'll hit the maximum number of tuples allowed on the page
dead or alive. But the vacuums are defeating that too. A special
purpose command could work around all of this.

--
greg
http://mit.edu/~gsstark/resume.pdf

#45Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#35)
Re: remove flatfiles.c

On Wed, Sep 2, 2009 at 8:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Greg Stark <gsstark@mit.edu> writes:

The backwards scan is awful for rotating media. The reading from the
end and writing to the beginning is bad too, though hopefully the
cache can help that.

Yeah.  And all that pales in comparison to what happens in the indexes.
You have to insert index entries (retail) for each moved-in tuple,

Hm, that could be addressed by buffering index inserts in backend
local memory. That's something Heikki proposed a long time ago
primarily for improving bulk data loading. Basically it would be a
retail version of the bulk loader that we saw at the 10th anniversary
where you merge a sorted list into the index.

You would still have to flush the buffer at transaction commit but
even if it only buffered a few dozen tuples if they're in the same
region of the index it would be a win. In this case it could probably
buffer hundreds and merge them all into the index en masse.

--
greg
http://mit.edu/~gsstark/resume.pdf

#46daveg
daveg@sonic.net
In reply to: Tom Lane (#12)
Re: remove flatfiles.c

On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL? �If that's really
a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards. You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

I have a client who uses temp tables heavily, hundreds of thousands of creates
and drops per day. They also have long running queries. The only thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. Without that pg_class, pg_attribute etc quickly balloon to
thousands of pages.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#47Andrew Dunstan
andrew@dunslane.net
In reply to: daveg (#46)
Re: remove flatfiles.c

daveg wrote:

On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL? If that's really
a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards. You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

I have a client who uses temp tables heavily, hundreds of thousands of creates
and drops per day. They also have long running queries. The only thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. Without that pg_class, pg_attribute etc quickly balloon to
thousands of pages.

That's a rate of more than one create and drop per second. How does your
client handle the fact that VACUUM FULL will exclusively lock those
catalog tables? Without knowing more, it looks like a bit of a design issue.

cheers

andrew

#48daveg
daveg@sonic.net
In reply to: Andrew Dunstan (#47)
Re: remove flatfiles.c

On Thu, Sep 03, 2009 at 07:57:25PM -0400, Andrew Dunstan wrote:

daveg wrote:

On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

I have a client who uses temp tables heavily, hundreds of thousands of
creates
and drops per day. They also have long running queries. The only thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. Without that pg_class, pg_attribute etc quickly balloon
to thousands of pages.

That's a rate of more than one create and drop per second. How does your
client handle the fact that VACUUM FULL will exclusively lock those
catalog tables? Without knowing more, it looks like a bit of a design issue.

I'd say it is several per second.

They wait for the catalog locks sometimes. This is not an interactive
application so that is somewhat acceptable. It also occasionally causes
deadlocks which is less agreeable.

There are various reasons for the heavy use of temps, mainly having to do
with loading external feeds or reusing intermediate query results in a series
of queries.

It would be great if there was a way to have temp tables that
did not get cataloged, eg local cache only.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#49Josh Berkus
josh@agliodbs.com
In reply to: daveg (#48)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

All,

I have a client who uses temp tables heavily, hundreds of thousands of
creates
and drops per day. They also have long running queries. The only

thing that

keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. With

Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#49)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Josh Berkus <josh@agliodbs.com> writes:

I have a client who uses temp tables heavily, hundreds of thousands of
creates
and drops per day. They also have long running queries. The only
thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. With

Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.

I don't think I believe the claim above that vacuum full is actually
necessary. Reasonably aggressive regular vacuuming ought to do it.

We used to have a bug that caused row deletions during backend shutdown
to not get reported to the stats collector; which had the effect that
dead catalog entries for temp tables didn't get counted, and so autovac
didn't hit the catalogs often enough, and so you'd get bloat in exactly
this scenario. I suspect the claim that manual vacuum full is necessary
is based on obsolete experience from before that bug got stomped.
It's hardly an ideal solution anyway given what an exclusive lock on
pg_class will do to the rest of the system --- and a cluster-like
cleanup won't be any better about that.

regards, tom lane

#51Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#50)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

On Fri, Sep 4, 2009 at 2:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:

I have a client who uses temp tables heavily, hundreds of thousands of
creates
and drops per day. They also have long running queries. The only
thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. With

Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.

I don't think I believe the claim above that vacuum full is actually
necessary.  Reasonably aggressive regular vacuuming ought to do it.

We used to have a bug that caused row deletions during backend shutdown
to not get reported to the stats collector; which had the effect that
dead catalog entries for temp tables didn't get counted, and so autovac
didn't hit the catalogs often enough, and so you'd get bloat in exactly
this scenario.  I suspect the claim that manual vacuum full is necessary
is based on obsolete experience from before that bug got stomped.
It's hardly an ideal solution anyway given what an exclusive lock on
pg_class will do to the rest of the system --- and a cluster-like
cleanup won't be any better about that.

I'm confused. Are you saying that pg_class will never get bloated, so
we don't need a way to debloat it? I realize that with HOT bloat is
much less of a problem than it used to be, but surely it's not
altogether impossible...

...Robert

#52Boszormenyi Zoltan
zb@cybertec.at
In reply to: Tom Lane (#50)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Tom Lane �rta:

Josh Berkus <josh@agliodbs.com> writes:

I have a client who uses temp tables heavily, hundreds of thousands of
creates
and drops per day. They also have long running queries. The only
thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. With

Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.

I don't think I believe the claim above that vacuum full is actually
necessary. Reasonably aggressive regular vacuuming ought to do it.

How about setting a non-100% fillfactor on catalog tables?
Maybe by default? That would also avoid most of the bloat, wouldn't it?

We used to have a bug that caused row deletions during backend shutdown
to not get reported to the stats collector; which had the effect that
dead catalog entries for temp tables didn't get counted, and so autovac
didn't hit the catalogs often enough, and so you'd get bloat in exactly
this scenario. I suspect the claim that manual vacuum full is necessary
is based on obsolete experience from before that bug got stomped.
It's hardly an ideal solution anyway given what an exclusive lock on
pg_class will do to the rest of the system --- and a cluster-like
cleanup won't be any better about that.

regards, tom lane

--
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zolt�n B�sz�rm�nyi
Cybertec Sch�nig & Sch�nig GmbH
http://www.postgresql.at/

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#51)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Robert Haas <robertmhaas@gmail.com> writes:

I'm confused. Are you saying that pg_class will never get bloated, so
we don't need a way to debloat it? I realize that with HOT bloat is
much less of a problem than it used to be, but surely it's not
altogether impossible...

Well, it's certainly *possible*, I'm just questioning the assertion that
it's necessarily a common situation.

regards, tom lane

#54Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#53)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

On Fri, 2009-09-04 at 15:10 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm confused. Are you saying that pg_class will never get bloated, so
we don't need a way to debloat it? I realize that with HOT bloat is
much less of a problem than it used to be, but surely it's not
altogether impossible...

Well, it's certainly *possible*, I'm just questioning the assertion that
it's necessarily a common situation.

Depends on your definition of common. It is very easy for someone to
blow away their vacuum settings in such a way that it will become
bloated pretty quick.

Joshua D. Drake

regards, tom lane

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

#55Alvaro Herrera
alvherre@commandprompt.com
In reply to: Joshua D. Drake (#54)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Joshua D. Drake escribi�:

On Fri, 2009-09-04 at 15:10 -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm confused. Are you saying that pg_class will never get bloated, so
we don't need a way to debloat it? I realize that with HOT bloat is
much less of a problem than it used to be, but surely it's not
altogether impossible...

Well, it's certainly *possible*, I'm just questioning the assertion that
it's necessarily a common situation.

Depends on your definition of common. It is very easy for someone to
blow away their vacuum settings in such a way that it will become
bloated pretty quick.

No problem, just CLUSTER that table same as today.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#55)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

No problem, just CLUSTER that table same as today.

Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
can't change its relfilenode. If you do, backends won't know where to
read pg_class to find out its relfilenode.

I was wondering whether maintenance operations like "vacuum rewrite"
could get away with filling a new table file and then moving it into
place with rename(2), which is guaranteed atomic (at least on sane
filesystems). The idea doesn't work right off because (1) you need
to atomically install the updated indexes too, and (2) the table
might span more than one segment file. But maybe we could think of
something.

regards, tom lane

#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#56)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

I wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

No problem, just CLUSTER that table same as today.

Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
can't change its relfilenode. If you do, backends won't know where to
read pg_class to find out its relfilenode.

I was wondering whether maintenance operations like "vacuum rewrite"
could get away with filling a new table file and then moving it into
place with rename(2), which is guaranteed atomic (at least on sane
filesystems). The idea doesn't work right off because (1) you need
to atomically install the updated indexes too, and (2) the table
might span more than one segment file. But maybe we could think of
something.

Hmm ... reading that over again, it seems like there is a pretty
obvious solution. The sticking point --- not only for pg_class,
but for shared catalogs such as pg_database --- is the lack of a
way to track relfilenode if it ever changes. What if we keep
the relfilenode of these critical tables someplace else? For
instance, we could have a "map" file in each database holding
the relfilenode of pg_class, and one in $PGDATA/global holding
the relfilenodes of the shared catalogs and indexes. It'd be
possible to update a map file atomically via the rename(2) trick.
Then we teach relcache or some similar place to believe the map
files over the contents of pg_class.

This looks sort of like a reversion to flat files, but it has a couple
of saving graces:
1. The set of entries is fixed and small, so there's no performance
issue looming with database growth.
2. We could not synchronize updates with transaction commit, which was
always the real Achilles' heel of the flat files. But I think we don't
need to, if we restrict the set of operations that can change the
relfilenodes of critical tables to "maintenance" operations that only
rewrite the table contents and don't make any logical changes in the
contents. Given that restriction, transaction commit isn't actually
important; it will be the act of moving the updated map file into place
that effectively is commit for these operations. If you crash after
that, your change is still effective.

Thoughts?

regards, tom lane

#58Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Tom Lane (#56)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Hi,

Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

No problem, just CLUSTER that table same as today.

Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
can't change its relfilenode. If you do, backends won't know where to
read pg_class to find out its relfilenode.

Why can't MVCC apply here? You'd have two versions of the pg_class entry
that just has been CLUSTERed, and you keep the old relfilenode arround
too. MVCC applies, and you teach vacuum to clean out the old file when
cleaning out the no more visible tuple.

Or you take necessary locks to protect the operation.

Something so obvious that I'll regret asking why it can not be done this
way sure will get back from this mail :)

Regards.
--
dim

#59Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dimitri Fontaine (#58)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Dimitri Fontaine escribi�:

Hi,

Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

No problem, just CLUSTER that table same as today.

Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
can't change its relfilenode. If you do, backends won't know where to
read pg_class to find out its relfilenode.

Why can't MVCC apply here? You'd have two versions of the pg_class entry
that just has been CLUSTERed, and you keep the old relfilenode arround
too. MVCC applies, and you teach vacuum to clean out the old file when
cleaning out the no more visible tuple.

It just doesn't work. pg_class (and various other rels) are special
because they are needed to bootstrap the catalog system. See
RelationCacheInitializePhase3. It wouldn't be possible to figure out
what's pg_class relfilenode until you have read it from pg_class, which
is precisely what we're trying to do.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#60Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Alvaro Herrera (#59)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

Dimitri Fontaine escribió:

Why can't MVCC apply here? You'd have two versions of the pg_class entry
that just has been CLUSTERed, and you keep the old relfilenode arround
too. MVCC applies, and you teach vacuum to clean out the old file when
cleaning out the no more visible tuple.

It just doesn't work. pg_class (and various other rels) are special
because they are needed to bootstrap the catalog system. See
RelationCacheInitializePhase3. It wouldn't be possible to figure out
what's pg_class relfilenode until you have read it from pg_class, which
is precisely what we're trying to do.

Well at bootstrap time I guess noone is able to disturb the system by
placing a concurrent CLUSTER pg_class; call. Once started, do those rels
still need to have a special behavior?

I guess I'm being dense, will now let people in the know find a solution...
--
dim

#61Alvaro Herrera
alvherre@commandprompt.com
In reply to: Dimitri Fontaine (#60)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Dimitri Fontaine escribi�:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Dimitri Fontaine escribi�:

Why can't MVCC apply here? You'd have two versions of the pg_class entry
that just has been CLUSTERed, and you keep the old relfilenode arround
too. MVCC applies, and you teach vacuum to clean out the old file when
cleaning out the no more visible tuple.

It just doesn't work. pg_class (and various other rels) are special
because they are needed to bootstrap the catalog system. See
RelationCacheInitializePhase3. It wouldn't be possible to figure out
what's pg_class relfilenode until you have read it from pg_class, which
is precisely what we're trying to do.

Well at bootstrap time I guess noone is able to disturb the system by
placing a concurrent CLUSTER pg_class; call. Once started, do those rels
still need to have a special behavior?

The relcache need to be bootstrapped more than once, not just at
initdb's bootstrap. (I guess you could try a breakpoint in formrdesc)

I guess I'm being dense, will now let people in the know find a solution...

Yeah, well, Tom just posted a possible solution :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#60)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Dimitri Fontaine <dfontaine@hi-media.com> writes:

Well at bootstrap time I guess noone is able to disturb the system by
placing a concurrent CLUSTER pg_class; call. Once started, do those rels
still need to have a special behavior?

It doesn't matter, if you fail to get past bootstrap because you
couldn't find pg_class. The existing design for this is absolutely
dependent on the fact that pg_class has a fixed relfilenode = fixed
physical file name. MVCC has nothing to do with it.

See my followon message for a sketch of a possible solution. Basically
it's pushing the fixed file name over to another place ...

regards, tom lane

#63Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Alvaro Herrera (#61)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Alvaro Herrera <alvherre@commandprompt.com> writes:

The relcache need to be bootstrapped more than once, not just at
initdb's bootstrap. (I guess you could try a breakpoint in formrdesc)

Ok so in RelationCacheInitializePhase3 we have formrdesc calls:

formrdesc("pg_class", false,
true, Natts_pg_class, Desc_pg_class);

That will do this:
/*
* initialize relation id from info in att array (my, this is ugly)
*/
RelationGetRelid(relation) = relation->rd_att->attrs[0]->attrelid;
relation->rd_rel->relfilenode = RelationGetRelid(relation);

And this uses data from pg_attribute.h which looks like this:

#define Schema_pg_class \
{ 1259, {"relname"}, 19, -1, 0, NAMEDATALEN, 1, 0, -1, -1, false, 'p', 'c', true, false, false, true, 0, { 0 } }, \

So in the source I'm reading, pg_class relfilenode is 1259, the system
knows how to get there, then read real values instead of bootstrap mode
dummy one.

Except that the bootstrap mode is used at initdb, at system startup and
anytime there's a cache miss?

So to be able to CLUSTER we have to find a way for cache miss to get the
file named 1259 or the new value, whatever it is after CLUSTER changed
it. I guess symlinks are not allowed?

Yeah, well, Tom just posted a possible solution :-)

Flat file are back? really?

Regards,
--
dim

#64Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#57)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

On Fri, Sep 4, 2009 at 4:01 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

I wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

No problem, just CLUSTER that table same as today.

Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
can't change its relfilenode.  If you do, backends won't know where to
read pg_class to find out its relfilenode.

I was wondering whether maintenance operations like "vacuum rewrite"
could get away with filling a new table file and then moving it into
place with rename(2), which is guaranteed atomic (at least on sane
filesystems).  The idea doesn't work right off because (1) you need
to atomically install the updated indexes too, and (2) the table
might span more than one segment file.  But maybe we could think of
something.

Hmm ... reading that over again, it seems like there is a pretty
obvious solution.  The sticking point --- not only for pg_class,
but for shared catalogs such as pg_database --- is the lack of a
way to track relfilenode if it ever changes.  What if we keep
the relfilenode of these critical tables someplace else?  For
instance, we could have a "map" file in each database holding
the relfilenode of pg_class, and one in $PGDATA/global holding
the relfilenodes of the shared catalogs and indexes.  It'd be
possible to update a map file atomically via the rename(2) trick.
Then we teach relcache or some similar place to believe the map
files over the contents of pg_class.

This looks sort of like a reversion to flat files, but it has a couple
of saving graces:
1. The set of entries is fixed and small, so there's no performance
issue looming with database growth.
2. We could not synchronize updates with transaction commit, which was
always the real Achilles' heel of the flat files.  But I think we don't
need to, if we restrict the set of operations that can change the
relfilenodes of critical tables to "maintenance" operations that only
rewrite the table contents and don't make any logical changes in the
contents.  Given that restriction, transaction commit isn't actually
important; it will be the act of moving the updated map file into place
that effectively is commit for these operations.  If you crash after
that, your change is still effective.

This doesn't seem totally horrible. But, before you go do it, do we
have a clearly-defined plan for the rest of the project? Because we
only need this if we're absolutely confident that rewriting the table
in place is just not an option worth keeping around. It's unclear to
me that everyone is convinced of that, and even if they are, it's
unclear to me what we plan to implement instead.

...Robert

#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#64)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Sep 4, 2009 at 4:01 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Hmm ... reading that over again, it seems like there is a pretty
obvious solution.

This doesn't seem totally horrible. But, before you go do it, do we
have a clearly-defined plan for the rest of the project?

Rest of what project? Removing vacuum full isn't a necessary component
of that. It would enable doing CLUSTER on pg_class, and it would
eliminate the crock of REINDEX having to reindex shared indexes
in-place. It could probably be justified even without any changes in
our approach to vacuum.

... only need this if we're absolutely confident that rewriting the table
in place is just not an option worth keeping around. It's unclear to
me that everyone is convinced of that, and even if they are, it's
unclear to me what we plan to implement instead.

I thought we were pretty well agreed that a seqscan variant of
CLUSTER would be worth doing. Whether we take the next step by
eliminating vacuum full is a different question, but the shape of
the substitute seems perfectly clear.

regards, tom lane

#66Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#65)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

On Fri, Sep 4, 2009 at 9:37 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Sep 4, 2009 at 4:01 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Hmm ... reading that over again, it seems like there is a pretty
obvious solution.

This doesn't seem totally horrible.  But, before you go do it, do we
have a clearly-defined plan for the rest of the project?

Rest of what project?  Removing vacuum full isn't a necessary component
of that.  It would enable doing CLUSTER on pg_class, and it would
eliminate the crock of REINDEX having to reindex shared indexes
in-place.  It could probably be justified even without any changes in
our approach to vacuum.

OK, I'm sold.

... only need this if we're absolutely confident that rewriting the table
in place is just not an option worth keeping around.  It's unclear to
me that everyone is convinced of that, and even if they are, it's
unclear to me what we plan to implement instead.

I thought we were pretty well agreed that a seqscan variant of
CLUSTER would be worth doing.  Whether we take the next step by
eliminating vacuum full is a different question, but the shape of
the substitute seems perfectly clear.

Well, there were some other ideas discussed, but perhaps that's the
only one that had a clear consensus.

...Robert

#67Andrew McNamara
andrewm@object-craft.com.au
In reply to: Tom Lane (#12)
Re: remove flatfiles.c

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards. You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

This was almost exactly the scenario I faced recently. A production
database unexpectedly filled up its partition. On investigation, we found
a developer had added a component to the application that updated every
row in one table each day, exhausting the free space map. Over time,
most of the tables in the system had grown to contain 50-70% dead tuples.

The owner of the system was understandably reluctant to dump and restore
the system, and there wasn't enough space left on the system to rewrite
any of the large tables. In the end, I dropped a table (the one owned
by the offending developer... 8-), and this gave me just enough space
to VACUUM FULL one table at a time.

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#57)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Back in September I wrote:

... The sticking point --- not only for pg_class,
but for shared catalogs such as pg_database --- is the lack of a
way to track relfilenode if it ever changes. What if we keep
the relfilenode of these critical tables someplace else? For
instance, we could have a "map" file in each database holding
the relfilenode of pg_class, and one in $PGDATA/global holding
the relfilenodes of the shared catalogs and indexes. It'd be
possible to update a map file atomically via the rename(2) trick.
Then we teach relcache or some similar place to believe the map
files over the contents of pg_class.

Thinking about this some more, I can see one small disadvantage:
for the relations that we use the map file for, pg_class.relfilenode
would not be trustworthy. This would not affect most of the system
internals (which will be looking at the relcache's copy, which would
be kept valid by the relcache code). But it would affect user queries,
such as for example attempts to use contrib/oid2name to identify a
file on-disk. The main case where pg_class.relfilenode would be
likely to be out-of-sync is for shared catalogs. We could keep it
up to date in most cases for local catalogs, but there's no hope
of reaching into other databases' pg_class when a shared catalog
is relocated.

What I'd suggest doing about this is:

(1) Store zero in pg_class.relfilenode for those catalogs for which
the map is used. This at least makes it obvious that the value
you're looking at isn't valid.

(2) Provide a SQL function to extract the real relfilenode of any
specified pg_class entry. We'd have to modify oid2name and
pg_dump to know to use the function instead of looking at the
column.

There might be some other client-side code that would be broken
until it got taught about the function, but hopefully not much.

Thoughts?

regards, tom lane

#69Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#68)
Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Back in September I wrote:

... The sticking point --- not only for pg_class,
but for shared catalogs such as pg_database --- is the lack of a
way to track relfilenode if it ever changes.  What if we keep
the relfilenode of these critical tables someplace else?  For
instance, we could have a "map" file in each database holding
the relfilenode of pg_class, and one in $PGDATA/global holding
the relfilenodes of the shared catalogs and indexes.  It'd be
possible to update a map file atomically via the rename(2) trick.
Then we teach relcache or some similar place to believe the map
files over the contents of pg_class.

Thinking about this some more, I can see one small disadvantage:
for the relations that we use the map file for, pg_class.relfilenode
would not be trustworthy.  This would not affect most of the system
internals (which will be looking at the relcache's copy, which would
be kept valid by the relcache code).  But it would affect user queries,
such as for example attempts to use contrib/oid2name to identify a
file on-disk.  The main case where pg_class.relfilenode would be
likely to be out-of-sync is for shared catalogs.  We could keep it
up to date in most cases for local catalogs, but there's no hope
of reaching into other databases' pg_class when a shared catalog
is relocated.

What I'd suggest doing about this is:

(1) Store zero in pg_class.relfilenode for those catalogs for which
the map is used.  This at least makes it obvious that the value
you're looking at isn't valid.

(2) Provide a SQL function to extract the real relfilenode of any
specified pg_class entry.  We'd have to modify oid2name and
pg_dump to know to use the function instead of looking at the
column.

There might be some other client-side code that would be broken
until it got taught about the function, but hopefully not much.

Thoughts?

Seems reasonable to me (assuming there's no way to avoid changing the
relfilenode, which I assume is the case but don't actually know the
code well enough to say with certainty).

...Robert