BUG #6041: Unlogged table was created bad in slave node

Started by Emanuelover 14 years ago23 messages

postgres.arg@gmail.com

over 14 years ago

The following bug has been logged online:

Bug reference: 6041
Logged by: Emanuel
Email address: postgres.arg@gmail.com
PostgreSQL version: 9.1 beta
Operating system: Ubuntu
Description: Unlogged table was created bad in slave node
Details:

MASTER=6666
SLAVE SYNC=6667
SLAVE ASYNC=6668

root@dell-desktop:/usr/local/pgsql# bin/psql -Upostgres -p6666
psql (9.1beta1)
Type "help" for help.

postgres=# CREATE UNLOGGED TABLE voidy AS SELECT i, random() FROM
generate_series(1,1000) i(i);
SELECT 1000
postgres=# \q
root@dell-desktop:/usr/local/pgsql# bin/psql -Upostgres -p6667
psql (9.1beta1)
Type "help" for help.

postgres=# select * from voidy ;
ERROR: could not open file "base/12071/17034": No existe el archivo o
directorio
postgres=# \q
root@dell-desktop:/usr/local/pgsql# bin/psql -Upostgres -p6668
psql (9.1beta1)
Type "help" for help.

postgres=# select * from voidy ;
ERROR: could not open file "base/12071/17034": No existe el archivo o
directorio

Euler Taveira de Oliveira

euler@timbira.com

over 14 years ago

In reply to: Emanuel (#1)

Re: BUG #6041: Unlogged table was created bad in slave node

Em 26-05-2011 08:37, Emanuel escreveu:

Description: Unlogged table was created bad in slave node

Unlogged table contents are not available to slave nodes [1]http://www.postgresql.org/docs/9.1/static/sql-createtable.html.

postgres=# select * from voidy ;
ERROR: could not open file "base/12071/17034": No existe el archivo o
directorio

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not available
in standby server".

[1]: http://www.postgresql.org/docs/9.1/static/sql-createtable.html

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Euler Taveira de Oliveira (#2)

Re: BUG #6041: Unlogged table was created bad in slave node

Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:

Em 26-05-2011 08:37, Emanuel escreveu:

Description: Unlogged table was created bad in slave node

Unlogged table contents are not available to slave nodes [1].

postgres=# select * from voidy ;
ERROR: could not open file "base/12071/17034": No existe el archivo o
directorio

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not available
in standby server".

I guess what it should do is create an empty file in the slave.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#3)

Re: BUG #6041: Unlogged table was created bad in slave node

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not available
in standby server".

I guess what it should do is create an empty file in the slave.

Probably it should, because won't the table malfunction after the slave
is promoted to master, if there's no file at all there? Or will the
process of coming live create an empty file even if there was none?

But Euler is pointing out a different issue, which is usability. If the
slave just acts like the table is present but empty, we are likely to
get bug reports about that too. An error telling you you aren't allowed
to access such a table on slaves would be more user-friendly, if we can
do it without too much pain.

regards, tom lane

Emanuel Calvo

postgres.arg@gmail.com

over 14 years ago

In reply to: Euler Taveira de Oliveira (#2)

Re: BUG #6041: Unlogged table was created bad in slave node

2011/5/26 Euler Taveira de Oliveira <euler@timbira.com>:

Em 26-05-2011 08:37, Emanuel escreveu:

Description: Unlogged table was created bad in slave node

Unlogged table contents are not available to slave nodes [1].

I know it. But the error raised isn't pretty nice for common users.
IMHO it could be an empty file with a message of WARNING level,
for notify administratros to watch up these tables.

postgres=# select * from voidy ;
ERROR: could not open file "base/12071/17034": No existe el archivo o
directorio

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not
available in standby server".

I think that detecting if table is unlogged && file not exists, rise
this message.
But the problem will comes when the standby is promoted to master, in that
case I didn't test what happens (file not exists, i think it will
couldn't insert data
directly or may force to the dba drop the table from catalog).

--
--
Emanuel Calvo
Helpame.com

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#4)

1 attachment(s)

Re: BUG #6041: Unlogged table was created bad in slave node

On Thu, May 26, 2011 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not available
in standby server".

I guess what it should do is create an empty file in the slave.

Probably it should, because won't the table malfunction after the slave
is promoted to master, if there's no file at all there? Or will the
process of coming live create an empty file even if there was none?

Coming live creates an empty file.

But Euler is pointing out a different issue, which is usability. If the
slave just acts like the table is present but empty, we are likely to
get bug reports about that too. An error telling you you aren't allowed
to access such a table on slaves would be more user-friendly, if we can
do it without too much pain.

I looked into this a bit. A few observations:

(1) This problem is actually not confined to unlogged tables;
temporary tables have the same issue. For example, if you create a
temporary table on the master and then, on the slave, do SELECT * FROM
pg_temp_3.hi_mom (or whatever the name of the temp schema where the
temp table is) you get the same error. In fact I suspect if you took
a base backup that included the temporary relation and matched the
backend ID you could even manage to read out the old contents (modulo
any fun and exciting XID wraparound issues). But the problem is of
course more noticeable for unlogged tables since they're not hidden
away in a special funny schema.

(2) It's somewhat difficult to get a clean fix for this error because
it's coming from deep down in the bowels of the system, inside
mdnblocks(). At that point, we haven't got a Relation available, just
an SMgrRelation, so the information that this relation is unlogged is
not really available, at least not without some sort of gross
modularity violation like calling smgrexists() on the init fork. It
doesn't particularly seem like a good idea to conditionalize the
behavior of mdnblocks() on relpersistence anyway; that's a pretty
gross modularity violation all by itself.

(3) mdnblocks() gets called twice during the process of doing a simple
select on a relation - once from the planner, via estimate_rel_size,
and again from the executor, via initscan. A fairly obvious fix for
this problem is to skip the call to estimate_rel_size() if we're
dealing with a temporary or unlogged relation and are in recovery; and
make heap_beginscan_internal() throw a suitable error under similar
circumstances (or do we want to throw the error at plan time? tossing
it out from all the way down in estimate_rel_size() seems a bit
wacky). Patch taking this approach attached.

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error. We could then make estimate_rel_size() use this
variant, eliminating the need for an explicit test in
get_relation_info(). I'm sort of inclined to go this route even
though it would require touching a bit more code. We've already
whacked around RelationGetNumberOfBlocks() a bit in this release (the
function that formerly had that name is now
RelationGetNumberOfBlocksInFork(), and RelationGetNumberOfBlocks() is
now a macro that calls that function w/MAIN_FORKNUM) so it doesn't
seem like a bad time to get any other API changes that might be useful
out of our system.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

reject-unlogged-during-recovery.patchapplication/octet-stream; name=reject-unlogged-during-recovery.patchDownload

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 346d6b9..e442076 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1202,6 +1202,14 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 	HeapScanDesc scan;
 
 	/*
+	 * We are unable to scan temporary or unlogged relations during recovery.
+	 */
+	if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot access temporary or unlogged relations during recovery")));
+
+	/*
 	 * increment relation ref count while scanning relation
 	 *
 	 * This is just to make really sure the relcache entry won't go away while
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e71090f..437d74b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 #include <unistd.h>
 
 #include "access/clog.h"
+#include "access/xlog.h"
 #include "access/multixact.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fd8ea45..39659ec 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -105,8 +105,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 * case the size will be computed later in set_append_rel_pathlist, and we
 	 * must leave it zero for now to avoid bollixing the total_table_pages
 	 * calculation.
+	 *
+	 * We also skip this step for temporary or unlogged relations, which are
+	 * inaccessible during recovery.  The exact size doesn't really matter, as
+	 * the query is guaranteed to fail at execution time anyway.
 	 */
-	if (!inhparent)
+	if (!inhparent && (RelationNeedsWAL(relation) || !RecoveryInProgress()))
 		estimate_rel_size(relation, rel->attr_widths - rel->min_attr,
 						  &rel->pages, &rel->tuples);

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Robert Haas (#6)

Re: BUG #6041: Unlogged table was created bad in slave node

On Wed, Jun 1, 2011 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 26, 2011 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not available
in standby server".

I guess what it should do is create an empty file in the slave.

Probably it should, because won't the table malfunction after the slave
is promoted to master, if there's no file at all there? Or will the
process of coming live create an empty file even if there was none?

Coming live creates an empty file.

But Euler is pointing out a different issue, which is usability. If the
slave just acts like the table is present but empty, we are likely to
get bug reports about that too. An error telling you you aren't allowed
to access such a table on slaves would be more user-friendly, if we can
do it without too much pain.

I looked into this a bit. A few observations:

(1) This problem is actually not confined to unlogged tables;
temporary tables have the same issue. For example, if you create a
temporary table on the master and then, on the slave, do SELECT * FROM
pg_temp_3.hi_mom (or whatever the name of the temp schema where the
temp table is) you get the same error. In fact I suspect if you took
a base backup that included the temporary relation and matched the
backend ID you could even manage to read out the old contents (modulo
any fun and exciting XID wraparound issues). But the problem is of
course more noticeable for unlogged tables since they're not hidden
away in a special funny schema.

(2) It's somewhat difficult to get a clean fix for this error because
it's coming from deep down in the bowels of the system, inside
mdnblocks(). At that point, we haven't got a Relation available, just
an SMgrRelation, so the information that this relation is unlogged is
not really available, at least not without some sort of gross
modularity violation like calling smgrexists() on the init fork. It
doesn't particularly seem like a good idea to conditionalize the
behavior of mdnblocks() on relpersistence anyway; that's a pretty
gross modularity violation all by itself.

(3) mdnblocks() gets called twice during the process of doing a simple
select on a relation - once from the planner, via estimate_rel_size,
and again from the executor, via initscan. A fairly obvious fix for
this problem is to skip the call to estimate_rel_size() if we're
dealing with a temporary or unlogged relation and are in recovery; and
make heap_beginscan_internal() throw a suitable error under similar
circumstances (or do we want to throw the error at plan time? tossing
it out from all the way down in estimate_rel_size() seems a bit
wacky). Patch taking this approach attached.

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error. We could then make estimate_rel_size() use this
variant, eliminating the need for an explicit test in
get_relation_info(). I'm sort of inclined to go this route even
though it would require touching a bit more code. We've already
whacked around RelationGetNumberOfBlocks() a bit in this release (the
function that formerly had that name is now
RelationGetNumberOfBlocksInFork(), and RelationGetNumberOfBlocks() is
now a macro that calls that function w/MAIN_FORKNUM) so it doesn't
seem like a bad time to get any other API changes that might be useful
out of our system.

Thoughts?

Anyone? Anyone? Bueller?

If we don't want to gum this with the above-mentioned cruft, the other
obvious alternative here is to do nothing, and live with the
non-beauty of the resulting error message.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Robert Haas (#7)

Re: BUG #6041: Unlogged table was created bad in slave node

Excerpts from Robert Haas's message of vie jun 03 12:44:45 -0400 2011:

On Wed, Jun 1, 2011 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error.

If we don't want to gum this with the above-mentioned cruft, the other
obvious alternative here is to do nothing, and live with the
non-beauty of the resulting error message.

Option 4 seems reasonable to me ... can you get rid of the dupe
smgrnblocks call simultaneously?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Alvaro Herrera (#8)

1 attachment(s)

Re: BUG #6041: Unlogged table was created bad in slave node

On Fri, Jun 3, 2011 at 1:01 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of vie jun 03 12:44:45 -0400 2011:

On Wed, Jun 1, 2011 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error.

If we don't want to gum this with the above-mentioned cruft, the other
obvious alternative here is to do nothing, and live with the
non-beauty of the resulting error message.

Option 4 seems reasonable to me ... can you get rid of the dupe
smgrnblocks call simultaneously?

What dup smgrnblocks call?

Patch along these lines attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

reject-unlogged-during-recovery-v2.patchapplication/octet-stream; name=reject-unlogged-during-recovery-v2.patchDownload

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index fe991cf..6dcfe7c 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -185,7 +185,7 @@ _hash_getinitbuf(Relation rel, BlockNumber blkno)
 Buffer
 _hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
 {
-	BlockNumber nblocks = RelationGetNumberOfBlocksInFork(rel, forkNum);
+	BlockNumber nblocks = RelationGetNumberOfBlocksInFork(rel, forkNum, false);
 	Buffer		buf;
 
 	if (blkno == P_NEW)
@@ -340,7 +340,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 	uint32		i;
 
 	/* safety check */
-	if (RelationGetNumberOfBlocksInFork(rel, forkNum) != 0)
+	if (RelationGetNumberOfBlocksInFork(rel, forkNum, false) != 0)
 		elog(ERROR, "cannot initialize non-empty hash index \"%s\"",
 			 RelationGetRelationName(rel));
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 01a492e..8045dda 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1201,6 +1201,12 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 {
 	HeapScanDesc scan;
 
+	/* Temporary/unlogged relations are unavailable during recovery. */
+	if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot access temporary or unlogged tables during recovery")));
+
 	/*
 	 * increment relation ref count while scanning relation
 	 *
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 58bab7d..3b69600 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -366,7 +366,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 	else
 		newnblocks = truncBlock;
 
-	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM) <= newnblocks)
+	if (smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM, false) <= newnblocks)
 	{
 		/* nothing to do, the file was already smaller than requested size */
 		return;
@@ -409,7 +409,8 @@ vm_readbuf(Relation rel, BlockNumber blkno, bool extend)
 	{
 		if (smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
 			rel->rd_smgr->smgr_vm_nblocks = smgrnblocks(rel->rd_smgr,
-													  VISIBILITYMAP_FORKNUM);
+													  VISIBILITYMAP_FORKNUM,
+														false);
 		else
 			rel->rd_smgr->smgr_vm_nblocks = 0;
 	}
@@ -472,7 +473,7 @@ vm_extend(Relation rel, BlockNumber vm_nblocks)
 		!smgrexists(rel->rd_smgr, VISIBILITYMAP_FORKNUM))
 		smgrcreate(rel->rd_smgr, VISIBILITYMAP_FORKNUM, false);
 
-	vm_nblocks_now = smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM);
+	vm_nblocks_now = smgrnblocks(rel->rd_smgr, VISIBILITYMAP_FORKNUM, false);
 
 	while (vm_nblocks_now < vm_nblocks)
 	{
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 27c37d6..e09af6f 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -280,6 +280,12 @@ index_beginscan_internal(Relation indexRelation,
 	IndexScanDesc scan;
 	FmgrInfo   *procedure;
 
+	/* Temporary/unlogged relations are unavailable during recovery. */
+	if (!RelationNeedsWAL(indexRelation) && RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot access temporary or unlogged tables during recovery")));
+
 	RELATION_CHECKS;
 	GET_REL_PROCEDURE(ambeginscan);
 
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index cbb61bb..b6309e6 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -288,7 +288,7 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	 */
 	smgrcreate(smgr, forknum, true);
 
-	lastblock = smgrnblocks(smgr, forknum);
+	lastblock = smgrnblocks(smgr, forknum, false);
 
 	if (blkno < lastblock)
 	{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 60b66ec..5bb766f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7945,7 +7945,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
 	 */
 	use_wal = XLogIsNeeded() && relpersistence == RELPERSISTENCE_PERMANENT;
 
-	nblocks = smgrnblocks(src, forkNum);
+	nblocks = smgrnblocks(src, forkNum, false);
 
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fd8ea45..2bf05c2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -374,8 +374,16 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 		case RELKIND_RELATION:
 		case RELKIND_INDEX:
 		case RELKIND_TOASTVALUE:
-			/* it has storage, ok to call the smgr */
-			curpages = RelationGetNumberOfBlocks(rel);
+			/*
+			 * These types of relation normally have storage - although it's
+			 * possible that, during recovery, an unlogged or temporary
+			 * relation which appears in the catalogs may have no file on
+			 * disk.  Since it's not appropriate to fail here, we just let
+			 * the smgr return 0 in such cases; the executor will throw an
+			 * error at run-time.
+			 */
+			curpages = RelationGetNumberOfBlocksInFork(rel, MAIN_FORKNUM,
+													   true);
 
 			/*
 			 * HACK: if the relation has never yet been vacuumed, use a
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index f96685d..49f7ae8 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -306,7 +306,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	/* Substitute proper block number if caller asked for P_NEW */
 	if (isExtend)
-		blockNum = smgrnblocks(smgr, forkNum);
+		blockNum = smgrnblocks(smgr, forkNum, false);
 
 	if (isLocalBuf)
 	{
@@ -1914,12 +1914,13 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
  *		Determines the current number of pages in the relation.
  */
 BlockNumber
-RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
+RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum,
+								bool missing_ok)
 {
 	/* Open it at the smgr level if not already done */
 	RelationOpenSmgr(relation);
 
-	return smgrnblocks(relation->rd_smgr, forkNum);
+	return smgrnblocks(relation->rd_smgr, forkNum, missing_ok);
 }
 
 /* ---------------------------------------------------------------------
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 1a5a874..452446d 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -298,7 +298,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 	else
 	{
 		new_nfsmblocks = fsm_logical_to_physical(first_removed_address);
-		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM) <= new_nfsmblocks)
+		if (smgrnblocks(rel->rd_smgr, FSM_FORKNUM, false) <= new_nfsmblocks)
 			return;				/* nothing to do; the FSM was already smaller */
 	}
 
@@ -518,7 +518,7 @@ fsm_readbuf(Relation rel, FSMAddress addr, bool extend)
 	{
 		if (smgrexists(rel->rd_smgr, FSM_FORKNUM))
 			rel->rd_smgr->smgr_fsm_nblocks = smgrnblocks(rel->rd_smgr,
-														 FSM_FORKNUM);
+														 FSM_FORKNUM, false);
 		else
 			rel->rd_smgr->smgr_fsm_nblocks = 0;
 	}
@@ -583,7 +583,7 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
 		!smgrexists(rel->rd_smgr, FSM_FORKNUM))
 		smgrcreate(rel->rd_smgr, FSM_FORKNUM, false);
 
-	fsm_nblocks_now = smgrnblocks(rel->rd_smgr, FSM_FORKNUM);
+	fsm_nblocks_now = smgrnblocks(rel->rd_smgr, FSM_FORKNUM, false);
 
 	while (fsm_nblocks_now < fsm_nblocks)
 	{
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 5034a1d..66238eb 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -744,12 +744,17 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  *		are present in the chain.
  */
 BlockNumber
-mdnblocks(SMgrRelation reln, ForkNumber forknum)
+mdnblocks(SMgrRelation reln, ForkNumber forknum, bool missing_ok)
 {
-	MdfdVec    *v = mdopen(reln, forknum, EXTENSION_FAIL);
+	MdfdVec    *v;
 	BlockNumber nblocks;
 	BlockNumber segno = 0;
 
+	v = mdopen(reln, forknum,
+			   missing_ok ? EXTENSION_RETURN_NULL : EXTENSION_FAIL);
+	if (!v)
+		return 0;
+
 	/*
 	 * Skip through any segments that aren't the last one, to avoid redundant
 	 * seeks on them.  We have previously verified that these segments are
@@ -815,7 +820,7 @@ mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
 	 * NOTE: mdnblocks makes sure we have opened all active segments, so that
 	 * truncation loop will get them all!
 	 */
-	curnblk = mdnblocks(reln, forknum);
+	curnblk = mdnblocks(reln, forknum, false);
 	if (nblocks > curnblk)
 	{
 		/* Bogus request ... but no complaint if InRecovery */
@@ -906,7 +911,7 @@ mdimmedsync(SMgrRelation reln, ForkNumber forknum)
 	 * NOTE: mdnblocks makes sure we have opened all active segments, so that
 	 * fsync loop will get them all!
 	 */
-	mdnblocks(reln, forknum);
+	mdnblocks(reln, forknum, false);
 
 	v = mdopen(reln, forknum, EXTENSION_FAIL);
 
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index a6610bf..a07fd86 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -55,7 +55,8 @@ typedef struct f_smgr
 										  BlockNumber blocknum, char *buffer);
 	void		(*smgr_write) (SMgrRelation reln, ForkNumber forknum,
 						 BlockNumber blocknum, char *buffer, bool skipFsync);
-	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
+	BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum,
+											bool missing_ok);
 	void		(*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
 											  BlockNumber nblocks);
 	void		(*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
@@ -434,9 +435,10 @@ smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
  *					 supplied relation.
  */
 BlockNumber
-smgrnblocks(SMgrRelation reln, ForkNumber forknum)
+smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool missing_ok)
 {
-	return (*(smgrsw[reln->smgr_which].smgr_nblocks)) (reln, forknum);
+	return (*(smgrsw[reln->smgr_which].smgr_nblocks)) (reln, forknum,
+													   missing_ok);
 }
 
 /*
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index b8fc87e..edd1674 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -178,7 +178,7 @@ extern void PrintBufferLeakWarning(Buffer buffer);
 extern void CheckPointBuffers(int flags);
 extern BlockNumber BufferGetBlockNumber(Buffer buffer);
 extern BlockNumber RelationGetNumberOfBlocksInFork(Relation relation,
-								ForkNumber forkNum);
+								ForkNumber forkNum, bool missing_ok);
 extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
 extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
@@ -186,7 +186,7 @@ extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
 extern void DropDatabaseBuffers(Oid dbid);
 
 #define RelationGetNumberOfBlocks(reln) \
-	RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM)
+	RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM, true)
 
 #ifdef NOT_USED
 extern void PrintPinnedBufs(void);
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 13f7239..6878a53 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -90,7 +90,8 @@ extern void smgrread(SMgrRelation reln, ForkNumber forknum,
 		 BlockNumber blocknum, char *buffer);
 extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
 		  BlockNumber blocknum, char *buffer, bool skipFsync);
-extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
+extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum,
+			bool fail_if_missing);
 extern void smgrtruncate(SMgrRelation reln, ForkNumber forknum,
 			 BlockNumber nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
@@ -115,7 +116,8 @@ extern void mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 	   char *buffer);
 extern void mdwrite(SMgrRelation reln, ForkNumber forknum,
 		BlockNumber blocknum, char *buffer, bool skipFsync);
-extern BlockNumber mdnblocks(SMgrRelation reln, ForkNumber forknum);
+extern BlockNumber mdnblocks(SMgrRelation reln, ForkNumber forknum,
+		  bool fail_if_missing);
 extern void mdtruncate(SMgrRelation reln, ForkNumber forknum,
 		   BlockNumber nblocks);
 extern void mdimmedsync(SMgrRelation reln, ForkNumber forknum);

#10

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Emanuel (#1)

Re: [BUGS] BUG #6041: Unlogged table was created bad in slave node

Excerpts from Robert Haas's message of mar jun 07 00:07:06 -0400 2011:

Did you intentionally fail to copy the list?

No, I noticed after I sent it ...

On Tue, Jun 7, 2011 at 12:03 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of lun jun 06 22:29:02 -0400 2011:

On Fri, Jun 3, 2011 at 1:01 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of vie jun 03 12:44:45 -0400 2011:

On Wed, Jun 1, 2011 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error.

If we don't want to gum this with the above-mentioned cruft, the other
obvious alternative here is to do nothing, and live with the
non-beauty of the resulting error message.

Option 4 seems reasonable to me ... can you get rid of the dupe
smgrnblocks call simultaneously?

What dup smgrnblocks call?

Err, weren't you saying in option (3) that mdnblocks was being called
twice during query planning? If I'm talking nonsense feel free to
ignore me.

Oh. It is, but there's no way to avoid that...

Patch along these lines attached.

The declaration vs. definition of these functions seem contradictory --
is the third arg "missing_ok" or "fail_if_missing"?

Woops. I started with it as fail_if_missing and then decided it
should be missing_ok. I can fix that...

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Import Notes

Reply to msg id not found: BANLkTiXvTzD0MVpHedkaHjnWQi-C8Xtpw@mail.gmail.com

#11

Simon Riggs

simon@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#6)

Re: BUG #6041: Unlogged table was created bad in slave node

On Wed, Jun 1, 2011 at 7:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 26, 2011 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011:

I think we should emit the real cause in those cases, if possible (not too
much overhead). The message would be "Unlogged table content is not available
in standby server".

I guess what it should do is create an empty file in the slave.

Probably it should, because won't the table malfunction after the slave
is promoted to master, if there's no file at all there? Or will the
process of coming live create an empty file even if there was none?

Coming live creates an empty file.

But Euler is pointing out a different issue, which is usability. If the
slave just acts like the table is present but empty, we are likely to
get bug reports about that too. An error telling you you aren't allowed
to access such a table on slaves would be more user-friendly, if we can
do it without too much pain.

I looked into this a bit. A few observations:

(1) This problem is actually not confined to unlogged tables;
temporary tables have the same issue. For example, if you create a
temporary table on the master and then, on the slave, do SELECT * FROM
pg_temp_3.hi_mom (or whatever the name of the temp schema where the
temp table is) you get the same error. In fact I suspect if you took
a base backup that included the temporary relation and matched the
backend ID you could even manage to read out the old contents (modulo
any fun and exciting XID wraparound issues). But the problem is of
course more noticeable for unlogged tables since they're not hidden
away in a special funny schema.

Seems like you're trying to fix the problem directly, which as you
say, has problems.

At some point we resolve from a word mentioned in the FROM clause to a
relfilenode.

Surely somewhere there we can notice its unlogged before we end up
down in the guts of smgr?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Simon Riggs (#11)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tue, Jun 7, 2011 at 2:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Seems like you're trying to fix the problem directly, which as you
say, has problems.

At some point we resolve from a word mentioned in the FROM clause to a
relfilenode.

Surely somewhere there we can notice its unlogged before we end up
down in the guts of smgr?

Probably. I guess the question is whether we want this to fail in (a)
the parser, (b) the planner, or (c) the executor. I'm not quite sure
what is best, but if I had to guess I would have picked (c) in
preference to (b) in preference to (a), and you seem to be proposing
(a). Having parserOpenTable() or transformSelectStmt() or some such
place barf doesn't feel right - it's not the job of those functions to
decide whether the query can actually be executed at the moment, just
whether it's well-formed. Failing in the planner seems better, but it
seems like a crude hack to have estimate_rel_size() be the place that
bails out just because that's where we happen to call smgrnblocks().
Also, it seems like we oughtn't error out if someone wants to, say,
PREPARE a query while running in HS mode and then wait until after the
server is promoted to EXECUTE it, which won't work if we fail in the
planner. So that led me to the approach I took in the patch I posted
last night: tweak estimate_rel_size() just enough so it doesn't fail,
and then fail for real inside the executor.

This is not to say I'm not open to other ideas, but just to give a
brief history of how I ended up where I am.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Robert Haas (#12)

Re: BUG #6041: Unlogged table was created bad in slave node

Excerpts from Robert Haas's message of mar jun 07 08:16:01 -0400 2011:

On Tue, Jun 7, 2011 at 2:57 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Seems like you're trying to fix the problem directly, which as you
say, has problems.

At some point we resolve from a word mentioned in the FROM clause to a
relfilenode.

Surely somewhere there we can notice its unlogged before we end up
down in the guts of smgr?

Probably. I guess the question is whether we want this to fail in (a)
the parser, (b) the planner, or (c) the executor. I'm not quite sure
what is best, but if I had to guess I would have picked (c) in
preference to (b) in preference to (a), and you seem to be proposing
(a). Having parserOpenTable() or transformSelectStmt() or some such
place barf doesn't feel right - it's not the job of those functions to
decide whether the query can actually be executed at the moment, just
whether it's well-formed.

Really? I thought it was the job of the parse analysis phase to figure
out if table and column names are valid or not, and such. If that's the
case, wouldn't it make sense to disallow usage of a table that doesn't
"exist" in a certain sense?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#14

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#13)

Re: BUG #6041: Unlogged table was created bad in slave node

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Robert Haas's message of mar jun 07 08:16:01 -0400 2011:

Probably. I guess the question is whether we want this to fail in (a)
the parser, (b) the planner, or (c) the executor.

Really? I thought it was the job of the parse analysis phase to figure
out if table and column names are valid or not, and such. If that's the
case, wouldn't it make sense to disallow usage of a table that doesn't
"exist" in a certain sense?

If you hope ever to support the proposed UNLOGGED-to-LOGGED or vice
versa table state changes, you don't want to be testing this in the
parser. It has to be done at plan or execute time.

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#9)

Re: BUG #6041: Unlogged table was created bad in slave node

Robert Haas <robertmhaas@gmail.com> writes:

Patch along these lines attached.

Frankly, I find this quite ugly, and much prefer the general approach of
your previous patch in <BANLkTim433vF5HWjbJ0FSWm_-xA8DDaGNg@mail.gmail.com>.

However, I don't like where you put the execution-time test there. I'd
put it in ExecOpenScanRelation instead, so that it covers both seqscan
and indexscan accesses.

regards, tom lane

#16

Andres Freund

andres@anarazel.de

over 14 years ago

In reply to: Robert Haas (#9)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tuesday, June 07, 2011 04:29:02 Robert Haas wrote:

On Fri, Jun 3, 2011 at 1:01 PM, Alvaro Herrera

<alvherre@commandprompt.com> wrote:

Excerpts from Robert Haas's message of vie jun 03 12:44:45 -0400 2011:

On Wed, Jun 1, 2011 at 2:28 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

(4) It strikes me that it might be possible to address this problem a
bit more cleanly by allowing mdnblocks() and smgrnblocks() and
RelationGetNumberOfBlocksInFork() to take a boolean argument
indicating whether or not an error should be thrown if the underlying
physical file happens not to exist. When no error is to be signaled,
we simply return 0 when the main fork doesn't exist, rather than
throwing an error.

If we don't want to gum this with the above-mentioned cruft, the other
obvious alternative here is to do nothing, and live with the
non-beauty of the resulting error message.

Option 4 seems reasonable to me ... can you get rid of the dupe
smgrnblocks call simultaneously?

What dup smgrnblocks call?
Patch along these lines attached.
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index b8fc87e..edd1674 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -186,7 +186,7 @@ extern void DropRelFileNodeBuffers(RelFileNodeBackend
rnode,
extern void DropDatabaseBuffers(Oid dbid);

#define RelationGetNumberOfBlocks(reln) \
-	RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM)
+	RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM, true)
#ifdef NOT_USED
extern void PrintPinnedBufs(void);

That hunk seems to be a bit dangerous given that RelationGetNumberOfBlocks is
used in the executor. Maybe all the callsites are actually safe but it seems
to be too fragile to me.
In my opinion RelationGetNumberOfBlocks should grow missing_ok as well.

Andres

#17

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#15)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tue, Jun 7, 2011 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Patch along these lines attached.

Frankly, I find this quite ugly, and much prefer the general approach of
your previous patch in <BANLkTim433vF5HWjbJ0FSWm_-xA8DDaGNg@mail.gmail.com>.

However, I don't like where you put the execution-time test there. I'd
put it in ExecOpenScanRelation instead, so that it covers both seqscan
and indexscan accesses.

Ah, OK. I was wondering if there was a better place. I'll do it that
way, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Robert Haas (#17)

1 attachment(s)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tue, Jun 7, 2011 at 2:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 7, 2011 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Patch along these lines attached.

Frankly, I find this quite ugly, and much prefer the general approach of
your previous patch in <BANLkTim433vF5HWjbJ0FSWm_-xA8DDaGNg@mail.gmail.com>.

However, I don't like where you put the execution-time test there. I'd
put it in ExecOpenScanRelation instead, so that it covers both seqscan
and indexscan accesses.

Ah, OK. I was wondering if there was a better place. I'll do it that
way, then.

I found a few other holes in my previous patch as well. I think this
plugs them all, but it's hard to be sure there aren't any other calls
to RelationGetNumberOfBlocks() that could bomb out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

reject-unlogged-during-recovery-v3.patchapplication/octet-stream; name=reject-unlogged-during-recovery-v3.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e71090f..437d74b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,6 +24,7 @@
 #include <unistd.h>
 
 #include "access/clog.h"
+#include "access/xlog.h"
 #include "access/multixact.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 073ef8d..03d7f18 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -802,6 +802,7 @@ ExecOpenScanRelation(EState *estate, Index scanrelid)
 {
 	Oid			reloid;
 	LOCKMODE	lockmode;
+	Relation	rel;
 
 	/*
 	 * Determine the lock type we need.  First, scan to see if target relation
@@ -829,7 +830,15 @@ ExecOpenScanRelation(EState *estate, Index scanrelid)
 
 	/* OK, open the relation and acquire lock as needed */
 	reloid = getrelid(scanrelid, estate->es_range_table);
-	return heap_open(reloid, lockmode);
+	rel = heap_open(reloid, lockmode);
+
+	/* We can't scan temporary or unlogged relations during recovery. */
+	if (!RelationNeedsWAL(rel) && RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot access temporary or unlogged relations during recovery")));
+
+	return rel;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fd8ea45..2555d8f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -106,7 +106,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	 * must leave it zero for now to avoid bollixing the total_table_pages
 	 * calculation.
 	 */
-	if (!inhparent)
+	if (!inhparent && (RelationNeedsWAL(relation) || !RecoveryInProgress()))
 		estimate_rel_size(relation, rel->attr_widths - rel->min_attr,
 						  &rel->pages, &rel->tuples);
 
@@ -321,7 +321,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 			 */
 			if (info->indpred == NIL)
 			{
-				info->pages = RelationGetNumberOfBlocks(indexRelation);
+				if (RelationNeedsWAL(indexRelation) || !RecoveryInProgress())
+					info->pages = RelationGetNumberOfBlocks(indexRelation);
+				else
+					info->pages = 0;
 				info->tuples = rel->tuples;
 			}
 			else
@@ -374,8 +377,15 @@ estimate_rel_size(Relation rel, int32 *attr_widths,
 		case RELKIND_RELATION:
 		case RELKIND_INDEX:
 		case RELKIND_TOASTVALUE:
-			/* it has storage, ok to call the smgr */
-			curpages = RelationGetNumberOfBlocks(rel);
+			/*
+			 * It has storage; we can call the smgr.  But not for temporary
+			 * or unlogged relations during recovery, when such relations
+			 * are not available.
+			 */
+			if (RelationNeedsWAL(rel) || !RecoveryInProgress())
+				curpages = RelationGetNumberOfBlocks(rel);
+			else
+				curpages = 0;
 
 			/*
 			 * HACK: if the relation has never yet been vacuumed, use a

#19

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#18)

Re: BUG #6041: Unlogged table was created bad in slave node

Robert Haas <robertmhaas@gmail.com> writes:

I found a few other holes in my previous patch as well. I think this
plugs them all, but it's hard to be sure there aren't any other calls
to RelationGetNumberOfBlocks() that could bomb out.

[ squint... ] Do we need those additional tests in plancat.c? I
haven't paid attention to whether we support unlogged indexes on logged
tables, but if we do, protecting the RelationGetNumberOfBlocks() call is
the least of your worries. You ought to be fixing things so the planner
won't consider the index valid at all (cf. the indisvalid test at line
165). Similarly, the change in estimate_rel_size seems to be at an
awfully low level, akin to locking the barn door after the horses are
out. What code path are you thinking will reach there on an unlogged
table?

It might be that it'd be best just to have both the planner and executor
throwing errors on unlogged tables, rather than rejiggering pieces of
the planner to sort-of not fail on an unlogged table.

regards, tom lane

#20

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#19)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tue, Jun 7, 2011 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I found a few other holes in my previous patch as well. I think this
plugs them all, but it's hard to be sure there aren't any other calls
to RelationGetNumberOfBlocks() that could bomb out.

[ squint... ] Do we need those additional tests in plancat.c? I
haven't paid attention to whether we support unlogged indexes on logged
tables, but if we do, protecting the RelationGetNumberOfBlocks() call is
the least of your worries. You ought to be fixing things so the planner
won't consider the index valid at all (cf. the indisvalid test at line
165).

Right now, RelationNeedsWAL() is always the same for a table and for
an index belonging to that table. That is, indexes on temporary
tables are temporary; indees on unlogged tables are unlogged; indexes
on permanent tables are permanent. But I agree that's something we'll
have to deal with if and when someone implements unlogged indexes on
logged tables. (Though frankly I hope someone will come up with a
better name for that; else it's going to be worse than
constraint_exclusion vs. exclusion constraints.)

Similarly, the change in estimate_rel_size seems to be at an
awfully low level, akin to locking the barn door after the horses are
out. What code path are you thinking will reach there on an unlogged
table?

Well, it gets there; I found this out empirically.
get_relation_info() calls it in two different places. Actually, I see
now that the v3 patch has a few leftovers: the test in
estimate_relation_size() makes the first of the two checks in
get_relaton_info() redundant -- but the second hunk in
get_relation_info() is needed, because there it calls
RelationGetNumberOfBlocks() directly. This is why I thought it might
be better to provide a version of RelationGetNumberOfBlocks() that
doesn't fail if the file is missing, instead of trying to plug these
holes one by one.

It might be that it'd be best just to have both the planner and executor
throwing errors on unlogged tables, rather than rejiggering pieces of
the planner to sort-of not fail on an unlogged table.

Mmm, that's not a bad thought either. Although I think if we can be
certain that the planner will error out, the executor checks aren't
necessary. It would disallow preparing a statement and then executing
it after promotion, but that doesn't seem terribly important. Any
idea where to put the check?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#20)

Re: BUG #6041: Unlogged table was created bad in slave node

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 7, 2011 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be that it'd be best just to have both the planner and executor
throwing errors on unlogged tables, rather than rejiggering pieces of
the planner to sort-of not fail on an unlogged table.

Mmm, that's not a bad thought either. Although I think if we can be
certain that the planner will error out, the executor checks aren't
necessary. It would disallow preparing a statement and then executing
it after promotion, but that doesn't seem terribly important. Any
idea where to put the check?

Well, I'd recommend keeping the test in ExecOpenScanRelation, since it's
cheap insurance against the situation changing since the plan was made.
But for the planner, why not just put the same kind of test in
get_relation_info, just after it does heap_open?

regards, tom lane

#22

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#21)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tue, Jun 7, 2011 at 5:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 7, 2011 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be that it'd be best just to have both the planner and executor
throwing errors on unlogged tables, rather than rejiggering pieces of
the planner to sort-of not fail on an unlogged table.

Mmm, that's not a bad thought either. Although I think if we can be
certain that the planner will error out, the executor checks aren't
necessary. It would disallow preparing a statement and then executing
it after promotion, but that doesn't seem terribly important. Any
idea where to put the check?

Well, I'd recommend keeping the test in ExecOpenScanRelation, since it's
cheap insurance against the situation changing since the plan was made.

Well, the system can't very well go back into recovery once it's
emerged. I guess it's possible that a table could be switched from
LOGGED to UNLOGGED during recovery though, in some hypothetical future
release. No one's proposed that feature yet, but since there IS a
proposal to go the other way it doesn't seem unlikely we may see the
other direction eventually too. I can't get too excited about
blocking this in multiple places just for the sake of preserving a
nicer error message in the face of a possible future feature
development, though.

But for the planner, why not just put the same kind of test in
get_relation_info, just after it does heap_open?

OK, let me try that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Robert Haas (#22)

Re: BUG #6041: Unlogged table was created bad in slave node

On Tue, Jun 7, 2011 at 6:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

But for the planner, why not just put the same kind of test in
get_relation_info, just after it does heap_open?

OK, let me try that.

Seems to work beautifully, so committed that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company