Unlogged tables cannot be truncated twice

Started by Greg Sabino Mullanealmost 15 years ago16 messagesbugs
Jump to latest
#1Greg Sabino Mullane
greg@turnstep.com

Wow, this one took a bit to narrow down. Here's the failing case:

# create unlogged table foo (a text);
CREATE TABLE
# begin;
BEGIN
#* truncate table foo;
TRUNCATE TABLE
#* truncate table foo;
ERROR: could not create file "base/19131/19183_init": File exists

Very reproducible. The column types matter: if the only column
is an INT, for example, the problem does not occur.

--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Greg Sabino Mullane (#1)
Re: Unlogged tables cannot be truncated twice

Excerpts from Greg Sabino Mullane's message of lun may 30 12:00:43 -0400 2011:

Wow, this one took a bit to narrow down. Here's the failing case:

# create unlogged table foo (a text);
CREATE TABLE
# begin;
BEGIN
#* truncate table foo;
TRUNCATE TABLE
#* truncate table foo;
ERROR: could not create file "base/19131/19183_init": File exists

Very reproducible. The column types matter: if the only column
is an INT, for example, the problem does not occur.

So 19183 is the toast table OID?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#2)
Re: Unlogged tables cannot be truncated twice

On Monday, May 30, 2011 11:18:20 PM Alvaro Herrera wrote:

Excerpts from Greg Sabino Mullane's message of lun may 30 12:00:43 -0400

2011:

Wow, this one took a bit to narrow down. Here's the failing case:

# create unlogged table foo (a text);
CREATE TABLE
# begin;
BEGIN
#* truncate table foo;
TRUNCATE TABLE
#* truncate table foo;
ERROR: could not create file "base/19131/19183_init": File exists

Very reproducible. The column types matter: if the only column
is an INT, for example, the problem does not occur.

So 19183 is the toast table OID?

Nope. Its any index.

You can provoke it with:
begin;create unlogged table foo (a int primary key);truncate foo;rollback;
or
begin;create unlogged table foo (a text);truncate foo;rollback;

The problem is this tidbit from tablecmds.c's ExecuteTruncate:

/*
* Normally, we need a transaction-safe truncation here. However, if
* the table was either created in the current (sub)transaction or has
* a new relfilenode in the current (sub)transaction, then we can just
* truncate it in-place, because a rollback would cause the whole
* table or the current physical file to be thrown away anyway.
*/
if (rel->rd_createSubid == mySubid ||
rel->rd_newRelfilenodeSubid == mySubid)
{
/* Immediate, non-rollbackable truncation is OK */
heap_truncate_one_rel(rel);
}

in combination with index.c's index_build:

/*
* If this is an unlogged index, we need to write out an init fork for it.
*/
if (heapRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
{
RegProcedure ambuildempty = indexRelation->rd_am->ambuildempty;

RelationOpenSmgr(indexRelation);
smgrcreate(indexRelation->rd_smgr, INIT_FORKNUM, false);
OidFunctionCall1(ambuildempty, PointerGetDatum(indexRelation));
}

Andres

#4Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Andres Freund (#3)
Re: Unlogged tables cannot be truncated twice

2011/5/31 Andres Freund <andres@anarazel.de>:

On Monday, May 30, 2011 11:18:20 PM Alvaro Herrera wrote:

Excerpts from Greg Sabino Mullane's message of lun may 30 12:00:43 -0400

2011:

Wow, this one took a bit to narrow down. Here's the failing case:

# create unlogged table foo (a text);
CREATE TABLE
# begin;
BEGIN
#* truncate table foo;
TRUNCATE TABLE
#* truncate table foo;
ERROR:  could not create file "base/19131/19183_init": File exists

Very reproducible. The column types matter: if the only column
is an INT, for example, the problem does not occur.

So 19183 is the toast table OID?

Nope. Its any index.

You can provoke it with:
begin;create unlogged table foo (a int primary key);truncate foo;rollback;
or
begin;create unlogged table foo (a text);truncate foo;rollback;

The problem is this tidbit from tablecmds.c's ExecuteTruncate:

               /*
                * Normally, we need a transaction-safe truncation here.  However, if
                * the table was either created in the current (sub)transaction or has
                * a new relfilenode in the current (sub)transaction, then we can just
                * truncate it in-place, because a rollback would cause the whole
                * table or the current physical file to be thrown away anyway.
                */
               if (rel->rd_createSubid == mySubid ||
                       rel->rd_newRelfilenodeSubid == mySubid)
               {
                       /* Immediate, non-rollbackable truncation is OK */
                       heap_truncate_one_rel(rel);
               }

in combination with index.c's index_build:

       /*
        * If this is an unlogged index, we need to write out an init fork for it.
        */
       if (heapRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
       {
               RegProcedure ambuildempty = indexRelation->rd_am->ambuildempty;

               RelationOpenSmgr(indexRelation);
               smgrcreate(indexRelation->rd_smgr, INIT_FORKNUM, false);
               OidFunctionCall1(ambuildempty, PointerGetDatum(indexRelation));
       }

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c if().

It looks enough from early testing.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Attachments:

fix_reindex_unlog.patchtext/x-patch; charset=US-ASCII; name=fix_reindex_unlog.patchDownload+1-1
#5Andres Freund
andres@anarazel.de
In reply to: Cédric Villemain (#4)
Re: Unlogged tables cannot be truncated twice

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

Andres

Attachments:

0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo.patchtext/x-patch; charset=UTF-8; name=0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo.patchDownload+6-1
#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
Re: Unlogged tables cannot be truncated twice

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it did not observe RelationTruncate's
nblocks argument. That function is used to "shorten" the relation in vacuum.
So dropping the init fork there is not a good idea.

So I think it is the simpler version of simply checking the existance of the
fork before creating is ok.

Andres

Attachments:

0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo_v2.patchtext/x-patch; charset=UTF-8; name=0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo_v2.patchDownload+2-4
#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#6)
Re: Unlogged tables cannot be truncated twice

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it did not observe RelationTruncate's
nblocks argument. That function is used to "shorten" the relation in
vacuum. So dropping the init fork there is not a good idea.

So I think it is the simpler version of simply checking the existance of
the fork before creating is ok.

Gna. gnargl. Coffe. Bed. ;)

There was an accidental hunk I added while removing some whitespace. That
would not have been good on a real commit.

Argh.

Attachments:

0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo_v3.patchtext/x-patch; charset=UTF-8; name=0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo_v3.patchDownload+2-2
0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo_v3.patchtext/x-patch; charset=UTF-8; name=0001-Fix-file-existing-errors-uppon-recreation-of-an-unlo_v3.patchDownload+2-2
#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#7)
Re: Unlogged tables cannot be truncated twice

Excerpts from Andres Freund's message of lun may 30 20:47:49 -0400 2011:

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it did not observe RelationTruncate's
nblocks argument. That function is used to "shorten" the relation in
vacuum. So dropping the init fork there is not a good idea.

So I think it is the simpler version of simply checking the existance of
the fork before creating is ok.

Hmm, I wonder if what we should be doing here is observe isreindex in
index_build to avoid creating the init fork. Doing smgr access at that
level seems wrong.

Gna. gnargl. Coffe. Bed. ;)

There was an accidental hunk I added while removing some whitespace. That
would not have been good on a real commit.

Argh.

Hah :-)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#8)
Re: Unlogged tables cannot be truncated twice

On Tuesday, May 31, 2011 03:27:22 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of lun may 30 20:47:49 -0400 2011:

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it did not observe
RelationTruncate's nblocks argument. That function is used to
"shorten" the relation in vacuum. So dropping the init fork there is
not a good idea.

So I think it is the simpler version of simply checking the existance
of the fork before creating is ok.

Hmm, I wonder if what we should be doing here is observe isreindex in
index_build to avoid creating the init fork. Doing smgr access at that
level seems wrong.

isreindex doesn't contain the necessary informormation as its set doing a
REINDEX even though a new relfilenode is created and thus the fork needs to be
created.

It doesn't seem terribly clean do do the !smgrexists(), I aggree with you
there. On the other hand we are calling smgrcreate() two lines down anyway. I
personally don't realy like the placement of that piece of code very much.
Doing it in index_build seems to be the wrong place. I don't think there
really is a good place for it right now.

Andres

#10Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Andres Freund (#9)
Re: Unlogged tables cannot be truncated twice

2011/5/31 Andres Freund <andres@anarazel.de>:

On Tuesday, May 31, 2011 03:27:22 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of lun may 30 20:47:49 -0400 2011:

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex)  in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it  did not observe
RelationTruncate's nblocks argument. That function is used to
"shorten" the relation in vacuum. So dropping the init fork there is
not a good idea.

So I think it is the simpler version of simply checking the existance
of the fork before creating is ok.

Hmm, I wonder if what we should be doing here is observe isreindex in
index_build to avoid creating the init fork.  Doing smgr access at that
level seems wrong.

isreindex doesn't contain the necessary informormation as its set doing a
REINDEX even though a new relfilenode is created and thus the fork needs to be
created.

It doesn't seem terribly clean do do the !smgrexists(), I aggree with you
there. On the other hand we are calling smgrcreate() two lines down anyway. I
personally don't realy like the placement of that piece of code very much.
Doing it in index_build seems to be the wrong place. I don't think there
really is a good place for it right now.

Robert, I wonder if all the logic with INIT_FORK is only to simulate
truncate on server restart ?
I failled to understand why not simply use truncate around the server
start (when it is in recovery).

The current large amount of code changed just to be able to flush
unlogged files on server start looks crazy, if the only point is a
possible performance impact on server restart, can't we find
something more elegant ?

While here I also wonder why GiST unlogged are not supported ?!

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#11Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#10)
Re: Unlogged tables cannot be truncated twice

On Tue, May 31, 2011 at 7:04 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

Robert, I wonder if all the logic with INIT_FORK is only to simulate
truncate on server restart ?

That is correct.

I failled to understand why not simply use truncate around the server
start (when it is in recovery).

We need to do the equivalent of a truncate but without relying in any
way on the system catalogs. The startup process is not bound to a
database, and starting a separate worker process for each database
would be both very complicated and very expensive.

Also note that while truncate is very simple for a table (just zero
out the file) it's not so simple for an index (a zero-length file is
an invalid index, not an empty one).

While here I also wonder why GiST unlogged are not supported ?!

Because they use LSNs internally to guarantee proper synchronization,
which presumes that WAL records are being omitted. Temporary GIST
indexes were broken too, for the same reason, but Heikki fixed that
using GetXLogRecPtrForTemp(). We could engineer a similar solution
for unlogged GIST indexes using a shared counter that is saved in
pg_control across clean shutdowns, but I think that's a 9.2 project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#9)
Re: Unlogged tables cannot be truncated twice

2011/5/31 Andres Freund <andres@anarazel.de>:

On Tuesday, May 31, 2011 03:27:22 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of lun may 30 20:47:49 -0400 2011:

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex)  in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it  did not observe
RelationTruncate's nblocks argument. That function is used to
"shorten" the relation in vacuum. So dropping the init fork there is
not a good idea.

So I think it is the simpler version of simply checking the existance
of the fork before creating is ok.

Hmm, I wonder if what we should be doing here is observe isreindex in
index_build to avoid creating the init fork.  Doing smgr access at that
level seems wrong.

isreindex doesn't contain the necessary informormation as its set doing a
REINDEX even though a new relfilenode is created and thus the fork needs to be
created.

It doesn't seem terribly clean do do the !smgrexists(), I aggree with you
there. On the other hand we are calling smgrcreate() two lines down anyway. I
personally don't realy like the placement of that piece of code very much.
Doing it in index_build seems to be the wrong place. I don't think there
really is a good place for it right now.

I'm open to suggestions on how to rearrange this, but I think for
right now the approach you proposed upthread (add a smgrexists() test)
is probably the simplest way to fix this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#11)
Re: Unlogged tables cannot be truncated twice

2011/6/1 Robert Haas <robertmhaas@gmail.com>:

On Tue, May 31, 2011 at 7:04 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

Robert, I wonder if all the logic with INIT_FORK is only to simulate
truncate on server restart ?

That is correct.

I failled to understand why not simply use truncate around the server
start (when it is in recovery).

We need to do the equivalent of a truncate but without relying in any
way on the system catalogs.  The startup process is not bound to a
database, and starting a separate worker process for each database
would be both very complicated and very expensive.

Also note that while truncate is very simple for a table (just zero
out the file) it's not so simple for an index (a zero-length file is
an invalid index, not an empty one).

While here I also wonder why GiST unlogged are not supported ?!

Because they use LSNs internally to guarantee proper synchronization,
which presumes that WAL records are being omitted.  Temporary GIST
indexes were broken too, for the same reason, but Heikki fixed that
using GetXLogRecPtrForTemp().  We could engineer a similar solution
for unlogged GIST indexes using a shared counter that is saved in
pg_control across clean shutdowns, but I think that's a 9.2 project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

All right, things under a new light!
Thank you.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#14Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#12)
Re: Unlogged tables cannot be truncated twice

On Wed, Jun 1, 2011 at 1:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

2011/5/31 Andres Freund <andres@anarazel.de>:

On Tuesday, May 31, 2011 03:27:22 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of lun may 30 20:47:49 -0400 2011:

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex)  in the index.c if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it  did not observe
RelationTruncate's nblocks argument. That function is used to
"shorten" the relation in vacuum. So dropping the init fork there is
not a good idea.

So I think it is the simpler version of simply checking the existance
of the fork before creating is ok.

Hmm, I wonder if what we should be doing here is observe isreindex in
index_build to avoid creating the init fork.  Doing smgr access at that
level seems wrong.

isreindex doesn't contain the necessary informormation as its set doing a
REINDEX even though a new relfilenode is created and thus the fork needs to be
created.

It doesn't seem terribly clean do do the !smgrexists(), I aggree with you
there. On the other hand we are calling smgrcreate() two lines down anyway. I
personally don't realy like the placement of that piece of code very much.
Doing it in index_build seems to be the wrong place. I don't think there
really is a good place for it right now.

I'm open to suggestions on how to rearrange this, but I think for
right now the approach you proposed upthread (add a smgrexists() test)
is probably the simplest way to fix this.

Done. Your patch tested for FSM_FORKNUM instead of INIT_FORKNUM,
which seemed wrong, so I changed it. I also added comments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#14)
Re: Unlogged tables cannot be truncated twice

On Thursday, June 02, 2011 07:31:33 PM Robert Haas wrote:

On Wed, Jun 1, 2011 at 1:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

2011/5/31 Andres Freund <andres@anarazel.de>:

On Tuesday, May 31, 2011 03:27:22 Alvaro Herrera wrote:

Excerpts from Andres Freund's message of lun may 30 20:47:49 -0400 2011:

On Tuesday, May 31, 2011 02:35:58 AM Andres Freund wrote:

On Tuesday, May 31, 2011 02:14:00 AM Andres Freund wrote:

On Tuesday, May 31, 2011 01:56:05 AM Cédric Villemain wrote:

I remove my own explanations as we conclude on the same thing.
Attached is the fix by adding a (!reindex) in the index.c
if().

Thats imo wrong because it will break a plain REINDEX?

I think one possible correct fix would be the attached:

My version was wrong as well because it did not observe
RelationTruncate's nblocks argument. That function is used to
"shorten" the relation in vacuum. So dropping the init fork there
is not a good idea.

So I think it is the simpler version of simply checking the
existance of the fork before creating is ok.

Hmm, I wonder if what we should be doing here is observe isreindex in
index_build to avoid creating the init fork. Doing smgr access at that
level seems wrong.

isreindex doesn't contain the necessary informormation as its set doing
a REINDEX even though a new relfilenode is created and thus the fork
needs to be created.

It doesn't seem terribly clean do do the !smgrexists(), I aggree with
you there. On the other hand we are calling smgrcreate() two lines down
anyway. I personally don't realy like the placement of that piece of
code very much. Doing it in index_build seems to be the wrong place. I
don't think there really is a good place for it right now.

I'm open to suggestions on how to rearrange this, but I think for
right now the approach you proposed upthread (add a smgrexists() test)
is probably the simplest way to fix this.

Done. Your patch tested for FSM_FORKNUM instead of INIT_FORKNUM,
which seemed wrong, so I changed it. I also added comments.

Wow. I don't think I ever made so many stupid mistakes when doing a two line
change. I abviously wasn't really awake that evening. As evidenced excessively
in that thread ;)

Thanks,

Andres

#16Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#15)
Re: Unlogged tables cannot be truncated twice

On Thu, Jun 2, 2011 at 2:52 PM, Andres Freund <andres@anarazel.de> wrote:

Wow. I don't think I ever made so many stupid mistakes when doing a two line
change. I abviously wasn't really awake that evening. As evidenced excessively
in that thread ;)

I think I once found two bugs in a one-line patch...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company