drop tablespace error: invalid argument

Started by Jan Ottoover 16 years ago21 messages
#1Jan Otto
asche@me.com

hello hackers,

i have problems dropping an existing empty tablespace. here is a
reduced example:

AscheMobil:~ asche$ cat test2.sql
CREATE TABLESPACE testspace LOCATION '/opt/postgresql/data2';
CREATE SCHEMA testschema;
CREATE TABLE testschema.foobar (id int) TABLESPACE testspace;
DROP SCHEMA testschema CASCADE;
DROP TABLESPACE testspace;

AscheMobil:~ asche$ /opt/postgresql/bin/psql asche <test2.sql
CREATE TABLESPACE
CREATE SCHEMA
CREATE TABLE
NOTICE: drop cascades to table testschema.foobar
DROP SCHEMA
ERROR: could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT: DROP TABLESPACE testspace;
ERROR: could not read directory "pg_tblspc/16464": Invalid argument

AscheMobil:~ asche$ ls -l /opt/postgresql/data/pg_tblspc/
total 8
lrwx------ 1 asche staff 21 Aug 16 13:08 16464 -> /opt/postgresql/
data2

AscheMobil:~ asche$ ls -l /opt/postgresql/data2/
total 8
-rw------- 1 asche staff 4 Aug 16 13:08 PG_VERSION

AscheMobil:~ asche$ id
uid=501(asche) gid=20(staff) groups=20(staff),204(_developer),100
(_lpoperator),98(_lpadmin),81(_appserveradm),80(admin),79
(_appserverusr),61(localaccounts),12(everyone),402
(com.apple.sharepoint.group.1),401(com.apple.access_screensharing)

if i dont create the table testschema.foobar i can drop the tablespace
without problems. there is another effect i wonder about. when i
execute 'DROP TABLESPACE testspace;' two times at the end of script
the second drop statement drops the tablespace correctly.

AscheMobil:~ asche$ echo 'DROP TABLESPACE testspace;'>>test2.sql
AscheMobil:~ asche$ cat test2.sql
CREATE TABLESPACE testspace LOCATION '/opt/postgresql/data2';
CREATE SCHEMA testschema;
CREATE TABLE testschema.foobar (id int) TABLESPACE testspace;
DROP SCHEMA testschema CASCADE;
DROP TABLESPACE testspace;
DROP TABLESPACE testspace;

AscheMobil:~ asche$ /opt/postgresql/bin/psql asche < test2.sql
CREATE TABLESPACE
CREATE SCHEMA
CREATE TABLE
NOTICE: drop cascades to table testschema.foobar
DROP SCHEMA
ERROR: could not read directory "pg_tblspc/16469": Invalid argument
STATEMENT: DROP TABLESPACE testspace;
ERROR: could not read directory "pg_tblspc/16469": Invalid argument
DROP TABLESPACE

AscheMobil:~ asche$ ls -l /opt/postgresql/data2/
AscheMobil:~ asche$ ls -l /opt/postgresql/data/pg_tblspc/

AscheMobil:~ asche$ /opt/postgresql/bin/psql asche -c 'select version()'
version
----------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.0 on i386-apple-darwin10.0.0, compiled by GCC i686-
apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646), 64-bit
(1 row)

this is the original postgresql-8.4.0 source package from http://www.postgresql.org/ftp/source/v8.4.0/
compiled with:
./configure --enable-debug --with-openssl --with-perl --with-python --
with-tcl --with-libxml --with-libxslt --with-zlib --prefix=/opt/
postgresql

it would be nice if somebody can take a look at this.

regards, jan otto

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Otto (#1)
Re: drop tablespace error: invalid argument

Jan Otto <asche@me.com> writes:

ERROR: could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT: DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX. From the version
number I suspect you are using unreleased Snow Leopard. I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple about
it.

regards, tom lane

#3Jan Otto
asche@me.com
In reply to: Tom Lane (#2)
Re: drop tablespace error: invalid argument

ERROR: could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT: DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX. From the version
number I suspect you are using unreleased Snow Leopard. I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple about
it.

Thank you Tom. I will file a bugreport at Apple.

regards, jan otto

#4Jan Otto
asche@me.com
In reply to: Tom Lane (#2)
Re: drop tablespace error: invalid argument

On Aug 16, 2009, at 8:25 PM, Tom Lane wrote:

Jan Otto <asche@me.com> writes:

ERROR: could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT: DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX. From the version
number I suspect you are using unreleased Snow Leopard. I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple about
it.

regards, tom lane

I have digged a bit around in the source code of postgresql to build a
self contained test-case for Apple and found that the implementation
of Apples readdir() is buggy. readdir() fails under some circumstances.
So i have build a patch against current pgsql's HEAD to work around
the issue. If the bug in readdir() goes into the final release snow
leopard
we have a solution.

This patch basically frees dirdesc and rereads the tablespace location
in case a subdirectory was deleted from the tablespace. this is the
place
where snow leopard fails to read the next entry with readdir().

regards, jan otto

diff -c -r1.61 tablespace.c
*** pgsql/src/backend/commands/tablespace.c     22 Jan 2009 20:16:02  
-0000      1.61
--- pgsql/src/backend/commands/tablespace.c     17 Aug 2009 22:36:01  
-0000
***************
*** 611,616 ****
--- 611,623 ----
                                          errmsg("could not remove  
directory \"%s\": %m",
                                                         subfile)));
+               /*
+                * The following two lines work around a bug in Mac OS  
X Snow Leopard (Build 10A432)
+                * readdir() implementation. We free dirdesc and  
reread location from start.
+                */
+               FreeDir(dirdesc);
+               dirdesc = AllocateDir(location);
+
                 pfree(subfile);
         }
#5Jan Otto
asche@me.com
In reply to: Jan Otto (#4)
Re: drop tablespace error: invalid argument

Jan Otto <asche@me.com> writes:

ERROR: could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT: DROP TABLESPACE testspace;

I have digged a bit around in the source code of postgresql to build a
self contained test-case for Apple and found that the implementation
of Apples readdir() is buggy. readdir() fails under some
circumstances.
So i have build a patch against current pgsql's HEAD to work around
the issue. If the bug in readdir() goes into the final release snow
leopard
we have a solution.

This patch basically frees dirdesc and rereads the tablespace location
in case a subdirectory was deleted from the tablespace. this is the
place
where snow leopard fails to read the next entry with readdir().

The bug in readdir() appeared in the final snow leopard too. Anybody
with Snow Leopard installed can check this, with simply doing the
regression
tests (make check). The tablespace regression test is failing.

The patch i sent in works around the issue. if it is not acceptable to
reread
the tablespace-directory after every delete i can rewrite the
workaround.
Probably it is preferred that we write all entries of the directory
into an array
and looping through that array after that instead of looping with
ReadDir()?

regards, jan otto

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Otto (#5)
Re: drop tablespace error: invalid argument

Jan Otto <asche@me.com> writes:

The bug in readdir() appeared in the final snow leopard too. Anybody
with Snow Leopard installed can check this, with simply doing the
regression tests (make check). The tablespace regression test is
failing.

The patch i sent in works around the issue. if it is not acceptable to
reread the tablespace-directory after every delete i can rewrite the
workaround. Probably it is preferred that we write all entries of the
directory into an array and looping through that array after that
instead of looping with ReadDir()?

I'm not really eager to put in a workaround for such a basic OS bug,
especially not when the odds are good that it'll be fixed in 10.6.1.
Let's wait a little bit for Apple to get their act together.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: drop tablespace error: invalid argument

I wrote:

Jan Otto <asche@me.com> writes:

The bug in readdir() appeared in the final snow leopard too. Anybody
with Snow Leopard installed can check this, with simply doing the
regression tests (make check). The tablespace regression test is
failing.

The patch i sent in works around the issue. if it is not acceptable to
reread the tablespace-directory after every delete i can rewrite the
workaround. Probably it is preferred that we write all entries of the
directory into an array and looping through that array after that
instead of looping with ReadDir()?

I'm not really eager to put in a workaround for such a basic OS bug,
especially not when the odds are good that it'll be fixed in 10.6.1.
Let's wait a little bit for Apple to get their act together.

Well, 10.6.1 is out and it's still got the readdir() bug :-(.

It's likely that there'll be a 10.6.2 before very long, but I wonder if
we should go ahead with some sort of hack; at least as a temporary fix
in CVS HEAD so that we can get more useful buildfarm reports from Snow
Leopard machines.

Comments?

regards, tom lane

#8David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#7)
Re: drop tablespace error: invalid argument

On Sep 11, 2009, at 12:42 PM, Tom Lane wrote:

Well, 10.6.1 is out and it's still got the readdir() bug :-(.

Has someone filed a bug report about this with Apple?

https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa

Best,

David

#9Robert Creager
robert@logicalchaos.org
In reply to: David E. Wheeler (#8)
1 attachment(s)
Re: drop tablespace error: invalid argument

On Sep 11, 2009, at 2:35 PM, David E. Wheeler wrote:

On Sep 11, 2009, at 12:42 PM, Tom Lane wrote:

Well, 10.6.1 is out and it's still got the readdir() bug :-(.

Has someone filed a bug report about this with Apple?

https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa

If no one has (yet), I'll be happy to. I just submitted one for an
AirPort problem... I guess I'll whip up an example program and just
submit it anyway... Anyone already written one?

Later,
Rob

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#10Robert Creager
robert@logicalchaos.org
In reply to: David E. Wheeler (#8)
1 attachment(s)
Re: drop tablespace error: invalid argument

On Sep 11, 2009, at 2:35 PM, David E. Wheeler wrote:

On Sep 11, 2009, at 12:42 PM, Tom Lane wrote:

Well, 10.6.1 is out and it's still got the readdir() bug :-(.

Has someone filed a bug report about this with Apple?

https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa

Look at the history of this thread, and it's already submitted:

http://www.nabble.com/drop-tablespace-error:-invalid-argument-td24992634.html

Later,
Rob

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Otto (#4)
Re: drop tablespace error: invalid argument

Jan Otto <asche@me.com> writes:

This patch basically frees dirdesc and rereads the tablespace location
in case a subdirectory was deleted from the tablespace. this is the
place
where snow leopard fails to read the next entry with readdir().

I've applied this patch in HEAD only for the moment. I hope that
Apple will have fixed their bug before the next set of PG back-branch
updates come out --- if not, we'll probably have to back-patch.

regards, tom lane

#12Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#11)
Re: drop tablespace error: invalid argument

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

I've applied this patch in HEAD only for the moment. I hope that
Apple will have fixed their bug before the next set of PG back-branch
updates come out --- if not, we'll probably have to back-patch.

and on the flip side, I was hoping to see a new 8.4.2 soon due to the
recent commits I've seen against that branch... :/

Thanks,

Stephen

#13Jan Otto
asche@me.com
In reply to: David E. Wheeler (#8)
Re: drop tablespace error: invalid argument

Well, 10.6.1 is out and it's still got the readdir() bug :-(.

Has someone filed a bug report about this with Apple?

yes i have filed a bugreport and keep this list informed when
there is something going on.

regards, jan otto

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Otto (#3)
Re: drop tablespace error: invalid argument

Jan Otto <asche@me.com> writes:

ERROR: could not read directory "pg_tblspc/16464": Invalid argument
STATEMENT: DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX. From the version
number I suspect you are using unreleased Snow Leopard. I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple about
it.

Thank you Tom. I will file a bugreport at Apple.

Hey Jan, did you get any response to that bug report? Somebody else
dug up a document suggesting that this might be intentional on Apple's
part:
http://archives.postgresql.org/pgsql-bugs/2009-11/msg00040.php

If he's right, we have a nontrivial problem here :-(

regards, tom lane

#15Jan Otto
asche@me.com
In reply to: Tom Lane (#14)
Re: drop tablespace error: invalid argument

ERROR: could not read directory "pg_tblspc/16464": Invalid
argument
STATEMENT: DROP TABLESPACE testspace;

Hmm ... can't reproduce this here, not even on OSX. From the
version
number I suspect you are using unreleased Snow Leopard. I'd venture
it's a newly-introduced kernel bug and you need to talk to Apple
about
it.

Thank you Tom. I will file a bugreport at Apple.

Hey Jan, did you get any response to that bug report? Somebody else
dug up a document suggesting that this might be intentional on Apple's
part:
http://archives.postgresql.org/pgsql-bugs/2009-11/msg00040.php

If he's right, we have a nontrivial problem here :-(

no this is not intentional. i got late (22. Oct 2009) feedback from
apple that my reported bug was marked as duplicate.

quoting apple:
"After further investigation it has been determined that this is a
known issue, which is currently being investigated by engineering.
This issue has been filed in our bug database under the original Bug
ID# 6795764."

regards, jan otto

#16Jan Otto
asche@me.com
In reply to: Tom Lane (#14)
Re: drop tablespace error: invalid argument

Hey Jan, did you get any response to that bug report? Somebody else
dug up a document suggesting that this might be intentional on Apple's
part:
http://archives.postgresql.org/pgsql-bugs/2009-11/msg00040.php

i was not subscribed to pgsql-bugs list. i have read this message now
and
see he is referring to an article that was last modified at 22. april
2008 and
was written for the first mac os x (10.0)! this article is very very
old and was
maybe modified during changes of apples knowledgbase-urls.

a quick check on mac os x 10.4 und 10.5 confirmed that this behaviour/
bug
is not present like described in this article. probably it was in
10.0.x... i have
no older version of mac os x available here to check.

regards, jan otto

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Otto (#16)
Re: drop tablespace error: invalid argument

Jan Otto <asche@me.com> writes:

a quick check on mac os x 10.4 und 10.5 confirmed that this behaviour/
bug is not present like described in this article. probably it was in
10.0.x... i have no older version of mac os x available here to check.

Yeah, I thought we'd probably have heard about it before now if OSX
had acted like that all along.

My inclination is to continue assuming that the EINVAL is a new bug
introduced in Snow Leopard. I sure hope they fix it in 10.6.2 though.
If they don't, we may have to think about a workaround, messy as that
will apparently be.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: drop tablespace error: invalid argument

I wrote:

My inclination is to continue assuming that the EINVAL is a new bug
introduced in Snow Leopard. I sure hope they fix it in 10.6.2 though.
If they don't, we may have to think about a workaround, messy as that
will apparently be.

10.6.2 is out, and it appears to fix the bug --- if I remove the hack
in tablespace.c, we still pass regression tests.

Someone else please confirm? If so I'll revert that patch.

regards, tom lane

#19Jan Otto
asche@me.com
In reply to: Tom Lane (#18)
Re: drop tablespace error: invalid argument

My inclination is to continue assuming that the EINVAL is a new bug
introduced in Snow Leopard. I sure hope they fix it in 10.6.2 though.
If they don't, we may have to think about a workaround, messy as that
will apparently be.

10.6.2 is out, and it appears to fix the bug --- if I remove the hack
in tablespace.c, we still pass regression tests.

Someone else please confirm? If so I'll revert that patch.

Yes i can confirm that this bug is fixed in Mac OS X 10.6.2. I have checked it twice.
With removed workaround in tablespace.c and with my self written testcase from
september.

regards, jan otto

#20Stephen Tyler
stephen@stephen-tyler.com
In reply to: Jan Otto (#19)
Re: drop tablespace error: invalid argument

On Tue, Nov 10, 2009 at 8:57 PM, Jan Otto <asche@me.com> wrote:

Someone else please confirm? If so I'll revert that patch.

Yes i can confirm that this bug is fixed in Mac OS X 10.6.2. I have checked
it twice.
With removed workaround in tablespace.c and with my self written testcase
from
september.

I can confirm that I am no longer able to trigger "ERROR: could not read
directory "pg_xlog": Invalid argument" in Mac OS X 10.6.2 with
"checkpoint_segments = 128".

I can also report that under 10.6.1, changing "checkpoint_segments = 128" to
"checkpoint_segments = 64" made the pg_xlog errors disappear almost
entirely. I could still easily trigger them with "VACUUM FULL", but could
not trigger them on demand with regular db operations.

Stephen

PS: I am observing some kind of disk lock-up on my machine that I can't
explain (and is present on both 10.6.1 and 10.6.2). Huge operations (like
"VACUUM FULL on a 50GB table") appear to run in brief spikes of activity
interspersed with 30 second pauses when the disk appears to be both inactive
and somewhat unresponsive and CPU is idle. Perhaps fsync() is misbehaving
(I have an SSD Raid 0 array). Anyway I am mentioning this as a caution that
although I can detect no readdir() errors on Mac OS X 10.6.2, perhaps all is
not OK on my system.

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Tyler (#20)
Re: drop tablespace error: invalid argument

Stephen Tyler <stephen@stephen-tyler.com> writes:

On Tue, Nov 10, 2009 at 8:57 PM, Jan Otto <asche@me.com> wrote:

Someone else please confirm? If so I'll revert that patch.

Yes i can confirm that this bug is fixed in Mac OS X 10.6.2. I have checked
it twice.
With removed workaround in tablespace.c and with my self written testcase
from
september.

I can confirm that I am no longer able to trigger "ERROR: could not read
directory "pg_xlog": Invalid argument" in Mac OS X 10.6.2 with
"checkpoint_segments = 128".

OK, I've reverted the hack in tablespace.c. This is good, I was not
looking forward to providing our own implementation of readdir() :-(

PS: I am observing some kind of disk lock-up on my machine that I can't
explain (and is present on both 10.6.1 and 10.6.2). Huge operations (like
"VACUUM FULL on a 50GB table") appear to run in brief spikes of activity
interspersed with 30 second pauses when the disk appears to be both inactive
and somewhat unresponsive and CPU is idle. Perhaps fsync() is misbehaving
(I have an SSD Raid 0 array).

Maybe ktrace and/or dtrace would shed a bit of light on what's happening
there.

regards, tom lane