Why does PostgreSQL ftruncate before unlink?

Started by Jon Nelsonabout 12 years ago8 messagesgeneral
Jump to latest
#1Jon Nelson
jnelson+pgsql@jamponi.net

When dropping lots of tables, I noticed postgresql taking longer than
I would have expected.

strace seems to report that the largest contributor is the ftruncate
and not the unlink. I'm curious what the logic is behind using
ftruncate before unlink.

I'm using an ext4 filesystem.

--
Jon

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Jon Nelson (#1)
Re: Why does PostgreSQL ftruncate before unlink?

On Fri, Feb 21, 2014 at 4:14 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:

When dropping lots of tables, I noticed postgresql taking longer than
I would have expected.

strace seems to report that the largest contributor is the ftruncate
and not the unlink. I'm curious what the logic is behind using
ftruncate before unlink.

I'm using an ext4 filesystem.

I'm guessing that this is so that it can be rolled back. Unlink is
likely issued at commit;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Scott Marlowe (#2)
Re: Why does PostgreSQL ftruncate before unlink?

On Sunday, February 23, 2014, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Fri, Feb 21, 2014 at 4:14 PM, Jon Nelson <jnelson+pgsql@jamponi.net<javascript:;>>
wrote:

When dropping lots of tables, I noticed postgresql taking longer than
I would have expected.

strace seems to report that the largest contributor is the ftruncate
and not the unlink. I'm curious what the logic is behind using
ftruncate before unlink.

I'm using an ext4 filesystem.

I'm guessing that this is so that it can be rolled back. Unlink is
likely issued at commit;

I would hope that ftruncate is issued at commit as well. That doesn't
sound undoable.

Cheers,

Jeff

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#3)
Re: Why does PostgreSQL ftruncate before unlink?

Jeff Janes <jeff.janes@gmail.com> writes:

On Sunday, February 23, 2014, Scott Marlowe <scott.marlowe@gmail.com> wrote:

I'm guessing that this is so that it can be rolled back. Unlink is
likely issued at commit;

I would hope that ftruncate is issued at commit as well. That doesn't
sound undoable.

It's more subtle than that. I'm too lazy to look at the comments in md.c
right now, but basically the reason for not doing an instant unlink is
to ensure that if a relation is truncated and then re-extended, open file
pointers held by other backends will still be valid. The ftruncate is
done to ensure that allocated disk space goes away as soon as that's safe
(ie, at commit of the truncation); but immediate unlink would require
forcing more cross-backend synchronization than we want to have.

If memory serves, the inode should get removed during the next checkpoint.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Jon Nelson
jnelson+pgsql@jamponi.net
In reply to: Tom Lane (#4)
Re: Why does PostgreSQL ftruncate before unlink?

On Sun, Feb 23, 2014 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Sunday, February 23, 2014, Scott Marlowe <scott.marlowe@gmail.com> wrote:

I'm guessing that this is so that it can be rolled back. Unlink is
likely issued at commit;

I would hope that ftruncate is issued at commit as well. That doesn't
sound undoable.

It's more subtle than that. I'm too lazy to look at the comments in md.c
right now, but basically the reason for not doing an instant unlink is
to ensure that if a relation is truncated and then re-extended, open file
pointers held by other backends will still be valid. The ftruncate is
done to ensure that allocated disk space goes away as soon as that's safe
(ie, at commit of the truncation); but immediate unlink would require
forcing more cross-backend synchronization than we want to have.

If memory serves, the inode should get removed during the next checkpoint.

I was moments away from commenting to say that I had traced the flow
of the code to md.c and found the comments there quite illuminating. I
wonder if there is a different way to solve the underlying issue
without relying on ftruncate (which seems to be somewhat expensive).

--
Jon

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jon Nelson (#5)
Re: Why does PostgreSQL ftruncate before unlink?

Jon Nelson <jnelson+pgsql@jamponi.net> writes:

On Sun, Feb 23, 2014 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If memory serves, the inode should get removed during the next checkpoint.

I was moments away from commenting to say that I had traced the flow
of the code to md.c and found the comments there quite illuminating. I
wonder if there is a different way to solve the underlying issue
without relying on ftruncate (which seems to be somewhat expensive).

Hm. The code is designed the way it is on the assumption that ftruncate
doesn't do anything that unlink wouldn't have to do anyway. If it really
is significantly slower on popular filesystems, maybe we need to revisit
that.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Jon Nelson
jnelson+pgsql@jamponi.net
In reply to: Tom Lane (#6)
Re: Why does PostgreSQL ftruncate before unlink?

On Sun, Feb 23, 2014 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jon Nelson <jnelson+pgsql@jamponi.net> writes:

On Sun, Feb 23, 2014 at 9:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If memory serves, the inode should get removed during the next checkpoint.

I was moments away from commenting to say that I had traced the flow
of the code to md.c and found the comments there quite illuminating. I
wonder if there is a different way to solve the underlying issue
without relying on ftruncate (which seems to be somewhat expensive).

Hm. The code is designed the way it is on the assumption that ftruncate
doesn't do anything that unlink wouldn't have to do anyway. If it really
is significantly slower on popular filesystems, maybe we need to revisit
that.

Here is an example.

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.95 3.207681 4182 767 ftruncate
0.05 0.001579 1 2428 2301 unlink

--
Jon

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Francisco Olarte
folarte@peoplecall.com
In reply to: Jon Nelson (#7)
Re: Why does PostgreSQL ftruncate before unlink?

On Mon, Feb 24, 2014 at 6:38 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:

Here is an example.

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.95 3.207681 4182 767 ftruncate
0.05 0.001579 1 2428 2301 unlink

Are this times for unlink after ftruncate? Because ( in linux which is
the one I use in the desktops and I'm familiar with ) unlinks of big
files are slow too, so to have a more meaningful comparison you would
need to time ftruncate+unlink and plain unlink of same files, IIRC
they take nearly equal time.

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general