Postmaster holding unlinked files for pg_largeobject table

Started by Alexander Shulginover 14 years ago29 messages

ash@commandprompt.com

over 14 years ago

Hello Hackers,

There is some strange behavior we're experiencing with one of the customer's DBs (8.4)

We've noticed that free disk space went down heavily on a system, and after a short analysis determined that the reason was that postmaster was holding lots of unlinked files open. A sample of lsof output was something like this:

postmaste 15484 postgres 57u REG 253,0 1073741824 41125093 /srv/pgsql/data/base/352483309/2613.2 (deleted)
postmaste 15484 postgres 58u REG 253,0 1073741824 41125094 /srv/pgsql/data/base/352483309/2613.3 (deleted)
postmaste 15484 postgres 59u REG 253,0 1073741824 41125095 /srv/pgsql/data/base/352483309/2613.4 (deleted)

There were about 450 such (or similar) files, all of them having /2613 in the filename. Since 2613 is a regclass of pg_largeobject and we are indeed working with quite a few large objects in that DB so this is where our problem lies we suspect.

Restarting PostgreSQL obviously helps the issue and the disk space occupied by those unlinked files (about 63GB actually) is reclaimed.

So what happens on that host is that we drop/restore a fresh version of the DB from the production host, followed by a migration script which among other things loads around 16GB of data files as large objects. This happens nightly.

But if we go and run the whole drop/restore and migration manually, the problem doesn't manifest itself right after migration is successfully finished.

Our next thought was that it must be dropdb part of the nightly script that removes the pg_largeobject's files (still we don't know what makes it keep them opened,) but dropping the DB doesn't manifest the problem either.

I'm currently running a VACUUM pg_largeobject on the problematic DB, to see if it triggers the problem, but this didn't happen so far.

At this point it would be nice to hear what are your thoughts. What could cause such behavior?

--
Regards,
Alex

Jeff Davis

pgsql@j-davis.com

over 14 years ago

In reply to: Alexander Shulgin (#1)

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

[ For future reference, -general is the appropriate list. Moving
discussion there. ]

On Sat, 2011-06-04 at 00:45 +0300, Alexander Shulgin wrote:

We've noticed that free disk space went down heavily on a system, and
after a short analysis determined that the reason was that postmaster
was holding lots of unlinked files open. A sample of lsof output was
something like this:

...

Restarting PostgreSQL obviously helps the issue and the disk space
occupied by those unlinked files (about 63GB actually) is reclaimed.

Normally postgres closes unlinked files during a checkpoint. How long
between checkpoints on this system? Is it possible that you noticed
before postgresql caused an automatic checkpoint?

Also, you can do a manual checkpoint with the CHECKPOINT command.

Regards,
Jeff Davis

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Alexander Shulgin (#1)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Alexander Shulgin's message of vie jun 03 17:45:28 -0400 2011:

There were about 450 such (or similar) files, all of them having /2613 in the filename. Since 2613 is a regclass of pg_largeobject and we are indeed working with quite a few large objects in that DB so this is where our problem lies we suspect.

Restarting PostgreSQL obviously helps the issue and the disk space occupied by those unlinked files (about 63GB actually) is reclaimed.

So what happens on that host is that we drop/restore a fresh version of the DB from the production host, followed by a migration script which among other things loads around 16GB of data files as large objects. This happens nightly.

What surprises me is that the open references remain after a database
drop. Surely this means that no backends keep open file descriptors to
any table in that database, because there are no connections.

I also requested Alexander to run a checkpoint and see if that made the
FDs go away (on the theory that bgwriter could be the culprit) -- no dice.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#3)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

What surprises me is that the open references remain after a database
drop. Surely this means that no backends keep open file descriptors to
any table in that database, because there are no connections.

bgwriter ...

regards, tom lane

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#4)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of sáb jun 04 12:49:05 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

What surprises me is that the open references remain after a database
drop. Surely this means that no backends keep open file descriptors to
any table in that database, because there are no connections.

bgwriter ...

Actually you were both wrong, hah. It's not bgwriter, and this doesn't
belong on pgsql-general. It's a backend. However, as we mentioned
initially, the database to which this file belongs is dropped.

What we found out after more careful investigation is that the file is
kept open by a backend connected to a different database. I have a
suspicion that what happened here is that this backend was forced to
flush out a page from shared buffers to read some other page; and it was
forced to do a fsync of this file. And then it forgets to close the
file descriptor.

Actually, there are 11 processes holding open file descriptors to the
table, each to a slightly different subset of the many segments of the
table. (There's also one holding a FD to the deleted
pg_largeobject_loid_pn_index -- remember, this is a dropped database).
All those backends belong to Zabbix, the monitoring platform, which are
connected to a different database.

I think what we have here is a new bug.

(This is running 8.4.8, by the way.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 14 years ago

In reply to: Alvaro Herrera (#5)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> wrote:

What we found out after more careful investigation is that the
file is kept open by a backend connected to a different database.
I have a suspicion that what happened here is that this backend
was forced to flush out a page from shared buffers to read some
other page; and it was forced to do a fsync of this file. And
then it forgets to close the file descriptor.

This sounds vaguely similar to what I found with WAL files being
held open for days after they were deleted by read-only backends:

http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us

I mention it only because there might be one place to fix both....

-Kevin

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Kevin Grittner (#6)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> wrote:

What we found out after more careful investigation is that the
file is kept open by a backend connected to a different database.
I have a suspicion that what happened here is that this backend
was forced to flush out a page from shared buffers to read some
other page; and it was forced to do a fsync of this file. And
then it forgets to close the file descriptor.

This sounds vaguely similar to what I found with WAL files being
held open for days after they were deleted by read-only backends:

http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us

I mention it only because there might be one place to fix both....

Hmm interesting. I don't think the placement suggested by Tom would be
useful, because the Zabbix backends are particularly busy all the time,
so they wouldn't run ProcessCatchupEvent at all.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#7)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> wrote:

What we found out after more careful investigation is that the
file is kept open by a backend connected to a different database.
I have a suspicion that what happened here is that this backend
was forced to flush out a page from shared buffers to read some
other page; and it was forced to do a fsync of this file. And
then it forgets to close the file descriptor.

It doesn't "forget" to close the descriptor; it intentionally keeps it
for possible further use.

This sounds vaguely similar to what I found with WAL files being
held open for days after they were deleted by read-only backends:
http://archives.postgresql.org/message-id/15412.1259630304@sss.pgh.pa.us
I mention it only because there might be one place to fix both....

Hmm interesting. I don't think the placement suggested by Tom would be
useful, because the Zabbix backends are particularly busy all the time,
so they wouldn't run ProcessCatchupEvent at all.

Yeah, I wasn't that thrilled with the suggestion either. But we can't
just have backends constantly closing every open FD they hold, or
performance will suffer. I don't see any very good place to do this...

regards, tom lane

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#8)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hmm interesting. I don't think the placement suggested by Tom would be
useful, because the Zabbix backends are particularly busy all the time,
so they wouldn't run ProcessCatchupEvent at all.

Yeah, I wasn't that thrilled with the suggestion either. But we can't
just have backends constantly closing every open FD they hold, or
performance will suffer. I don't see any very good place to do this...

How about doing something on an sinval message for pg_database?
That doesn't solve the WAL problem Kevin found, of course ...

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 14 years ago

In reply to: Alvaro Herrera (#9)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> wrote:

That doesn't solve the WAL problem Kevin found, of course ...

I wouldn't worry about that too much -- the WAL issue is
self-limiting and not likely to ever cause a failure. The biggest
risk is that every now and then some new individual will notice it
and waste a bit of time investigating. The LO issue, on the other
hand, could easily eat enough disk to cause a failure.

-Kevin

#11

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#9)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011:

Yeah, I wasn't that thrilled with the suggestion either. But we can't
just have backends constantly closing every open FD they hold, or
performance will suffer. I don't see any very good place to do this...

How about doing something on an sinval message for pg_database?
That doesn't solve the WAL problem Kevin found, of course ...

Hmm ... that would help for the specific scenario of dropped databases,
but we've also heard complaints about narrower cases such as a single
dropped table.

A bigger issue is that I don't believe it's very practical to scan the
FD array looking for files associated with a particular database (or
table). They aren't labeled that way, and parsing the file path to
find out the info seems pretty grotty.

On reflection I think this behavior is probably limited to the case
where we've done what we used to call a "blind write" of a block that
is unrelated to our database or tables. For normal SQL-driven accesses,
there's a relcache entry, and flushing of that entry will lead to
closure of associated files. I wonder whether we should go back to
forcibly closing the FD after a blind write. This would suck if a
backend had to do many dirty-buffer flushes for the same relation,
but hopefully the bgwriter is doing most of those. We'd want to make
sure such forced closure *doesn't* occur in the bgwriter. (If memory
serves, it has a checkpoint-driven closure mechanism instead.)

regards, tom lane

#12

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#11)

Re: Postmaster holding unlinked files for pg_largeobject table

On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011:

Yeah, I wasn't that thrilled with the suggestion either. But we can't
just have backends constantly closing every open FD they hold, or
performance will suffer. I don't see any very good place to do this...

How about doing something on an sinval message for pg_database?
That doesn't solve the WAL problem Kevin found, of course ...

Hmm ... that would help for the specific scenario of dropped databases,
but we've also heard complaints about narrower cases such as a single
dropped table.

A bigger issue is that I don't believe it's very practical to scan the
FD array looking for files associated with a particular database (or
table). They aren't labeled that way, and parsing the file path to
find out the info seems pretty grotty.

On reflection I think this behavior is probably limited to the case
where we've done what we used to call a "blind write" of a block that
is unrelated to our database or tables. For normal SQL-driven accesses,
there's a relcache entry, and flushing of that entry will lead to
closure of associated files. I wonder whether we should go back to
forcibly closing the FD after a blind write. This would suck if a
backend had to do many dirty-buffer flushes for the same relation,
but hopefully the bgwriter is doing most of those. We'd want to make
sure such forced closure *doesn't* occur in the bgwriter. (If memory
serves, it has a checkpoint-driven closure mechanism instead.)

Instead of closing them immediately, how about flagging the FD and
closing all the flagged FDs at the end of each query, or something
like that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#12)

Re: Postmaster holding unlinked files for pg_largeobject table

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

On reflection I think this behavior is probably limited to the case
where we've done what we used to call a "blind write" of a block that
is unrelated to our database or tables. �For normal SQL-driven accesses,
there's a relcache entry, and flushing of that entry will lead to
closure of associated files. �I wonder whether we should go back to
forcibly closing the FD after a blind write. �This would suck if a
backend had to do many dirty-buffer flushes for the same relation,
but hopefully the bgwriter is doing most of those. �We'd want to make
sure such forced closure *doesn't* occur in the bgwriter. �(If memory
serves, it has a checkpoint-driven closure mechanism instead.)

Instead of closing them immediately, how about flagging the FD and
closing all the flagged FDs at the end of each query, or something
like that?

Hmm, there's already a mechanism for closing "temp" FDs at the end of a
query ... maybe blind writes could use temp-like FDs?

regards, tom lane

#14

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#13)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

On reflection I think this behavior is probably limited to the case
where we've done what we used to call a "blind write" of a block that
is unrelated to our database or tables. For normal SQL-driven accesses,
there's a relcache entry, and flushing of that entry will lead to
closure of associated files. I wonder whether we should go back to
forcibly closing the FD after a blind write. This would suck if a
backend had to do many dirty-buffer flushes for the same relation,
but hopefully the bgwriter is doing most of those. We'd want to make
sure such forced closure *doesn't* occur in the bgwriter. (If memory
serves, it has a checkpoint-driven closure mechanism instead.)

Instead of closing them immediately, how about flagging the FD and
closing all the flagged FDs at the end of each query, or something
like that?

Hmm, there's already a mechanism for closing "temp" FDs at the end of a
query ... maybe blind writes could use temp-like FDs?

OK, I'll have a look at how blind writes work this afternoon and propose
something.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#15

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#13)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011:

Robert Haas <robertmhaas@gmail.com> writes:

Instead of closing them immediately, how about flagging the FD and
closing all the flagged FDs at the end of each query, or something
like that?

Hmm, there's already a mechanism for closing "temp" FDs at the end of a
query ... maybe blind writes could use temp-like FDs?

I don't think it can be made to work exactly like that. If I understand
correctly, the code involved here is the FlushBuffer() call that happens
during BufferAlloc(), and what we have at that point is a SMgrRelation;
we're several levels removed from actually being able to set the
FD_XACT_TEMPORARY flag which is what I think you're thinking of.

What I think would make some sense is to keep a list of SMgrRelations
that we opened during FlushBuffer that are foreign to the current
database, in the current ResourceOwner. That way, when the resowner is
destroyed, we would be able to smgrclose() them. This would only be
done when called by a backend, not bgwriter.

(I'm not really sure about requiring the buffer to belong to a relation
in another database, given the report that this is also a problem with
dropped tables. However, it certainly makes no sense to try to remember
*all* buffers.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#16

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#15)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011:

Hmm, there's already a mechanism for closing "temp" FDs at the end of a
query ... maybe blind writes could use temp-like FDs?

I don't think it can be made to work exactly like that. If I understand
correctly, the code involved here is the FlushBuffer() call that happens
during BufferAlloc(), and what we have at that point is a SMgrRelation;
we're several levels removed from actually being able to set the
FD_XACT_TEMPORARY flag which is what I think you're thinking of.

It's not *that* many levels: in fact, I think md.c is the only level
that would just have to pass it through without doing anything useful.
I think that working from there is a saner and more efficient approach
than what you're sketching.

If you want a concrete design sketch, consider this:

1. Add a flag to the SMgrRelation struct that has the semantics of "all
files opened through this SMgrRelation should be marked as transient,
causing them to be automatically closed at end of xact".

2. *Any* normal smgropen() call would reset this flag (since it suggests
that we are accessing the relation because of SQL activity). In the
single case where FlushBuffer() is called with reln == NULL, it would
set the flag after doing its local smgropen().

3. Then, modify md.c to pass the flag down to fd.c whenever opening an
FD file. fd.c sets a bit in the resulting VFD.

4. Extend CleanupTempFiles to close the kernel FD (but not release the
VFD) when a VFD has the bit set.

I'm fairly sure that CleanupTempFiles is never called in the bgwriter,
so we don't even need any special hack to prevent the flag from becoming
set in the bgwriter.

regards, tom lane

#17

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#16)

1 attachment(s)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of mar jun 07 12:26:34 -0400 2011:

It's not *that* many levels: in fact, I think md.c is the only level
that would just have to pass it through without doing anything useful.
I think that working from there is a saner and more efficient approach
than what you're sketching.

If you want a concrete design sketch, consider this:

Okay, here's a patch implementing this idea. It seems to work quite
well, and it solves the problem in a limited testing scenario -- I
haven't yet tested on the customer machines.

This customer is running on 8.4 so I started from there; should I
backpatch this to 8.2, or not at all? (I have all the branches ready
anyway.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

smgr-transient-files.patchapplication/octet-stream; name=smgr-transient-files.patchDownload

*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 1834,1840 **** BufferGetTag(Buffer buffer, RelFileNode *rnode, ForkNumber *forknum,
   * written.)
   *
   * If the caller has an smgr reference for the buffer's relation, pass it
!  * as the second parameter.  If not, pass NULL.
   */
  static void
  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
--- 1834,1843 ----
   * written.)
   *
   * If the caller has an smgr reference for the buffer's relation, pass it
!  * as the second parameter.  If not, pass NULL.  In the latter case, the
!  * relation will be marked as "transient" so that the corresponding
!  * kernel-level file descriptors are closed when the current transaction ends,
!  * if any.
   */
  static void
  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
***************
*** 1856,1864 **** FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
  	errcontext.previous = error_context_stack;
  	error_context_stack = &errcontext;
  
! 	/* Find smgr relation for buffer */
  	if (reln == NULL)
  		reln = smgropen(buf->tag.rnode, InvalidBackendId);
  
  	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
  										buf->tag.blockNum,
--- 1859,1870 ----
  	errcontext.previous = error_context_stack;
  	error_context_stack = &errcontext;
  
! 	/* Find smgr relation for buffer, and mark it as transient */
  	if (reln == NULL)
+ 	{
  		reln = smgropen(buf->tag.rnode, InvalidBackendId);
+ 		smgrsettransient(reln);
+ 	}
  
  	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
  										buf->tag.blockNum,
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
***************
*** 125,136 **** static int	max_safe_fds = 32;	/* default if not changed */
  /* these are the assigned bits in fdstate below: */
  #define FD_TEMPORARY		(1 << 0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1 << 1)	/* T = delete at eoXact */
  
  /*
!  * Flag to tell whether it's worth scanning VfdCache looking for temp files to
!  * close
   */
  static bool have_xact_temporary_files = false;
  
  typedef struct vfd
  {
--- 125,140 ----
  /* these are the assigned bits in fdstate below: */
  #define FD_TEMPORARY		(1 << 0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1 << 1)	/* T = delete at eoXact */
+ #define FD_XACT_TRANSIENT	(1 << 2)	/* T = close (not delete) at aoXact,
+ 										 * but keep VFD */
  
  /*
!  * Global status flags
   */
+ /* are there FD_XACT_TEMPORARY files? */
  static bool have_xact_temporary_files = false;
+ /* are there FD_XACT_TRANSIENT files? */
+ static bool have_xact_transient_files = false;
  
  typedef struct vfd
  {
***************
*** 1027,1032 **** OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError)
--- 1031,1075 ----
  }
  
  /*
+  * Set the transient flag on a file
+  *
+  * This flag tells CleanupTempFiles to close the kernel-level file descriptor
+  * (but not the VFD itself) at end of transaction.
+  */
+ void
+ FileSetTransient(File file)
+ {
+ 	Vfd		  *vfdP;
+ 
+ 	Assert(FileIsValid(file));
+ 
+ 	vfdP = &VfdCache[file];
+ 	vfdP->fdstate |= FD_XACT_TRANSIENT;
+ 
+ 	have_xact_transient_files = true;
+ }
+ 
+ /*
+  * Close a file at the kernel level, but keep the VFD open
+  */
+ static void
+ FileKernelClose(File file)
+ {
+ 	Vfd		  *vfdP;
+ 
+ 	Assert(FileIsValid(file));
+ 
+ 	vfdP = &VfdCache[file];
+ 
+ 	if (!FileIsNotOpen(file))
+ 	{
+ 		if (close(vfdP->fd))
+ 			elog(ERROR, "could not close file \"%s\": %m", vfdP->fileName);
+ 		vfdP->fd = VFD_CLOSED;
+ 	}
+ }
+ 
+ /*
   * close a file when done with it
   */
  void
***************
*** 1819,1833 **** CleanupTempFiles(bool isProcExit)
  	 * Careful here: at proc_exit we need extra cleanup, not just
  	 * xact_temporary files.
  	 */
! 	if (isProcExit || have_xact_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
  		{
  			unsigned short fdstate = VfdCache[i].fdstate;
  
! 			if ((fdstate & FD_TEMPORARY) && VfdCache[i].fileName != NULL)
  			{
  				/*
  				 * If we're in the process of exiting a backend process, close
  				 * all temporary files. Otherwise, only close temporary files
--- 1862,1878 ----
  	 * Careful here: at proc_exit we need extra cleanup, not just
  	 * xact_temporary files.
  	 */
! 	if (isProcExit || have_xact_temporary_files || have_xact_transient_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
  		{
  			unsigned short fdstate = VfdCache[i].fdstate;
  
! 			if (VfdCache[i].fileName != NULL)
  			{
+ 				if (fdstate & FD_TEMPORARY)
+ 				{
  				/*
  				 * If we're in the process of exiting a backend process, close
  				 * all temporary files. Otherwise, only close temporary files
***************
*** 1844,1853 **** CleanupTempFiles(bool isProcExit)
--- 1889,1902 ----
  						 VfdCache[i].fileName);
  					FileClose(i);
  				}
+ 				}
+ 				else if (fdstate & FD_XACT_TRANSIENT)
+ 					FileKernelClose(i);
  			}
  		}
  
  		have_xact_temporary_files = false;
+ 		have_xact_transient_files = false;
  	}
  
  	/* Clean up "allocated" stdio files and dirs. */
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
***************
*** 288,293 **** mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
--- 288,296 ----
  
  	pfree(path);
  
+ 	if (reln->smgr_transient)
+ 		FileSetTransient(fd);
+ 
  	reln->md_fd[forkNum] = _fdvec_alloc();
  
  	reln->md_fd[forkNum]->mdfd_vfd = fd;
***************
*** 542,547 **** mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior)
--- 545,553 ----
  
  	pfree(path);
  
+ 	if (reln->smgr_transient)
+ 		FileSetTransient(fd);
+ 
  	reln->md_fd[forknum] = mdfd = _fdvec_alloc();
  
  	mdfd->mdfd_vfd = fd;
***************
*** 1556,1561 **** _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
--- 1562,1570 ----
  	if (fd < 0)
  		return NULL;
  
+ 	if (reln->smgr_transient)
+ 		FileSetTransient(fd);
+ 
  	/* allocate an mdfdvec entry for it */
  	v = _fdvec_alloc();
  
*** a/src/backend/storage/smgr/smgr.c
--- b/src/backend/storage/smgr/smgr.c
***************
*** 165,170 **** smgropen(RelFileNode rnode, BackendId backend)
--- 165,171 ----
  		reln->smgr_targblock = InvalidBlockNumber;
  		reln->smgr_fsm_nblocks = InvalidBlockNumber;
  		reln->smgr_vm_nblocks = InvalidBlockNumber;
+ 		reln->smgr_transient = false;
  		reln->smgr_which = 0;	/* we only have md.c at present */
  
  		/* mark it not open */
***************
*** 176,181 **** smgropen(RelFileNode rnode, BackendId backend)
--- 177,195 ----
  }
  
  /*
+  * smgrsettransient -- mark an SMgrRelation object as transaction-bound
+  *
+  * The main effect of this is that all opened files are marked to be
+  * kernel-level closed (but not necessarily VFD-closed) when the current
+  * transaction ends.
+  */
+ void
+ smgrsettransient(SMgrRelation reln)
+ {
+ 	reln->smgr_transient = true;
+ }
+ 
+ /*
   * smgrsetowner() -- Establish a long-lived reference to an SMgrRelation object
   *
   * There can be only one owner at a time; this is sufficient since currently
*** a/src/include/storage/fd.h
--- b/src/include/storage/fd.h
***************
*** 61,66 **** extern int	max_files_per_process;
--- 61,67 ----
  /* Operations on virtual Files --- equivalent to Unix kernel file ops */
  extern File PathNameOpenFile(FileName fileName, int fileFlags, int fileMode);
  extern File OpenTemporaryFile(bool interXact);
+ extern void FileSetTransient(File file);
  extern void FileClose(File file);
  extern int	FilePrefetch(File file, off_t offset, int amount);
  extern int	FileRead(File file, char *buffer, int amount);
*** a/src/include/storage/smgr.h
--- b/src/include/storage/smgr.h
***************
*** 62,67 **** typedef struct SMgrRelationData
--- 62,68 ----
  	 * submodules.	Do not touch them from elsewhere.
  	 */
  	int			smgr_which;		/* storage manager selector */
+ 	bool		smgr_transient;	/* T if files are to be closed at EOXact */
  
  	/* for md.c; NULL for forks that are not open */
  	struct _MdfdVec *md_fd[MAX_FORKNUM + 1];
***************
*** 74,79 **** typedef SMgrRelationData *SMgrRelation;
--- 75,81 ----
  
  extern void smgrinit(void);
  extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
+ extern void smgrsettransient(SMgrRelation reln);
  extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
  extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
  extern void smgrclose(SMgrRelation reln);

#18

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#17)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

Okay, here's a patch implementing this idea. It seems to work quite
well, and it solves the problem in a limited testing scenario -- I
haven't yet tested on the customer machines.

This seems mostly sane, except I think you have not considered the
issue of when to clear the smgr_transient flag on an existing
SMgrRelation: if it starts getting used for "normal" accesses after
having by chance been used for a blind write, we don't want the
transient marking to persist. That's why I suggested having smgropen
always clear it.

Likewise, I think the FD_XACT_TRANSIENT flag on a VFD needs to go away
at some point, probably once it's actually been closed at EOXACT, though
there's doubtless more than one way to handle that.

This customer is running on 8.4 so I started from there; should I
backpatch this to 8.2, or not at all?

I'm not excited about back-patching it...

regards, tom lane

#19

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#18)

1 attachment(s)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Okay, here's a patch implementing this idea. It seems to work quite
well, and it solves the problem in a limited testing scenario -- I
haven't yet tested on the customer machines.

This seems mostly sane, except I think you have not considered the
issue of when to clear the smgr_transient flag on an existing
SMgrRelation: if it starts getting used for "normal" accesses after
having by chance been used for a blind write, we don't want the
transient marking to persist. That's why I suggested having smgropen
always clear it.

Likewise, I think the FD_XACT_TRANSIENT flag on a VFD needs to go away
at some point, probably once it's actually been closed at EOXACT, though
there's doubtless more than one way to handle that.

Aha, I see -- makes sense. Here's an updated patch.

This customer is running on 8.4 so I started from there; should I
backpatch this to 8.2, or not at all?

I'm not excited about back-patching it...

Bummer.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachments:

smgr-transient-files-2.patchapplication/octet-stream; name=smgr-transient-files-2.patchDownload

*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***************
*** 1834,1840 **** BufferGetTag(Buffer buffer, RelFileNode *rnode, ForkNumber *forknum,
   * written.)
   *
   * If the caller has an smgr reference for the buffer's relation, pass it
!  * as the second parameter.  If not, pass NULL.
   */
  static void
  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
--- 1834,1843 ----
   * written.)
   *
   * If the caller has an smgr reference for the buffer's relation, pass it
!  * as the second parameter.  If not, pass NULL.  In the latter case, the
!  * relation will be marked as "transient" so that the corresponding
!  * kernel-level file descriptors are closed when the current transaction ends,
!  * if any.
   */
  static void
  FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
***************
*** 1856,1864 **** FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
  	errcontext.previous = error_context_stack;
  	error_context_stack = &errcontext;
  
! 	/* Find smgr relation for buffer */
  	if (reln == NULL)
  		reln = smgropen(buf->tag.rnode, InvalidBackendId);
  
  	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
  										buf->tag.blockNum,
--- 1859,1870 ----
  	errcontext.previous = error_context_stack;
  	error_context_stack = &errcontext;
  
! 	/* Find smgr relation for buffer, and mark it as transient */
  	if (reln == NULL)
+ 	{
  		reln = smgropen(buf->tag.rnode, InvalidBackendId);
+ 		smgrsettransient(reln);
+ 	}
  
  	TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,
  										buf->tag.blockNum,
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
***************
*** 125,136 **** static int	max_safe_fds = 32;	/* default if not changed */
  /* these are the assigned bits in fdstate below: */
  #define FD_TEMPORARY		(1 << 0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1 << 1)	/* T = delete at eoXact */
  
! /*
!  * Flag to tell whether it's worth scanning VfdCache looking for temp files to
!  * close
!  */
! static bool have_xact_temporary_files = false;
  
  typedef struct vfd
  {
--- 125,135 ----
  /* these are the assigned bits in fdstate below: */
  #define FD_TEMPORARY		(1 << 0)	/* T = delete when closed */
  #define FD_XACT_TEMPORARY	(1 << 1)	/* T = delete at eoXact */
+ #define FD_XACT_TRANSIENT	(1 << 2)	/* T = close (not delete) at aoXact,
+ 										 * but keep VFD */
  
! /* Flag to tell whether there are files to close/delete at end of transaction */
! static bool have_pending_fd_cleanup = false;
  
  typedef struct vfd
  {
***************
*** 953,959 **** OpenTemporaryFile(bool interXact)
  		VfdCache[file].resowner = CurrentResourceOwner;
  
  		/* ensure cleanup happens at eoxact */
! 		have_xact_temporary_files = true;
  	}
  
  	return file;
--- 952,958 ----
  		VfdCache[file].resowner = CurrentResourceOwner;
  
  		/* ensure cleanup happens at eoxact */
! 		have_pending_fd_cleanup = true;
  	}
  
  	return file;
***************
*** 1027,1032 **** OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError)
--- 1026,1070 ----
  }
  
  /*
+  * Set the transient flag on a file
+  *
+  * This flag tells CleanupTempFiles to close the kernel-level file descriptor
+  * (but not the VFD itself) at end of transaction.
+  */
+ void
+ FileSetTransient(File file)
+ {
+ 	Vfd		  *vfdP;
+ 
+ 	Assert(FileIsValid(file));
+ 
+ 	vfdP = &VfdCache[file];
+ 	vfdP->fdstate |= FD_XACT_TRANSIENT;
+ 
+ 	have_pending_fd_cleanup = true;
+ }
+ 
+ /*
+  * Close a file at the kernel level, but keep the VFD open
+  */
+ static void
+ FileKernelClose(File file)
+ {
+ 	Vfd		  *vfdP;
+ 
+ 	Assert(FileIsValid(file));
+ 
+ 	vfdP = &VfdCache[file];
+ 
+ 	if (!FileIsNotOpen(file))
+ 	{
+ 		if (close(vfdP->fd))
+ 			elog(ERROR, "could not close file \"%s\": %m", vfdP->fileName);
+ 		vfdP->fd = VFD_CLOSED;
+ 	}
+ }
+ 
+ /*
   * close a file when done with it
   */
  void
***************
*** 1778,1785 **** AtEOSubXact_Files(bool isCommit, SubTransactionId mySubid,
   * particularly care which).  All still-open per-transaction temporary file
   * VFDs are closed, which also causes the underlying files to be deleted
   * (although they should've been closed already by the ResourceOwner
!  * cleanup). Furthermore, all "allocated" stdio files are closed. We also
!  * forget any transaction-local temp tablespace list.
   */
  void
  AtEOXact_Files(void)
--- 1816,1824 ----
   * particularly care which).  All still-open per-transaction temporary file
   * VFDs are closed, which also causes the underlying files to be deleted
   * (although they should've been closed already by the ResourceOwner
!  * cleanup). Transient files have their kernel file descriptors closed.
!  * Furthermore, all "allocated" stdio files are closed. We also forget any
!  * transaction-local temp tablespace list.
   */
  void
  AtEOXact_Files(void)
***************
*** 1802,1808 **** AtProcExit_Files(int code, Datum arg)
  }
  
  /*
!  * Close temporary files and delete their underlying files.
   *
   * isProcExit: if true, this is being called as the backend process is
   * exiting. If that's the case, we should remove all temporary files; if
--- 1841,1850 ----
  }
  
  /*
!  * General cleanup routine for fd.c.
!  *
!  * Temporary files are closed, and their underlying files deleted.
!  * Transient files are closed.
   *
   * isProcExit: if true, this is being called as the backend process is
   * exiting. If that's the case, we should remove all temporary files; if
***************
*** 1819,1853 **** CleanupTempFiles(bool isProcExit)
  	 * Careful here: at proc_exit we need extra cleanup, not just
  	 * xact_temporary files.
  	 */
! 	if (isProcExit || have_xact_temporary_files)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
  		{
  			unsigned short fdstate = VfdCache[i].fdstate;
  
! 			if ((fdstate & FD_TEMPORARY) && VfdCache[i].fileName != NULL)
  			{
! 				/*
! 				 * If we're in the process of exiting a backend process, close
! 				 * all temporary files. Otherwise, only close temporary files
! 				 * local to the current transaction. They should be closed by
! 				 * the ResourceOwner mechanism already, so this is just a
! 				 * debugging cross-check.
! 				 */
! 				if (isProcExit)
! 					FileClose(i);
! 				else if (fdstate & FD_XACT_TEMPORARY)
  				{
! 					elog(WARNING,
! 						 "temporary file %s not closed at end-of-transaction",
! 						 VfdCache[i].fileName);
! 					FileClose(i);
  				}
  			}
  		}
  
! 		have_xact_temporary_files = false;
  	}
  
  	/* Clean up "allocated" stdio files and dirs. */
--- 1861,1909 ----
  	 * Careful here: at proc_exit we need extra cleanup, not just
  	 * xact_temporary files.
  	 */
! 	if (isProcExit || have_pending_fd_cleanup)
  	{
  		Assert(FileIsNotOpen(0));		/* Make sure ring not corrupted */
  		for (i = 1; i < SizeVfdCache; i++)
  		{
  			unsigned short fdstate = VfdCache[i].fdstate;
  
! 			if (VfdCache[i].fileName != NULL)
  			{
! 				if (fdstate & FD_TEMPORARY)
! 				{
! 					/*
! 					 * If we're in the process of exiting a backend process, close
! 					 * all temporary files. Otherwise, only close temporary files
! 					 * local to the current transaction. They should be closed by
! 					 * the ResourceOwner mechanism already, so this is just a
! 					 * debugging cross-check.
! 					 */
! 					if (isProcExit)
! 						FileClose(i);
! 					else if (fdstate & FD_XACT_TEMPORARY)
! 					{
! 						elog(WARNING,
! 							 "temporary file %s not closed at end-of-transaction",
! 							 VfdCache[i].fileName);
! 						FileClose(i);
! 					}
! 				}
! 				else if (fdstate & FD_XACT_TRANSIENT)
  				{
! 					/*
! 					 * Close the kernel file descriptor, but also remove the
! 					 * flag from the VFD.  This is to ensure that if the VFD is
! 					 * reused in the future for non-transient access, we don't
! 					 * close it inappropriately then.
! 					 */
! 					FileKernelClose(i);
! 					VfdCache[i].fdstate &= ~FD_XACT_TRANSIENT;
  				}
  			}
  		}
  
! 		have_pending_fd_cleanup = false;
  	}
  
  	/* Clean up "allocated" stdio files and dirs. */
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
***************
*** 288,293 **** mdcreate(SMgrRelation reln, ForkNumber forkNum, bool isRedo)
--- 288,296 ----
  
  	pfree(path);
  
+ 	if (reln->smgr_transient)
+ 		FileSetTransient(fd);
+ 
  	reln->md_fd[forkNum] = _fdvec_alloc();
  
  	reln->md_fd[forkNum]->mdfd_vfd = fd;
***************
*** 542,547 **** mdopen(SMgrRelation reln, ForkNumber forknum, ExtensionBehavior behavior)
--- 545,553 ----
  
  	pfree(path);
  
+ 	if (reln->smgr_transient)
+ 		FileSetTransient(fd);
+ 
  	reln->md_fd[forknum] = mdfd = _fdvec_alloc();
  
  	mdfd->mdfd_vfd = fd;
***************
*** 1556,1561 **** _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
--- 1562,1570 ----
  	if (fd < 0)
  		return NULL;
  
+ 	if (reln->smgr_transient)
+ 		FileSetTransient(fd);
+ 
  	/* allocate an mdfdvec entry for it */
  	v = _fdvec_alloc();
  
*** a/src/backend/storage/smgr/smgr.c
--- b/src/backend/storage/smgr/smgr.c
***************
*** 165,181 **** smgropen(RelFileNode rnode, BackendId backend)
--- 165,198 ----
  		reln->smgr_targblock = InvalidBlockNumber;
  		reln->smgr_fsm_nblocks = InvalidBlockNumber;
  		reln->smgr_vm_nblocks = InvalidBlockNumber;
+ 		reln->smgr_transient = false;
  		reln->smgr_which = 0;	/* we only have md.c at present */
  
  		/* mark it not open */
  		for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
  			reln->md_fd[forknum] = NULL;
  	}
+ 	else
+ 		/* if it was transient before, it no longer is */
+ 		reln->smgr_transient = false;
  
  	return reln;
  }
  
  /*
+  * smgrsettransient() -- mark an SMgrRelation object as transaction-bound
+  *
+  * The main effect of this is that all opened files are marked to be
+  * kernel-level closed (but not necessarily VFD-closed) when the current
+  * transaction ends.
+  */
+ void
+ smgrsettransient(SMgrRelation reln)
+ {
+ 	reln->smgr_transient = true;
+ }
+ 
+ /*
   * smgrsetowner() -- Establish a long-lived reference to an SMgrRelation object
   *
   * There can be only one owner at a time; this is sufficient since currently
*** a/src/include/storage/fd.h
--- b/src/include/storage/fd.h
***************
*** 61,66 **** extern int	max_files_per_process;
--- 61,67 ----
  /* Operations on virtual Files --- equivalent to Unix kernel file ops */
  extern File PathNameOpenFile(FileName fileName, int fileFlags, int fileMode);
  extern File OpenTemporaryFile(bool interXact);
+ extern void FileSetTransient(File file);
  extern void FileClose(File file);
  extern int	FilePrefetch(File file, off_t offset, int amount);
  extern int	FileRead(File file, char *buffer, int amount);
*** a/src/include/storage/smgr.h
--- b/src/include/storage/smgr.h
***************
*** 62,67 **** typedef struct SMgrRelationData
--- 62,68 ----
  	 * submodules.	Do not touch them from elsewhere.
  	 */
  	int			smgr_which;		/* storage manager selector */
+ 	bool		smgr_transient;	/* T if files are to be closed at EOXact */
  
  	/* for md.c; NULL for forks that are not open */
  	struct _MdfdVec *md_fd[MAX_FORKNUM + 1];
***************
*** 74,79 **** typedef SMgrRelationData *SMgrRelation;
--- 75,81 ----
  
  extern void smgrinit(void);
  extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
+ extern void smgrsettransient(SMgrRelation reln);
  extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
  extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
  extern void smgrclose(SMgrRelation reln);

#20

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#19)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

This customer is running on 8.4 so I started from there; should I
backpatch this to 8.2, or not at all?

I'm not excited about back-patching it...

Bummer.

Well, of course mine is only one opinion; anybody else feel this *is*
worth risking a back-patch for?

My thought is that it needs some beta testing. Perhaps it'd be sane to
push it into beta2 now, and then back-patch sometime after 9.1 final,
if no problems pop up.

regards, tom lane

#21

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#20)

Re: Postmaster holding unlinked files for pg_largeobject table

On Thu, Jun 9, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

This customer is running on 8.4 so I started from there; should I
backpatch this to 8.2, or not at all?

I'm not excited about back-patching it...

Bummer.

Well, of course mine is only one opinion; anybody else feel this *is*
worth risking a back-patch for?

My thought is that it needs some beta testing. Perhaps it'd be sane to
push it into beta2 now, and then back-patch sometime after 9.1 final,
if no problems pop up.

I think it'd be sensible to back-patch it. I'm not sure whether now
or later. It's a bug fix that is biting real users in the field, so
it seems like we ought to do something about it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Robert Haas (#21)

Re: Postmaster holding unlinked files for pg_largeobject table

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 9, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My thought is that it needs some beta testing. �Perhaps it'd be sane to
push it into beta2 now, and then back-patch sometime after 9.1 final,
if no problems pop up.

I think it'd be sensible to back-patch it. I'm not sure whether now
or later. It's a bug fix that is biting real users in the field, so
it seems like we ought to do something about it.

I don't deny it's a bug fix; I'm just dubious about the risk-reward
ratio. As to risk: the patch isn't trivial (notice Alvaro didn't get it
right the first time). As to reward: it's been like that since forever,
so if the problem were really serious, we'd have identified it before.

Letting it age a bit during beta would do a lot to damp down the risk
side of the equation.

regards, tom lane

#23

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Tom Lane (#22)

Re: Postmaster holding unlinked files for pg_largeobject table

On Thu, Jun 9, 2011 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 9, 2011 at 2:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

My thought is that it needs some beta testing. Perhaps it'd be sane to
push it into beta2 now, and then back-patch sometime after 9.1 final,
if no problems pop up.

I think it'd be sensible to back-patch it. I'm not sure whether now
or later. It's a bug fix that is biting real users in the field, so
it seems like we ought to do something about it.

I don't deny it's a bug fix; I'm just dubious about the risk-reward
ratio. As to risk: the patch isn't trivial (notice Alvaro didn't get it
right the first time). As to reward: it's been like that since forever,
so if the problem were really serious, we'd have identified it before.

Letting it age a bit during beta would do a lot to damp down the risk
side of the equation.

OK by me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#20)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of jue jun 09 14:45:31 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of mié jun 08 14:28:02 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

This customer is running on 8.4 so I started from there; should I
backpatch this to 8.2, or not at all?

I'm not excited about back-patching it...

Bummer.

Well, of course mine is only one opinion; anybody else feel this *is*
worth risking a back-patch for?

My thought is that it needs some beta testing. Perhaps it'd be sane to
push it into beta2 now, and then back-patch sometime after 9.1 final,
if no problems pop up.

FWIW I was about to push it but noticed that regression tests fail with
this:

TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "/pgsql/source/HEAD/src/backend/access/index/indexam.c", Line: 283)

I just make distclean'd -- still there. I'm gonna revert my patch and
retry.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#25

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Alvaro Herrera (#24)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Alvaro Herrera's message of jue jun 09 16:02:14 -0400 2011:

Excerpts from Tom Lane's message of jue jun 09 14:45:31 -0400 2011:

My thought is that it needs some beta testing. Perhaps it'd be sane to
push it into beta2 now, and then back-patch sometime after 9.1 final,
if no problems pop up.

FWIW I was about to push it but noticed that regression tests fail with
this:

TRAP: FailedAssertion("!(!ReindexIsProcessingIndex(((indexRelation)->rd_id)))", File: "/pgsql/source/HEAD/src/backend/access/index/indexam.c", Line: 283)

I just make distclean'd -- still there. I'm gonna revert my patch and
retry.

That was pretty weird. I had rm'd the build tree and rebuilt, failure
still there. I then did a git reset FETCH_HEAD, recompiled, and the
problem was gone. git reset to my revision, failed. Then git clean
-dfx, nuked the build tree, redid the whole thing from scratch, no
problem. I guess there was a mismatch on the files that we build in the
source tree.

I have pushed it now.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#26

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#25)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

That was pretty weird. I had rm'd the build tree and rebuilt, failure
still there. I then did a git reset FETCH_HEAD, recompiled, and the
problem was gone. git reset to my revision, failed. Then git clean
-dfx, nuked the build tree, redid the whole thing from scratch, no
problem. I guess there was a mismatch on the files that we build in the
source tree.

git bug? I'd expect exactly that failure if you had the test changes
but not the source fixes from commit dccfb72892acabd25568539ec882cc44c57c25bd.

regards, tom lane

#27

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Alvaro Herrera (#25)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Alvaro Herrera's message of jue jun 09 16:34:13 -0400 2011:

I have pushed it now.

... and it caused a failure on the buildfarm, so I panicked and reverted
it. I think the patch below fixes it. Let me know if you think I
should push the whole thing again.

*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
***************
*** 1045,1070 **** FileSetTransient(File file)
  }

  /*
-  * Close a file at the kernel level, but keep the VFD open
-  */
- static void
- FileKernelClose(File file)
- {
- 	Vfd		  *vfdP;
- 
- 	Assert(FileIsValid(file));
- 
- 	vfdP = &VfdCache[file];
- 
- 	if (!FileIsNotOpen(file))
- 	{
- 		if (close(vfdP->fd))
- 			elog(ERROR, "could not close file \"%s\": %m", vfdP->fileName);
- 		vfdP->fd = VFD_CLOSED;
- 	}
- }
- 
- /*
   * close a file when done with it
   */
  void
--- 1045,1050 ----
***************
*** 1892,1903 **** CleanupTempFiles(bool isProcExit)
  				else if (fdstate & FD_XACT_TRANSIENT)
  				{
  					/*
! 					 * Close the kernel file descriptor, but also remove the
! 					 * flag from the VFD.  This is to ensure that if the VFD is
! 					 * reused in the future for non-transient access, we don't
! 					 * close it inappropriately then.
  					 */
! 					FileKernelClose(i);
  					VfdCache[i].fdstate &= ~FD_XACT_TRANSIENT;
  				}
  			}
--- 1872,1884 ----
  				else if (fdstate & FD_XACT_TRANSIENT)
  				{
  					/*
! 					 * Close the FD, and remove the entry from the LRU ring,
! 					 * but also remove the flag from the VFD.  This is to
! 					 * ensure that if the VFD is reused in the future for
! 					 * non-transient access, we don't close it inappropriately
! 					 * then.
  					 */
! 					LruDelete(i);
  					VfdCache[i].fdstate &= ~FD_XACT_TRANSIENT;
  				}
  			}
-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#28

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Alvaro Herrera (#27)

Re: Postmaster holding unlinked files for pg_largeobject table

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Alvaro Herrera's message of jue jun 09 16:34:13 -0400 2011:

I have pushed it now.

... and it caused a failure on the buildfarm, so I panicked and reverted
it. I think the patch below fixes it.

Actually, I think what you want is what closeAllVfds does, which
suggests that you need a FileIsNotOpen test too.

Let me know if you think I should push the whole thing again.

Yesterday I would have said okay, but now we're too close to the beta2
wrap deadline. Please hold off until beta2 is out.

regards, tom lane

#29

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Tom Lane (#28)

Re: Postmaster holding unlinked files for pg_largeobject table

Excerpts from Tom Lane's message of jue jun 09 17:10:10 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Alvaro Herrera's message of jue jun 09 16:34:13 -0400 2011:

I have pushed it now.

... and it caused a failure on the buildfarm, so I panicked and reverted
it. I think the patch below fixes it.

Actually, I think what you want is what closeAllVfds does, which
suggests that you need a FileIsNotOpen test too.

Yeah, I noticed that my proposed change doesn't solve the problem.

Let me know if you think I should push the whole thing again.

Yesterday I would have said okay, but now we're too close to the beta2
wrap deadline. Please hold off until beta2 is out.

Will do.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support