pg_stat_statements temporary file

Started by Magnus Haganderover 13 years ago18 messages
#1Magnus Hagander
magnus@hagander.net

If pg_stat_statements is set to store it's data across restarts, it
stores it in global/pg_stat_statements.stat. This causes some
interesting things to happen in combination with a base backup -
namely, if you take a base backup *after* you have restarted th
emaster, the slave will get a snapshot of whatever was in the
temporary file at the time of the restart. This is quite unpredictable
- particularly in relation to a slave where it gets a snapshot from
the last restart, not from the base backup, after which it diverges.
AFAICT, it also has the property that if the server crashes, it will
reload the latest snapshot - not reset to 0 or anything like that.

Finally, if the server were to crash *while* the file is being
written, it will get a corrupt file (I haven't tested this part, but
it's rather obvious from the code). I'm pretty sure this could lead to
a situation where the database wouldn't restart.

Fixing the last part is easy - we need to write the file to a
temporary file and then rename() it into place, like we do with the
stats collectors file.

Fixing the first one, I can think of a few things:

1) unlink() the file after we've read it.
2) forcibly exclude the file from base backups taken with
pg_basebackup. We'd still have the problem when it comes to backups
taken manually.
3) avoid loading the file on a standby (that wouldn't fix the similar
problem on the master of course)

And perhaps some other solution I haven't thought of?

If we want to go with option 2, we have another problem - it's in the
global directory. But the name is dependent on what's in a contrib
module. Meaning we'd have to teach core postgresql about a contrib
modules filename, which pretty much breaks the abstraction layer.
Perhaps a better choice here would be to create another directory
under the data directory that is always excluded from base backup, and
store it there? That would also work for third party modules that core
can never learn about...

In general, should a contrib module really store data in the global/
directory? Seems pretty ugly to me...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#2Peter Geoghegan
peter@2ndquadrant.com
In reply to: Magnus Hagander (#1)
Re: pg_stat_statements temporary file

On 24 May 2012 11:43, Magnus Hagander <magnus@hagander.net> wrote:

In general, should a contrib module really store data in the global/
directory? Seems pretty ugly to me...

I think the case could be made for moving pg_stat_statements into
core, as an optionally enabled view, like pg_stat_user_functions,
since pg_stat_statements is now rather a lot more useful than it used
to be. That would solve that problem, as well as putting
pg_stat_statements into the hands of the largest possible number of
people, which would be a positive development, in my humble and fairly
predictable opinion.

However, pg_stat_statements will not prevent the database from
starting if the file is corrupt. It makes some basic attempts to
detect that within pgss_shmem_startup(), and will simply log the
problem and unlink the file in the event of detecting corruption.
Otherwise, I suppose you might get garbage values in
pg_stat_statements, which, while rather annoying and possibly
unacceptable, is hardly the end of the world.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#3Magnus Hagander
magnus@hagander.net
In reply to: Peter Geoghegan (#2)
Re: pg_stat_statements temporary file

On Thu, May 24, 2012 at 1:36 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 24 May 2012 11:43, Magnus Hagander <magnus@hagander.net> wrote:

In general, should a contrib module really store data in the global/
directory? Seems pretty ugly to me...

I think the case could be made for moving pg_stat_statements into
core, as an optionally enabled view, like pg_stat_user_functions,
since pg_stat_statements is now rather a lot more useful than it used
to be. That would solve that problem, as well as putting
pg_stat_statements into the hands of the largest possible number of
people, which would be a positive development, in my humble and fairly
predictable opinion.

Well, it would solve the problem for this specific case - but there
will always be yet another extension. Actually, it would only solve
the *ugliness*, and not the actual problem.

(That's not to say tha tI don't agree that moving it into core would
be a good idea, but that's not happening for 9.2 - and the problem
exists in 9.1 as well)

However, pg_stat_statements will not prevent the database from
starting if the file is corrupt. It makes some basic attempts to
detect that within pgss_shmem_startup(), and will simply log the
problem and unlink the file in the event of detecting corruption.
Otherwise, I suppose you might get garbage values in
pg_stat_statements, which, while rather annoying and possibly
unacceptable, is hardly the end of the world.

Ok. I was worried it might crash on loading the data when it was
corrupt - say a size field that ended up specifying gigabytes that it
then tries to allocate, or something like that.

What actually happens if it tries to repalloc() something huge? palloc
will throw an elog(ERROR), and since this happens during postmaster
startup, are you sure it won't prevent the server from starting?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#4Peter Geoghegan
peter@2ndquadrant.com
In reply to: Magnus Hagander (#3)
Re: pg_stat_statements temporary file

On 24 May 2012 12:42, Magnus Hagander <magnus@hagander.net> wrote:

What actually happens if it tries to repalloc() something huge? palloc
will throw an elog(ERROR), and since this happens during postmaster
startup, are you sure it won't prevent the server from starting?

Oh, yes, missed that.

/* Previous incarnation might have had a larger query_size */
if (temp.query_len >= buffer_size)
{
buffer = (char *) repalloc(buffer, temp.query_len + 1);
buffer_size = temp.query_len + 1;
}

Here, "temp" receives its value from an fread().

This could probably be coded to be defensive against such things, but
a better fix would be preferred. I have to wonder how much of a
problem corruption is likely to be though, given that we only save to
disk in a corresponding pgss_shmem_shutdown() call, which actually has
more protections against corruption. The window for the saved file to
be corrupt seems rather small, though I accept that a better window
would be zero.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#5Magnus Hagander
magnus@hagander.net
In reply to: Peter Geoghegan (#4)
Re: pg_stat_statements temporary file

On Thu, May 24, 2012 at 2:16 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 24 May 2012 12:42, Magnus Hagander <magnus@hagander.net> wrote:

What actually happens if it tries to repalloc() something huge? palloc
will throw an elog(ERROR), and since this happens during postmaster
startup, are you sure it won't prevent the server from starting?

Oh, yes, missed that.

               /* Previous incarnation might have had a larger query_size */
               if (temp.query_len >= buffer_size)
               {
                       buffer = (char *) repalloc(buffer, temp.query_len + 1);
                       buffer_size = temp.query_len + 1;
               }

Here, "temp" receives its value from an fread().

This could probably be coded to be defensive against such things, but
a better fix would be preferred. I have to wonder how much of a
problem corruption is likely to be though, given that we only save to
disk in a corresponding pgss_shmem_shutdown() call, which actually has
more protections against corruption. The window for the saved file to
be corrupt seems rather small, though I accept that a better window
would be zero.

Right. But writing to a temp file and rename()ing it into place is trivial.

It's really the other issues raised that are bigger ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#6Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#5)
1 attachment(s)
Re: pg_stat_statements temporary file

On Thu, May 24, 2012 at 2:19 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, May 24, 2012 at 2:16 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 24 May 2012 12:42, Magnus Hagander <magnus@hagander.net> wrote:

What actually happens if it tries to repalloc() something huge? palloc
will throw an elog(ERROR), and since this happens during postmaster
startup, are you sure it won't prevent the server from starting?

Oh, yes, missed that.

               /* Previous incarnation might have had a larger query_size */
               if (temp.query_len >= buffer_size)
               {
                       buffer = (char *) repalloc(buffer, temp.query_len + 1);
                       buffer_size = temp.query_len + 1;
               }

Here, "temp" receives its value from an fread().

This could probably be coded to be defensive against such things, but
a better fix would be preferred. I have to wonder how much of a
problem corruption is likely to be though, given that we only save to
disk in a corresponding pgss_shmem_shutdown() call, which actually has
more protections against corruption. The window for the saved file to
be corrupt seems rather small, though I accept that a better window
would be zero.

Right. But writing to a temp file and rename()ing it into place is trivial.

It's really the other issues raised that are bigger ;)

Here's a patch that does the two easy fixes:
1) writes the file to a temp file and rename()s it over the main file
as it writes down. This removes the (small) risk of corruption because
of a crash during write

2) unlinks the file after reading it. this makes sure it's not
included in online backups.

I still think we should consider the placement of this file to not be
in the global/ directory, but this is a quick (back-patchable) fix...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachments:

pg_stat_statements_file.patchapplication/octet-stream; name=pg_stat_statements_file.patchDownload
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8347c8e..1ebda8b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -511,6 +511,13 @@ pgss_shmem_startup(void)
 
 	pfree(buffer);
 	FreeFile(file);
+
+	/*
+	 * Remove the file so it's not included in backups/replication
+	 * slaves, etc. A new file will be written on next shutdown.
+	 */
+	unlink(PGSS_DUMP_FILE);
+
 	return;
 
 error:
@@ -552,7 +559,7 @@ pgss_shmem_shutdown(int code, Datum arg)
 	if (!pgss_save)
 		return;
 
-	file = AllocateFile(PGSS_DUMP_FILE, PG_BINARY_W);
+	file = AllocateFile(PGSS_DUMP_FILE ".tmp", PG_BINARY_W);
 	if (file == NULL)
 		goto error;
 
@@ -578,6 +585,17 @@ pgss_shmem_shutdown(int code, Datum arg)
 		goto error;
 	}
 
+	/*
+	 * Rename file into place, so we atomically replace the old one.
+	 */
+	if (rename(PGSS_DUMP_FILE ".tmp", PGSS_DUMP_FILE) != 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename pg_stat_statement file \"%s.tmp\": %m",
+						PGSS_DUMP_FILE)));
+	}
+
 	return;
 
 error:
@@ -587,7 +605,7 @@ error:
 					PGSS_DUMP_FILE)));
 	if (file)
 		FreeFile(file);
-	unlink(PGSS_DUMP_FILE);
+	unlink(PGSS_DUMP_FILE ".tmp");
 }
 
 /*
#7Peter Geoghegan
peter@2ndquadrant.com
In reply to: Magnus Hagander (#6)
Re: pg_stat_statements temporary file

On 25 May 2012 14:13, Magnus Hagander <magnus@hagander.net> wrote:

Here's a patch that does the two easy fixes:
1) writes the file to a temp file and rename()s it over the main file
as it writes down. This removes the (small) risk of corruption because
of a crash during write

2) unlinks the file after reading it. this makes sure it's not
included in online backups.

Seems reasonable. It might be better to consistently concatenate the
string literals PGSS_DUMP_FILE and ".tmp" statically. Also, I'd have
updated the string in the errmsg callsite after the "error" tag too,
to refer to the tmp file rather than the file proper. Forgive the
pedantry, but I should mention that I believe that it is project
policy to not use squiggly parenthesis following an if expression when
that is unnecessary due to there only being a single statement.

I still think we should consider the placement of this file to not be
in the global/ directory, but this is a quick (back-patchable) fix...

Where do you suggest the file be written to?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#8Andres Freund
andres@2ndquadrant.com
In reply to: Peter Geoghegan (#7)
Re: pg_stat_statements temporary file

On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:

I still think we should consider the placement of this file to not be
in the global/ directory, but this is a quick (back-patchable) fix...

Where do you suggest the file be written to?

One could argue stats_temp_directory would be the correct place.

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#7)
Re: pg_stat_statements temporary file

Peter Geoghegan <peter@2ndquadrant.com> writes:

On 25 May 2012 14:13, Magnus Hagander <magnus@hagander.net> wrote:

I still think we should consider the placement of this file to not be
in the global/ directory, but this is a quick (back-patchable) fix...

Where do you suggest the file be written to?

Given that pgstats keeps its permanent file in global/, I think the
argument that pg_stat_statements should not do likewise is pretty thin.

regards, tom lane

#10Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#9)
Re: pg_stat_statements temporary file

On Fri, May 25, 2012 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <peter@2ndquadrant.com> writes:

On 25 May 2012 14:13, Magnus Hagander <magnus@hagander.net> wrote:

I still think we should consider the placement of this file to not be
in the global/ directory, but this is a quick (back-patchable) fix...

Where do you suggest the file be written to?

Given that pgstats keeps its permanent file in global/, I think the
argument that pg_stat_statements should not do likewise is pretty thin.

Fair enough. As long as the file is unlinked after reading (per my
patch), it doesn't cause issues on a standby anymore, so it's a lot
less important, I guess. It's mostly "namespace invasion" at this
time...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#10)
Re: pg_stat_statements temporary file

Magnus Hagander <magnus@hagander.net> writes:

On Fri, May 25, 2012 at 4:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given that pgstats keeps its permanent file in global/, I think the
argument that pg_stat_statements should not do likewise is pretty thin.

Fair enough. As long as the file is unlinked after reading (per my
patch), it doesn't cause issues on a standby anymore, so it's a lot
less important, I guess. It's mostly "namespace invasion" at this
time...

Well, I could support moving both of those stats files someplace else,
but it seems neatnik-ism more than something we have a definable need
for.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: pg_stat_statements temporary file

Andres Freund <andres@2ndquadrant.com> writes:

On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:

Where do you suggest the file be written to?

One could argue stats_temp_directory would be the correct place.

No, that would be exactly the *wrong* place, because that directory can
be on a RAM disk. We need to put this somewhere where it'll survive
a shutdown.

One could imagine creating a PGDATA subdirectory just for permanent (not
temp) stats files, but right at the moment that seems like overkill.
If we accumulate a few more similar files, I'd start to think it was
worth doing.

regards, tom lane

#13Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#12)
Re: pg_stat_statements temporary file

On 5/25/12 8:19 AM, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:

Where do you suggest the file be written to?

One could argue stats_temp_directory would be the correct place.

No, that would be exactly the *wrong* place, because that directory can
be on a RAM disk. We need to put this somewhere where it'll survive
a shutdown.

Mind you, I can imagine a busy system wanting to keep PSS on a ram disk
as well. But users should be able to make that decision separately from
the stats file.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#14Magnus Hagander
magnus@hagander.net
In reply to: Josh Berkus (#13)
Re: pg_stat_statements temporary file

On Fri, May 25, 2012 at 6:49 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 5/25/12 8:19 AM, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:

Where do you suggest the file be written to?

One could argue stats_temp_directory would be the correct place.

No, that would be exactly the *wrong* place, because that directory can
be on a RAM disk.  We need to put this somewhere where it'll survive
a shutdown.

Mind you, I can imagine a busy system wanting to keep PSS on a ram disk
as well. But users should be able to make that decision separately from
the stats file.

Why would they want that? PSS only writes the tempfile on shutdown and
reads it on startup. Unlike pgstats which reads and writes it all the
time.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#15Josh Berkus
josh@agliodbs.com
In reply to: Magnus Hagander (#14)
Re: pg_stat_statements temporary file

Why would they want that? PSS only writes the tempfile on shutdown and
reads it on startup. Unlike pgstats which reads and writes it all the
time.

Ah, ok! Didn't know that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#16Magnus Hagander
magnus@hagander.net
In reply to: Peter Geoghegan (#7)
Re: pg_stat_statements temporary file

On Friday, May 25, 2012, Peter Geoghegan wrote:

On 25 May 2012 14:13, Magnus Hagander <magnus@hagander.net <javascript:;>>
wrote:

Here's a patch that does the two easy fixes:
1) writes the file to a temp file and rename()s it over the main file
as it writes down. This removes the (small) risk of corruption because
of a crash during write

2) unlinks the file after reading it. this makes sure it's not
included in online backups.

Seems reasonable. It might be better to consistently concatenate the
string literals PGSS_DUMP_FILE and ".tmp" statically. Also, I'd have
updated the string in the errmsg callsite after the "error" tag too,
to refer to the tmp file rather than the file proper. Forgive the

Agreed on the first one, and oops-forgot on the second one.

pedantry, but I should mention that I believe that it is project
policy to not use squiggly parenthesis following an if expression when
that is unnecessary due to there only being a single statement.

Good point too - I had some other code there as well during testing, and
didn't clean it up all the way. Thanks for pointing it out!

Will apply with those fixes.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#17Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#12)
Re: pg_stat_statements temporary file

On Friday, May 25, 2012, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com <javascript:;>> writes:

On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:

Where do you suggest the file be written to?

One could argue stats_temp_directory would be the correct place.

No, that would be exactly the *wrong* place, because that directory can
be on a RAM disk. We need to put this somewhere where it'll survive
a shutdown.

One could imagine creating a PGDATA subdirectory just for permanent (not
temp) stats files, but right at the moment that seems like overkill.
If we accumulate a few more similar files, I'd start to think it was
worth doing.

That's pretty much what I was thinking. But yeah, at the time it's probably
overkill - the main use today would be for better isolation of non-core
extensions, but I'm not sure there are enough of those that want to write
files in the data directory to care about..

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#18Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#12)
Re: pg_stat_statements temporary file

On Friday, May 25, 2012 05:19:28 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:

Where do you suggest the file be written to?

One could argue stats_temp_directory would be the correct place.

No, that would be exactly the *wrong* place, because that directory can
be on a RAM disk. We need to put this somewhere where it'll survive
a shutdown.

I had assumed it would do the writeout regularly to survive a database crash.
As it does not do that my argument is clearly bogus, sorry for that.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services