Patch to log usage of temporary files

Started by Bill Moranover 19 years ago46 messageshackers
Jump to latest
#1Bill Moran
wmoran@collaborativefusion.com

Thanks to Simon Riggs and Bruce for input that helped me put this together.

--
Bill Moran
Collaborative Fusion Inc.

Attachments:

trace_log_files.difftext/x-diff; charset=iso-8859-1; name=trace_log_files.diffDownload+32-20
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bill Moran (#1)
Re: Patch to log usage of temporary files

Bill Moran wrote:

Thanks to Simon Riggs and Bruce for input that helped me put this together.

Please change things to save the stat() syscall when the feature is not
in use.

Nitpick: also note our brace placement convention (though this would be
fixed by pgindent, but still).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Bill Moran (#1)
Re: Patch to log usage of temporary files

Bill Moran wrote:

+     	if (stat(vfdP->fileName, &filestats) == 0) {
+ 			if (trace_temp_files)

Shouldn't these tests be the other way around?

cheers

andrew

#4Bill Moran
wmoran@collaborativefusion.com
In reply to: Alvaro Herrera (#2)
Re: Patch to log usage of temporary files

In response to Alvaro Herrera <alvherre@commandprompt.com>:

Bill Moran wrote:

Thanks to Simon Riggs and Bruce for input that helped me put this together.

Please change things to save the stat() syscall when the feature is not
in use.

Do you have a suggestion on how to do that and still have the PG_TRACE1()
work? That was specifically requested by Simon Riggs.

I'm not at all familiar with how the PG_TRACE probes work, so I'd be
interested to hear suggestions on how to wrap that in an if. If I remove
the PG_TRACE, it becomes a simple matter to skip the stat() if the feature
is disabled.

Nitpick: also note our brace placement convention (though this would be
fixed by pgindent, but still).

Sorry, I thought I _was_ following the convention. Must have missed
something. Is there a written style guide somewhere? Might drive things
home a little better than just looking at other folks code.

--
Bill Moran
Collaborative Fusion Inc.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#4)
Re: Patch to log usage of temporary files

Bill Moran <wmoran@collaborativefusion.com> writes:

In response to Alvaro Herrera <alvherre@commandprompt.com>:

Please change things to save the stat() syscall when the feature is not
in use.

Do you have a suggestion on how to do that and still have the PG_TRACE1()
work? That was specifically requested by Simon Riggs.

Well, we are NOT paying a stat() call on every single file close,
whether Simon wants it or not. PG_TRACE1 doesn't even do anything
on non-Solaris platforms, for pete's sake.

Perhaps it would be reasonable to define trace_temp_files as the minimum
file size to log; then you could do something like

if (trace_temp_files > 0)
{
if (stat(vfdP->fileName, &filestats) < 0)
elog(LOG, ...);
else
{
if (filestats.st_size / BLCKSZ >= trace_temp_files)
ereport(LOG, ...);
PG_TRACE1(temp__file__cleanup, filestats.st_size);
}
}

Note that elog(ERROR) is quite inappropriate here.

regards, tom lane

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#5)
Re: [HACKERS] Patch to log usage of temporary files

On Tue, 2007-01-02 at 18:20 -0500, Tom Lane wrote:

Bill Moran <wmoran@collaborativefusion.com> writes:

In response to Alvaro Herrera <alvherre@commandprompt.com>:

Please change things to save the stat() syscall when the feature is not
in use.

Do you have a suggestion on how to do that and still have the PG_TRACE1()
work? That was specifically requested by Simon Riggs.

Well, we are NOT paying a stat() call on every single file close,
whether Simon wants it or not.

Simon doesn't/wouldn't want the stat() call on each file close.

If you put the PG_TRACE macro outside of the if test, yet prior to the
file close, you can pass the filename through like this

PG_TRACE1(temp__file__cleanup, vfdP->fileName);

That way DTrace can make its own call to find out filesize, if it would
like to... and we don't need to stat() before each temp file close.
That's much more flexible and useful, as well as better performance.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#7Bill Moran
wmoran@collaborativefusion.com
In reply to: Simon Riggs (#6)
Re: [HACKERS] Patch to log usage of temporary files

In response to "Simon Riggs" <simon@2ndquadrant.com>:

On Tue, 2007-01-02 at 18:20 -0500, Tom Lane wrote:

Bill Moran <wmoran@collaborativefusion.com> writes:

In response to Alvaro Herrera <alvherre@commandprompt.com>:

Please change things to save the stat() syscall when the feature is not
in use.

Do you have a suggestion on how to do that and still have the PG_TRACE1()
work? That was specifically requested by Simon Riggs.

Well, we are NOT paying a stat() call on every single file close,
whether Simon wants it or not.

Simon doesn't/wouldn't want the stat() call on each file close.

If you put the PG_TRACE macro outside of the if test, yet prior to the
file close, you can pass the filename through like this

PG_TRACE1(temp__file__cleanup, vfdP->fileName);

That way DTrace can make its own call to find out filesize, if it would
like to... and we don't need to stat() before each temp file close.
That's much more flexible and useful, as well as better performance.

OK, I think I've managed to adjust this patch so that everyone is happy ;)
and it's better as well.

* PG_TRACE will work whether the GUC var is enabled or not, it sends the
fileName, as suggested by Simon
* stat() call is not made if trace_temp_files is disabled
* trace_temp_files is now an int: -1 disables, 0 and up equate to "log if
the file is this size or larger"
* Cleaned things up a bit -- should be more in line with PostgreSQL
coding standards
* failed stat is reported as LOG instead of ERROR

Done a bit of testing here, and everything seems to be in order.

--
Bill Moran
Collaborative Fusion Inc.

Attachments:

trace_log_files.difftext/x-diff; charset=iso-8859-1; name=trace_log_files.diffDownload+40-8
#8Andrew Dunstan
andrew@dunslane.net
In reply to: Bill Moran (#7)
Re: [HACKERS] Patch to log usage of temporary files

Bill Moran wrote:

+ if (trace_temp_files != -1)

Might be more robust to say

if (trace_temp_files >= 0)

cheers

andrew

#9Bill Moran
wmoran@collaborativefusion.com
In reply to: Andrew Dunstan (#8)
Re: [HACKERS] Patch to log usage of temporary files

Andrew Dunstan <andrew@dunslane.net> wrote:

Bill Moran wrote:

+ if (trace_temp_files != -1)

Might be more robust to say

if (trace_temp_files >= 0)

Because it would allow for the easy addition of more negative numbers
with magic value?

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Bill Moran (#9)
Re: [HACKERS] Patch to log usage of temporary files

Bill Moran wrote:

Andrew Dunstan <andrew@dunslane.net> wrote:

Bill Moran wrote:

+ if (trace_temp_files != -1)

Might be more robust to say

if (trace_temp_files >= 0)

Because it would allow for the easy addition of more negative numbers
with magic value?

because ISTM any negative number here should mean no action is to be
taken. Otherwise how else is it different from 0?

cheers

andrew

#11Bill Moran
wmoran@collaborativefusion.com
In reply to: Andrew Dunstan (#10)
Re: [HACKERS] Patch to log usage of temporary files

In response to "Andrew Dunstan" <andrew@dunslane.net>:

Bill Moran wrote:

Andrew Dunstan <andrew@dunslane.net> wrote:

Bill Moran wrote:

+ if (trace_temp_files != -1)

Might be more robust to say

if (trace_temp_files >= 0)

Because it would allow for the easy addition of more negative numbers
with magic value?

because ISTM any negative number here should mean no action is to be
taken. Otherwise how else is it different from 0?

??

I specified in the GUC config that minimum allowable value is -1.

/usr/local/etc/rc.d/postgresql start
FATAL: -5 is outside the valid range for parameter "trace_temp_files" (-1 .. 2147483647)

set trace_temp_files to -8;
ERROR: -8 is outside the valid range for parameter "trace_temp_files" (-1 .. 2147483647)

Perhaps there's another reason to use the >= 0 check, but handling invalid
values with POLA doesn't seem to be a good one.

--
Bill Moran
Collaborative Fusion Inc.

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Bill Moran (#11)
Re: [HACKERS] Patch to log usage of temporary files

Bill Moran wrote:

In response to "Andrew Dunstan" <andrew@dunslane.net>:

Bill Moran wrote:

Andrew Dunstan <andrew@dunslane.net> wrote:

Bill Moran wrote:

+ if (trace_temp_files != -1)

Might be more robust to say

if (trace_temp_files >= 0)

Because it would allow for the easy addition of more negative numbers
with magic value?

because ISTM any negative number here should mean no action is to be
taken. Otherwise how else is it different from 0?

??

I specified in the GUC config that minimum allowable value is -1.

OK, missed that, sorry. Please return to normal viewing ...

cheers

andrew

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#11)
Re: [HACKERS] Patch to log usage of temporary files

Bill Moran <wmoran@collaborativefusion.com> writes:

Andrew Dunstan <andrew@dunslane.net> wrote:

Might be more robust to say
if (trace_temp_files >= 0)

I specified in the GUC config that minimum allowable value is -1.

I'd still tend to go with Andrew's suggestion because it makes this
particular bit of code self-defending against bad values. Yes, it's
reasonably safe given that bit of coding way over yonder in guc.c,
but there's no particularly good reason why this code has to depend
on that to avoid doing something stupid. And it's easier to understand
too --- you don't have to go looking in guc.c to convince yourself it's
safe.

regards, tom lane

#14Bill Moran
wmoran@collaborativefusion.com
In reply to: Tom Lane (#13)
Re: [HACKERS] Patch to log usage of temporary files

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

Andrew Dunstan <andrew@dunslane.net> wrote:

Might be more robust to say
if (trace_temp_files >= 0)

I specified in the GUC config that minimum allowable value is -1.

I'd still tend to go with Andrew's suggestion because it makes this
particular bit of code self-defending against bad values. Yes, it's
reasonably safe given that bit of coding way over yonder in guc.c,
but there's no particularly good reason why this code has to depend
on that to avoid doing something stupid. And it's easier to understand
too --- you don't have to go looking in guc.c to convince yourself it's
safe.

Ahh ... well, I've probably already argued about it more than it's worth.
The patch is easy enough to adjust, find attached.

--
Bill Moran
Collaborative Fusion Inc.

Attachments:

trace_log_files.difftext/x-diff; charset=iso-8859-1; name=trace_log_files.diffDownload+40-8
#15Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bill Moran (#7)
Re: [HACKERS] Patch to log usage of temporary files

On Jan 3, 2007, at 4:20 PM, Bill Moran wrote:

* trace_temp_files is now an int: -1 disables, 0 and up equate to
"log if
the file is this size or larger"

Another thought is to allow ignoring files over a certain size. The
reason is that if you end up creating 10MB of temp files, you can
probably avoid that by adjusting work_mem. But if you just created
10GB, you probably have no choice in the matter.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#16Bruce Momjian
bruce@momjian.us
In reply to: Bill Moran (#14)
Re: [PATCHES] Patch to log usage of temporary files

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Bill Moran wrote:

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

Andrew Dunstan <andrew@dunslane.net> wrote:

Might be more robust to say
if (trace_temp_files >= 0)

I specified in the GUC config that minimum allowable value is -1.

I'd still tend to go with Andrew's suggestion because it makes this
particular bit of code self-defending against bad values. Yes, it's
reasonably safe given that bit of coding way over yonder in guc.c,
but there's no particularly good reason why this code has to depend
on that to avoid doing something stupid. And it's easier to understand
too --- you don't have to go looking in guc.c to convince yourself it's
safe.

Ahh ... well, I've probably already argued about it more than it's worth.
The patch is easy enough to adjust, find attached.

--
Bill Moran
Collaborative Fusion Inc.

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#17Bruce Momjian
bruce@momjian.us
In reply to: Bill Moran (#14)
Re: [HACKERS] Patch to log usage of temporary files

Bill Moran wrote:

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

Andrew Dunstan <andrew@dunslane.net> wrote:

Might be more robust to say
if (trace_temp_files >= 0)

I specified in the GUC config that minimum allowable value is -1.

I'd still tend to go with Andrew's suggestion because it makes this
particular bit of code self-defending against bad values. Yes, it's
reasonably safe given that bit of coding way over yonder in guc.c,
but there's no particularly good reason why this code has to depend
on that to avoid doing something stupid. And it's easier to understand
too --- you don't have to go looking in guc.c to convince yourself it's
safe.

Ahh ... well, I've probably already argued about it more than it's worth.
The patch is easy enough to adjust, find attached.

I have applied a modified version of your patch. I renamed the
parameter to 'log_temp_files', for consistency, added documentation, and
improved the wording, particularly mentioning that the logging happens
at file deletion time.

Thanks.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload+49-3
#18Bill Moran
wmoran@collaborativefusion.com
In reply to: Bruce Momjian (#17)
Re: [HACKERS] Patch to log usage of temporary files

In response to Bruce Momjian <bruce@momjian.us>:

I have applied a modified version of your patch. I renamed the
parameter to 'log_temp_files', for consistency, added documentation, and
improved the wording, particularly mentioning that the logging happens
at file deletion time.

Thanks.

--
Bill Moran
Collaborative Fusion Inc.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#17)
Re: [HACKERS] Patch to log usage of temporary files

Bruce Momjian <bruce@momjian.us> writes:

+         A value of zero logs all temporary files, and positive
+         values log only files whose size is equal or greater than
+         the specified number of bytes.

Surely the measurement unit should be kbytes or disk blocks. And why
aren't you using that GUC UNITS infrastructure Peter put in?

/* reset flag so that die() interrupt won't cause problems */
vfdP->fdstate &= ~FD_TEMPORARY;
+ 		PG_TRACE1(temp__file__cleanup, vfdP->fileName);
+ 		if (log_temp_files >= 0)
+ 		{
+ 			if (stat(vfdP->fileName, &filestats) == 0)

The TRACE is in the wrong place no? I thought it was going to be after
the stat() operation so it could pass the file size.

Also, I dunno much about DTrace, but I had the idea that you can't
simply throw a PG_TRACE macro into the source and think you are done
--- isn't there a file of probe declarations to add to?  Not to mention
the documentation of what probes exist.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#19)
Re: [HACKERS] Patch to log usage of temporary files

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

+         A value of zero logs all temporary files, and positive
+         values log only files whose size is equal or greater than
+         the specified number of bytes.

Surely the measurement unit should be kbytes or disk blocks. And why
aren't you using that GUC UNITS infrastructure Peter put in?

Agreed. I have applied the following patch to make it kilobytes, and
documented it. I didn't put '-1kB' in the postgresql.conf file because
the -1 value is special. (ideas?)

/* reset flag so that die() interrupt won't cause problems */
vfdP->fdstate &= ~FD_TEMPORARY;
+ 		PG_TRACE1(temp__file__cleanup, vfdP->fileName);
+ 		if (log_temp_files >= 0)
+ 		{
+ 			if (stat(vfdP->fileName, &filestats) == 0)

The TRACE is in the wrong place no? I thought it was going to be after
the stat() operation so it could pass the file size.

Also, I dunno much about DTrace, but I had the idea that you can't
simply throw a PG_TRACE macro into the source and think you are done
--- isn't there a file of probe declarations to add to?  Not to mention
the documentation of what probes exist.

I didn't like the macro in that area anyway. It seems too adhock to
just throw it in when we have so few places monitored now. Removed.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload+8-8
#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
#23Bill Moran
wmoran@collaborativefusion.com
In reply to: Tom Lane (#21)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#23)
#26Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
#27Benny Amorsen
benny+usenet@amorsen.dk
In reply to: Tom Lane (#24)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#20)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#28)
#31Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#29)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#30)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#29)
#34Guillaume Smet
guillaume.smet@gmail.com
In reply to: Bill Moran (#1)
#35Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#32)
#36Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#35)
#37Bruce Momjian
bruce@momjian.us
In reply to: Guillaume Smet (#34)
#38Guillaume Smet
guillaume.smet@gmail.com
In reply to: Bruce Momjian (#37)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guillaume Smet (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Guillaume Smet (#38)
#41Guillaume Smet
guillaume.smet@gmail.com
In reply to: Bruce Momjian (#40)
#42Guillaume Smet
guillaume.smet@gmail.com
In reply to: Tom Lane (#39)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guillaume Smet (#42)
#44Bruce Momjian
bruce@momjian.us
In reply to: Guillaume Smet (#41)
#45Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#43)
#46Bill Moran
wmoran@collaborativefusion.com
In reply to: Guillaume Smet (#41)