patch review : Add ability to constrain backend temporary file space

Started by Cédric Villemainalmost 15 years ago45 messageshackers
Jump to latest
#1Cédric Villemain
cedric.villemain.debian@gmail.com

Hello

here is a partial review of your patch, better than keeping it
sleeping in the commitfest queue I hope.

Submission review
================

* The patch is not in context diff format.
* The patch apply, but contains some extra whitespace.
* Documentation is here but not explicit about 'temp tables',
maybe worth adding that this won't limit temporary table size ?
* There is no test provided. One can be expected to check that the
feature work.

Code review
=========

* in fd.c, I think that "temporary_files_size -=
(double)vfdP->fileSize;" should be done later in the function once we
have successfully unlink the file, not before.

* I am not sure it is better to add a fileSize like you did or use
relationgetnumberofblock() when file is about to be truncated or
unlinked, this way the seekPos should be enough to increase the global
counter.

* temporary_files_size, I think it is better to have a number of pages
à la postgresql than a kilobyte size

* max_temp_files_size, I'll prefer an approach like shared_buffers
GUC: you can use pages, or KB, MB, ...

Simple Feature test
==============

either explain buffers is wrong or the patch is wrong:
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
time=364.373..518.940 rows=100000 loops=1)
Sort Key: generate_series
Sort Method: external merge Disk: 1352kB
Buffers: local hit=393, temp read=249 written=249
-> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
(actual time=0.025..138.754 rows=100000 loops=1)
Buffers: local hit=393
Total runtime: 642.874 ms
(7 rows)

cedric=# set max_temp_files_size to 1900;
SET
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size
STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size

Do you have some testing method I can apply to track that without
explain (analyze, buffers) before going to low-level monitoring ?

Architecture review
==============

max_temp_files_size is used for the global space used per backend.
Based on how work_mem work I expect something like "work_disk" to
limit per file and maybe a backend_work_disk (and yes maybe a
backend_work_mem ?!) per backend.
So I propose to rename the current GUC to something like backend_work_disk.

Patch is not large and easy to read.
I like the idea and it sounds useful.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#2Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Cédric Villemain (#1)
Re: patch review : Add ability to constrain backend temporary file space

On 01/06/11 09:24, C�dric Villemain wrote:

Hello

here is a partial review of your patch, better than keeping it
sleeping in the commitfest queue I hope.

Submission review
================

* The patch is not in context diff format.
* The patch apply, but contains some extra whitespace.
* Documentation is here but not explicit about 'temp tables',
maybe worth adding that this won't limit temporary table size ?
* There is no test provided. One can be expected to check that the
feature work.

Code review
=========

* in fd.c, I think that "temporary_files_size -=
(double)vfdP->fileSize;" should be done later in the function once we
have successfully unlink the file, not before.

* I am not sure it is better to add a fileSize like you did or use
relationgetnumberofblock() when file is about to be truncated or
unlinked, this way the seekPos should be enough to increase the global
counter.

* temporary_files_size, I think it is better to have a number of pages
� la postgresql than a kilobyte size

* max_temp_files_size, I'll prefer an approach like shared_buffers
GUC: you can use pages, or KB, MB, ...

Simple Feature test
==============

either explain buffers is wrong or the patch is wrong:
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
time=364.373..518.940 rows=100000 loops=1)
Sort Key: generate_series
Sort Method: external merge Disk: 1352kB
Buffers: local hit=393, temp read=249 written=249
-> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
(actual time=0.025..138.754 rows=100000 loops=1)
Buffers: local hit=393
Total runtime: 642.874 ms
(7 rows)

cedric=# set max_temp_files_size to 1900;
SET
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size
STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size

Do you have some testing method I can apply to track that without
explain (analyze, buffers) before going to low-level monitoring ?

Architecture review
==============

max_temp_files_size is used for the global space used per backend.
Based on how work_mem work I expect something like "work_disk" to
limit per file and maybe a backend_work_disk (and yes maybe a
backend_work_mem ?!) per backend.
So I propose to rename the current GUC to something like backend_work_disk.

Patch is not large and easy to read.
I like the idea and it sounds useful.

Hi C�dric,

Thanks for reviewing!

The diff format is odd - I'm sure I told git to do a context diff,
however running 'file' on the v2 patch results it it saying 'HTML
document'... hmm, I'll check more carefully for the next one I do.

Yeah, I agree about explicitly mentioning how it does not constraint
temp tables.

Hmm, test - good idea - I'll look into it.

Re the code comments - I agree with most of them. However with respect
to the Guc units, I copied the setup for work_mem as that seemed the
most related. Also I used bytes for the internal variable in fd.c to
limit the number of arithmetic operations while comparing.

For testing I set 'log_temp_files = 0' in postgresql.conf and the
numbers seemed to agree exactly - I didn't think to use EXPLAIN +
buffers... not sure why they would disagree. Have a go with
log_temp_files set and see what you find!

I like yhour suggesting for the name. Given that 'work_mem' is per
backend, I'm leaning towards the idea of 'work_disk' being sufficiently
descriptive.

Thanks again, and I'll come up with a v3 patch.

Mark

#3Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#2)
Re: patch review : Add ability to constrain backend temporary file space

On 01/06/11 12:27, Mark Kirkwood wrote:

Re the code comments - I agree with most of them. However with respect
to the Guc units, I copied the setup for work_mem as that seemed the
most related.

Also - forget to mention - I *thought* you could specify the temp files
size GUC as KB, MB, GB or plain old pages, but I'll check this, in case
I've busted it somehow.

Cheers

Mark

#4Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#3)
Re: patch review : Add ability to constrain backend temporary file space

On 01/06/11 12:32, Mark Kirkwood wrote:

On 01/06/11 12:27, Mark Kirkwood wrote:

Re the code comments - I agree with most of them. However with
respect to the Guc units, I copied the setup for work_mem as that
seemed the most related.

Also - forget to mention - I *thought* you could specify the temp
files size GUC as KB, MB, GB or plain old pages, but I'll check this,
in case I've busted it somehow.

Doh, ignore me here :-( You are correct - I've specified GUC_UNITS_KB
(copying work_mem) - it is easy enough to allow pages like shared_buffers.

Cheers

Mark

#5Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Cédric Villemain (#1)
Re: patch review : Add ability to constrain backend temporary file space

On 01/06/11 09:24, C�dric Villemain wrote:

Submission review
================

* The patch is not in context diff format.
* The patch apply, but contains some extra whitespace.
* Documentation is here but not explicit about 'temp tables',
maybe worth adding that this won't limit temporary table size ?
* There is no test provided. One can be expected to check that the
feature work.

I've created a new patch (attached) with 'git diff -c' so hopefully the
format is ok now. I've added a paragraph about how temporary *table*
storage is not constrained, only temp work files. I made the point that
the *query* that creates a temp table will have its work files
constrained too. I've added a test too.

The major visible change is that the guc has been renamed to
'work_disk', to gel better with the idea that it is the disk spill for
'work_mem'.

Code review
=========

* in fd.c, I think that "temporary_files_size -=
(double)vfdP->fileSize;" should be done later in the function once we
have successfully unlink the file, not before.

Agreed, I've moved it.

* I am not sure it is better to add a fileSize like you did or use
relationgetnumberofblock() when file is about to be truncated or
unlinked, this way the seekPos should be enough to increase the global
counter.

The temp files are not relations so I'd have to call stat I guess. Now
truncate/unlink can happen quite a lot (e.g hash with many files) and I
wanted to avoid adding too many library calls to this code for
performance reasons, so on balance I'm thinking it is gonna be more
efficient to remember the size in the Vfd.

* temporary_files_size, I think it is better to have a number of pages
� la postgresql than a kilobyte size

The temp file related stuff is worked on in bytes or kbytes in the fd.c
and other code (e.g log_temp_files), so it seems more natural to stay in
kbytes. Also work_disk is really only a spillover from work_mem (in
kbytes) so seems logical to match its units.

* max_temp_files_size, I'll prefer an approach like shared_buffers
GUC: you can use pages, or KB, MB, ...

We can use KB, MB, GB - just not pages, again like work_mem. These files
are not relations so I'm not sure pages is an entirely appropriate unit
for them.

Simple Feature test
==============

either explain buffers is wrong or the patch is wrong:
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
time=364.373..518.940 rows=100000 loops=1)
Sort Key: generate_series
Sort Method: external merge Disk: 1352kB
Buffers: local hit=393, temp read=249 written=249
-> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
(actual time=0.025..138.754 rows=100000 loops=1)
Buffers: local hit=393
Total runtime: 642.874 ms
(7 rows)

cedric=# set max_temp_files_size to 1900;
SET
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size
STATEMENT: explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size

Do you have some testing method I can apply to track that without
explain (analyze, buffers) before going to low-level monitoring ?

We're looking at this...

Architecture review
==============

max_temp_files_size is used for the global space used per backend.
Based on how work_mem work I expect something like "work_disk" to
limit per file and maybe a backend_work_disk (and yes maybe a
backend_work_mem ?!) per backend.
So I propose to rename the current GUC to something like backend_work_disk.

Done - 'work_disk' it is to match 'work_mem'.

Patch is not large and easy to read.
I like the idea and it sounds useful.

Great! Cheers

Mark

Attachments:

temp-files-v3.patch.gzapplication/x-gzip; name=temp-files-v3.patch.gzDownload
#6Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#5)
Re: Re: patch review : Add ability to constrain backend temporary file space

On 02/06/11 11:35, Mark Kirkwood wrote:

On 01/06/11 09:24, C�dric Villemain wrote: Simple Feature test

==============

either explain buffers is wrong or the patch is wrong:
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------

Sort (cost=10260.02..10495.82 rows=94320 width=4) (actual
time=364.373..518.940 rows=100000 loops=1)
Sort Key: generate_series
Sort Method: external merge Disk: 1352kB
Buffers: local hit=393, temp read=249 written=249
-> Seq Scan on foo (cost=0.00..1336.20 rows=94320 width=4)
(actual time=0.025..138.754 rows=100000 loops=1)
Buffers: local hit=393
Total runtime: 642.874 ms
(7 rows)

cedric=# set max_temp_files_size to 1900;
SET
cedric=# explain (analyze,buffers) select * from foo order by 1 desc ;
ERROR: aborting due to exceeding max temp files size
STATEMENT: explain (analyze,buffers) select * from foo order by 1
desc ;
ERROR: aborting due to exceeding max temp files size

Do you have some testing method I can apply to track that without
explain (analyze, buffers) before going to low-level monitoring ?

We're looking at this...

Arg - I was being dense. FileWrite can write *anywhere* in the file, so
need to be smart about whether to add to the total filesize or not.

I'll update the Commitfest page as soon as it sends me my password reset
mail...

Cheers

Mark

Attachments:

temp-files-v4.patch.gzapplication/x-gzip; name=temp-files-v4.patch.gzDownload
#7Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Mark Kirkwood (#5)
Re: Re: patch review : Add ability to constrain backend temporary file space

On Wed, Jun 1, 2011 at 6:35 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 01/06/11 09:24, Cédric Villemain wrote:

 Submission review
================

    * The patch is not in context diff format.
    * The patch apply, but contains some extra whitespace.
    * Documentation is here but not explicit about 'temp tables',
maybe worth adding that this won't limit temporary table size ?
    * There is no test provided. One can be expected to check that the
feature work.

I've created a new patch (attached)

Hi Mark,

A few comments:

- why only superusers can set this? if this is a per-backend setting,
i don't see the problem in allowing normal users to restrict their own
queries

- why the calculations are done as double?
+               if (temporary_files_size / 1024.0 > (double)work_disk)

- the patch adds this to serial_schedule but no test has been added...

diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index bb654f9..325cb3d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -127,3 +127,4 @@ test: largeobject
 test: with
 test: xml
 test: stats
+test: resource

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

#8Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Jaime Casanova (#7)
Re: Re: patch review : Add ability to constrain backend temporary file space

On 02/06/11 18:34, Jaime Casanova wrote:

On Wed, Jun 1, 2011 at 6:35 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

I've created a new patch (attached)

Hi Mark,

A few comments:

- why only superusers can set this? if this is a per-backend setting,
i don't see the problem in allowing normal users to restrict their own
queries

Yeah, that's a good point, I was thinking that as a resource control
parameter it might be good to prevent casual users increasing their
limit. However the most likely use case would be ad-hoc query tools that
don't have the ability to do SET anyway. Hmm - what do you think?

- why the calculations are done as double?
+               if (temporary_files_size / 1024.0>  (double)work_disk)

I originally coded this with the idea that the guc would (or could) be a
double - to allow for seriously big limits in data warehousing
situations etc. But subsequent discussion led to that being discarded.
However work_disk can go to INT_MAX * 1024 bytes so I need to make sure
that whatever datatype I use can handle that - double seemed sufficient.

- the patch adds this to serial_schedule but no test has been added...

diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index bb654f9..325cb3d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -127,3 +127,4 @@ test: largeobject
test: with
test: xml
test: stats
+test: resource

Bugger - I think I forgot to 'git add' em before doing the diff.

I can sense a v5 coming on.

#9Robert Haas
robertmhaas@gmail.com
In reply to: Mark Kirkwood (#5)
Re: Re: patch review : Add ability to constrain backend temporary file space

On Wed, Jun 1, 2011 at 7:35 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

Done - 'work_disk' it is to match 'work_mem'.

I guess I'm bikeshedding here, but I'm not sure I really buy this
parallel. work_mem is primarily a query planner parameter; it says,
if you're going to need more memory than this, then you have to
execute the plan some other way. This new parameter is not a query
planner paramater AIUI - its job is to KILL things if they exceed the
limit. In that sense it's more like statement_timeout. I can imagine
us wanting more parameters like this too. Kill the query if it...

...takes too long (statement_timeout)
...uses too much temporary file space (the current patch)
...uses too much CPU time
...uses too much RAM
...generates too much disk I/O
...has too high an estimated cost
...others?

So I'm not sure work_disk is a great name. Actually, work_mem is
already not a great name even for what it is, but at any rate I think
this is something different.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: Re: patch review : Add ability to constrain backend temporary file space

Robert Haas <robertmhaas@gmail.com> writes:

So I'm not sure work_disk is a great name.

I agree. Maybe something along the lines of "temp_file_limit"?

Also, once you free yourself from the analogy to work_mem, you could
adopt some more natural unit than KB. I'd think MB would be a practical
unit size, and would avoid (at least for the near term) the need to make
the parameter a float.

regards, tom lane

#11Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#9)
Re: Re: patch review : Add ability to constrain backend temporary file space

2011/6/2 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jun 1, 2011 at 7:35 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

Done - 'work_disk' it is to match 'work_mem'.

I guess I'm bikeshedding here, but I'm not sure I really buy this
parallel.  work_mem is primarily a query planner parameter; it says,
if you're going to need more memory than this, then you have to
execute the plan some other way.  This new parameter is not a query
planner paramater AIUI - its job is to KILL things if they exceed the
limit.  In that sense it's more like statement_timeout.  I can imagine
us wanting more parameters like this too.  Kill the query if it...

...takes too long (statement_timeout)
...uses too much temporary file space (the current patch)
...uses too much CPU time
...uses too much RAM
...generates too much disk I/O
...has too high an estimated cost
...others?

you're sorting limits for 'executor' and limits for 'planner': uses
too much CPU time VS has too high an estimated cost.

(backend)_work_(disk|mem) looks good also for the 'has too high an
estimated cost' series: limiter at the planner level should allow
planner to change its strategy, I think... But probably not something
to consider too much right now.

So I'm not sure work_disk is a great name.  Actually, work_mem is
already not a great name even for what it is, but at any rate I think
this is something different.

I am not specially attached to a name, idea was not to use work_disk
but backend_work_disk. I agree with you anyway, and suggestion from
Tom is fine for me (temp_file_limit).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#12Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#11)
Re: Re: patch review : Add ability to constrain backend temporary file space

On Thu, Jun 2, 2011 at 10:58 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

I am not specially attached to a name, idea was not to use work_disk
but backend_work_disk. I agree with you anyway, and suggestion from
Tom is fine for me (temp_file_limit).

Yeah, I like that too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: patch review : Add ability to constrain backend temporary file space

On Thu, Jun 2, 2011 at 7:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, once you free yourself from the analogy to work_mem, you could
adopt some more natural unit than KB.  I'd think MB would be a practical
unit size, and would avoid (at least for the near term) the need to make
the parameter a float.

As long as users can specify any unit when they input the parameter it
doesn't really matter what unit the variable is stored in. I'm not
sure the GUC infrastructure can currently handle megabytes as the
native units for a guc though.

--
greg

#14Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#10)
Re: Re: patch review : Add ability to constrain backend temporary file space

On 03/06/11 02:36, Tom Lane wrote:

Robert Haas<robertmhaas@gmail.com> writes:

So I'm not sure work_disk is a great name.

I agree. Maybe something along the lines of "temp_file_limit"?

Also, once you free yourself from the analogy to work_mem, you could
adopt some more natural unit than KB. I'd think MB would be a practical
unit size, and would avoid (at least for the near term) the need to make
the parameter a float.

I agree, and I like your name suggestion (and also agree with Robert
that 'work_mem' was not such a great name).

I'd be quite happy to use MB (checks guc.h) but looks like we don't have
a GUC_UNIT_MB, not sure if adding it would be an issue (suggests on a
suitable mask if people think it is a great idea).

Cheers

Mark

#15Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Jaime Casanova (#7)
Re: Re: patch review : Add ability to constrain backend temporary file space

On 02/06/11 18:34, Jaime Casanova wrote:

- the patch adds this to serial_schedule but no test has been added...

diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index bb654f9..325cb3d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -127,3 +127,4 @@ test: largeobject
test: with
test: xml
test: stats
+test: resource

Corrected v4 patch with the test files, for completeness. Note that
discussion has moved on and there will be a v5 :-)

Attachments:

temp-files-v4.1.patch.gzapplication/x-gzip; name=temp-files-v4.1.patch.gzDownload
#16Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mark Kirkwood (#5)
Re: patch review : Add ability to constrain backend temporary file space

2011/6/2 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:

On 01/06/11 09:24, Cédric Villemain wrote:

* I am not sure it is better to add a fileSize like you did or use
relationgetnumberofblock() when file is about to be truncated or
unlinked, this way the seekPos should be enough to increase the global
counter.

The temp files are not relations so I'd have to call stat I guess. Now
truncate/unlink can happen quite a lot (e.g hash with many files) and I
wanted to avoid adding too many library calls to this code for performance
reasons, so on balance I'm thinking it is gonna be more efficient to
remember the size in the Vfd.

I am not sure temporary relation are truncated. I have not checked
right now, but IIRC, we don't need to truncate it. And I believe it
would defeat the log_temp feature because log_temp is done on
FileClose() only.

If we are to add a field in struct vfd to keep the filesize then some
optimization may happens for logging too....

#17Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Cédric Villemain (#16)
Re: patch review : Add ability to constrain backend temporary file space

On 03/06/11 12:33, C�dric Villemain wrote:

2011/6/2 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:

On 01/06/11 09:24, C�dric Villemain wrote:

* I am not sure it is better to add a fileSize like you did or use
relationgetnumberofblock() when file is about to be truncated or
unlinked, this way the seekPos should be enough to increase the global
counter.

The temp files are not relations so I'd have to call stat I guess. Now
truncate/unlink can happen quite a lot (e.g hash with many files) and I
wanted to avoid adding too many library calls to this code for performance
reasons, so on balance I'm thinking it is gonna be more efficient to
remember the size in the Vfd.

I am not sure temporary relation are truncated. I have not checked
right now, but IIRC, we don't need to truncate it. And I believe it
would defeat the log_temp feature because log_temp is done on
FileClose() only.

If we are to add a field in struct vfd to keep the filesize then some
optimization may happens for logging too....

We pretty much need to store the file size I think, due to needing to
work out if FileWrite is re-writing inside the file or extending it. So
maybe we could avoid a stat on unlink too (mind you it is a nice
validity check that the ongoing size accounting is correct). I was/am
keen to avoid changing too much here (heh - maybe I'm being too timid...).

Cheers

Mark

#18Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mark Kirkwood (#15)
Re: Re: patch review : Add ability to constrain backend temporary file space

2011/6/3 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:

On 02/06/11 18:34, Jaime Casanova wrote:

- the patch adds this to serial_schedule but no test has been added...

diff --git a/src/test/regress/serial_schedule
b/src/test/regress/serial_schedule
index bb654f9..325cb3d 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -127,3 +127,4 @@ test: largeobject
 test: with
 test: xml
 test: stats
+test: resource

Corrected v4 patch with the test files, for completeness. Note that
discussion has moved on and there will be a v5 :-)

Mark, can you submit your updated patch ?

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#19Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Cédric Villemain (#18)
Re: Re: patch review : Add ability to constrain backend temporary file space

On 15/06/11 02:52, C�dric Villemain wrote:

2011/6/3 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:

Corrected v4 patch with the test files, for completeness. Note that
discussion has moved on and there will be a v5 :-)

Mark, can you submit your updated patch ?

Thanks for the reminder! Here is v5. The changes are:

- guc is called temp_file_limit, which seems like the best choice to
date :-)
- removed code to do with truncating files, as after testing I agree
with you that temp work files don't seem to get truncated.

I have not done anything about the business on units - so we are in KB
still - there is no MB unit avaiable in the code as yet - I'm not sure
we need one at this point, as most folk who use this feature will find
4096GB a big enough *per backend* limit. Obviously it might be a good
investment to plan to have MB, and GB as possible guc units too. Maybe
this could be a separate piece of work since any *other* resource
limiters we add might find it convenient to have these available?

Cheers

Mark

Attachments:

temp-files-v5.patch.gzapplication/x-gzip; name=temp-files-v5.patch.gzDownload
#20Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mark Kirkwood (#19)
Re: Re: patch review : Add ability to constrain backend temporary file space

2011/6/15 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:

On 15/06/11 02:52, Cédric Villemain wrote:

2011/6/3 Mark Kirkwood<mark.kirkwood@catalyst.net.nz>:

Corrected v4 patch with the test files, for completeness. Note that
discussion has moved on and there will be a v5 :-)

Mark, can you submit your updated patch ?

Thanks for the reminder! Here is v5. The changes are:

I have issues applying it.
Please can you remove trailing space?
Also, you can generate a cool patch like this :

get git-external-diff from postgres/src/tools to /usr/lib/git-core/
chmod +x it
export GIT_EXTERNAL_DIFF=git-external-diff
git format-patch --ext-diff origin

my log:
$ git apply temp-files-v5.patch
temp-files-v5.patch:22: trailing whitespace.
defaults to unlimited (<literal>-1</>). Values larger than zero
temp-files-v5.patch:23: trailing whitespace.
constraint the temporary file space usage to be that number of
temp-files-v5.patch:28: trailing whitespace.
the total space used by all the files produced by one backend is
temp-files-v5.patch:35: trailing whitespace.
constrain disk space used for temporary table storage. However if
temp-files-v5.patch:105: trailing whitespace.
/*
error: patch failed: src/backend/storage/file/fd.c:132
error: src/backend/storage/file/fd.c: patch does not apply
error: src/test/regress/expected/resource.out: No such file or directory
error: src/test/regress/sql/resource.sql: No such file or directory

- guc is called temp_file_limit, which seems like the best choice to date
:-)
- removed code to do with truncating files, as after testing I agree with
you that temp work files don't seem to get truncated.

I have not done anything about the business on units - so we are in KB still
- there is no MB unit avaiable in the code as yet - I'm not sure we need one
at this point, as most folk who use this feature will find 4096GB a big
enough *per backend* limit. Obviously it might be a good investment to plan
to have MB, and GB as possible guc units too. Maybe this could be a separate
piece of work since any *other* resource limiters we add might find it
convenient to have these available?

Cheers

Mark

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#21Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Cédric Villemain (#20)
#22Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#21)
#23Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mark Kirkwood (#22)
#24Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Cédric Villemain (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#24)
#26Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#25)
#27Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Cédric Villemain (#26)
#28Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#27)
#29Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mark Kirkwood (#28)
#30Josh Berkus
josh@agliodbs.com
In reply to: Cédric Villemain (#29)
#31Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Josh Berkus (#30)
#32Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tatsuo Ishii (#31)
#33Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Mark Kirkwood (#32)
#34Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tatsuo Ishii (#33)
#35Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Mark Kirkwood (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cédric Villemain (#35)
#37Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Cédric Villemain (#35)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#28)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#37)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#28)
#41Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#39)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#41)
#43Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#42)
#44Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#40)
#45Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#44)