Correct the documentation for work_mem
Hi,
I recently noticed the following in the work_mem [1] documentation:
“Note that for a complex query, several sort or hash operations might be running in parallel;”
The use of “parallel” here is misleading as this has nothing to do with parallel query, but
rather several operations in a plan running simultaneously.
The use of parallel in this doc predates parallel query support, which explains the usage.
I suggest a small doc fix:
“Note that for a complex query, several sort or hash operations might be running simultaneously;”
This should also be backpatched to all supported versions docs.
Thoughts?
Regards,
Sami Imseih
Amazon Web Services (AWS)
1. https://www.postgresql.org/docs/current/runtime-config-resource.html
On 21.04.23 16:28, Imseih (AWS), Sami wrote:
I recently noticed the following in the work_mem [1] documentation:
“Note that for a complex query, several sort or hash operations might be
running in parallel;”The use of “parallel” here is misleading as this has nothing to do with
parallel query, butrather several operations in a plan running simultaneously.
The use of parallel in this doc predates parallel query support, which
explains the usage.I suggest a small doc fix:
“Note that for a complex query, several sort or hash operations might be
running simultaneously;”
Here is a discussion of these terms:
https://takuti.me/note/parallel-vs-concurrent/
I think "concurrently" is the correct word here.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 21.04.23 16:28, Imseih (AWS), Sami wrote:
I suggest a small doc fix:
“Note that for a complex query, several sort or hash operations might be
running simultaneously;”
Here is a discussion of these terms:
https://takuti.me/note/parallel-vs-concurrent/
I think "concurrently" is the correct word here.
Probably, but it'd do little to remove the confusion Sami is on about,
especially since the next sentence uses "concurrently" to describe the
other case. I think we need a more thorough rewording, perhaps like
- Note that for a complex query, several sort or hash operations might be
- running in parallel; each operation will generally be allowed
+ Note that a complex query may include several sort or hash
+ operations; each such operation will generally be allowed
to use as much memory as this value specifies before it starts
to write data into temporary files. Also, several running
sessions could be doing such operations concurrently.
I also find this wording a bit further down to be poor:
Hash-based operations are generally more sensitive to memory
availability than equivalent sort-based operations. The
memory available for hash tables is computed by multiplying
<varname>work_mem</varname> by
<varname>hash_mem_multiplier</varname>. This makes it
I think "available" is not le mot juste, and it's also unclear from
this whether we're speaking of the per-hash-table limit or some
(nonexistent) overall limit. How about
- memory available for hash tables is computed by multiplying
+ memory limit for a hash table is computed by multiplying
regards, tom lane
On Fri, Apr 21, 2023 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 21.04.23 16:28, Imseih (AWS), Sami wrote:
I suggest a small doc fix:
“Note that for a complex query, several sort or hash operations might be
running simultaneously;”Here is a discussion of these terms:
https://takuti.me/note/parallel-vs-concurrent/I think "concurrently" is the correct word here.
Probably, but it'd do little to remove the confusion Sami is on about,
+1.
When discussing this internally, Sami's proposal was in fact to use
the word 'concurrently'. But given that when it comes to computers and
programming, it's common for someone to not understand the intricate
difference between the two terms, we thought it's best to not use any
of those, and instead use a word not usually associated with
programming and algorithms.
Aside: Another pair of words I see regularly used interchangeably,
when in fact they mean different things: precise vs. accurate.
especially since the next sentence uses "concurrently" to describe the
other case. I think we need a more thorough rewording, perhaps like- Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will generally be allowed + Note that a complex query may include several sort or hash + operations; each such operation will generally be allowed
This wording doesn't seem to bring out the fact that there could be
more than one work_mem consumer running (in-progress) at the same
time. The reader to could mistake it to mean hashes and sorts in a
complex query may happen one after the other.
+ Note that a complex query may include several sort and hash operations, and
+ more than one of these operations may be in progress simultaneously at any
+ given time; each such operation will generally be allowed
I believe the phrase "several sort _and_ hash" better describes the
possible composition of a complex query, than does "several sort _or_
hash".
I also find this wording a bit further down to be poor:
Hash-based operations are generally more sensitive to memory
availability than equivalent sort-based operations. The
memory available for hash tables is computed by multiplying
<varname>work_mem</varname> by
<varname>hash_mem_multiplier</varname>. This makes itI think "available" is not le mot juste, and it's also unclear from
this whether we're speaking of the per-hash-table limit or some
(nonexistent) overall limit. How about- memory available for hash tables is computed by multiplying + memory limit for a hash table is computed by multiplying
+1
Best regards,
Gurjeet https://Gurje.et
Postgres Contributors Team, http://aws.amazon.com
especially since the next sentence uses "concurrently" to describe the
other case. I think we need a more thorough rewording, perhaps like- Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will generally be allowed + Note that a complex query may include several sort or hash + operations; each such operation will generally be allowed
This wording doesn't seem to bring out the fact that there could be
more than one work_mem consumer running (in-progress) at the same
time.
Do you mean, more than one work_mem consumer running at the same
time for a given query? If so, that is precisely the point we need to convey
in the docs.
i.e. if I have 2 sorts in a query that can use up to 4MB each, at some point
in the query execution, I can have 8MB of memory allocated.
Regards,
Sami Imseih
Amazon Web Services (AWS)
Based on the feedback, here is a v1 of the suggested doc changes.
I modified Gurjeets suggestion slightly to make it clear that a specific
query execution could have operations simultaneously using up to
work_mem.
I also added the small hash table memory limit clarification.
Regards,
Sami Imseih
Amazon Web Services (AWS)
Attachments:
v1-0001-Fix-documentation-for-work_mem.patchapplication/octet-stream; name=v1-0001-Fix-documentation-for-work_mem.patchDownload
From 2fbbe428c25d7d12ad7d818ef5d00fe7c8085433 Mon Sep 17 00:00:00 2001
From: EC2 Default User <ec2-user@ip-172-31-26-221.ec2.internal>
Date: Mon, 24 Apr 2023 16:04:45 +0000
Subject: [PATCH 1/1] Fix documentation for work_mem
A couple of small documentation fixes to clear
up terminology used in the work_mem documentation.
The removal of the usage of "parallel"
as it does not refer to parallel query in the context
of work_mem. Also, a clarification on the memory used
by hash tables.
Discussion: https://www.postgresql.org/message-id/flat/66590882-F48C-4A25-83E3-73792CF8C51F%40amazon.com
---
doc/src/sgml/config.sgml | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 091a79d4f3..bafda1c53a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1897,8 +1897,9 @@ include_dir 'conf.d'
(such as a sort or hash table) before writing to temporary disk files.
If this value is specified without units, it is taken as kilobytes.
The default value is four megabytes (<literal>4MB</literal>).
- Note that for a complex query, several sort or hash operations might be
- running in parallel; each operation will generally be allowed
+ Note that a complex query may include several sort and hash operations,
+ and more than one of these operations may be in progress simultaneously
+ for a given query execution; each such operation will generally be allowed
to use as much memory as this value specifies before it starts
to write data into temporary files. Also, several running
sessions could be doing such operations concurrently.
@@ -1914,7 +1915,7 @@ include_dir 'conf.d'
<para>
Hash-based operations are generally more sensitive to memory
availability than equivalent sort-based operations. The
- memory available for hash tables is computed by multiplying
+ memory limit for hash tables is computed by multiplying
<varname>work_mem</varname> by
<varname>hash_mem_multiplier</varname>. This makes it
possible for hash-based operations to use an amount of memory
--
2.39.2
On Tue, 25 Apr 2023 at 04:20, Imseih (AWS), Sami <simseih@amazon.com> wrote:
Based on the feedback, here is a v1 of the suggested doc changes.
I modified Gurjeets suggestion slightly to make it clear that a specific
query execution could have operations simultaneously using up to
work_mem.
- Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will generally be allowed + Note that a complex query may include several sort and hash operations, + and more than one of these operations may be in progress simultaneously + for a given query execution; each such operation will generally be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently.
I'm wondering about adding "and more than one of these operations may
be in progress simultaneously". Are you talking about concurrent
sessions running other queries which are using work_mem too? If so,
isn't that already covered by the final sentence in the quoted text
above? if not, what is running simultaneously?
I think Tom's suggestion looks fine. I'd maybe change "sort or hash"
to "sort and hash" per the suggestion from Gurjeet above.
David
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed
Hello,
I've reviewed and built the documentation for the updated patch. As it stands right now I think the documentation for this section is quite clear.
I'm wondering about adding "and more than one of these operations may
be in progress simultaneously". Are you talking about concurrent
sessions running other queries which are using work_mem too?
This appears to be referring to the "sort and hash" operations mentioned prior.
If so,
isn't that already covered by the final sentence in the quoted text
above? if not, what is running simultaneously?
I believe the last sentence is referring to another session that is running its own sort and hash operations. So the first section you mention is describing how sort and hash operations can be in execution at the same time for a query, while the second refers to how sessions may overlap in their execution of sort and hash operations if I am understanding this correctly.
I also agree that changing "sort or hash" to "sort and hash" is a better description.
Tristen
Hi,
Sorry for the delay in response and thanks for the feedback!
I've reviewed and built the documentation for the updated patch. As it stands right now I think the documentation for this section is quite clear.
Sorry, I am not understanding. What is clear? The current documentation -or- the proposed documentation in the patch?
I'm wondering about adding "and more than one of these operations may
be in progress simultaneously". Are you talking about concurrent
sessions running other queries which are using work_mem too?
This appears to be referring to the "sort and hash" operations mentioned prior.
Correct, this is not referring to multiple sessions, but a given execution could
have multiple operations that are each using up to work_mem simultaneously.
I also agree that changing "sort or hash" to "sort and hash" is a better description.
That is addressed in the last revision of the patch.
- Note that for a complex query, several sort or hash operations might be
- running in parallel; each operation will generally be allowed
+ Note that a complex query may include several sort and hash operations,
Regards,
Sami
On Fri, Apr 21, 2023 at 01:15:01PM -0400, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 21.04.23 16:28, Imseih (AWS), Sami wrote:
I suggest a small doc fix:
“Note that for a complex query, several sort or hash operations might be
running simultaneously;”Here is a discussion of these terms:
https://takuti.me/note/parallel-vs-concurrent/I think "concurrently" is the correct word here.
Probably, but it'd do little to remove the confusion Sami is on about,
especially since the next sentence uses "concurrently" to describe the
other case. I think we need a more thorough rewording, perhaps like- Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will generally be allowed + Note that a complex query may include several sort or hash + operations; each such operation will generally be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently.I also find this wording a bit further down to be poor:
Hash-based operations are generally more sensitive to memory
availability than equivalent sort-based operations. The
memory available for hash tables is computed by multiplying
<varname>work_mem</varname> by
<varname>hash_mem_multiplier</varname>. This makes itI think "available" is not le mot juste, and it's also unclear from
this whether we're speaking of the per-hash-table limit or some
(nonexistent) overall limit. How about- memory available for hash tables is computed by multiplying + memory limit for a hash table is computed by multiplying
Adjusted patch attached.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
Attachments:
workmem.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..45d1bb4b7b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1829,9 +1829,10 @@ include_dir 'conf.d'
(such as a sort or hash table) before writing to temporary disk files.
If this value is specified without units, it is taken as kilobytes.
The default value is four megabytes (<literal>4MB</literal>).
- Note that for a complex query, several sort or hash operations might be
- running in parallel; each operation will generally be allowed
- to use as much memory as this value specifies before it starts
+ Note that a complex query might perform several sort or hash
+ operations at the same time, with each operation generally being
+ allowed to use as much memory as this value specifies before
+ it starts
to write data into temporary files. Also, several running
sessions could be doing such operations concurrently.
Therefore, the total memory used could be many times the value
@@ -1845,7 +1846,7 @@ include_dir 'conf.d'
<para>
Hash-based operations are generally more sensitive to memory
availability than equivalent sort-based operations. The
- memory available for hash tables is computed by multiplying
+ memory limit for a hash table is computed by multiplying
<varname>work_mem</varname> by
<varname>hash_mem_multiplier</varname>. This makes it
possible for hash-based operations to use an amount of memory
On Fri, 8 Sept 2023 at 15:24, Bruce Momjian <bruce@momjian.us> wrote:
Adjusted patch attached.
This looks mostly fine to me modulo "sort or hash". I do see many
instances of "and/or" in the docs. Maybe that would work better.
David
This looks mostly fine to me modulo "sort or hash". I do see many
instances of "and/or" in the docs. Maybe that would work better.
"sort or hash operations at the same time" is clear explanation IMO.
This latest version of the patch looks good to me.
Regards,
Sami
On Sat, 9 Sept 2023 at 14:25, Imseih (AWS), Sami <simseih@amazon.com> wrote:
This looks mostly fine to me modulo "sort or hash". I do see many
instances of "and/or" in the docs. Maybe that would work better."sort or hash operations at the same time" is clear explanation IMO.
Just for anyone else following along that haven't seen the patch. The
full text in question is:
+ Note that a complex query might perform several sort or hash
+ operations at the same time, with each operation generally being
It's certainly not a show-stopper. I do believe the patch makes some
improvements. The reason I'd prefer to see either "and" or "and/or"
in place of "or" is because the text is trying to imply that many of
these operations can run at the same time. I'm struggling to
understand why, given that there could be many sorts and many hashes
going on at once that we'd claim it could only be one *or* the other.
If we have 12 sorts and 4 hashes then that's not "several sort or hash
operations", it's "several sort and hash operations". Of course, it
could just be sorts or just hashes, so "and/or" works fine for that.
David
On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote:
On Sat, 9 Sept 2023 at 14:25, Imseih (AWS), Sami <simseih@amazon.com> wrote:
This looks mostly fine to me modulo "sort or hash". I do see many
instances of "and/or" in the docs. Maybe that would work better."sort or hash operations at the same time" is clear explanation IMO.
Just for anyone else following along that haven't seen the patch. The
full text in question is:+ Note that a complex query might perform several sort or hash + operations at the same time, with each operation generally beingIt's certainly not a show-stopper. I do believe the patch makes some
improvements. The reason I'd prefer to see either "and" or "and/or"
in place of "or" is because the text is trying to imply that many of
these operations can run at the same time. I'm struggling to
understand why, given that there could be many sorts and many hashes
going on at once that we'd claim it could only be one *or* the other.
If we have 12 sorts and 4 hashes then that's not "several sort or hash
operations", it's "several sort and hash operations". Of course, it
could just be sorts or just hashes, so "and/or" works fine for that.
Yes, I see your point and went with "and", updated patch attached.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
Attachments:
workmem.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..8ed7ae57c2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1829,9 +1829,10 @@ include_dir 'conf.d'
(such as a sort or hash table) before writing to temporary disk files.
If this value is specified without units, it is taken as kilobytes.
The default value is four megabytes (<literal>4MB</literal>).
- Note that for a complex query, several sort or hash operations might be
- running in parallel; each operation will generally be allowed
- to use as much memory as this value specifies before it starts
+ Note that a complex query might perform several sort and hash
+ operations at the same time, with each operation generally being
+ allowed to use as much memory as this value specifies before
+ it starts
to write data into temporary files. Also, several running
sessions could be doing such operations concurrently.
Therefore, the total memory used could be many times the value
@@ -1845,7 +1846,7 @@ include_dir 'conf.d'
<para>
Hash-based operations are generally more sensitive to memory
availability than equivalent sort-based operations. The
- memory available for hash tables is computed by multiplying
+ memory limit for a hash table is computed by multiplying
<varname>work_mem</varname> by
<varname>hash_mem_multiplier</varname>. This makes it
possible for hash-based operations to use an amount of memory
On Tue, 12 Sept 2023 at 03:03, Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote:
It's certainly not a show-stopper. I do believe the patch makes some
improvements. The reason I'd prefer to see either "and" or "and/or"
in place of "or" is because the text is trying to imply that many of
these operations can run at the same time. I'm struggling to
understand why, given that there could be many sorts and many hashes
going on at once that we'd claim it could only be one *or* the other.
If we have 12 sorts and 4 hashes then that's not "several sort or hash
operations", it's "several sort and hash operations". Of course, it
could just be sorts or just hashes, so "and/or" works fine for that.Yes, I see your point and went with "and", updated patch attached.
Looks good to me.
David
On Wed, Sep 27, 2023 at 02:05:44AM +1300, David Rowley wrote:
On Tue, 12 Sept 2023 at 03:03, Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Sep 11, 2023 at 10:02:55PM +1200, David Rowley wrote:
It's certainly not a show-stopper. I do believe the patch makes some
improvements. The reason I'd prefer to see either "and" or "and/or"
in place of "or" is because the text is trying to imply that many of
these operations can run at the same time. I'm struggling to
understand why, given that there could be many sorts and many hashes
going on at once that we'd claim it could only be one *or* the other.
If we have 12 sorts and 4 hashes then that's not "several sort or hash
operations", it's "several sort and hash operations". Of course, it
could just be sorts or just hashes, so "and/or" works fine for that.Yes, I see your point and went with "and", updated patch attached.
Looks good to me.
Patch applied back to Postgres 11.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.