Atomics for heap_parallelscan_nextpage()

Started by David Rowleyalmost 9 years ago32 messageshackers
Jump to latest
#1David Rowley
dgrowleyml@gmail.com

Hi,

A while back I did some benchmarking on a big 4 socket machine to
explore a bit around the outer limits of parallel aggregates. I
discovered along the way that, given enough workers, and a simple
enough task, that seq-scan workers were held up waiting for the lock
to be released in heap_parallelscan_nextpage().

I've since done a little work in this area to try to improve things. I
ended up posting about it yesterday in [1]/messages/by-id/CAKJS1f-XhfQ2-=85wgYo5b3WtEs=ys=2Rsq=NuvnmaV4ZsM1XQ@mail.gmail.com. My original patch used
batching to solve the issue; instead of allocating 1 block at a time,
the batching patch allocated a range of 10 blocks for the worker to
process. However, the implementation still needed a bit of work around
reporting sync-scan locations.

Andres mentioned in [2]/messages/by-id/20170505023646.3uhnmf2hbwtm63lc@alap3.anarazel.de that it might be worth exploring using atomics
to do the same job. So I went ahead and did that, and came up with the
attached, which is a slight variation on what he mentioned in the
thread.

To keep things a bit more simple, and streamline, I ended up pulling
out the logic for setting the startblock into another function, which
we only call once before the first call to
heap_parallelscan_nextpage(). I also ended up changing phs_cblock and
replacing it with a counter that always starts at zero. The actual
block is calculated based on that + the startblock modulo nblocks.
This makes things a good bit more simple for detecting when we've
allocated all the blocks to the workers, and also works nicely when
wrapping back to the start of a relation when we started somewhere in
the middle due to piggybacking with a synchronous scan.

Performance:

With parallel_workers=71, it looks something like:

Query 1: 881 GB, ~6 billion row TPC-H lineitem table.

tpch=# select count(*) from lineitem;
count
------------
5999989709
(1 row)

-- Master
Time: 123421.283 ms (02:03.421)
Time: 118895.846 ms (01:58.896)
Time: 118632.546 ms (01:58.633)

-- Atomics patch
Time: 74038.813 ms (01:14.039)
Time: 73166.200 ms (01:13.166)
Time: 72492.338 ms (01:12.492)

-- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage()
Time: 76364.215 ms (01:16.364)
Time: 75808.900 ms (01:15.809)
Time: 74927.756 ms (01:14.928)

Query 2: Single int column table with 2 billion rows.

tpch=# select count(*) from a;
count
------------
2000000000
(1 row)

-- Master
Time: 5853.918 ms (00:05.854)
Time: 5925.633 ms (00:05.926)
Time: 5859.223 ms (00:05.859)

-- Atomics patch
Time: 5825.745 ms (00:05.826)
Time: 5849.139 ms (00:05.849)
Time: 5815.818 ms (00:05.816)

-- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage()
Time: 5789.237 ms (00:05.789)
Time: 5837.395 ms (00:05.837)
Time: 5821.492 ms (00:05.821)

I've also attached a text file with the perf report for the lineitem
query. You'll notice that the heap_parallelscan_nextpage() is very
visible in master, but not on each of the two patches.

With the 2nd query, heap_parallelscan_nextpage is fairly insignificant
on master's profile, it's only showing up as 0.48%. Likely this must
be due to more tuples being read from the page, and more aggregation
work getting done before the next page is needed. I'm uncertain why I
previously saw a speed up in this case in [1]/messages/by-id/CAKJS1f-XhfQ2-=85wgYo5b3WtEs=ys=2Rsq=NuvnmaV4ZsM1XQ@mail.gmail.com.

I've also noticed that both the atomics patch and unpatched master do
something that looks a bit weird with synchronous seq-scans. If the
parallel seq-scan piggybacked on another scan, then subsequent
parallel scans will start at the same non-zero block location, even
when no other concurrent scans exist. I'd have expected this should go
back to block 0 again, but maybe I'm just failing to understand the
reason for reporting the startblock to ss_report_location() at the end
of the scan.

I'll now add this to the first commitfest of pg11. I just wanted to
note that I've done this, so that it's less likely someone else goes
and repeats the same work.

[1]: /messages/by-id/CAKJS1f-XhfQ2-=85wgYo5b3WtEs=ys=2Rsq=NuvnmaV4ZsM1XQ@mail.gmail.com

[2]: /messages/by-id/20170505023646.3uhnmf2hbwtm63lc@alap3.anarazel.de

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

parallel_next_page_perf.txttext/plain; charset=UTF-8; name=parallel_next_page_perf.txtDownload
parallel_nextpage_atomics.patchapplication/octet-stream; name=parallel_nextpage_atomics.patchDownload+62-44
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: David Rowley (#1)
Re: Atomics for heap_parallelscan_nextpage()

On Sat, May 6, 2017 at 7:27 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

I've also noticed that both the atomics patch and unpatched master do
something that looks a bit weird with synchronous seq-scans. If the
parallel seq-scan piggybacked on another scan, then subsequent
parallel scans will start at the same non-zero block location, even
when no other concurrent scans exist.

That is what we expect. The same is true for non-parallel scans as
well, refer below code for non-parallel scans.

heapgettup_pagemode()
{
..

finished = (page == scan->rs_startblock) ||
(scan->rs_numblocks != InvalidBlockNumber ? --scan->rs_numblocks == 0 : false);

/*
* Report our new scan position for synchronization purposes. We
* don't do that when moving backwards, however. That would just
* mess up any other forward-moving scanners.
*
* Note: we do this before checking for end of scan so that the
* final state of the position hint is back at the start of the
* rel. That's not strictly necessary, but otherwise when you run
* the same query multiple times the starting position would shift
* a little bit backwards on every invocation, which is confusing.
* We don't guarantee any specific ordering in general, though.
*/
if (scan->rs_syncscan)
ss_report_location(scan->rs_rd, page);
..
}

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: David Rowley (#1)
Re: Atomics for heap_parallelscan_nextpage()

On 05/06/2017 04:57 PM, David Rowley wrote:

Andres mentioned in [2] that it might be worth exploring using atomics
to do the same job. So I went ahead and did that, and came up with the
attached, which is a slight variation on what he mentioned in the
thread.

To keep things a bit more simple, and streamline, I ended up pulling
out the logic for setting the startblock into another function, which
we only call once before the first call to
heap_parallelscan_nextpage(). I also ended up changing phs_cblock and
replacing it with a counter that always starts at zero. The actual
block is calculated based on that + the startblock modulo nblocks.
This makes things a good bit more simple for detecting when we've
allocated all the blocks to the workers, and also works nicely when
wrapping back to the start of a relation when we started somewhere in
the middle due to piggybacking with a synchronous scan.

Looks reasonable. I edited the comments and the variable names a bit, to
my liking, and committed. Thanks!

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#3)
Re: Atomics for heap_parallelscan_nextpage()

On 08/16/2017 04:20 PM, Heikki Linnakangas wrote:

On 05/06/2017 04:57 PM, David Rowley wrote:

Andres mentioned in [2] that it might be worth exploring using
atomics to do the same job. So I went ahead and did that, and came
up with the attached, which is a slight variation on what he
mentioned in the thread.

To keep things a bit more simple, and streamline, I ended up
pulling out the logic for setting the startblock into another
function, which we only call once before the first call to
heap_parallelscan_nextpage(). I also ended up changing phs_cblock
and replacing it with a counter that always starts at zero. The
actual block is calculated based on that + the startblock modulo
nblocks. This makes things a good bit more simple for detecting
when we've allocated all the blocks to the workers, and also works
nicely when wrapping back to the start of a relation when we
started somewhere in the middle due to piggybacking with a
synchronous scan.

Looks reasonable. I edited the comments and the variable names a bit,
to my liking, and committed. Thanks!

A couple of 32-bit x86 buildfarm members don't seem to be happy with
this. I'll investigate, but if anyone has a clue, I'm all ears...

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#4)
Re: Atomics for heap_parallelscan_nextpage()

Heikki Linnakangas <hlinnaka@iki.fi> writes:

A couple of 32-bit x86 buildfarm members don't seem to be happy with
this. I'll investigate, but if anyone has a clue, I'm all ears...

dromedary's issue seems to be alignment:

TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: "../../../../src/include/port/atomics.h", Line: 452)
2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was terminated by signal 6: Abort trap
2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: select count(*) from a_star;

Not sure if this is your bug or if it's exposing a pre-existing
deficiency in the atomics code, viz, failure to ensure that
pg_atomic_uint64 is actually a 64-bit-aligned type. Andres?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: Atomics for heap_parallelscan_nextpage()

On 2017-08-16 11:16:58 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

A couple of 32-bit x86 buildfarm members don't seem to be happy with
this. I'll investigate, but if anyone has a clue, I'm all ears...

dromedary's issue seems to be alignment:

TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: "../../../../src/include/port/atomics.h", Line: 452)
2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was terminated by signal 6: Abort trap
2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: select count(*) from a_star;

Not sure if this is your bug or if it's exposing a pre-existing
deficiency in the atomics code, viz, failure to ensure that
pg_atomic_uint64 is actually a 64-bit-aligned type. Andres?

I suspect it's the former. Suspect that the shared memory that holds
the "parallel desc" isn't properly aligned:

void
ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
...
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
heap_parallelscan_initialize(pscan,
node->ss.ss_currentRelation,
estate->es_snapshot);

/*
...
* We allocate backwards from the end of the segment, so that the TOC entries
* can grow forward from the start of the segment.
*/
extern void *
shm_toc_allocate(shm_toc *toc, Size nbytes)
{
volatile shm_toc *vtoc = toc;
Size total_bytes;
Size allocated_bytes;
Size nentry;
Size toc_bytes;

/* Make sure request is well-aligned. */
nbytes = BUFFERALIGN(nbytes);
...
return ((char *) toc) + (total_bytes - allocated_bytes - nbytes);
}

/*
* Initialize a region of shared memory with a table of contents.
*/
shm_toc *
shm_toc_create(uint64 magic, void *address, Size nbytes)
{
shm_toc *toc = (shm_toc *) address;

Assert(nbytes > offsetof(shm_toc, toc_entry));
toc->toc_magic = magic;
SpinLockInit(&toc->toc_mutex);
toc->toc_total_bytes = nbytes;
toc->toc_allocated_bytes = 0;
toc->toc_nentry = 0;

return toc;
}

Afaict shm_create/shm_toc_allocate don't actually guarantee that the end
of the toc's memory is suitably aligned. But I didn't yet have any
coffee, so ...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andres Freund (#6)
Re: Atomics for heap_parallelscan_nextpage()

On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-08-16 11:16:58 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

A couple of 32-bit x86 buildfarm members don't seem to be happy with
this. I'll investigate, but if anyone has a clue, I'm all ears...

dromedary's issue seems to be alignment:

TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: "../../../../src/include/port/atomics.h", Line: 452)
2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was terminated by signal 6: Abort trap
2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: select count(*) from a_star;

Not sure if this is your bug or if it's exposing a pre-existing
deficiency in the atomics code, viz, failure to ensure that
pg_atomic_uint64 is actually a 64-bit-aligned type. Andres?

I suspect it's the former. Suspect that the shared memory that holds
the "parallel desc" isn't properly aligned:

Clang has an -fsanitize=alignment option that may be useful here.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#6)
Re: Atomics for heap_parallelscan_nextpage()

Robert, Tom,

On 2017-08-16 09:55:15 -0700, Andres Freund wrote:

Not sure if this is your bug or if it's exposing a pre-existing
deficiency in the atomics code, viz, failure to ensure that
pg_atomic_uint64 is actually a 64-bit-aligned type. Andres?

I suspect it's the former. Suspect that the shared memory that holds
the "parallel desc" isn't properly aligned:

Or, well, a mixture of both, because it seems like a deficiency in the
shm_toc, code, rather than the atomics code.

Afaict shm_create/shm_toc_allocate don't actually guarantee that the end
of the toc's memory is suitably aligned. But I didn't yet have any
coffee, so ...

Robert, I'm not quite sure what the intended behaviour of shm_toc is wrt
alignment. I see that individual chunks are BUFFERALIGNed (both during
estimation, and allocation). But I don't see how the size of the entire
toc is aligned, which seems a requirement, given we allocate from the
end.
Seems like we'd just have to add alignment of the total size to
shm_toc_estimate()?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#7)
Re: Atomics for heap_parallelscan_nextpage()

On 2017-08-16 09:57:35 -0700, Peter Geoghegan wrote:

On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-08-16 11:16:58 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

A couple of 32-bit x86 buildfarm members don't seem to be happy with
this. I'll investigate, but if anyone has a clue, I'm all ears...

dromedary's issue seems to be alignment:

TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: "../../../../src/include/port/atomics.h", Line: 452)
2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was terminated by signal 6: Abort trap
2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: select count(*) from a_star;

Not sure if this is your bug or if it's exposing a pre-existing
deficiency in the atomics code, viz, failure to ensure that
pg_atomic_uint64 is actually a 64-bit-aligned type. Andres?

I suspect it's the former. Suspect that the shared memory that holds
the "parallel desc" isn't properly aligned:

Clang has an -fsanitize=alignment option that may be useful here.

Given that we're crashing in an assert that does nothing but check for
alignment, before the memory is actually used in a "dangerous" manner, I
don't quite see how?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#8)
Re: Atomics for heap_parallelscan_nextpage()

On 08/16/2017 08:10 PM, Andres Freund wrote:

Afaict shm_create/shm_toc_allocate don't actually guarantee that the end
of the toc's memory is suitably aligned. But I didn't yet have any
coffee, so ...

Robert, I'm not quite sure what the intended behaviour of shm_toc is wrt
alignment. I see that individual chunks are BUFFERALIGNed (both during
estimation, and allocation). But I don't see how the size of the entire
toc is aligned, which seems a requirement, given we allocate from the
end.
Seems like we'd just have to add alignment of the total size to
shm_toc_estimate()?

Yeah, that's the gist of it.

The attached patch seems to fix this. I'm not too familiar with this DSM
stuff, but this seems right to me. Unless someone has a better idea
soon, I'll commit this to make the buildfarm happy.

- Heikki

Attachments:

fix-shm_toc-alignment-1.patchtext/x-patch; name=fix-shm_toc-alignment-1.patchDownload+6-4
#11Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#10)
Re: Atomics for heap_parallelscan_nextpage()

On Wed, Aug 16, 2017 at 1:19 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Yeah, that's the gist of it.

The attached patch seems to fix this. I'm not too familiar with this DSM
stuff, but this seems right to me. Unless someone has a better idea soon,
I'll commit this to make the buildfarm happy.

Mmm, clever. Yeah, that looks reasonable at first glance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#10)
Re: Atomics for heap_parallelscan_nextpage()

Hi,

On 2017-08-16 20:19:29 +0300, Heikki Linnakangas wrote:

diff --git a/src/backend/storage/ipc/shm_toc.c b/src/backend/storage/ipc/shm_toc.c
index 9f259441f0..121d5a1ec9 100644
--- a/src/backend/storage/ipc/shm_toc.c
+++ b/src/backend/storage/ipc/shm_toc.c
@@ -44,7 +44,7 @@ shm_toc_create(uint64 magic, void *address, Size nbytes)
Assert(nbytes > offsetof(shm_toc, toc_entry));
toc->toc_magic = magic;
SpinLockInit(&toc->toc_mutex);
-	toc->toc_total_bytes = nbytes;
+	toc->toc_total_bytes = BUFFERALIGN_DOWN(nbytes);
toc->toc_allocated_bytes = 0;
toc->toc_nentry = 0;

Don't think we require BUFFERALIGN - MAXALIGN ought to be
sufficient. The use of BUFFERALIGN presumably is to space out things
into different cachelines, but that doesn't really seem to be important
with this. Then we can just avoid defining the new macro...

@@ -252,7 +252,8 @@ shm_toc_lookup(shm_toc *toc, uint64 key, bool noError)
Size
shm_toc_estimate(shm_toc_estimator *e)
{
-	return add_size(offsetof(shm_toc, toc_entry),
-					add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)),
-							 e->space_for_chunks));
+	return BUFFERALIGN(
+		add_size(offsetof(shm_toc, toc_entry),
+				 add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)),
+						  e->space_for_chunks)));
}

I think splitting this into separate statements would be better. I think
we also should rather *ALLIGN the size without e->space_for_chunks. The
latter is already aligned properly, and the important bit is that we
have enough space for e->space_for_chunks afterwards, and only padding
is in th eway of that...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#10)
Re: Atomics for heap_parallelscan_nextpage()

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 08/16/2017 08:10 PM, Andres Freund wrote:

Seems like we'd just have to add alignment of the total size to
shm_toc_estimate()?

Yeah, that's the gist of it.

I can confirm that on dromedary, that regression test case is attempting
to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
Re: Atomics for heap_parallelscan_nextpage()

I wrote:

I can confirm that on dromedary, that regression test case is attempting
to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes.

... although, on closer look, it still seems like we have a fundamental
bit of schizophrenia here, because on this machine

$ grep ALIGN pg_config.h
#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 4
#define ALIGNOF_LONG_LONG_INT 4
#define ALIGNOF_SHORT 2
#define MAXIMUM_ALIGNOF 4

Basically, therefore, ISTM that it is not a good thing that the atomics
code thinks it can rely on 8-byte-aligned data when the entire rest of
the system believes that 4-byte alignment is enough for anything.

I was wondering why the shm_toc code was using BUFFERALIGN and not
MAXALIGN, and I now suspect that the answer is "it's an entirely
undocumented kluge to make the atomics code not crash on 32-bit
machines, so long as nobody puts a pg_atomic_uint64 anywhere except
in a shm_toc".

I'm not sure that that's good enough, and I'm damn sure that it
shouldn't be undocumented.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: Atomics for heap_parallelscan_nextpage()

Andres Freund <andres@anarazel.de> writes:

Don't think we require BUFFERALIGN - MAXALIGN ought to be
sufficient.

Uh, see my other message just now.

The use of BUFFERALIGN presumably is to space out things
into different cachelines, but that doesn't really seem to be important
with this. Then we can just avoid defining the new macro...

I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
different reason: if the caller has specified the exact amount of space it
needs, having shm_toc_create discard some could lead to an unexpected
failure. I wonder whether maybe shm_toc_create should just error out if
the number it's handed isn't aligned already.

+	return BUFFERALIGN(
+		add_size(offsetof(shm_toc, toc_entry),
+				 add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)),
+						  e->space_for_chunks)));

I think splitting this into separate statements would be better.

+1, it was too complicated already.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#14)
Re: Atomics for heap_parallelscan_nextpage()

On 2017-08-16 13:40:09 -0400, Tom Lane wrote:

I wrote:

I can confirm that on dromedary, that regression test case is attempting
to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes.

... although, on closer look, it still seems like we have a fundamental
bit of schizophrenia here, because on this machine

$ grep ALIGN pg_config.h
#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 4
#define ALIGNOF_LONG_LONG_INT 4
#define ALIGNOF_SHORT 2
#define MAXIMUM_ALIGNOF 4

Basically, therefore, ISTM that it is not a good thing that the atomics
code thinks it can rely on 8-byte-aligned data when the entire rest of
the system believes that 4-byte alignment is enough for anything.

That's a hardware requirement, we can't do much about it. Several
[micro-]architectures don't support unaligned atomic 8 byte writes.

I was wondering why the shm_toc code was using BUFFERALIGN and not
MAXALIGN, and I now suspect that the answer is "it's an entirely
undocumented kluge to make the atomics code not crash on 32-bit
machines, so long as nobody puts a pg_atomic_uint64 anywhere except in
a shm_toc".

I don't think there were any atomics in affected code until earlier
today... And given it didn't work for shm_toc anyway, I'm not quite
following.

I'm not sure that that's good enough, and I'm damn sure that it
shouldn't be undocumented.

8 byte alignment would be good enough, so BUFFERALIGN ought to be
sufficient. But it'd be nicer to have a separate more descriptive knob.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#14)
Re: Atomics for heap_parallelscan_nextpage()

On Wed, Aug 16, 2017 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I was wondering why the shm_toc code was using BUFFERALIGN and not
MAXALIGN, and I now suspect that the answer is "it's an entirely
undocumented kluge to make the atomics code not crash on 32-bit
machines, so long as nobody puts a pg_atomic_uint64 anywhere except
in a shm_toc".

Well, shm_toc considerably predates 64-bit atomics, so I think the
causality cannot run in that direction. shm_toc.c first appeared in
the tree in January of 2014. src/include/port/atomics didn't show up
until September of that year, and 64-bit atomics weren't actually
usable in practice until e8fdbd58fe564a29977f4331cd26f9697d76fc40 in
April of 2017.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#15)
Re: Atomics for heap_parallelscan_nextpage()

On 2017-08-16 13:44:28 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Don't think we require BUFFERALIGN - MAXALIGN ought to be
sufficient.

Uh, see my other message just now.

Yup, you're right.

The use of BUFFERALIGN presumably is to space out things
into different cachelines, but that doesn't really seem to be important
with this. Then we can just avoid defining the new macro...

I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
different reason: if the caller has specified the exact amount of space it
needs, having shm_toc_create discard some could lead to an unexpected
failure.

Well, that's why shm_toc_estimate() increases the size appropriately.

I wonder whether maybe shm_toc_create should just error out if the
number it's handed isn't aligned already.

I think that's going to be harder atm, because it's not the size shm_toc
computes, it's what the caller to shm_toc_estimate_chunk() provides.
And that size is already guaranteed to be upsized by BUFFERALIGN in
shm_toc_estimate_chunk(). It's just that the base-offset from where the
allocations start isn't aligned.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: Atomics for heap_parallelscan_nextpage()

On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a
different reason: if the caller has specified the exact amount of space it
needs, having shm_toc_create discard some could lead to an unexpected
failure.

Well, that's why Heikki also patched shm_toc_estimate. With the
patch, if the size being used in shm_toc_create comes from
shm_toc_estimate, it will always be aligned and nothing bad will
happen. If the user invents another size out of whole cloth, then
they might get a few bytes less than they expect, but that's what you
get for not using shm_toc_estimate().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#17)
Re: Atomics for heap_parallelscan_nextpage()

On August 16, 2017 10:47:23 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 16, 2017 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I was wondering why the shm_toc code was using BUFFERALIGN and not
MAXALIGN, and I now suspect that the answer is "it's an entirely
undocumented kluge to make the atomics code not crash on 32-bit
machines, so long as nobody puts a pg_atomic_uint64 anywhere except
in a shm_toc".

Well, shm_toc considerably predates 64-bit atomics, so I think the
causality cannot run in that direction. shm_toc.c first appeared in
the tree in January of 2014. src/include/port/atomics didn't show up
until September of that year, and 64-bit atomics weren't actually
usable in practice until e8fdbd58fe564a29977f4331cd26f9697d76fc40 in
April of 2017.

Well, not for core code. I certainly know about production code using it, because crusty platforms are considered irrelevant...

Independent of that, a comment explaining what the BUFFERALIGN is intending would be good.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#16)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#21)
#24Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#22)
#25Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#3)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#23)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#25)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#30)
#32David Rowley
dgrowleyml@gmail.com
In reply to: Heikki Linnakangas (#3)