Shared Memory: How to use SYSV rather than MMAP ?

Started by REIX, Tonyover 7 years ago39 messageshackers
Jump to latest
#1REIX, Tony
tony.reix@atos.net

Hi,

On AIX, since with MMAP we have only 4K pages though we can have 64K pages with SYSV, we'd like to experiment with SYSV rather than MMAP and measure the impact to the performance.

Looking at file: src/include/storage/dsm_impl.h , it seemed to me that replacing the line:

#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_POSIX
by the line:
#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_SYSV

was the right thing to do. Plus some changes like:

export LDR_CNTRL=SHMPSIZE=64K

ldedit -btextpsize=64k -bdatapsize=64k -bstackpsize=64k ...../postgres

However, when looking at details by means of procmap tool, it is unclear if that worked or not.

Maybe I was lost by the variables:

HAVE_SHM_OPEN . USE_DSM_POSIX . USE_DSM_SYSV . USE_DSM_MMAP

which are all defined.

So, what should I do in order to use SYSV rather than MMAP for the Shared Memory ?

(PostgreSQL v11.1)

Thanks/Regards,

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;

#2Thomas Munro
thomas.munro@gmail.com
In reply to: REIX, Tony (#1)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Tue, Nov 20, 2018 at 11:11 PM REIX, Tony <tony.reix@atos.net> wrote:

On AIX, since with MMAP we have only 4K pages though we can have 64K pages with SYSV, we'd like to experiment with SYSV rather than MMAP and measure the impact to the performance.

Looking at file: src/include/storage/dsm_impl.h , it seemed to me that replacing the line:

#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_POSIX
by the line:
#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_SYSV

Hi Tony,

SHOW dynamic_shared_memory_type to see which one it's actually using,
and set it in postgresql.conf to change it.

However, when looking at details by means of procmap tool, it is unclear if that worked or not.

These segments are short-lived ones used for parallel query. I
haven't used AIX recently but I suspect procmap -X will show them as
different types and show the page size, but you'd have to check that
while it's actually running a parallel query. For example, a large
parallel hash join that runs for a while would do it, and in theory
you might be able to see a small performance improvement for larger
page sizes due to better TLB cache hit ratios.

--
Thomas Munro
http://www.enterprisedb.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: REIX, Tony (#1)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Tue, Nov 20, 2018 at 5:11 AM REIX, Tony <tony.reix@atos.net> wrote:

On AIX, since with MMAP we have only 4K pages though we can have 64K pages with SYSV, we'd like to experiment with SYSV rather than MMAP and measure the impact to the performance.

Are you trying to move the main shared memory segment or the dynamic
shared memory segments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4REIX, Tony
tony.reix@atos.net
In reply to: Robert Haas (#3)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Robert,

We are trying to understand why pgbench on AIX is slower compared to Linux/Power on the same HW/Disks.

So, we have yet no idea about what may be the root cause and what should be changed.

So, changing: dynamic_shared_memory_type = sysv seems to help.

And maybe changing the main shared memory segment could also improve the performance. However, how one can change this?

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;
________________________________
De : Robert Haas <robertmhaas@gmail.com>
Envoyé : mardi 20 novembre 2018 13:53:53
À : REIX, Tony
Cc : pgsql-hackers@postgresql.org; EMPEREUR-MOT, SYLVIE
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

On Tue, Nov 20, 2018 at 5:11 AM REIX, Tony <tony.reix@atos.net> wrote:

On AIX, since with MMAP we have only 4K pages though we can have 64K pages with SYSV, we'd like to experiment with SYSV rather than MMAP and measure the impact to the performance.

Are you trying to move the main shared memory segment or the dynamic
shared memory segments?

--
Robert Haas
EnterpriseDB: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&amp;amp;data=01%7C01%7Ctony.reix%40atos.net%7C09c690fe81b9489e135d08d64ee74b7f%7C33440fc6b7c7412cbb730e70b0198d5a%7C0&amp;amp;sdata=oj%2Fd7djWk16Bb8%2F2I9eiqlWnRBfcFNjYtZCj%2FHd3Qp0%3D&amp;amp;reserved=0
The Enterprise PostgreSQL Company

#5Robert Haas
robertmhaas@gmail.com
In reply to: REIX, Tony (#4)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Tue, Nov 20, 2018 at 8:36 AM REIX, Tony <tony.reix@atos.net> wrote:

We are trying to understand why pgbench on AIX is slower compared to Linux/Power on the same HW/Disks.

So, we have yet no idea about what may be the root cause and what should be changed.

So, changing: dynamic_shared_memory_type = sysv seems to help.

And maybe changing the main shared memory segment could also improve the performance. However, how one can change this?

There's no configuration setting for the main shared memory segment,
but removing #define USE_ANONYMOUS_SHMEM from sysv_shmem.c would
probably do the trick.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6REIX, Tony
tony.reix@atos.net
In reply to: Robert Haas (#5)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Robert,

YES ! Reading this file, your suggestion should work ! Thx !

I've rebuilt and run the basic tests. We'll relaunch our tests asap.

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;
________________________________
De : Robert Haas <robertmhaas@gmail.com>
Envoyé : mardi 20 novembre 2018 15:53:11
À : REIX, Tony
Cc : pgsql-hackers@postgresql.org; EMPEREUR-MOT, SYLVIE
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

On Tue, Nov 20, 2018 at 8:36 AM REIX, Tony <tony.reix@atos.net> wrote:

We are trying to understand why pgbench on AIX is slower compared to Linux/Power on the same HW/Disks.

So, we have yet no idea about what may be the root cause and what should be changed.

So, changing: dynamic_shared_memory_type = sysv seems to help.

And maybe changing the main shared memory segment could also improve the performance. However, how one can change this?

There's no configuration setting for the main shared memory segment,
but removing #define USE_ANONYMOUS_SHMEM from sysv_shmem.c would
probably do the trick.

--
Robert Haas
EnterpriseDB: https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&amp;amp;data=01%7C01%7Ctony.reix%40atos.net%7C723ccf057a79436bcf9208d64ef7f48b%7C33440fc6b7c7412cbb730e70b0198d5a%7C0&amp;amp;sdata=ZBRv1Ja1THRJH2symVaZSLjGQ4f9hRP9kw27hFlPdAE%3D&amp;amp;reserved=0
The Enterprise PostgreSQL Company

#7Thomas Munro
thomas.munro@gmail.com
In reply to: REIX, Tony (#6)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Wed, Nov 21, 2018 at 4:37 AM REIX, Tony <tony.reix@atos.net> wrote:

YES ! Reading this file, your suggestion should work ! Thx !

I've rebuilt and run the basic tests. We'll relaunch our tests asap.

I would be surprised if that makes a difference:
anonymous-mmap-then-fork and SysV shm are just two different ways to
exchange mappings between processes, but I'd expect the virtual memory
object itself to be basically the same, in terms of constraints that
might affect page size at least.

If you were talking about mmap backed by a file (which is what you get
for temporary parallel query segments if you tell it to use
dynamic_shared_memory_type = mmap), that might be a different matter,
because then the block size of the file system backing it might come
into the picture and limit the kernel's options. For example, that is
why (with default settings) Parallel Hash can't use large pages on
Linux (because Linux's POSIX shm_open() really just opens files on
/dev/shm, which has a 4k block size), but can use them on FreeBSD
(because its shm_open() isn't bound to page sizes, it can and
sometimes decides to use large pages).

--
Thomas Munro
http://www.enterprisedb.com

#8Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#7)
Re: Shared Memory: How to use SYSV rather than MMAP ?

Hi,

On 2018-11-21 09:00:58 +1300, Thomas Munro wrote:

On Wed, Nov 21, 2018 at 4:37 AM REIX, Tony <tony.reix@atos.net> wrote:

YES ! Reading this file, your suggestion should work ! Thx !

I've rebuilt and run the basic tests. We'll relaunch our tests asap.

I would be surprised if that makes a difference:
anonymous-mmap-then-fork and SysV shm are just two different ways to
exchange mappings between processes, but I'd expect the virtual memory
object itself to be basically the same, in terms of constraints that
might affect page size at least.

I don't think that's true on many systems, FWIW. On linux there's
certainly different behaviour, and e.g. the way to get hugepages for
anon-mmap and SysV shmem aren't the same. [1]http://archives.postgresql.org/message-id/2AE143D2-87D3-4AD1-AC78-CE2258230C05%40FreeBSD.org strongly suggests that
that's not the case on FreeBSD either (with sysv shmem being
better). I'd attached a patch to implement a GUC to allow users to
choose the shmem implementation back then [2]http://archives.postgresql.org/message-id/20140422121921.GD4449%40awork2.anarazel.de.

[1]: http://archives.postgresql.org/message-id/2AE143D2-87D3-4AD1-AC78-CE2258230C05%40FreeBSD.org
[2]: http://archives.postgresql.org/message-id/20140422121921.GD4449%40awork2.anarazel.de

Greetings,

Andres Freund

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#8)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Wed, Nov 21, 2018 at 9:07 AM Andres Freund <andres@anarazel.de> wrote:

On 2018-11-21 09:00:58 +1300, Thomas Munro wrote:

On Wed, Nov 21, 2018 at 4:37 AM REIX, Tony <tony.reix@atos.net> wrote:

YES ! Reading this file, your suggestion should work ! Thx !

I've rebuilt and run the basic tests. We'll relaunch our tests asap.

I would be surprised if that makes a difference:
anonymous-mmap-then-fork and SysV shm are just two different ways to
exchange mappings between processes, but I'd expect the virtual memory
object itself to be basically the same, in terms of constraints that
might affect page size at least.

I don't think that's true on many systems, FWIW. On linux there's
certainly different behaviour, and e.g. the way to get hugepages for
anon-mmap and SysV shmem aren't the same.

Right, when asking for them explicitly the API is different (SHM_HUGE
flag to shmget(), MAP_HUGETLB flag to mmap()). Actually I was
expecting AIX to be more like FreeBSD and Solaris, where you don't do
that, the OS just decides what page size to give you, but after some
quality time with google I now see that it's more like Linux in the
SysV case... there is an explicit flag:

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.performance/large_pages_shared_mem_segs.htm

You also need some special privileges:

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.performance/large_page_ovw.htm

As for the main shared buffers area using anon-mmap, I wonder if it
would automagically use large pages if you have the privileges and set
the LDR_CNTRL environment variable (or the equivalent XCOFF header for
the binary):

https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.performance/set_env_variable_lpages.htm

[1] strongly suggests that
that's not the case on FreeBSD either (with sysv shmem being
better). I'd attached a patch to implement a GUC to allow users to
choose the shmem implementation back then [2].

Surprising. I'd like to know if that's still true. SysV shm is not
nice, and if there is anything accidentally better about its
performance, I'd love to know what. That report (slightly) predates
this work (maybe causally linked), which fixed various VM scale
problems hit by PostgreSQL:
http://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf

--
Thomas Munro
http://www.enterprisedb.com

#10REIX, Tony
tony.reix@atos.net
In reply to: Thomas Munro (#9)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Thomas, Andres,

I still have to reread/study in depth the discussion in this thread in order to understand all these information. However, we've already got a very good performance improvement of pgbench on AIX 7.2 / Power9 with that change: + ~38% in best case. See below for the details.

This +38% improvement has been measured by comparison with a PostgreSQL v11.1 code which was built with: XLC -O2 + power9-tuning, plus some changes about inlining for AIX and some fixes dealing with issues with XLC and PostgreSQL #ifdef. Maybe GCC provides better results, we'll know later.

Once we are done with this performance analysis campaign, I'll have to submit patches.

Meanwhile, if anyone has ideas about where the choices made for PostgreSQL on Linux may have an impact to the performance on AIX, I'm very interested!

Regards,

Tony

Changes in PostgreSQL11.1 sources for SYSV large pages (64K) support :

* Main shared memory segment in sysv_shmem.c

removal of #define USE_ANONYMOUS_SHMEM

* Dynamic shared memory implementations in src/include/storage/dsm_impl.h :

#define USE_DSM_POSIX
// #define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_POSIX
#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_SYSV
#endif

* Changes in PostgreSQL11.1 XCOFF binary with ledit :

* ldedit -btextpsize=64K -bdatapsize=64K -bstackpsize=64K /opt/freeware/bin/postgres_64
* Env variable LDR_CNTRL=SHMPSIZE=64K

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;

#11REIX, Tony
tony.reix@atos.net
In reply to: REIX, Tony (#10)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Andres, Thomas,

Here is a patch for enabling SystemV Shared Memory on AIX, for 64K or bigger page size, rather than using MMAP shared memory, which is slower on AIX.

We have tested this code with 64K pages and pgbench, on AIX 7.2 TL2 Power 9, and it provided a maximum of +37% improvement.

We'll test this code with Large Pages (SHM_LGPAGE | SHM_PIN | S_IRUSR | S_IWUSR flags of shmget() ) ASAP.

However, I wanted first to get your comments about this change in order to improve it for acceptance.

Thanks/Regards,

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;
________________________________
De : REIX, Tony
Envoyé : mercredi 21 novembre 2018 09:45:12
À : Thomas Munro; Andres Freund
Cc : Robert Haas; Pg Hackers; EMPEREUR-MOT, SYLVIE; BERGAMINI, DAMIEN
Objet : RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Thomas, Andres,

I still have to reread/study in depth the discussion in this thread in order to understand all these information. However, we've already got a very good performance improvement of pgbench on AIX 7.2 / Power9 with that change: + ~38% in best case. See below for the details.

This +38% improvement has been measured by comparison with a PostgreSQL v11.1 code which was built with: XLC -O2 + power9-tuning, plus some changes about inlining for AIX and some fixes dealing with issues with XLC and PostgreSQL #ifdef. Maybe GCC provides better results, we'll know later.

Once we are done with this performance analysis campaign, I'll have to submit patches.

Meanwhile, if anyone has ideas about where the choices made for PostgreSQL on Linux may have an impact to the performance on AIX, I'm very interested!

Regards,

Tony

Changes in PostgreSQL11.1 sources for SYSV large pages (64K) support :

* Main shared memory segment in sysv_shmem.c

removal of #define USE_ANONYMOUS_SHMEM

* Dynamic shared memory implementations in src/include/storage/dsm_impl.h :

#define USE_DSM_POSIX
// #define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_POSIX
#define DEFAULT_DYNAMIC_SHARED_MEMORY_TYPE DSM_IMPL_SYSV
#endif

* Changes in PostgreSQL11.1 XCOFF binary with ledit :

* ldedit -btextpsize=64K -bdatapsize=64K -bstackpsize=64K /opt/freeware/bin/postgres_64
* Env variable LDR_CNTRL=SHMPSIZE=64K

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;

Attachments:

postgresql-11.1-AIX-SysV-SharedMemory-LargePagesV2.patchtext/x-patch; name=postgresql-11.1-AIX-SysV-SharedMemory-LargePagesV2.patchDownload+24-2
#12Thomas Munro
thomas.munro@gmail.com
In reply to: REIX, Tony (#11)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Sat, Nov 24, 2018 at 4:54 AM REIX, Tony <tony.reix@atos.net> wrote:

Here is a patch for enabling SystemV Shared Memory on AIX, for 64K or bigger page size, rather than using MMAP shared memory, which is slower on AIX.

We have tested this code with 64K pages and pgbench, on AIX 7.2 TL2 Power 9, and it provided a maximum of +37% improvement.

You also mentioned changing from XLC to GCC. Did you test the various
changes in isolation? XLC->GCC, mmap->shmget, with/without
SHM_LGPAGE. 37% is a bigger performance change than I expected from
large pages, since reports from other architectures are single-digit
percentage increases with pgbench -S.

If just changing to GCC gives you a big speed-up, it could of course
just be different/better code generation (though that'd be a bit sad
for XLC), but I also wonder if the different atomics support in our
tree could be implicated.

We'll test this code with Large Pages (SHM_LGPAGE | SHM_PIN | S_IRUSR | S_IWUSR flags of shmget() ) ASAP.

However, I wanted first to get your comments about this change in order to improve it for acceptance.

I think we should respect the huge_pages GUC, as we do on Linux and
Windows (since there are downsides to using large pages, maybe not
everyone would want that). It could even be useful to allow different
page sizes to be requested by GUC (I see that DB2 has an option to use
16GB pages -- yikes). It also seems like a good idea to have a
shared_memory_type GUC as Andres proposed (see his link), instead of
using a compile time option. I guess it was made a compile time
option because nobody could imagine wanting to go back to SysV shm!
(I'm still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls, since
apparently other things like data segments etc apparently can, though
I can't find any text that says that's the case and I have no AIX
system).

--
Thomas Munro
http://www.enterprisedb.com

#13REIX, Tony
tony.reix@atos.net
In reply to: Thomas Munro (#12)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Thomas,

About reliability, I've compiled/tested with GCC/XLCC on 2 machines in order to check that my patches are OK (no impact to PostgreSQL tests, OK both with GCC & XLC).

We do not have yet performance comparison between GCC & XLC since, though we experimented with both, we moved from v11beta1 to beta4 to 11.0 and now with 11.1 . We'll do asap.

About performance, we have deeply compared MMAP (4KB) vs SysV (64KB) Shared Memory, for dynamic and main shared memory segments, with the SAME exact HW + SW environment, using XLC -O2 + tune=pwr9.

We have not yet experimented with Large Pages (16MB), however the flags added to the 3rd parameter of shmget() are said to have no impact to performance unless Large Pages are really used.

Same with Huge Pages (16GB). We'll study this later.

So, the +37% (maximum value seen. +29% in average) improvement is the result of the single change: MMAP 4K to SysV 64K.

(this improvement is due to 2 things: mmap on AIX has perf drawbacks vs Sys V ShMem, and 64K vs 4K).

That's for 64bit only, on AIX 7.2 only. About 32bit, we do not have done measures.

We'll have to discuss in more depth your last paragraph how to handle this not only for AIX in PostgreSQL code.

Regards,

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;
________________________________
De : Thomas Munro <thomas.munro@enterprisedb.com>
Envoyé : vendredi 23 novembre 2018 22:07:23
À : REIX, Tony
Cc : Andres Freund; Robert Haas; Pg Hackers; EMPEREUR-MOT, SYLVIE; BERGAMINI, DAMIEN
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

On Sat, Nov 24, 2018 at 4:54 AM REIX, Tony <tony.reix@atos.net> wrote:

Here is a patch for enabling SystemV Shared Memory on AIX, for 64K or bigger page size, rather than using MMAP shared memory, which is slower on AIX.

We have tested this code with 64K pages and pgbench, on AIX 7.2 TL2 Power 9, and it provided a maximum of +37% improvement.

You also mentioned changing from XLC to GCC. Did you test the various
changes in isolation? XLC->GCC, mmap->shmget, with/without
SHM_LGPAGE. 37% is a bigger performance change than I expected from
large pages, since reports from other architectures are single-digit
percentage increases with pgbench -S.

If just changing to GCC gives you a big speed-up, it could of course
just be different/better code generation (though that'd be a bit sad
for XLC), but I also wonder if the different atomics support in our
tree could be implicated.

We'll test this code with Large Pages (SHM_LGPAGE | SHM_PIN | S_IRUSR | S_IWUSR flags of shmget() ) ASAP.

However, I wanted first to get your comments about this change in order to improve it for acceptance.

I think we should respect the huge_pages GUC, as we do on Linux and
Windows (since there are downsides to using large pages, maybe not
everyone would want that). It could even be useful to allow different
page sizes to be requested by GUC (I see that DB2 has an option to use
16GB pages -- yikes). It also seems like a good idea to have a
shared_memory_type GUC as Andres proposed (see his link), instead of
using a compile time option. I guess it was made a compile time
option because nobody could imagine wanting to go back to SysV shm!
(I'm still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls, since
apparently other things like data segments etc apparently can, though
I can't find any text that says that's the case and I have no AIX
system).

--
Thomas Munro
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&amp;amp;data=01%7C01%7Ctony.reix%40atos.net%7C1e06667e1d304905267c08d65187c41e%7C33440fc6b7c7412cbb730e70b0198d5a%7C0&amp;amp;sdata=%2Feor3O4UXCcXlLrJWXQS8HWpfa77b86HCYQ3Ot24Vzk%3D&amp;amp;reserved=0

#14REIX, Tony
tony.reix@atos.net
In reply to: Thomas Munro (#12)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Thomas,

You said:

I think we should respect the huge_pages GUC, as we do on Linux and
Windows (since there are downsides to using large pages, maybe not
everyone would want that). It could even be useful to allow different
page sizes to be requested by GUC (I see that DB2 has an option to use
16GB pages -- yikes). It also seems like a good idea to have a
shared_memory_type GUC as Andres proposed (see his link), instead of
using a compile time option. I guess it was made a compile time
option because nobody could imagine wanting to go back to SysV shm!
(I'm still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls, since
apparently other things like data segments etc apparently can, though
I can't find any text that says that's the case and I have no AIX
system).

I guess that you are talking about CPP & C variables:
#ifndef MAP_HUGETLB
HUGE_PAGES_ON
HUGE_PAGES_TRY)
in addition to : huge_pages = .... in postgresql.conf file.

For now, these variables for Huge Pages are used only for MMAP.
About SysV shared memory, as far as I know, shmget() options for AIX and Linux are different.
Moreover, AIX also provides Large Pages (16MB).

About Andres proposal, I've read his link. However, the patch he proposed:
0001-Add-shared_memory_type-GUC.patch</messages/by-id/attachment/33090/0001-Add-shared_memory_type-GUC.patch&gt;
is no more available (Attachment not found).

I confirm that I got the SysV Shared Memory by means of a "compile time option".

About "still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls" I confirm that,
on AIX, only 4K pages are available for mmap().

I do agree that options in the postgresql.conf file would be the best solution,
since the code for SysV shared memory and MMAP shared memory seems always present.

Regards,

Tony

#15REIX, Tony
tony.reix@atos.net
In reply to: REIX, Tony (#14)
RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Thomas,

Here is the patch we are using now on AIX for enabling SysV shm for AIX, which improves greatly the performance on AIX.

It is compile time.

It seems to me that you'd like this to become a shared_memory_type GUC. Correct? However, I do not know how to do.

Even as-is, this patch would greatly improve the performance of PostgreSQL v11.1 in the field on AIX machines. So, we'd like this change to be available for AIX asap.

What are the next steps to get this patch accepted? or What are your suggestions for improving it?

Thanks/Regards

Cordialement,

Tony Reix

tony.reix@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net<https://mail.ad.bull.net/owa/redir.aspx?C=PvphmPvCZkGrAgHVnWGsdMcDKgzl_dEIsM6rX0g4u4v8V81YffzBGkWrtQeAXNovd3ttkJL8JIc.&amp;URL=http%3a%2f%2fwww.atos.net%2f&gt;
________________________________
De : REIX, Tony
Envoyé : lundi 26 novembre 2018 18:00:15
À : Thomas Munro
Cc : Andres Freund; Robert Haas; Pg Hackers; EMPEREUR-MOT, SYLVIE; BERGAMINI, DAMIEN
Objet : RE: Shared Memory: How to use SYSV rather than MMAP ?

Hi Thomas,

You said:

I think we should respect the huge_pages GUC, as we do on Linux and
Windows (since there are downsides to using large pages, maybe not
everyone would want that). It could even be useful to allow different
page sizes to be requested by GUC (I see that DB2 has an option to use
16GB pages -- yikes). It also seems like a good idea to have a
shared_memory_type GUC as Andres proposed (see his link), instead of
using a compile time option. I guess it was made a compile time
option because nobody could imagine wanting to go back to SysV shm!
(I'm still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls, since
apparently other things like data segments etc apparently can, though
I can't find any text that says that's the case and I have no AIX
system).

I guess that you are talking about CPP & C variables:
#ifndef MAP_HUGETLB
HUGE_PAGES_ON
HUGE_PAGES_TRY)
in addition to : huge_pages = .... in postgresql.conf file.

For now, these variables for Huge Pages are used only for MMAP.
About SysV shared memory, as far as I know, shmget() options for AIX and Linux are different.
Moreover, AIX also provides Large Pages (16MB).

About Andres proposal, I've read his link. However, the patch he proposed:
0001-Add-shared_memory_type-GUC.patch</messages/by-id/attachment/33090/0001-Add-shared_memory_type-GUC.patch&gt;
is no more available (Attachment not found).

I confirm that I got the SysV Shared Memory by means of a "compile time option".

About "still kinda surprised that MAP_ANONYMOUS memory can't be coaxed
into large pages by environment variables or loader controls" I confirm that,
on AIX, only 4K pages are available for mmap().

I do agree that options in the postgresql.conf file would be the best solution,
since the code for SysV shared memory and MMAP shared memory seems always present.

Regards,

Tony

Attachments:

postgresql-11.1-AIX-SysV-SharedMemory-LargePagesV2.patchtext/x-patch; name=postgresql-11.1-AIX-SysV-SharedMemory-LargePagesV2.patchDownload+24-2
#16Thomas Munro
thomas.munro@gmail.com
In reply to: REIX, Tony (#15)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Wed, Dec 19, 2018 at 4:17 AM REIX, Tony <tony.reix@atos.net> wrote:

Here is the patch we are using now on AIX for enabling SysV shm for AIX, which improves greatly the performance on AIX.

It is compile time.

It seems to me that you'd like this to become a shared_memory_type GUC. Correct? However, I do not know how to do.

Even as-is, this patch would greatly improve the performance of PostgreSQL v11.1 in the field on AIX machines. So, we'd like this change to be available for AIX asap.

What are the next steps to get this patch accepted? or What are your suggestions for improving it?

Hi Tony,

Since it's not fixing a bug, we wouldn't back-patch that into existing
releases. But I agree that we should do something like this for
PostgreSQL 12, and I think we should make it user configurable.

Here is a quick rebase of Andres's shared_memory_type patch for
master, so that you can put shared_memory_type=sysv in postgresql.conf
to get the old pre-9.3 behaviour (this may also be useful for other
operating systems). Here also is a "blind" patch that makes it
respect huge_pages=try/on on AIX (or at least, I think it does; I
don't have an AIX to try it, it almost certainly needs some
adjustments). Thoughts?

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Add-shared_memory_type-GUC.patchapplication/x-patch; name=0001-Add-shared_memory_type-GUC.patchDownload+91-27
0002-Add-huge-page-support-for-AIX.patchapplication/octet-stream; name=0002-Add-huge-page-support-for-AIX.patchDownload+52-8
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#16)
Re: Shared Memory: How to use SYSV rather than MMAP ?

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Since it's not fixing a bug, we wouldn't back-patch that into existing
releases. But I agree that we should do something like this for
PostgreSQL 12, and I think we should make it user configurable.

I'm -1 on making this user configurable via a GUC; that adds documentation
and compatibility burdens that we don't need, for something of no value
to 99.99% of users. The fact that the default would need to be
platform-dependent just makes that tradeoff even worse. I think the other
0.01% who need to change the default (and are bright enough to be doing
the right thing for the right reasons) could certainly handle something
like a pg_config_manual.h control symbol --- see USE_PPC_LWARX_MUTEX_HINT
for a precedent that I think applies well here. So I'd favor just doing
it that way.

regards, tom lane

#18Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#17)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On Wed, Dec 26, 2018 at 11:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Since it's not fixing a bug, we wouldn't back-patch that into existing
releases. But I agree that we should do something like this for
PostgreSQL 12, and I think we should make it user configurable.

I'm -1 on making this user configurable via a GUC; that adds documentation
and compatibility burdens that we don't need, for something of no value
to 99.99% of users. The fact that the default would need to be
platform-dependent just makes that tradeoff even worse. I think the other
0.01% who need to change the default (and are bright enough to be doing
the right thing for the right reasons) could certainly handle something
like a pg_config_manual.h control symbol --- see USE_PPC_LWARX_MUTEX_HINT
for a precedent that I think applies well here. So I'd favor just doing
it that way.

I disagree. I think there is a growing body of evidence that
b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67 killed performance on many
types of non-Linux systems. This is the first report I recall about
AIX, but there have been previous complaints about some BSD variants.

When I was working on developing that commit, I went and tried to find
out all of the different ways of getting some shared memory from
various operating systems and compared them. Anonymous shared memory
allocated via mmap() was the hands-down winner in almost every
respect: supported on many systems, no annoying operating system
limits, automatic deallocation when the last process exits. It had
the disadvantage that it didn't have an equivalent of nattch, which
meant that we had to keep a small System V segment around just for
that purpose, but otherwise it looked really good.

However, I only considered the situation from a functional point of
view. I never considered the possibility that the method used to
obtain shared memory from the operating system would affect the
performance of that shared memory. To my surprise, however, it does,
and on multiple operating systems from various parts of the UNIX
family tree. If I'd known that at the time, that commit probably would
not have gone into the tree in the form that it did. I suspect that
there would have been a loud clamor for configurability, and I think I
would have agreed.

You may be right that this is of no value to a high percentage our
users, but I think that's only because a high percentage of our users
run Linux or Windows, which happen not to be affected. I'm rather
proud, though, of PostgreSQL's long history of trying to be
cross-platform. Even if operating systems like AIX or BSD are a small
percentage of the overall user base, I think it's totally fair to add
a GUC which likely be helpful to a large percentage of those people,
and I think the GUC proposed here likely falls into that category.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#18)
Re: Shared Memory: How to use SYSV rather than MMAP ?

On December 26, 2018 6:48:31 PM GMT+01:00, Robert Haas <robertmhaas@gmail.com> wrote:

I disagree. I think there is a growing body of evidence that
b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67 killed performance on many
types of non-Linux systems. This is the first report I recall about
AIX, but there have been previous complaints about some BSD variants.

Exactly. I think we should have added this a few years ago.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#18)
Re: Shared Memory: How to use SYSV rather than MMAP ?

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 26, 2018 at 11:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm -1 on making this user configurable via a GUC; that adds documentation
and compatibility burdens that we don't need, for something of no value
to 99.99% of users.

...
You may be right that this is of no value to a high percentage our
users, but I think that's only because a high percentage of our users
run Linux or Windows, which happen not to be affected. I'm rather
proud, though, of PostgreSQL's long history of trying to be
cross-platform. Even if operating systems like AIX or BSD are a small
percentage of the overall user base, I think it's totally fair to add
a GUC which likely be helpful to a large percentage of those people,
and I think the GUC proposed here likely falls into that category.

You misread what I said. I don't say that we shouldn't fix this;
what I'm saying is we should not do so via a user-configurable knob.
We should be able to auto-configure this and just handle it internally.
I have zero faith in the idea that users would set the knob correctly.

regards, tom lane

#21Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#18)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#20)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#21)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#19)
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#23)
#26REIX, Tony
tony.reix@atos.net
In reply to: Thomas Munro (#16)
#27Thomas Munro
thomas.munro@gmail.com
In reply to: REIX, Tony (#26)
#28REIX, Tony
tony.reix@atos.net
In reply to: Thomas Munro (#27)
#29Thomas Munro
thomas.munro@gmail.com
In reply to: REIX, Tony (#28)
#30Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#29)
#31Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#30)
#32REIX, Tony
tony.reix@atos.net
In reply to: REIX, Tony (#1)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#30)
#34Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#33)
#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#34)
#36REIX, Tony
tony.reix@atos.net
In reply to: Alvaro Herrera (#35)
#37Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: REIX, Tony (#36)
#38REIX, Tony
tony.reix@atos.net
In reply to: Alvaro Herrera (#37)
#39REIX, Tony
tony.reix@atos.net
In reply to: Thomas Munro (#34)