Supporting huge pages on Windows

Started by Tsunakawa, Takayukiover 9 years ago70 messageshackers
Jump to latest
#1Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com

Hello,

The attached patch implements huge_pages on Windows. I'll add this to the CommitFest.

The performance improvement was about 2% with the following select-only pgbench. The scaling factor is 200, where the database size is roughly 3GB. I ran the benchmark on my Windows 10 PC with 6 CPU cores and 16GB of RAM.

  pgbench -c18 -j6 -T60 -S bench

Before running pgbench, I used pg_prewarm to cache all pgbench tables and indexes (excluding the history table) in the 4GB shared buffers. The averages of running pgbench three times are:

huge_pages=off: 70412 tps
huge_pages=on : 72100 tps

The purpose of pg_ctl.c modification is to retain "Lock pages in memory" Windows user right in postgres. That user right is necessary for the large-page support. The current pg_ctl removes all privileges when spawning postgres, which is overkill. The system administrator should be able to assign appropriate privileges to the PostgreSQL service account.

Credit: This patch is based on Thomas Munro's one.

Regards
Takayuki Tsunakawa

Attachments:

win_large_page.patchapplication/octet-stream; name=win_large_page.patchDownload+116-17
#2Robert Haas
robertmhaas@gmail.com
In reply to: Tsunakawa, Takayuki (#1)
Re: Supporting huge pages on Windows

On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Credit: This patch is based on Thomas Munro's one.

How are they different?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Robert Haas (#2)
Re: Supporting huge pages on Windows

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Credit: This patch is based on Thomas Munro's one.

How are they different?

As Thomas mentioned, his patch (only win32_shmem.c) might not have been able to compile (though I didn't try.) And it didn't have error processing or documentation. I added error handling, documentation, comments, and a little bit of structural change. The possibly biggest change, though it's only one-liner in pg_ctl.c, is additionally required. I failed to include it in the first patch. The attached patch includes that.

Regards
Takayuki Tsunakawa

Attachments:

win_large_page_v2.patchapplication/octet-stream; name=win_large_page_v2.patchDownload+118-24
#4Thomas Munro
thomas.munro@gmail.com
In reply to: Tsunakawa, Takayuki (#3)
Re: Supporting huge pages on Windows

On Wed, Sep 28, 2016 at 1:26 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Credit: This patch is based on Thomas Munro's one.

How are they different?

As Thomas mentioned, his patch (only win32_shmem.c) might not have been able to compile (though I didn't try.) And it didn't have error processing or documentation. I added error handling, documentation, comments, and a little bit of structural change. The possibly biggest change, though it's only one-liner in pg_ctl.c, is additionally required. I failed to include it in the first patch. The attached patch includes that.

Right, my patch[1]/messages/by-id/CAEepm=075-bgHi_VDt4SCAmt+o_+1XaRap2zh7XwfZvT294oHA@mail.gmail.com was untested sketch code. I had heard complaints
about poor performance with large shared_buffers on Windows, and then
I bumped into some recommendations to turn large pages on for a couple
of other RDBMSs if using large buffer pool. So I wrote that patch
based on the documentation to try some time in the future if I ever
got trapped in a room with Windows, but I posted it just in case the
topic would interest other hackers. Thanks for picking it up!

huge_pages=off: 70412 tps
huge_pages=on : 72100 tps

Hmm. I guess it could be noise or random code rearrangement effects.
I saw your recent post[2]/messages/by-id/0A3221C70F24FB45833433255569204D1F5EE995@G01JPEXMBYT05 proposing to remove the sentence about the
512MB effective limit and I wondered why you didn't go to larger sizes
with a larger database and more run time. But I will let others with
more benchmarking experience comment on the best approach to
investigate Windows shared_buffers performance.

[1]: /messages/by-id/CAEepm=075-bgHi_VDt4SCAmt+o_+1XaRap2zh7XwfZvT294oHA@mail.gmail.com
[2]: /messages/by-id/0A3221C70F24FB45833433255569204D1F5EE995@G01JPEXMBYT05

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Thomas Munro (#4)
Re: Supporting huge pages on Windows

From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]

huge_pages=off: 70412 tps
huge_pages=on : 72100 tps

Hmm. I guess it could be noise or random code rearrangement effects.

I'm not the difference was a random noise, because running multiple set of three runs of pgbench (huge_pages = on, off, on, off, on...) produced similar results. But I expected a bit greater improvement, say, +10%. There may be better benchmark model where the large page stands out, but I think pgbench is not so bad because its random data access would cause TLB cache misses.

I saw your recent post[2] proposing to remove the sentence about the 512MB
effective limit and I wondered why you didn't go to larger sizes with a
larger database and more run time. But I will let others with more
benchmarking experience comment on the best approach to investigate Windows
shared_buffers performance.

Yes, I could have gone to 8GB of shared_buffers because my PC has 16GB of RAM, but I felt the number of variations was sufficient. Anyway, positive comments on benchmarking would be appreciated.

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Tsunakawa, Takayuki (#5)
Re: Supporting huge pages on Windows

On Wed, Sep 28, 2016 at 7:32 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]

huge_pages=off: 70412 tps
huge_pages=on : 72100 tps

Hmm. I guess it could be noise or random code rearrangement effects.

I'm not the difference was a random noise, because running multiple set of three runs of pgbench (huge_pages = on, off, on, off, on...) produced similar results. But I expected a bit greater improvement, say, +10%. There may be better benchmark model where the large page stands out, but I think pgbench is not so bad because its random data access would cause TLB cache misses.

Your ~2.4% number is similar to what was reported for Linux with 4GB
shared_buffers:

/messages/by-id/20130913234125.GC13697@roobarb.crazydogs.org

Later in that thread there was a report of a dramatic ~15% increase in
"best result" TPS, but that was with 60GB of shared_buffers on a
machine with 256GB of RAM:

/messages/by-id/20131024060313.GA21888@toroid.org

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#6)
Re: Supporting huge pages on Windows

On 2016-10-11 09:57:48 +1300, Thomas Munro wrote:

Later in that thread there was a report of a dramatic ~15% increase in
"best result" TPS, but that was with 60GB of shared_buffers on a
machine with 256GB of RAM:

/messages/by-id/20131024060313.GA21888@toroid.org

FWIW, I've seen 2-3x increases with ~60GB of s_b.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Andres Freund (#7)
Re: Supporting huge pages on Windows

From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]

Your ~2.4% number is similar to what was reported for Linux with 4GB
shared_buffers:

/messages/by-id/20130913234125.GC13697@roobarb
.crazydogs.org

I'm relieved to know that a similar figure was gained on Linux. Thanks for the info.

Later in that thread there was a report of a dramatic ~15% increase in "best
result" TPS, but that was with 60GB of shared_buffers on a machine with
256GB of RAM:

/messages/by-id/20131024060313.GA21888@toroid.
org

From: Andres Freund [mailto:andres@anarazel.de]

FWIW, I've seen 2-3x increases with ~60GB of s_b.

Wow, nice figures. It's unfortunate that I don't have such a big machine available at hand.

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Tsunakawa, Takayuki (#8)
Re: Supporting huge pages on Windows

On Tue, Oct 11, 2016 at 1:36 PM, Tsunakawa, Takayuki <
tsunakawa.takay@jp.fujitsu.com> wrote:

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#10Magnus Hagander
magnus@hagander.net
In reply to: Tsunakawa, Takayuki (#3)
Re: Supporting huge pages on Windows

On Wed, Sep 28, 2016 at 2:26 AM, Tsunakawa, Takayuki <
tsunakawa.takay@jp.fujitsu.com> wrote:

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Credit: This patch is based on Thomas Munro's one.

How are they different?

As Thomas mentioned, his patch (only win32_shmem.c) might not have been
able to compile (though I didn't try.) And it didn't have error processing
or documentation. I added error handling, documentation, comments, and a
little bit of structural change. The possibly biggest change, though it's
only one-liner in pg_ctl.c, is additionally required. I failed to include
it in the first patch. The attached patch includes that.

For the pg_ctl changes, we're going from removing all privilieges from the
token, to removing none. Are there any other privileges that we should be
worried about? I think you may be correct in that it's overkill to do it,
but I think we need some more specifics to decide that.

Also, what happens with privileges that were granted to the groups that
were removed? Are they retained or lost?

Should we perhaps consider having pg_ctl instead *disable* all the
privileges (rather than removing them), using AdjustTokenPrivileges? As a
middle ground?

Taking a look at the actual code here, and a few small nitpicks:

+ errdetail("Failed system call was %s,
error code %lu", "LookupPrivilegeValue", GetLastError())));

this seems to be a new pattern of code -- for other similar cases it just
writes LookupPrivilegeValue inside the patch itself. I'm guessing the idea
was for translatability, but I think it's better we stick to the existing
pattern.

When AdjustTokenPrivileges() returns, you explicitly check for
ERROR_NOT_ALL_ASSIGNED, which is good. But we should probably also
explicitly check for ERROR_SUCCESS for that case. Right now that's the only
two possible options that can be returned, but in a future version other
result codes could be added and we'd miss them. Basically, "there should be
an else branch".

Is there a reason the error messages for AdjustTokenPrivileges() returning
false and ERROR_NOT_ALL_ASSIGNED is different?

There are three repeated blocks of
+ if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY)

It threw me off in the initial reading, until I realized the upper levels
of them can change the value of huge_pages.

I don't think changing the global variable huge_pages like that is a very
good idea. Wouldn't something like the attached be cleaner (not tested)? At
least I find that one easier to read.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

win_large_page3.patchtext/x-patch; charset=US-ASCII; name=win_large_page3.patchDownload+97-10
#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Magnus Hagander (#10)
Re: Supporting huge pages on Windows

On Sat, Dec 31, 2016 at 7:04 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Sep 28, 2016 at 2:26 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
On Sun, Sep 25, 2016 at 10:45 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Credit: This patch is based on Thomas Munro's one.

How are they different?

As Thomas mentioned, his patch (only win32_shmem.c) might not have been
able to compile (though I didn't try.) And it didn't have error processing
or documentation. I added error handling, documentation, comments, and a
little bit of structural change. The possibly biggest change, though it's
only one-liner in pg_ctl.c, is additionally required. I failed to include
it in the first patch. The attached patch includes that.

For the pg_ctl changes, we're going from removing all privilieges from the
token, to removing none. Are there any other privileges that we should be
worried about? I think you may be correct in that it's overkill to do it,
but I think we need some more specifics to decide that.

Also, what happens with privileges that were granted to the groups that were
removed? Are they retained or lost?

Should we perhaps consider having pg_ctl instead *disable* all the
privileges (rather than removing them), using AdjustTokenPrivileges? As a
middle ground?

Sounds like a good idea.

Taking a look at the actual code here, and a few small nitpicks:

+ errdetail("Failed system call was %s, error
code %lu", "LookupPrivilegeValue", GetLastError())));

this seems to be a new pattern of code -- for other similar cases it just
writes LookupPrivilegeValue inside the patch itself. I'm guessing the idea
was for translatability, but I think it's better we stick to the existing
pattern.

When AdjustTokenPrivileges() returns, you explicitly check for
ERROR_NOT_ALL_ASSIGNED, which is good. But we should probably also
explicitly check for ERROR_SUCCESS for that case. Right now that's the only
two possible options that can be returned, but in a future version other
result codes could be added and we'd miss them. Basically, "there should be
an else branch".

Is there a reason the error messages for AdjustTokenPrivileges() returning
false and ERROR_NOT_ALL_ASSIGNED is different?

There are three repeated blocks of
+ if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY)

It threw me off in the initial reading, until I realized the upper levels of
them can change the value of huge_pages.

I don't think changing the global variable huge_pages like that is a very
good idea. Wouldn't something like the attached be cleaner (not tested)? At
least I find that one easier to read.

Your version of the patch looks better than the previous one. Don't
you need to consider MEM_LARGE_PAGES in VirtualAllocEx call (refer
pgwin32_ReserveSharedMemoryRegion)? At least that is what is
mentioned in MSDN [1]https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx. Another point worth considering is that
currently for VirtualAllocEx() we use PAGE_READWRITE as flProtect
value, shouldn't it be same as used in CreateFileMapping() by patch.

[1]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Kapila (#11)
Re: Supporting huge pages on Windows

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Magnus Hagander
For the pg_ctl changes, we're going from removing all privilieges from the
token, to removing none. Are there any other privileges that we should be
worried about? I think you may be correct in that it's overkill to do it,
but I think we need some more specifics to decide that.

This page lists the privileges. Is there anyhing you are concerned about?

https://msdn.microsoft.com/ja-jp/library/windows/desktop/bb530716(v=vs.85).aspx

Also, what happens with privileges that were granted to the groups that
were removed? Are they retained or lost?

Are you referring to the privileges of Administrators and PowerUsers that pg_ctl removes? They are lost. The Windows user account who actually runs PostgreSQL needs SeLockMemory privilege.

Should we perhaps consider having pg_ctl instead *disable* all the
privileges (rather than removing them), using AdjustTokenPrivileges? As
a middle ground?

Sorry, I may misunderstand what you are suggesting, but AdjustTokenPrivilege() cannot enable the privilege which is not assigned to the user. Anyway, I think it's the user's responsibility (and freedom) to assign desired privileges, and pg_ctl's disabling all privileges is overkill.

+ errdetail("Failed system call was %s,
error code %lu", "LookupPrivilegeValue", GetLastError())));

this seems to be a new pattern of code -- for other similar cases it just
writes LookupPrivilegeValue inside the patch itself. I'm guessing the idea
was for translatability, but I think it's better we stick to the existing
pattern.

OK, modified.

When AdjustTokenPrivileges() returns, you explicitly check for
ERROR_NOT_ALL_ASSIGNED, which is good. But we should probably also
explicitly check for ERROR_SUCCESS for that case. Right now that's the only
two possible options that can be returned, but in a future version other
result codes could be added and we'd miss them. Basically, "there should
be an else branch".

OK, modified.

Is there a reason the error messages for AdjustTokenPrivileges() returning
false and ERROR_NOT_ALL_ASSIGNED is different?

As mentioned in the following page, the error cause is clearly defined. So, I thought it'd be better to give a specific hint message to help users troubleshoot the error.

https://msdn.microsoft.com/ja-jp/library/windows/desktop/aa375202(v=vs.85).aspx

ERROR_NOT_ALL_ASSIGNED
The token does not have one or more of the privileges specified in the NewState parameter. The function may succeed with this error value even if no privileges were adjusted. The PreviousState parameter indicates the privileges that were adjusted.

There are three repeated blocks of
+ if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY)

It threw me off in the initial reading, until I realized the upper levels
of them can change the value of huge_pages.

OK, I like your code.

I don't think changing the global variable huge_pages like that is a very
good idea.

Yes, actually, I was afraid that it might be controversial to change the GUC value. But I thought it may be better for "SHOW huge_pages" to reflect whether the huge_pages feature is effective. Otherwise, users don't know about that. For example, wal_buffers behaves similarly; if it's set to -1 (default), "SHOW wal_buffers" displays the actual wal buffer size, not -1. What do you think? Surely, the Linux code for huge_pages doesn't do that. I'm OK with either.

From: Amit Kapila [mailto:amit.kapila16@gmail.com]

Your version of the patch looks better than the previous one. Don't you
need to consider MEM_LARGE_PAGES in VirtualAllocEx call (refer
pgwin32_ReserveSharedMemoryRegion)? At least that is what is mentioned
in MSDN [1]. Another point worth considering is that currently for
VirtualAllocEx() we use PAGE_READWRITE as flProtect value, shouldn't it
be same as used in CreateFileMapping() by patch.

[1] -
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs
.85).aspx

No, it's not necessary. Please see PGSharedMemoryReAttach(), where VirtualFree() is called to free the reserved address space and then call MapViewOfFile() to allocate the already created shared memory to that area.

Regards
Takayuki Tsunakawa

Attachments:

win_large_pages_v4.patchapplication/octet-stream; name=win_large_pages_v4.patchDownload+103-16
#13Thomas Munro
thomas.munro@gmail.com
In reply to: Tsunakawa, Takayuki (#12)
Re: Supporting huge pages on Windows

On Thu, Jan 5, 2017 at 4:12 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

[win_large_pages_v4.patch]

Just a small suggestion about the wording in this patch:

+        This feature uses the large-page support on Windows. To use
the large-page
+        support, you need to assign Lock page in memory user right to
the Windows
+        user account which runs <productname>PostgreSQL</productname>.

In the Microsoft documentation I've seen, the privilege's name is
always written as "Lock Pages in Memory" (note: "Pages" plural, and
with initial capital letters). It's quite hard to parse the sentence
otherwise! How about this?

Huge pages are known as large pages on Windows. To use them,
you need to
assign the user right Lock Pages in Memory to the Windows user account
that runs <productname>PostgreSQL</productname>.

+ ereport(elevel,
+ (errmsg("could not enable Lock pages in memory user right: error
code %lu", GetLastError()),
+ errdetail("Failed system call was OpenProcessToken.")));

Same comment about capitalisation of the privilege name in this and
other error messages.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Thomas Munro (#13)
Re: Supporting huge pages on Windows

From: Thomas Munro [mailto:thomas.munro@enterprisedb.com]

In the Microsoft documentation I've seen, the privilege's name is always
written as "Lock Pages in Memory" (note: "Pages" plural, and with initial
capital letters). It's quite hard to parse the sentence otherwise! How
about this?

Huge pages are known as large pages on Windows. To use them, you
need to
assign the user right Lock Pages in Memory to the Windows user
account
that runs <productname>PostgreSQL</productname>.

+ ereport(elevel,
+ (errmsg("could not enable Lock pages in memory user right: error
code %lu", GetLastError()),
+ errdetail("Failed system call was OpenProcessToken.")));

Same comment about capitalisation of the privilege name in this and other
error messages.

Thanks, your sentences are surely better. Fixed.

Regards
Takayuki Tsunakawa

Attachments:

win_large_pages_v5.patchapplication/octet-stream; name=win_large_pages_v5.patchDownload+103-16
#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Tsunakawa, Takayuki (#12)
Re: Supporting huge pages on Windows

On Thu, Jan 5, 2017 at 8:42 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Your version of the patch looks better than the previous one. Don't you
need to consider MEM_LARGE_PAGES in VirtualAllocEx call (refer
pgwin32_ReserveSharedMemoryRegion)? At least that is what is mentioned
in MSDN [1]. Another point worth considering is that currently for
VirtualAllocEx() we use PAGE_READWRITE as flProtect value, shouldn't it
be same as used in CreateFileMapping() by patch.

[1] -
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs
.85).aspx

No, it's not necessary. Please see PGSharedMemoryReAttach(), where VirtualFree() is called to free the reserved address space and then call MapViewOfFile() to allocate the already created shared memory to that area.

PGSharedMemoryReAttach is called after the startup of new process
whereas pgwin32_ReserveSharedMemoryRegion is called before the new
process could actually start. Basically,
pgwin32_ReserveSharedMemoryRegion is used to reserve shared memory for
each backend, so calling VirtualAlloc there should follow spec for
huge pages. If you have some reason for not using, then it is not
clear from your reply, can you try to explain in detail.

I have tried to test v4 version of the patch and it is always failing
in below error after call to AdjustTokenPrivileges:

+ if (GetLastError() != ERROR_SUCCESS)
+ {
+ if (GetLastError() == ERROR_NOT_ALL_ASSIGNED)
+ ereport(elevel,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("could not enable Lock pages in memory user right"),
+ errhint("Assign Lock pages in memory user right to the Windows user
account which runs PostgreSQL.")));

I have ensured that user used to run PostgreSQL has "Lock pages in
memory" privilege/rights. I have followed msdn tips [1]https://msdn.microsoft.com/en-us/library/windows/desktop/ff961911(v=vs.85).aspx to do that
(restarted the m/c after assigning privilege). I am using Win7. Can
you share the steps you have followed to test and your windows m/c
details?

+ if (!LookupPrivilegeValue(NULL, SE_LOCK_MEMORY_NAME, &luid))
+ {
+ CloseHandle(hToken);
+ ereport(elevel,
+ (errmsg("could not enable Lock pages in memory user right: error
code %lu", GetLastError()),
+ errdetail("Failed system call was LookupPrivilegeValue.")));
+ return FALSE;
+ }

The order of closing handle and ereport is different here than other
places in the same function. If there is no specific reason for doing
so, then keep the order consistent.

[1]: https://msdn.microsoft.com/en-us/library/windows/desktop/ff961911(v=vs.85).aspx

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Kapila (#15)
Re: Supporting huge pages on Windows

Hello, Amit, Magnus,

I'm sorry for my late reply. Yesterday was a national holiday in Japan.

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Amit Kapila
PGSharedMemoryReAttach is called after the startup of new process whereas
pgwin32_ReserveSharedMemoryRegion is called before the new process could
actually start. Basically, pgwin32_ReserveSharedMemoryRegion is used to
reserve shared memory for each backend, so calling VirtualAlloc there should
follow spec for huge pages. If you have some reason for not using, then
it is not clear from your reply, can you try to explain in detail.

OK. The processing takes place in the following order:

1. postmaster calls CreateProcess() to start a child postgres in a suspended state.
2. postmaster calls VirtualAlloc(MEM_RESERVE) in pgwin32_ReserveSharedMemoryRegion() to reserve the virtual address space in the child to secure space for the shared memory. This call just affects the virtual address space and does not allocate physical memory. So the large page is still irrelevant.
3. postmaster resumes execution of the child postgres.
4. The child postgres calls VirtualFree(MEM_RESERVE) in PGSharedMemoryReAttach() to release the reserved virtual address space. Here, the effect of VirtualAlloc() is invalidated anyway.
5. The child process calls MapViewOfFile() in PGSharedMemoryReAttach() to map the shared memory at the same address. Hereafter, the large page option specified in CreateFileMapping() call is relevant.

+ if (!LookupPrivilegeValue(NULL, SE_LOCK_MEMORY_NAME, &luid)) {
+ CloseHandle(hToken); ereport(elevel, (errmsg("could not enable Lock
+ pages in memory user right: error
code %lu", GetLastError()),
+ errdetail("Failed system call was LookupPrivilegeValue."))); return
+ FALSE; }

The order of closing handle and ereport is different here than other places
in the same function. If there is no specific reason for doing so, then
keep the order consistent.

You are right, I modified the patch to call CloseHandle() after ereport() so that ereport() CloseHandle() wouldn't change the error code for ereport(). That's the order used in postmaster.c.

I have tried to test v4 version of the patch and it is always failing in
below error after call to AdjustTokenPrivileges:

+ if (GetLastError() != ERROR_SUCCESS)
+ {
+ if (GetLastError() == ERROR_NOT_ALL_ASSIGNED) ereport(elevel,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("could not enable Lock pages in memory user right"),
+ errhint("Assign Lock pages in memory user right to the Windows user
account which runs PostgreSQL.")));

I have ensured that user used to run PostgreSQL has "Lock pages in memory"
privilege/rights. I have followed msdn tips [1] to do that (restarted the
m/c after assigning privilege). I am using Win7. Can you share the steps
you have followed to test and your windows m/c details?

[1] -
https://msdn.microsoft.com/en-us/library/windows/desktop/ff961911(v=vs
.85).aspx

I succeeded by following the same procedure using secpol.msc on Win10, running 64-bit PostgreSQL. I started PostgreSQL as a Windows service because it's the normal way, with the service account being a regular Windows user account(i.e. not LocalSystem).

But... I failed to start PostgreSQL by running "pg_ctl start" from a command prompt, receiving the same error message as you. On the other hand, I could start PostgreSQL on a command prompt with administrative privileges (right-click on "Command prompt" from the Start menu and select "Run as an administrator" in the menu. It seems that Windows removes many privileges, including "Lock Pages in Memory", when starting the normal command prompt. As its evidence, you can use the attached priv.c to see what privileges are assigned and and enabled/disabled. Build it like "cl priv.c" and just run priv.exe on each command prompt. Those runs show different privileges.

Should I need to do something, e.g. explain in the document that the user should use the command prompt with administrative privileges when he uses huge_pages?

Regards
Takayuki Tsunakawa

Attachments:

win_large_pages_v6.patchapplication/octet-stream; name=win_large_pages_v6.patchDownload+103-16
priv.ctext/plain; name=priv.cDownload
#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Tsunakawa, Takayuki (#16)
Re: Supporting huge pages on Windows

On Tue, Jan 10, 2017 at 8:49 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Hello, Amit, Magnus,

I'm sorry for my late reply. Yesterday was a national holiday in Japan.

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Amit Kapila
PGSharedMemoryReAttach is called after the startup of new process whereas
pgwin32_ReserveSharedMemoryRegion is called before the new process could
actually start. Basically, pgwin32_ReserveSharedMemoryRegion is used to
reserve shared memory for each backend, so calling VirtualAlloc there should
follow spec for huge pages. If you have some reason for not using, then
it is not clear from your reply, can you try to explain in detail.

OK. The processing takes place in the following order:

1. postmaster calls CreateProcess() to start a child postgres in a suspended state.
2. postmaster calls VirtualAlloc(MEM_RESERVE) in pgwin32_ReserveSharedMemoryRegion() to reserve the virtual address space in the child to secure space for the shared memory. This call just affects the virtual address space and does not allocate physical memory. So the large page is still irrelevant.

Okay, today again reading VirtualAlloc specs, I could see that
MEM_LARGE_PAGES is not not required to reserve the memory region. It
is only required during allocation.

I succeeded by following the same procedure using secpol.msc on Win10, running 64-bit PostgreSQL. I started PostgreSQL as a Windows service because it's the normal way, with the service account being a regular Windows user account(i.e. not LocalSystem).

But... I failed to start PostgreSQL by running "pg_ctl start" from a command prompt, receiving the same error message as you. On the other hand, I could start PostgreSQL on a command prompt with administrative privileges (right-click on "Command prompt" from the Start menu and select "Run as an administrator" in the menu.

Hmm. It doesn't work even on a command prompt with administrative
privileges. It gives below error:

waiting for server to start....2017-01-17 11:20:13.780 IST [4788] FATAL: could
not create shared memory segment: error code 1450
2017-01-17 11:20:13.780 IST [4788] DETAIL: Failed system call was CreateFileMap
ping(size=148897792, name=Global/PostgreSQL:E:/WorkSpace/PostgreSQL/master/Data)
.
2017-01-17 11:20:13.780 IST [4788] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

Now, error code 1450 can occur due to insufficient system resources,
so I have tried by increasing the size of shared memory (higher value
of shared_buffers) without your patch and it works. This indicates
some problem with the patch.

It seems that Windows removes many privileges, including "Lock Pages in Memory", when starting the normal command prompt. As its evidence, you can use the attached priv.c to see what privileges are assigned and and enabled/disabled. Build it like "cl priv.c" and just run priv.exe on each command prompt. Those runs show different privileges.

This is bad.

Should I need to do something, e.g. explain in the document that the user should use the command prompt with administrative privileges when he uses huge_pages?

I think it is better to document in some way if we decide to go-ahead
with the patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Kapila (#17)
Re: Supporting huge pages on Windows

From: Amit Kapila [mailto:amit.kapila16@gmail.com]

Hmm. It doesn't work even on a command prompt with administrative
privileges. It gives below error:

waiting for server to start....2017-01-17 11:20:13.780 IST [4788] FATAL:
could not create shared memory segment: error code 1450
2017-01-17 11:20:13.780 IST [4788] DETAIL: Failed system call was
CreateFileMap ping(size=148897792,
name=Global/PostgreSQL:E:/WorkSpace/PostgreSQL/master/Data)
.
2017-01-17 11:20:13.780 IST [4788] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

Now, error code 1450 can occur due to insufficient system resources, so
I have tried by increasing the size of shared memory (higher value of
shared_buffers) without your patch and it works. This indicates some
problem with the patch.

Hmm, the large-page requires contiguous memory for each page, so this error could occur on a long-running system where the memory is heavily fragmented. For example, please see the following page and check the memory with RAMMap program referred there.

http://blog.dbi-services.com/large-pages-and-memory_target-on-windows/

BTW, is your OS or PostgreSQL 32-bit?

It seems that Windows removes many privileges, including "Lock Pages

in Memory", when starting the normal command prompt. As its evidence, you
can use the attached priv.c to see what privileges are assigned and and
enabled/disabled. Build it like "cl priv.c" and just run priv.exe on each
command prompt. Those runs show different privileges.

This is bad.

Should I need to do something, e.g. explain in the document that the user

should use the command prompt with administrative privileges when he uses
huge_pages?

I think it is better to document in some way if we decide to go-ahead with
the patch.

Sure, I added these sentences.

+        To start the database server on the command prompt as a standalone process,
+        not as a Windows service, run the command prompt as an administrator or
+        disable the User Access Control (UAC). When the UAC is enabled, the normal
+        command prompt revokes the user right Lock Pages in Memory.

Regards
Takayuki Tsunakawa

Attachments:

win_large_pages_v7.patchapplication/octet-stream; name=win_large_pages_v7.patchDownload+108-16
#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Tsunakawa, Takayuki (#18)
Re: Supporting huge pages on Windows

On Mon, Jan 30, 2017 at 7:16 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

From: Amit Kapila [mailto:amit.kapila16@gmail.com]

Hmm. It doesn't work even on a command prompt with administrative
privileges. It gives below error:

waiting for server to start....2017-01-17 11:20:13.780 IST [4788] FATAL:
could not create shared memory segment: error code 1450
2017-01-17 11:20:13.780 IST [4788] DETAIL: Failed system call was
CreateFileMap ping(size=148897792,
name=Global/PostgreSQL:E:/WorkSpace/PostgreSQL/master/Data)
.
2017-01-17 11:20:13.780 IST [4788] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.

Now, error code 1450 can occur due to insufficient system resources, so
I have tried by increasing the size of shared memory (higher value of
shared_buffers) without your patch and it works. This indicates some
problem with the patch.

Hmm, the large-page requires contiguous memory for each page, so this error could occur on a long-running system where the memory is heavily fragmented. For example, please see the following page and check the memory with RAMMap program referred there.

I don't have RAMMap and it might take some time to investigate what is
going on, but I think in such a case even if it works we should keep
the default value of huge_pages as off on Windows. I request somebody
else having access to Windows m/c to test this patch and if it works
then we can move forward.

http://blog.dbi-services.com/large-pages-and-memory_target-on-windows/

BTW, is your OS or PostgreSQL 32-bit?

both 64-bit.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Magnus Hagander (#10)
Re: Supporting huge pages on Windows

Magnus Hagander wrote:

Taking a look at the actual code here, and a few small nitpicks:

+ errdetail("Failed system call was %s,
error code %lu", "LookupPrivilegeValue", GetLastError())));

this seems to be a new pattern of code -- for other similar cases it just
writes LookupPrivilegeValue inside the patch itself. I'm guessing the idea
was for translatability, but I think it's better we stick to the existing
pattern.

There are two reasons for doing things this way. One is that you reduce
the chances of a translator making a mistake with the function name (say
just a typo, or in egregious cases they may even translate the function
name). The other is that if you have many of these messages, you only
translate the generic part once instead of having the same message
a handful of times, exactly identical but for the function name.

So please do apply that kind of pattern wherever possible. We already
have the proposed error message, twice. No need for two more
occurrences of it.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Michael Paquier
michael@paquier.xyz
In reply to: Tsunakawa, Takayuki (#18)
#22Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Kapila (#19)
#23Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Tsunakawa, Takayuki (#22)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Tsunakawa, Takayuki (#22)
#25Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Thomas Munro (#24)
#26David Steele
david@pgmasters.net
In reply to: Tsunakawa, Takayuki (#25)
#27Amit Kapila
amit.kapila16@gmail.com
In reply to: David Steele (#26)
#28David Steele
david@pgmasters.net
In reply to: Amit Kapila (#27)
#29Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: David Steele (#28)
#30David Steele
david@pgmasters.net
In reply to: Ashutosh Sharma (#29)
#31Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: David Steele (#30)
#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Tsunakawa, Takayuki (#25)
#33Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Kapila (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Tsunakawa, Takayuki (#33)
#35Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Amit Kapila (#34)
#36Michael Paquier
michael@paquier.xyz
In reply to: Tsunakawa, Takayuki (#35)
#37Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Michael Paquier (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Tsunakawa, Takayuki (#37)
#39Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Andres Freund (#38)
#40Andres Freund
andres@anarazel.de
In reply to: Tsunakawa, Takayuki (#12)
#41Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#40)
#42Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Craig Ringer (#41)
#43Craig Ringer
craig@2ndquadrant.com
In reply to: Tsunakawa, Takayuki (#42)
#44Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Craig Ringer (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Tsunakawa, Takayuki (#44)
#46Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Andres Freund (#45)
#47Magnus Hagander
magnus@hagander.net
In reply to: Tsunakawa, Takayuki (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#47)
#49Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Magnus Hagander (#47)
#50Thomas Munro
thomas.munro@gmail.com
In reply to: Tsunakawa, Takayuki (#49)
#51Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#50)
#52Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Thomas Munro (#51)
#53Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Tsunakawa, Takayuki (#52)
#54Magnus Hagander
magnus@hagander.net
In reply to: Tsunakawa, Takayuki (#52)
#55Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Ashutosh Sharma (#53)
#56Thomas Munro
thomas.munro@gmail.com
In reply to: Magnus Hagander (#54)
#57Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Thomas Munro (#56)
#58Michael Paquier
michael@paquier.xyz
In reply to: Tsunakawa, Takayuki (#57)
#59Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#58)
#60Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#59)
#61Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#60)
#62Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#61)
#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#62)
#64Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#63)
#65Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#60)
#66Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#65)
#67Thomas Munro
thomas.munro@gmail.com
In reply to: Magnus Hagander (#62)
#68Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#66)
#69Magnus Hagander
magnus@hagander.net
In reply to: Thomas Munro (#67)
#70Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#68)