Large writable variables

Started by Andres Freundabout 7 years ago51 messages
#1Andres Freund
andres@anarazel.de

Hi,

while briefly thinking about the per-process overhead of postgres, I
looked at the section sizes of a modern postgres. In particular which
memory areas are *not* shared between processes.

If you look at the section sizes that are mapped read-write:

$ size --format=sysv src/backend/postgres
src/backend/postgres :
section size addr
.rodata 1593617 5730304 (read-only, for reference)
.data.rel.ro 134944 8039904 (read only after start)
.data 53888 8178912 (read-write, initialized)
.bss 510416 8232800 (read-write, uninitialized)
Total 52417197

So we have 500kb of not-initialized memory mapped into every
process. That's, uh, not nothing.

top unitialized allocations:
$ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[bB]\b'|head
0000000008251872 0000000000131144 b LagTracker
0000000008585088 0000000000131104 b hist_entries
0000000008435040 0000000000085280 b DCHCache
0000000008391168 0000000000043840 b NUMCache
0000000008560224 0000000000023440 b tzdefrules_s
0000000008536704 0000000000023440 b gmtmem.7009
0000000008716192 0000000000016384 b hist_start
0000000008238336 0000000000008192 b PqRecvBuffer
0000000008734208 0000000000005136 B BackendWritebackContext
0000000008386368 0000000000003200 b held_lwlocks

So we have a two variables sized 130kb. Yikes.

struct {
XLogRecPtr last_lsn; /* 0 8 */
WalTimeSample buffer[8192]; /* 8 131072 */
/* --- cacheline 2048 boundary (131072 bytes) was 8 bytes ago --- */
int write_head; /* 131080 4 */
int read_heads[3]; /* 131084 12 */
WalTimeSample last_read[3]; /* 131096 48 */
/* --- cacheline 2049 boundary (131136 bytes) was 8 bytes ago --- */

/* size: 131144, cachelines: 2050, members: 5 */
/* last cacheline: 8 bytes */
};

that's not actually used very often, nor in all processes... Thomas?

hist_entries is pg_lzcompress; DCHCache,NUMCache are formatting.c.

top itialized allocations:
$ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[dD]\b'|head
0000000008086944 0000000000087904 D fmgr_builtins
0000000008201120 0000000000017280 d ConfigureNamesInt
0000000008218400 0000000000013680 d ConfigureNamesBool
0000000008189248 0000000000008512 d ConfigureNamesString
0000000008077344 0000000000007040 D ScanKeywords
0000000008184928 0000000000004320 d ConfigureNamesEnum
0000000008197760 0000000000003360 d ConfigureNamesReal
0000000008062976 0000000000002304 d DCH_keywords
0000000008069952 0000000000002016 D pg_wchar_table
0000000008075552 0000000000001776 d encoding_match_list

fmgr_builtins isn't readonly even though declared a const - I assume
because it's full of addresses that will be mapped differently from
execution to execution.

ConfigureNames* isn't marked as const, because we update them:
/* Rather than requiring vartype to be filled in by hand, do this: */
conf->gen.vartype = PGC_BOOL;

I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly
section.

Greetings,

Andres Freund

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: Large writable variables

Hi,

On 2018-10-15 13:07:54 -0700, Andres Freund wrote:

top itialized allocations:
$ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[dD]\b'|head
0000000008086944 0000000000087904 D fmgr_builtins
0000000008201120 0000000000017280 d ConfigureNamesInt
0000000008218400 0000000000013680 d ConfigureNamesBool
0000000008189248 0000000000008512 d ConfigureNamesString
0000000008077344 0000000000007040 D ScanKeywords
0000000008184928 0000000000004320 d ConfigureNamesEnum
0000000008197760 0000000000003360 d ConfigureNamesReal
0000000008062976 0000000000002304 d DCH_keywords
0000000008069952 0000000000002016 D pg_wchar_table
0000000008075552 0000000000001776 d encoding_match_list

fmgr_builtins isn't readonly even though declared a const - I assume
because it's full of addresses that will be mapped differently from
execution to execution.

ConfigureNames* isn't marked as const, because we update them:
/* Rather than requiring vartype to be filled in by hand, do this: */
conf->gen.vartype = PGC_BOOL;

I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly
section.

It's because they contain pointers to strings, which are affected by
relocations (and position independent executables force everything to be
relocatable). They do go into .data.rel.ro however:

$ objdump -t ~/build/postgres/dev-optimize/vpath/src/backend/postgres|grep -E '\b(ScanKeywords|fmgr_builtins|DCH_keywords|pg_wchar_table)\b'
00000000007b0800 l O .data.rel.ro 0000000000000900 DCH_keywords
00000000007b4020 g O .data.rel.ro 0000000000001b80 ScanKeywords
00000000007b65a0 g O .data.rel.ro 0000000000015760 fmgr_builtins
00000000007b2340 g O .data.rel.ro 00000000000007e0 pg_wchar_table

as long as we're using forking rather than EXEC_BACKEND, that's
perfectly fine. I don't quite know how windows handles any of this, so
I can't say whether it's a problem there.

Greetings,

Andres Freund

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

So we have 500kb of not-initialized memory mapped into every
process. That's, uh, not nothing.

Bleah.

0000000008585088 0000000000131104 b hist_entries
0000000008716192 0000000000016384 b hist_start

I'm unsure what fraction of processes would have use for these.

0000000008435040 0000000000085280 b DCHCache
0000000008391168 0000000000043840 b NUMCache

We could surely allocate these on first use.

0000000008560224 0000000000023440 b tzdefrules_s
0000000008536704 0000000000023440 b gmtmem.7009

I think that tzdefrules_s is not used in common cases (though I could be
wrong about that), so we could win by alloc-on-first-use. The same might
be true for gmtmem, but there's a sticking point: there is no provision
for failure there, so I'm unsure how we avoid crashing on OOM.

0000000008238336 0000000000008192 b PqRecvBuffer
0000000008734208 0000000000005136 B BackendWritebackContext
0000000008386368 0000000000003200 b held_lwlocks

These are below my personal threshold of pain.

fmgr_builtins isn't readonly even though declared a const - I assume
because it's full of addresses that will be mapped differently from
execution to execution.

Check.

I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly
section.

I think it's the same problem: pointers can't be truly const because
they have to be changed if one relocates the executable.

We could possibly fix these by changing the data structure so that
what's in a ScanKeywords entry is an offset into some giant string
constant somewhere. No idea how that would affect performance, but
I do notice that we could reduce the sizeof(ScanKeyword), which can't
hurt.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Large writable variables

Hi,

On 2018-10-15 16:36:26 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

So we have 500kb of not-initialized memory mapped into every
process. That's, uh, not nothing.

Bleah.

Yea...

0000000008585088 0000000000131104 b hist_entries
0000000008716192 0000000000016384 b hist_start

I'm unsure what fraction of processes would have use for these.

Yea, I'm not sure either. Although I suspect that given the cost of
compression having an "allocate on first use" check should be quite
doable.

0000000008435040 0000000000085280 b DCHCache
0000000008391168 0000000000043840 b NUMCache

We could surely allocate these on first use.

yep.

0000000008560224 0000000000023440 b tzdefrules_s
0000000008536704 0000000000023440 b gmtmem.7009

I think that tzdefrules_s is not used in common cases (though I could be
wrong about that), so we could win by alloc-on-first-use. The same might
be true for gmtmem, but there's a sticking point: there is no provision
for failure there, so I'm unsure how we avoid crashing on OOM.

I guess we could return false / NULL to the caller. Not perfect, but
there's not that many callers. We could wrap them in wrappers that throw
errors...

0000000008238336 0000000000008192 b PqRecvBuffer
0000000008734208 0000000000005136 B BackendWritebackContext
0000000008386368 0000000000003200 b held_lwlocks

These are below my personal threshold of pain.

Yea, agreed. PqRecvBuffer and held_lwlocks are common enough that it
makes sense to pre-allocate them anyway. I guess you could argue that
BackendWritebackContext should be dynamically allocated.

I'm unclear as to why ScanKeywords, DCH_keywords aren't in a readonly
section.

I think it's the same problem: pointers can't be truly const because
they have to be changed if one relocates the executable.

Right. It's, as I noticed when looking via objdupm, in .data.rel.ro, so
I think it's not that bad.

We could possibly fix these by changing the data structure so that
what's in a ScanKeywords entry is an offset into some giant string
constant somewhere. No idea how that would affect performance, but
I do notice that we could reduce the sizeof(ScanKeyword), which can't
hurt.

Yea, that might even help performancewise. Alternatively we could change
ScanKeyword to store the keyword name inline, but that'd be a measurable
size increase...

Greetings,

Andres Freund

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

On 2018-10-15 16:36:26 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

0000000008585088 0000000000131104 b hist_entries
0000000008716192 0000000000016384 b hist_start

I'm unsure what fraction of processes would have use for these.

Yea, I'm not sure either. Although I suspect that given the cost of
compression having an "allocate on first use" check should be quite
doable.

I don't think the extra check would be a problem, but if we end up
needing the space in most processes, we're not really buying anything.
It'd need some investigation.

We could possibly fix these by changing the data structure so that
what's in a ScanKeywords entry is an offset into some giant string
constant somewhere. No idea how that would affect performance, but
I do notice that we could reduce the sizeof(ScanKeyword), which can't
hurt.

Yea, that might even help performancewise. Alternatively we could change
ScanKeyword to store the keyword name inline, but that'd be a measurable
size increase...

Yeah. It also seems like doing it this way would improve locality of
access: the pieces of the giant string would presumably be in the same
order as the ScanKeywords entries, whereas with the current setup,
who knows where the compiler has put 'em or in what order.

We'd need some tooling to generate the constants that way, though;
I can't see how to make it directly from kwlist.h.

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: Large writable variables

Hi,

On 2018-10-15 16:54:53 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-10-15 16:36:26 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

0000000008585088 0000000000131104 b hist_entries
0000000008716192 0000000000016384 b hist_start

I'm unsure what fraction of processes would have use for these.

Yea, I'm not sure either. Although I suspect that given the cost of
compression having an "allocate on first use" check should be quite
doable.

I don't think the extra check would be a problem, but if we end up
needing the space in most processes, we're not really buying anything.
It'd need some investigation.

I don't think it's particularly uncommon to have processes that don't
use any toasted datums. I'm not sure however how to put numbers on
that? After all, it'll be drastically workload dependant.

We could possibly fix these by changing the data structure so that
what's in a ScanKeywords entry is an offset into some giant string
constant somewhere. No idea how that would affect performance, but
I do notice that we could reduce the sizeof(ScanKeyword), which can't
hurt.

Yea, that might even help performancewise. Alternatively we could change
ScanKeyword to store the keyword name inline, but that'd be a measurable
size increase...

Yeah. It also seems like doing it this way would improve locality of
access: the pieces of the giant string would presumably be in the same
order as the ScanKeywords entries, whereas with the current setup,
who knows where the compiler has put 'em or in what order.

I assume you're talking about the offset approach. Performancewise I
assume that my suggestion of inlining the names into the struct would be
faster. Are there many realistic cases where performance matters enough
to warrant the size increase?

We'd need some tooling to generate the constants that way, though;
I can't see how to make it directly from kwlist.h.

I guess we could create a struct with all the strings as members. But
that seems fairly nasty.

Greetings,

Andres Freund

#7Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Andres Freund (#1)
1 attachment(s)
Re: Large writable variables

On Tue, Oct 16, 2018 at 9:07 AM Andres Freund <andres@anarazel.de> wrote:

$ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[bB]\b'|head
0000000008251872 0000000000131144 b LagTracker

...

So we have a two variables sized 130kb. Yikes.

...

that's not actually used very often, nor in all processes... Thomas?

Yeah, here's a patch to move it in to the heap.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Move-the-replication-lag-tracker-into-heap-memory.patchapplication/octet-stream; name=0001-Move-the-replication-lag-tracker-into-heap-memory.patchDownload
From f3fb64e67c1e86e11dfafc8a712e9750f650f60b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Tue, 16 Oct 2018 10:08:57 +1300
Subject: [PATCH] Move the replication lag tracker into heap memory.

Andres Freund complained about the 128KB of .bss occupied by LagTracker.
It's only needed in the walsender process, so allocate it in heap
memory there.

Author: Thomas Munro
Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya%40alap3.anarazel.de
---
 src/backend/replication/walsender.c | 56 +++++++++++++++--------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 370429d746c..2683385ca6e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -211,7 +211,7 @@ typedef struct
 #define LAG_TRACKER_BUFFER_SIZE 8192
 
 /* A mechanism for tracking replication lag. */
-static struct
+typedef struct
 {
 	XLogRecPtr	last_lsn;
 	WalTimeSample buffer[LAG_TRACKER_BUFFER_SIZE];
@@ -220,6 +220,8 @@ static struct
 	WalTimeSample last_read[NUM_SYNC_REP_WAIT_MODE];
 }			LagTracker;
 
+static LagTracker *lag_tracker;
+
 /* Signal handlers */
 static void WalSndLastCycleHandler(SIGNAL_ARGS);
 
@@ -282,7 +284,7 @@ InitWalSender(void)
 	SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
 
 	/* Initialize empty timestamp buffer for lag tracking. */
-	memset(&LagTracker, 0, sizeof(LagTracker));
+	lag_tracker = MemoryContextAllocZero(TopMemoryContext, sizeof(LagTracker));
 }
 
 /*
@@ -3439,9 +3441,9 @@ LagTrackerWrite(XLogRecPtr lsn, TimestampTz local_flush_time)
 	 * If the lsn hasn't advanced since last time, then do nothing.  This way
 	 * we only record a new sample when new WAL has been written.
 	 */
-	if (LagTracker.last_lsn == lsn)
+	if (lag_tracker->last_lsn == lsn)
 		return;
-	LagTracker.last_lsn = lsn;
+	lag_tracker->last_lsn = lsn;
 
 	/*
 	 * If advancing the write head of the circular buffer would crash into any
@@ -3449,11 +3451,11 @@ LagTrackerWrite(XLogRecPtr lsn, TimestampTz local_flush_time)
 	 * slowest reader (presumably apply) is the one that controls the release
 	 * of space.
 	 */
-	new_write_head = (LagTracker.write_head + 1) % LAG_TRACKER_BUFFER_SIZE;
+	new_write_head = (lag_tracker->write_head + 1) % LAG_TRACKER_BUFFER_SIZE;
 	buffer_full = false;
 	for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; ++i)
 	{
-		if (new_write_head == LagTracker.read_heads[i])
+		if (new_write_head == lag_tracker->read_heads[i])
 			buffer_full = true;
 	}
 
@@ -3464,17 +3466,17 @@ LagTrackerWrite(XLogRecPtr lsn, TimestampTz local_flush_time)
 	 */
 	if (buffer_full)
 	{
-		new_write_head = LagTracker.write_head;
-		if (LagTracker.write_head > 0)
-			LagTracker.write_head--;
+		new_write_head = lag_tracker->write_head;
+		if (lag_tracker->write_head > 0)
+			lag_tracker->write_head--;
 		else
-			LagTracker.write_head = LAG_TRACKER_BUFFER_SIZE - 1;
+			lag_tracker->write_head = LAG_TRACKER_BUFFER_SIZE - 1;
 	}
 
 	/* Store a sample at the current write head position. */
-	LagTracker.buffer[LagTracker.write_head].lsn = lsn;
-	LagTracker.buffer[LagTracker.write_head].time = local_flush_time;
-	LagTracker.write_head = new_write_head;
+	lag_tracker->buffer[lag_tracker->write_head].lsn = lsn;
+	lag_tracker->buffer[lag_tracker->write_head].time = local_flush_time;
+	lag_tracker->write_head = new_write_head;
 }
 
 /*
@@ -3496,14 +3498,14 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
 	TimestampTz time = 0;
 
 	/* Read all unread samples up to this LSN or end of buffer. */
-	while (LagTracker.read_heads[head] != LagTracker.write_head &&
-		   LagTracker.buffer[LagTracker.read_heads[head]].lsn <= lsn)
+	while (lag_tracker->read_heads[head] != lag_tracker->write_head &&
+		   lag_tracker->buffer[lag_tracker->read_heads[head]].lsn <= lsn)
 	{
-		time = LagTracker.buffer[LagTracker.read_heads[head]].time;
-		LagTracker.last_read[head] =
-			LagTracker.buffer[LagTracker.read_heads[head]];
-		LagTracker.read_heads[head] =
-			(LagTracker.read_heads[head] + 1) % LAG_TRACKER_BUFFER_SIZE;
+		time = lag_tracker->buffer[lag_tracker->read_heads[head]].time;
+		lag_tracker->last_read[head] =
+			lag_tracker->buffer[lag_tracker->read_heads[head]];
+		lag_tracker->read_heads[head] =
+			(lag_tracker->read_heads[head] + 1) % LAG_TRACKER_BUFFER_SIZE;
 	}
 
 	/*
@@ -3513,8 +3515,8 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
 	 * interpolation at the beginning of the next burst of WAL after a period
 	 * of idleness.
 	 */
-	if (LagTracker.read_heads[head] == LagTracker.write_head)
-		LagTracker.last_read[head].time = 0;
+	if (lag_tracker->read_heads[head] == lag_tracker->write_head)
+		lag_tracker->last_read[head].time = 0;
 
 	if (time > now)
 	{
@@ -3532,17 +3534,17 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
 		 * eventually start moving again and cross one of our samples before
 		 * we can show the lag increasing.
 		 */
-		if (LagTracker.read_heads[head] == LagTracker.write_head)
+		if (lag_tracker->read_heads[head] == lag_tracker->write_head)
 		{
 			/* There are no future samples, so we can't interpolate. */
 			return -1;
 		}
-		else if (LagTracker.last_read[head].time != 0)
+		else if (lag_tracker->last_read[head].time != 0)
 		{
 			/* We can interpolate between last_read and the next sample. */
 			double		fraction;
-			WalTimeSample prev = LagTracker.last_read[head];
-			WalTimeSample next = LagTracker.buffer[LagTracker.read_heads[head]];
+			WalTimeSample prev = lag_tracker->last_read[head];
+			WalTimeSample next = lag_tracker->buffer[lag_tracker->read_heads[head]];
 
 			if (lsn < prev.lsn)
 			{
@@ -3579,7 +3581,7 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
 			 * standby reaches the future sample the best we can do is report
 			 * the hypothetical lag if that sample were to be replayed now.
 			 */
-			time = LagTracker.buffer[LagTracker.read_heads[head]].time;
+			time = lag_tracker->buffer[lag_tracker->read_heads[head]].time;
 		}
 	}
 
-- 
2.17.1 (Apple Git-112)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

On 2018-10-15 16:54:53 -0400, Tom Lane wrote:

Yeah. It also seems like doing it this way would improve locality of
access: the pieces of the giant string would presumably be in the same
order as the ScanKeywords entries, whereas with the current setup,
who knows where the compiler has put 'em or in what order.

I assume you're talking about the offset approach. Performancewise I
assume that my suggestion of inlining the names into the struct would be
faster. Are there many realistic cases where performance matters enough
to warrant the size increase?

Doubt it, because there'd be an awful lot of wasted space due to the need
to set the struct size large enough for the longest keyword. (Plus it
would likely not come out to be a power-of-2 size, slowing array
indexing.) If you want this to be cache-friendly, I'd think the smaller
the better.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#7)
Re: Large writable variables

On 2018-10-16 10:16:34 +1300, Thomas Munro wrote:

On Tue, Oct 16, 2018 at 9:07 AM Andres Freund <andres@anarazel.de> wrote:

$ nm -t d --size-sort -r -S src/backend/postgres|grep '\b[bB]\b'|head
0000000008251872 0000000000131144 b LagTracker

...

So we have a two variables sized 130kb. Yikes.

...

that's not actually used very often, nor in all processes... Thomas?

Yeah, here's a patch to move it in to the heap.

--
Thomas Munro
http://www.enterprisedb.com

From f3fb64e67c1e86e11dfafc8a712e9750f650f60b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Tue, 16 Oct 2018 10:08:57 +1300
Subject: [PATCH] Move the replication lag tracker into heap memory.

Andres Freund complained about the 128KB of .bss occupied by LagTracker.
It's only needed in the walsender process, so allocate it in heap
memory there.

Cool, let's do that.

Greetings,

Andres Freund

#10Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Andres Freund (#9)
Re: Large writable variables

On Tue, Oct 16, 2018 at 10:57 AM Andres Freund <andres@anarazel.de> wrote:

Subject: [PATCH] Move the replication lag tracker into heap memory.

Andres Freund complained about the 128KB of .bss occupied by LagTracker.
It's only needed in the walsender process, so allocate it in heap
memory there.

Cool, let's do that.

Pushed.

--
Thomas Munro
http://www.enterprisedb.com

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
1 attachment(s)
Re: Large writable variables

I wrote:

Andres Freund <andres@anarazel.de> writes:

0000000008560224 0000000000023440 b tzdefrules_s
0000000008536704 0000000000023440 b gmtmem.7009

I think that tzdefrules_s is not used in common cases (though I could be
wrong about that), so we could win by alloc-on-first-use. The same might
be true for gmtmem, but there's a sticking point: there is no provision
for failure there, so I'm unsure how we avoid crashing on OOM.

So my intuition was exactly backwards on this.

1. tzdefrules_s is filled in the postmaster, and if it's not filled there
it'd be filled by every child process, so there's zero point in converting
it to malloc-on-the-fly style. This is because tzparse() always wants
it filled. That's actually a tad annoying, because it looks to me like
with many timezone settings the data will not get used, but I'm hesitant
to mess with the IANA code's logic enough to improve that. Maybe I'll try
submitting an upstream patch and see what they think of it first.

2. gmtmem, on the other hand, is only used if somebody calls pg_gmtime,
which already has a perfectly good error-report convention, cf gmtime(3).
We have exactly one caller, which was not bothering to test for error :-(
and would dump core on failure. And that caller isn't reached in most
processes, at least not in the regression tests. So this side of it is
easy to improve.

Hence I propose the attached patch for point 2, which I'd want to
backpatch to keep our copies of the IANA code in sync. Point 1
maybe can be addressed in future.

regards, tom lane

Attachments:

allocate-gmtmem-only-when-needed.patchtext/x-diff; charset=us-ascii; name=allocate-gmtmem-only-when-needed.patchDownload
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index ba15a29..449164a 100644
*** a/src/backend/utils/adt/timestamp.c
--- b/src/backend/utils/adt/timestamp.c
*************** GetEpochTime(struct pg_tm *tm)
*** 2008,2013 ****
--- 2008,2016 ----
  
  	t0 = pg_gmtime(&epoch);
  
+ 	if (t0 == NULL)
+ 		elog(ERROR, "could not convert epoch to timestamp: %m");
+ 
  	tm->tm_year = t0->tm_year;
  	tm->tm_mon = t0->tm_mon;
  	tm->tm_mday = t0->tm_mday;
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index 31b06b0..04a1013 100644
*** a/src/timezone/localtime.c
--- b/src/timezone/localtime.c
*************** gmtsub(pg_time_t const *timep, int32 off
*** 1328,1340 ****
  	struct pg_tm *result;
  
  	/* GMT timezone state data is kept here */
! 	static struct state gmtmem;
! 	static bool gmt_is_set = false;
! #define gmtptr		(&gmtmem)
  
! 	if (!gmt_is_set)
  	{
! 		gmt_is_set = true;
  		gmtload(gmtptr);
  	}
  	result = timesub(timep, offset, gmtptr, tmp);
--- 1328,1341 ----
  	struct pg_tm *result;
  
  	/* GMT timezone state data is kept here */
! 	static struct state *gmtptr = NULL;
  
! 	if (gmtptr == NULL)
  	{
! 		/* Allocate on first use. */
! 		gmtptr = (struct state *) malloc(sizeof(struct state));
! 		if (gmtptr == NULL)
! 			return NULL;		/* errno should be set by malloc */
  		gmtload(gmtptr);
  	}
  	result = timesub(timep, offset, gmtptr, tmp);
#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#11)
Re: Large writable variables

Hi,

I've a patch making .data variables constant, fwiw. Will post in a
bit. Just so we don't duplicate work too much.

On 2018-10-15 19:41:49 -0400, Tom Lane wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

0000000008560224 0000000000023440 b tzdefrules_s
0000000008536704 0000000000023440 b gmtmem.7009

I think that tzdefrules_s is not used in common cases (though I could be
wrong about that), so we could win by alloc-on-first-use. The same might
be true for gmtmem, but there's a sticking point: there is no provision
for failure there, so I'm unsure how we avoid crashing on OOM.

So my intuition was exactly backwards on this.

1. tzdefrules_s is filled in the postmaster, and if it's not filled there
it'd be filled by every child process, so there's zero point in converting
it to malloc-on-the-fly style. This is because tzparse() always wants
it filled. That's actually a tad annoying, because it looks to me like
with many timezone settings the data will not get used, but I'm hesitant
to mess with the IANA code's logic enough to improve that. Maybe I'll try
submitting an upstream patch and see what they think of it first.

Ok, that makes sense.

2. gmtmem, on the other hand, is only used if somebody calls pg_gmtime,
which already has a perfectly good error-report convention, cf gmtime(3).
We have exactly one caller, which was not bothering to test for error :-(
and would dump core on failure. And that caller isn't reached in most
processes, at least not in the regression tests. So this side of it is
easy to improve.

Hence I propose the attached patch for point 2, which I'd want to
backpatch to keep our copies of the IANA code in sync.

That makes sense, too.

Greetings,

Andres Freund

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
1 attachment(s)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

top unitialized allocations:
0000000008435040 0000000000085280 b DCHCache
0000000008391168 0000000000043840 b NUMCache

Here's a patch to improve that situation.

regards, tom lane

Attachments:

alloc-formatting-cache-only-as-needed.patchtext/x-diff; charset=us-ascii; name=alloc-formatting-cache-only-as-needed.patchDownload
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b3ff133..43a0307 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -93,6 +93,7 @@
 #include "utils/float.h"
 #include "utils/formatting.h"
 #include "utils/int8.h"
+#include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
 
@@ -385,12 +386,12 @@ typedef struct
 } NUMCacheEntry;
 
 /* global cache for date/time format pictures */
-static DCHCacheEntry DCHCache[DCH_CACHE_ENTRIES];
+static DCHCacheEntry *DCHCache[DCH_CACHE_ENTRIES];
 static int	n_DCHCache = 0;		/* current number of entries */
 static int	DCHCounter = 0;		/* aging-event counter */
 
 /* global cache for number format pictures */
-static NUMCacheEntry NUMCache[NUM_CACHE_ENTRIES];
+static NUMCacheEntry *NUMCache[NUM_CACHE_ENTRIES];
 static int	n_NUMCache = 0;		/* current number of entries */
 static int	NUMCounter = 0;		/* aging-event counter */
 
@@ -3368,13 +3369,13 @@ DCH_cache_getnew(const char *str)
 {
 	DCHCacheEntry *ent;
 
-	/* counter overflow check - paranoia? */
+	/* handle counter overflow by resetting all ages */
 	if (DCHCounter >= (INT_MAX - DCH_CACHE_ENTRIES))
 	{
 		DCHCounter = 0;
 
-		for (ent = DCHCache; ent < (DCHCache + DCH_CACHE_ENTRIES); ent++)
-			ent->age = (++DCHCounter);
+		for (int i = 0; i < n_DCHCache; i++)
+			DCHCache[i]->age = (++DCHCounter);
 	}
 
 	/*
@@ -3382,15 +3383,16 @@ DCH_cache_getnew(const char *str)
 	 */
 	if (n_DCHCache >= DCH_CACHE_ENTRIES)
 	{
-		DCHCacheEntry *old = DCHCache + 0;
+		DCHCacheEntry *old = DCHCache[0];
 
 #ifdef DEBUG_TO_FROM_CHAR
 		elog(DEBUG_elog_output, "cache is full (%d)", n_DCHCache);
 #endif
 		if (old->valid)
 		{
-			for (ent = DCHCache + 1; ent < (DCHCache + DCH_CACHE_ENTRIES); ent++)
+			for (int i = 1; i < DCH_CACHE_ENTRIES; i++)
 			{
+				ent = DCHCache[i];
 				if (!ent->valid)
 				{
 					old = ent;
@@ -3414,7 +3416,9 @@ DCH_cache_getnew(const char *str)
 #ifdef DEBUG_TO_FROM_CHAR
 		elog(DEBUG_elog_output, "NEW (%d)", n_DCHCache);
 #endif
-		ent = DCHCache + n_DCHCache;
+		Assert(DCHCache[n_DCHCache] == NULL);
+		DCHCache[n_DCHCache] = ent = (DCHCacheEntry *)
+			MemoryContextAllocZero(TopMemoryContext, sizeof(DCHCacheEntry));
 		ent->valid = false;
 		StrNCpy(ent->str, str, DCH_CACHE_SIZE + 1);
 		ent->age = (++DCHCounter);
@@ -3428,20 +3432,19 @@ DCH_cache_getnew(const char *str)
 static DCHCacheEntry *
 DCH_cache_search(const char *str)
 {
-	int			i;
-	DCHCacheEntry *ent;
-
-	/* counter overflow check - paranoia? */
+	/* handle counter overflow by resetting all ages */
 	if (DCHCounter >= (INT_MAX - DCH_CACHE_ENTRIES))
 	{
 		DCHCounter = 0;
 
-		for (ent = DCHCache; ent < (DCHCache + DCH_CACHE_ENTRIES); ent++)
-			ent->age = (++DCHCounter);
+		for (int i = 0; i < n_DCHCache; i++)
+			DCHCache[i]->age = (++DCHCounter);
 	}
 
-	for (i = 0, ent = DCHCache; i < n_DCHCache; i++, ent++)
+	for (int i = 0; i < n_DCHCache; i++)
 	{
+		DCHCacheEntry *ent = DCHCache[i];
+
 		if (ent->valid && strcmp(ent->str, str) == 0)
 		{
 			ent->age = (++DCHCounter);
@@ -4047,13 +4050,13 @@ NUM_cache_getnew(const char *str)
 {
 	NUMCacheEntry *ent;
 
-	/* counter overflow check - paranoia? */
+	/* handle counter overflow by resetting all ages */
 	if (NUMCounter >= (INT_MAX - NUM_CACHE_ENTRIES))
 	{
 		NUMCounter = 0;
 
-		for (ent = NUMCache; ent < (NUMCache + NUM_CACHE_ENTRIES); ent++)
-			ent->age = (++NUMCounter);
+		for (int i = 0; i < n_NUMCache; i++)
+			NUMCache[i]->age = (++NUMCounter);
 	}
 
 	/*
@@ -4061,15 +4064,16 @@ NUM_cache_getnew(const char *str)
 	 */
 	if (n_NUMCache >= NUM_CACHE_ENTRIES)
 	{
-		NUMCacheEntry *old = NUMCache + 0;
+		NUMCacheEntry *old = NUMCache[0];
 
 #ifdef DEBUG_TO_FROM_CHAR
 		elog(DEBUG_elog_output, "Cache is full (%d)", n_NUMCache);
 #endif
 		if (old->valid)
 		{
-			for (ent = NUMCache + 1; ent < (NUMCache + NUM_CACHE_ENTRIES); ent++)
+			for (int i = 1; i < NUM_CACHE_ENTRIES; i++)
 			{
+				ent = NUMCache[i];
 				if (!ent->valid)
 				{
 					old = ent;
@@ -4093,7 +4097,9 @@ NUM_cache_getnew(const char *str)
 #ifdef DEBUG_TO_FROM_CHAR
 		elog(DEBUG_elog_output, "NEW (%d)", n_NUMCache);
 #endif
-		ent = NUMCache + n_NUMCache;
+		Assert(NUMCache[n_NUMCache] == NULL);
+		NUMCache[n_NUMCache] = ent = (NUMCacheEntry *)
+			MemoryContextAllocZero(TopMemoryContext, sizeof(NUMCacheEntry));
 		ent->valid = false;
 		StrNCpy(ent->str, str, NUM_CACHE_SIZE + 1);
 		ent->age = (++NUMCounter);
@@ -4107,20 +4113,19 @@ NUM_cache_getnew(const char *str)
 static NUMCacheEntry *
 NUM_cache_search(const char *str)
 {
-	int			i;
-	NUMCacheEntry *ent;
-
-	/* counter overflow check - paranoia? */
+	/* handle counter overflow by resetting all ages */
 	if (NUMCounter >= (INT_MAX - NUM_CACHE_ENTRIES))
 	{
 		NUMCounter = 0;
 
-		for (ent = NUMCache; ent < (NUMCache + NUM_CACHE_ENTRIES); ent++)
-			ent->age = (++NUMCounter);
+		for (int i = 0; i < n_NUMCache; i++)
+			NUMCache[i]->age = (++NUMCounter);
 	}
 
-	for (i = 0, ent = NUMCache; i < n_NUMCache; i++, ent++)
+	for (int i = 0; i < n_NUMCache; i++)
 	{
+		NUMCacheEntry *ent = NUMCache[i];
+
 		if (ent->valid && strcmp(ent->str, str) == 0)
 		{
 			ent->age = (++NUMCounter);
#14Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#12)
1 attachment(s)
Re: Large writable variables

Hi,

On 2018-10-15 16:45:03 -0700, Andres Freund wrote:

I've a patch making .data variables constant, fwiw. Will post in a
bit. Just so we don't duplicate work too much.

I pushed a few very obvious ones.

An additional fix, for the system attributes in heap.c, is attached. But
that's a bit more invasive, so I wanted to get some input.

I think all of the changes are fairly obvious and the new const
attributes just document what already had to be the case. But to
correctly propagate the const through typedef-the-pointer-away typedefs,
I had to use the underlying type (e.g. use const NameData *n2 instad of
Name n2). To me that's the correct fix, but then I hate these typedefs
with a passion anyway. An alternative fix is to add additional
typedefs like ConstName, but I don't find that particularly
advantageous.

Comments?

Arguably we should use consts more aggresively in signatures anyway, but
I think that's a separate effort.

Greetings,

Andres Freund

Attachments:

0001-Correct-constness-of-system-attributes-in-heap.c-pre.patchtext/x-diff; charset=us-asciiDownload
From 6db74bb80e13415467b1be6c340b0a06713f891b Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 15 Oct 2018 21:05:05 -0700
Subject: [PATCH] Correct constness of system attributes in heap.c &
 prerequisites.

This allows the compiler / linker to mark affected pages as read-only.

There's a fair number of pre-requisite changes, so the const can
properly be propagated. Most of these were already required for
correctness anyway.  Arguably we could be more aggressive in using
consts in related code, but..

Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de
---
 src/backend/catalog/heap.c             | 22 +++++++++++-----------
 src/backend/catalog/index.c            |  2 +-
 src/backend/executor/spi.c             |  4 ++--
 src/backend/optimizer/util/plancat.c   |  2 +-
 src/backend/parser/parse_relation.c    |  8 ++++----
 src/backend/parser/parse_utilcmd.c     |  2 +-
 src/backend/utils/adt/expandedrecord.c | 13 +++++++------
 src/backend/utils/adt/name.c           |  2 +-
 src/include/catalog/heap.h             |  4 ++--
 src/include/parser/parse_relation.h    |  2 +-
 src/include/utils/builtins.h           |  2 +-
 11 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4ad58689fc3..ea69aa7f12a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -144,7 +144,7 @@ static List *insert_ordered_unique_oid(List *list, Oid datum);
  * fixed-size portion of the structure anyway.
  */
 
-static FormData_pg_attribute a1 = {
+static const FormData_pg_attribute a1 = {
 	.attname = {"ctid"},
 	.atttypid = TIDOID,
 	.attlen = sizeof(ItemPointerData),
@@ -158,7 +158,7 @@ static FormData_pg_attribute a1 = {
 	.attislocal = true,
 };
 
-static FormData_pg_attribute a2 = {
+static const FormData_pg_attribute a2 = {
 	.attname = {"oid"},
 	.atttypid = OIDOID,
 	.attlen = sizeof(Oid),
@@ -172,7 +172,7 @@ static FormData_pg_attribute a2 = {
 	.attislocal = true,
 };
 
-static FormData_pg_attribute a3 = {
+static const FormData_pg_attribute a3 = {
 	.attname = {"xmin"},
 	.atttypid = XIDOID,
 	.attlen = sizeof(TransactionId),
@@ -186,7 +186,7 @@ static FormData_pg_attribute a3 = {
 	.attislocal = true,
 };
 
-static FormData_pg_attribute a4 = {
+static const FormData_pg_attribute a4 = {
 	.attname = {"cmin"},
 	.atttypid = CIDOID,
 	.attlen = sizeof(CommandId),
@@ -200,7 +200,7 @@ static FormData_pg_attribute a4 = {
 	.attislocal = true,
 };
 
-static FormData_pg_attribute a5 = {
+static const FormData_pg_attribute a5 = {
 	.attname = {"xmax"},
 	.atttypid = XIDOID,
 	.attlen = sizeof(TransactionId),
@@ -214,7 +214,7 @@ static FormData_pg_attribute a5 = {
 	.attislocal = true,
 };
 
-static FormData_pg_attribute a6 = {
+static const FormData_pg_attribute a6 = {
 	.attname = {"cmax"},
 	.atttypid = CIDOID,
 	.attlen = sizeof(CommandId),
@@ -234,7 +234,7 @@ static FormData_pg_attribute a6 = {
  * table of a particular class/type. In any case table is still the word
  * used in SQL.
  */
-static FormData_pg_attribute a7 = {
+static const FormData_pg_attribute a7 = {
 	.attname = {"tableoid"},
 	.atttypid = OIDOID,
 	.attlen = sizeof(Oid),
@@ -248,14 +248,14 @@ static FormData_pg_attribute a7 = {
 	.attislocal = true,
 };
 
-static const Form_pg_attribute SysAtt[] = {&a1, &a2, &a3, &a4, &a5, &a6, &a7};
+static const FormData_pg_attribute * SysAtt[] = {&a1, &a2, &a3, &a4, &a5, &a6, &a7};
 
 /*
  * This function returns a Form_pg_attribute pointer for a system attribute.
  * Note that we elog if the presented attno is invalid, which would only
  * happen if there's a problem upstream.
  */
-Form_pg_attribute
+const FormData_pg_attribute *
 SystemAttributeDefinition(AttrNumber attno, bool relhasoids)
 {
 	if (attno >= 0 || attno < -(int) lengthof(SysAtt))
@@ -269,14 +269,14 @@ SystemAttributeDefinition(AttrNumber attno, bool relhasoids)
  * If the given name is a system attribute name, return a Form_pg_attribute
  * pointer for a prototype definition.  If not, return NULL.
  */
-Form_pg_attribute
+const FormData_pg_attribute *
 SystemAttributeByName(const char *attname, bool relhasoids)
 {
 	int			j;
 
 	for (j = 0; j < (int) lengthof(SysAtt); j++)
 	{
-		Form_pg_attribute att = SysAtt[j];
+		const FormData_pg_attribute *att = SysAtt[j];
 
 		if (relhasoids || att->attnum != ObjectIdAttributeNumber)
 		{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 339bb35f9bc..f1ef4c319a0 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -352,7 +352,7 @@ ConstructTupleDescriptor(Relation heapRelation,
 		if (atnum != 0)
 		{
 			/* Simple index column */
-			Form_pg_attribute from;
+			const FormData_pg_attribute *from;
 
 			if (atnum < 0)
 			{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fb36e762f28..19212738568 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -899,7 +899,7 @@ int
 SPI_fnumber(TupleDesc tupdesc, const char *fname)
 {
 	int			res;
-	Form_pg_attribute sysatt;
+	const FormData_pg_attribute *sysatt;
 
 	for (res = 0; res < tupdesc->natts; res++)
 	{
@@ -921,7 +921,7 @@ SPI_fnumber(TupleDesc tupdesc, const char *fname)
 char *
 SPI_fname(TupleDesc tupdesc, int fnumber)
 {
-	Form_pg_attribute att;
+	const FormData_pg_attribute *att;
 
 	SPI_result = 0;
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8369e3ad62d..46de00460d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1692,7 +1692,7 @@ build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 		if (indexkey != 0)
 		{
 			/* simple column */
-			Form_pg_attribute att_tup;
+			const FormData_pg_attribute *att_tup;
 
 			if (indexkey < 0)
 				att_tup = SystemAttributeDefinition(indexkey,
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index f115ed863af..ac374d1d894 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -3135,7 +3135,7 @@ attnameAttNum(Relation rd, const char *attname, bool sysColOK)
 static int
 specialAttNum(const char *attname)
 {
-	Form_pg_attribute sysatt;
+	const FormData_pg_attribute *sysatt;
 
 	sysatt = SystemAttributeByName(attname,
 								   true /* "oid" will be accepted */ );
@@ -3152,12 +3152,12 @@ specialAttNum(const char *attname)
  *	heap_open()'ed.  Use the cache version get_atttype()
  *	for access to non-opened relations.
  */
-Name
+const NameData*
 attnumAttName(Relation rd, int attid)
 {
 	if (attid <= 0)
 	{
-		Form_pg_attribute sysatt;
+		const FormData_pg_attribute *sysatt;
 
 		sysatt = SystemAttributeDefinition(attid, rd->rd_rel->relhasoids);
 		return &sysatt->attname;
@@ -3179,7 +3179,7 @@ attnumTypeId(Relation rd, int attid)
 {
 	if (attid <= 0)
 	{
-		Form_pg_attribute sysatt;
+		const FormData_pg_attribute *sysatt;
 
 		sysatt = SystemAttributeDefinition(attid, rd->rd_rel->relhasoids);
 		return sysatt->atttypid;
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 42bf9bfec62..d8387d43569 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -2065,7 +2065,7 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
 		for (i = 0; i < index_form->indnatts; i++)
 		{
 			int16		attnum = index_form->indkey.values[i];
-			Form_pg_attribute attform;
+			const FormData_pg_attribute *attform;
 			char	   *attname;
 			Oid			defopclass;
 
diff --git a/src/backend/utils/adt/expandedrecord.c b/src/backend/utils/adt/expandedrecord.c
index b1b6883c19f..9aa6f3aac51 100644
--- a/src/backend/utils/adt/expandedrecord.c
+++ b/src/backend/utils/adt/expandedrecord.c
@@ -1025,6 +1025,7 @@ expanded_record_lookup_field(ExpandedRecordHeader *erh, const char *fieldname,
 	TupleDesc	tupdesc;
 	int			fno;
 	Form_pg_attribute attr;
+	const FormData_pg_attribute *sysattr;
 
 	tupdesc = expanded_record_get_tupdesc(erh);
 
@@ -1044,13 +1045,13 @@ expanded_record_lookup_field(ExpandedRecordHeader *erh, const char *fieldname,
 	}
 
 	/* How about system attributes? */
-	attr = SystemAttributeByName(fieldname, tupdesc->tdhasoid);
-	if (attr != NULL)
+	sysattr = SystemAttributeByName(fieldname, tupdesc->tdhasoid);
+	if (sysattr != NULL)
 	{
-		finfo->fnumber = attr->attnum;
-		finfo->ftypeid = attr->atttypid;
-		finfo->ftypmod = attr->atttypmod;
-		finfo->fcollation = attr->attcollation;
+		finfo->fnumber = sysattr->attnum;
+		finfo->ftypeid = sysattr->atttypid;
+		finfo->ftypmod = sysattr->atttypmod;
+		finfo->fcollation = sysattr->attcollation;
 		return true;
 	}
 
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index fe20448ac57..c266da2de1c 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -188,7 +188,7 @@ namege(PG_FUNCTION_ARGS)
 /* (see char.c for comparison/operation routines) */
 
 int
-namecpy(Name n1, Name n2)
+namecpy(Name n1, const NameData *n2)
 {
 	if (!n1 || !n2)
 		return -1;
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index b3e8fdd9c60..39f04b06eef 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -127,10 +127,10 @@ extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 extern void RemoveAttrDefaultById(Oid attrdefId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
 
-extern Form_pg_attribute SystemAttributeDefinition(AttrNumber attno,
+extern const FormData_pg_attribute *SystemAttributeDefinition(AttrNumber attno,
 						  bool relhasoids);
 
-extern Form_pg_attribute SystemAttributeByName(const char *attname,
+extern const FormData_pg_attribute *SystemAttributeByName(const char *attname,
 					  bool relhasoids);
 
 extern void CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index 687a7d7b1b9..03160fd71da 100644
--- a/src/include/parser/parse_relation.h
+++ b/src/include/parser/parse_relation.h
@@ -125,7 +125,7 @@ extern void expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up,
 extern List *expandRelAttrs(ParseState *pstate, RangeTblEntry *rte,
 			   int rtindex, int sublevels_up, int location);
 extern int	attnameAttNum(Relation rd, const char *attname, bool sysColOK);
-extern Name attnumAttName(Relation rd, int attid);
+extern const NameData* attnumAttName(Relation rd, int attid);
 extern Oid	attnumTypeId(Relation rd, int attid);
 extern Oid	attnumCollationId(Relation rd, int attid);
 extern bool isQueryUsingTempRelation(Query *query);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 6f55699912c..61785a24335 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -37,7 +37,7 @@ extern unsigned hex_decode(const char *src, unsigned len, char *dst);
 extern int2vector *buildint2vector(const int16 *int2s, int n);
 
 /* name.c */
-extern int	namecpy(Name n1, Name n2);
+extern int	namecpy(Name n1, const NameData *n2);
 extern int	namestrcpy(Name name, const char *str);
 extern int	namestrcmp(Name name, const char *str);
 
-- 
2.18.0.rc2.dirty

#15Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: Large writable variables

Hi,

After the last few changes (including a proposed one), we now have in
the .data segment (i.e. mutable initialized variables):

$ objdump -j .data -t ~/build/postgres/dev-optimize/vpath/src/backend/postgres|awk '{print $4, $5, $6}'|sort -r|lessGreetings,
.data 0000000000004380 ConfigureNamesInt
.data 0000000000003570 ConfigureNamesBool
.data 0000000000002140 ConfigureNamesString
.data 00000000000010e0 ConfigureNamesEnum
.data 0000000000000d20 ConfigureNamesReal

These are by far the largest chunk of the .data segment (like 47152
bytes out of 51168 bytes). It's not entirely trivial to fix. While we
can slap consts onto large parts of the functions taking a struct
config_generic*, there's a few places where *do* modify them.

ISTM, to actually fix, we'd have to split
struct config_generic
{
/* constant fields, must be set correctly in initial value: */
const char *name; /* name of variable - MUST BE FIRST */
GucContext context; /* context required to set the variable */
enum config_group group; /* to help organize variables by function */
const char *short_desc; /* short desc. of this variable's purpose */
const char *long_desc; /* long desc. of this variable's purpose */
int flags; /* flag bits, see guc.h */
/* variable fields, initialized at runtime: */
enum config_type vartype; /* type of variable (set only at startup) */
int status; /* status bits, see below */
GucSource source; /* source of the current actual value */
GucSource reset_source; /* source of the reset_value */
GucContext scontext; /* context that set the current value */
GucContext reset_scontext; /* context that set the reset value */
GucStack *stack; /* stacked prior values */
void *extra; /* "extra" pointer for current actual value */
char *sourcefile; /* file current setting is from (NULL if not
* set in config file) */
int sourceline; /* line in source file */
};

into two different arrays. Namely the already denoted 'constant fields'
and 'variable fields. But it's a bit more complicated than that: The
size doesn't come just from the base config_struct, but also from the
config_{bool,int,real,string,enum} arrays. Where we again have separate
'constant fields' and 'variable fields'.

So we'd have to refactor more heavily than just splitting the above
array into two.

I kind of wonder, if we shouldn't remove the separate ConfigureNamesInt*
int arrays and change things so we have one

struct config_def
{
const char *name; /* name of variable - MUST BE FIRST */
GucContext context; /* context required to set the variable */
enum config_group group; /* to help organize variables by function */
const char *short_desc; /* short desc. of this variable's purpose */
const char *long_desc; /* long desc. of this variable's purpose */
int flags; /* flag bits, see guc.h */
enum config_type vartype; /* type of variable (set only at startup) */

union
{
struct
{
bool *variable;
bool boot_val;
GucBoolCheckHook check_hook;
GucBoolAssignHook assign_hook;
GucShowHook show_hook;
} config_bool;

struct
{
int *variable;
int boot_val;
int min;
int max;
GucIntCheckHook check_hook;
GucIntAssignHook assign_hook;
GucShowHook show_hook;
} config_int

...
} pertype;
}

and then a corresponding struct config_val with a similar union.

Besides reducing modified memory, it'd also have the benefit of making
the grouping within the various arrays much more sensible, because
related fields could be grouped together.

.data 0000000000000420 intRelOpts
.data 00000000000001c0 realRelOpts
.data 0000000000000118 boolRelOpts
.data 00000000000000a8 stringRelOpts

these should be readonly, but the code doesn't make that easy. There's
pending refactorings, I don't know how they effect this.

.data 0000000000000068 SnapshotSelfData
.data 0000000000000068 SnapshotAnyData
.data 0000000000000068 SecondarySnapshotData
.data 0000000000000068 CurrentSnapshotData
.data 0000000000000068 CatalogSnapshotData

The obviously actually are modifyable.

.data 0000000000000028 spi_printtupDR
.data 0000000000000028 printsimpleDR
.data 0000000000000028 donothingDR
.data 0000000000000028 debugtupDR

These we could actually make constant, but CreateDestReceiver() as an
API makes that inconvenient. They also are pretty darn small... There's
a security benefit in making them constant and casting the constness
away - I think that might not be insane.

- Andres

#16Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#15)
1 attachment(s)
Re: Large writable variables

Hi,

On 2018-10-15 21:50:51 -0700, Andres Freund wrote:

.data 0000000000000028 spi_printtupDR
.data 0000000000000028 printsimpleDR
.data 0000000000000028 donothingDR
.data 0000000000000028 debugtupDR

These we could actually make constant, but CreateDestReceiver() as an
API makes that inconvenient. They also are pretty darn small... There's
a security benefit in making them constant and casting the constness
away - I think that might not be insane.

I.e. do something like the attached.

Greetings,

Andres Freund

Attachments:

const-dest.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c
index c95a4d519de..6960518eca1 100644
--- a/src/backend/tcop/dest.c
+++ b/src/backend/tcop/dest.c
@@ -67,28 +67,33 @@ donothingCleanup(DestReceiver *self)
  *		static DestReceiver structs for dest types needing no local state
  * ----------------
  */
-static DestReceiver donothingDR = {
+static const DestReceiver donothingDR = {
 	donothingReceive, donothingStartup, donothingCleanup, donothingCleanup,
 	DestNone
 };
 
-static DestReceiver debugtupDR = {
+static const DestReceiver debugtupDR = {
 	debugtup, debugStartup, donothingCleanup, donothingCleanup,
 	DestDebug
 };
 
-static DestReceiver printsimpleDR = {
+static const DestReceiver printsimpleDR = {
 	printsimple, printsimple_startup, donothingCleanup, donothingCleanup,
 	DestRemoteSimple
 };
 
-static DestReceiver spi_printtupDR = {
+static const DestReceiver spi_printtupDR = {
 	spi_printtup, spi_dest_startup, donothingCleanup, donothingCleanup,
 	DestSPI
 };
 
-/* Globally available receiver for DestNone */
-DestReceiver *None_Receiver = &donothingDR;
+/*
+ * Globally available receiver for DestNone.
+ *
+ * It's ok to cast the constness away as any modification of the none receiver
+ * would be a bug (which gets easier to catch this way).
+ */
+DestReceiver *None_Receiver = (DestReceiver *) &donothingDR;
 
 
 /* ----------------
@@ -108,6 +113,11 @@ BeginCommand(const char *commandTag, CommandDest dest)
 DestReceiver *
 CreateDestReceiver(CommandDest dest)
 {
+	/*
+	 * It's ok to cast the constness away as any modification of the none receiver
+	 * would be a bug (which gets easier to catch this way).
+	 */
+
 	switch (dest)
 	{
 		case DestRemote:
@@ -115,16 +125,16 @@ CreateDestReceiver(CommandDest dest)
 			return printtup_create_DR(dest);
 
 		case DestRemoteSimple:
-			return &printsimpleDR;
+			return (DestReceiver *) &printsimpleDR;
 
 		case DestNone:
-			return &donothingDR;
+			return (DestReceiver *) &donothingDR;
 
 		case DestDebug:
-			return &debugtupDR;
+			return (DestReceiver *) &debugtupDR;
 
 		case DestSPI:
-			return &spi_printtupDR;
+			return (DestReceiver *) &spi_printtupDR;
 
 		case DestTuplestore:
 			return CreateTuplestoreDestReceiver();
@@ -146,7 +156,7 @@ CreateDestReceiver(CommandDest dest)
 	}
 
 	/* should never get here */
-	return &donothingDR;
+	pg_unreachable();
 }
 
 /* ----------------
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#15)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

[ let's rearrange the GUC structs ]

I find it hard to believe that the API breaks you'll cause for extensions
are a good trade for a few kB reduction in the size of the .data segment.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
1 attachment(s)
Re: Large writable variables

I wrote:

Andres Freund <andres@anarazel.de> writes:

top unitialized allocations:
0000000008435040 0000000000085280 b DCHCache
0000000008391168 0000000000043840 b NUMCache

Here's a patch to improve that situation.

Hm, looking at that more closely, there's a problem with the idea of
allocating the cache slots one-at-a-time. Currently,
sizeof(DCHCacheEntry) and sizeof(NUMCacheEntry) are each just a bit more
than a power of 2, which would cause palloc to waste nearly 50% of the
allocation it makes for them.

We could forget the one-at-a-time idea and just allocate the whole
array on first use, but I feel like that's probably not a good answer.
I'm inclined to instead reduce DCH_CACHE_SIZE and NUM_CACHE_SIZE enough
to make the structs just less than powers of 2 instead of just more ---
AFAICS, both of those numbers were picked out of the air, and so there's
no reason not to pick a different number out of the air.

Also, I noticed that the biggest part of those structs are arrays of
FormatNode, which has been designed with complete lack of thought about
size or padding issues. We can very easily cut it in half on 64-bit
machines.

The attached patch does both of those things. It's independent of
the other one but is important to make that one efficient.

regards, tom lane

Attachments:

formatting-shave-bits.patchtext/x-diff; charset=us-ascii; name=formatting-shave-bits.patchDownload
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b3ff133..9c2649e 100644
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
***************
*** 124,133 ****
   */
  typedef struct
  {
! 	char	   *name;			/* suffix string		*/
  	int			len,			/* suffix length		*/
  				id,				/* used in node->suffix */
! 				type;			/* prefix / postfix			*/
  } KeySuffix;
  
  /* ----------
--- 124,133 ----
   */
  typedef struct
  {
! 	const char *name;			/* suffix string		*/
  	int			len,			/* suffix length		*/
  				id,				/* used in node->suffix */
! 				type;			/* prefix / postfix		*/
  } KeySuffix;
  
  /* ----------
*************** typedef struct
*** 155,164 ****
  
  typedef struct
  {
! 	int			type;			/* NODE_TYPE_XXX, see below */
! 	const KeyWord *key;			/* if type is ACTION */
  	char		character[MAX_MULTIBYTE_CHAR_LEN + 1];	/* if type is CHAR */
! 	int			suffix;			/* keyword prefix/suffix code, if any */
  } FormatNode;
  
  #define NODE_TYPE_END		1
--- 155,164 ----
  
  typedef struct
  {
! 	uint8		type;			/* NODE_TYPE_XXX, see below */
  	char		character[MAX_MULTIBYTE_CHAR_LEN + 1];	/* if type is CHAR */
! 	uint8		suffix;			/* keyword prefix/suffix code, if any */
! 	const KeyWord *key;			/* if type is ACTION */
  } FormatNode;
  
  #define NODE_TYPE_END		1
*************** typedef struct
*** 358,370 ****
   * For simplicity, the cache entries are fixed-size, so they allow for the
   * worst case of a FormatNode for each byte in the picture string.
   *
   * The max number of entries in the caches is DCH_CACHE_ENTRIES
   * resp. NUM_CACHE_ENTRIES.
   * ----------
   */
! #define NUM_CACHE_SIZE		64
  #define NUM_CACHE_ENTRIES	20
! #define DCH_CACHE_SIZE		128
  #define DCH_CACHE_ENTRIES	20
  
  typedef struct
--- 358,374 ----
   * For simplicity, the cache entries are fixed-size, so they allow for the
   * worst case of a FormatNode for each byte in the picture string.
   *
+  * The CACHE_SIZE constants are chosen to make sizeof(DCHCacheEntry) and
+  * sizeof(NUMCacheEntry) be powers of 2, or just less than that, so that
+  * we don't waste too much space by palloc'ing them individually.
+  *
   * The max number of entries in the caches is DCH_CACHE_ENTRIES
   * resp. NUM_CACHE_ENTRIES.
   * ----------
   */
! #define NUM_CACHE_SIZE		56
  #define NUM_CACHE_ENTRIES	20
! #define DCH_CACHE_SIZE		119
  #define DCH_CACHE_ENTRIES	20
  
  typedef struct
*************** do { \
*** 496,502 ****
   *****************************************************************************/
  
  /* ----------
!  * Suffixes:
   * ----------
   */
  #define DCH_S_FM	0x01
--- 500,506 ----
   *****************************************************************************/
  
  /* ----------
!  * Suffixes (FormatNode.suffix is an OR of these codes)
   * ----------
   */
  #define DCH_S_FM	0x01
#19Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#17)
Re: Large writable variables

Hi,

On 2018-10-16 01:49:22 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

[ let's rearrange the GUC structs ]

I find it hard to believe that the API breaks you'll cause for extensions
are a good trade for a few kB reduction in the size of the .data segment.

I'm doubtful that it's worth it too. But more because it seems like a
fair bit of work. I don't think that many extensions use guc_tables.h -
shouldn't most use guc.h, which wouldn't be affected?

Greetings,

Andres Freund

#20Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#16)
Re: Large writable variables

On 2018-10-15 22:02:00 -0700, Andres Freund wrote:

Hi,

On 2018-10-15 21:50:51 -0700, Andres Freund wrote:

.data 0000000000000028 spi_printtupDR
.data 0000000000000028 printsimpleDR
.data 0000000000000028 donothingDR
.data 0000000000000028 debugtupDR

These we could actually make constant, but CreateDestReceiver() as an
API makes that inconvenient. They also are pretty darn small... There's
a security benefit in making them constant and casting the constness
away - I think that might not be insane.

I.e. do something like the attached.

This just reminded me that a couple times I wanted a cast that casts
away const, but otherwise makes sure the type stays the same. I don't
think there's a way to do that in C, but we can write one that verifies
the cast doesn't do something bad if gcc is used:

#if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P)
#define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst) (var)
#else
#define unconstify(cst, var) ((cst) (var))
#endif

Does anybody besides me see value in adding a cleaned up version of
that?

Greetings,

Andres Freund

#21Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#18)
Re: Large writable variables

Hi,

On 2018-10-16 01:59:00 -0400, Tom Lane wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

top unitialized allocations:
0000000008435040 0000000000085280 b DCHCache
0000000008391168 0000000000043840 b NUMCache

Here's a patch to improve that situation.

Hm, looking at that more closely, there's a problem with the idea of
allocating the cache slots one-at-a-time. Currently,
sizeof(DCHCacheEntry) and sizeof(NUMCacheEntry) are each just a bit more
than a power of 2, which would cause palloc to waste nearly 50% of the
allocation it makes for them.

Hm, that's a bit annoying...

We could forget the one-at-a-time idea and just allocate the whole
array on first use, but I feel like that's probably not a good answer.

I suspect it'd be fine, but obviously we can do better.

Also, I noticed that the biggest part of those structs are arrays of
FormatNode, which has been designed with complete lack of thought about
size or padding issues. We can very easily cut it in half on 64-bit
machines.

Heh, neat. I feel like we've paid very little attention to that in a
myriad of places :(

Greetings,

Andres Freund

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#21)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

On 2018-10-16 01:59:00 -0400, Tom Lane wrote:

Also, I noticed that the biggest part of those structs are arrays of
FormatNode, which has been designed with complete lack of thought about
size or padding issues. We can very easily cut it in half on 64-bit
machines.

Heh, neat. I feel like we've paid very little attention to that in a
myriad of places :(

Most of the time, we probably *shouldn't* pay attention to it; logical
field ordering is worth a good deal IMO. But in a case like this,
where there are large arrays of the things and it's not very painful
to avoid padding waste, it's worth the trouble.

regards, tom lane

#23Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
Re: Large writable variables

On Tue, Oct 16, 2018 at 2:30 AM Andres Freund <andres@anarazel.de> wrote:

This just reminded me that a couple times I wanted a cast that casts
away const, but otherwise makes sure the type stays the same. I don't
think there's a way to do that in C, but we can write one that verifies
the cast doesn't do something bad if gcc is used:

#if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P)
#define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst) (var)
#else
#define unconstify(cst, var) ((cst) (var))
#endif

Does anybody besides me see value in adding a cleaned up version of
that?

Under what circumstances would we consider this to be a legitimate thing to use?

I think if we add something this, we'd better accompany it with some
detailed and very clearly-written statements about when you're allowed
to use it. Otherwise, I predict that people will use it in cases
where it's not actually safe, and we'll end up with low-grade bugs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#23)
Re: Large writable variables

On 2018-10-16 11:22:31 -0400, Robert Haas wrote:

On Tue, Oct 16, 2018 at 2:30 AM Andres Freund <andres@anarazel.de> wrote:

This just reminded me that a couple times I wanted a cast that casts
away const, but otherwise makes sure the type stays the same. I don't
think there's a way to do that in C, but we can write one that verifies
the cast doesn't do something bad if gcc is used:

#if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P)
#define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst) (var)
#else
#define unconstify(cst, var) ((cst) (var))
#endif

Does anybody besides me see value in adding a cleaned up version of
that?

Under what circumstances would we consider this to be a legitimate thing to use?

When the variable actually *will not* be modified, but language or API
design reasons makes it unfeasiable to express that. Look e.g.
DestReceiver * CreateDestReceiver(CommandDest dest);
some of the returned receivers (e.g. donothingDR, printsimpleDR) are
statically allocated and *any* modification would be a bug. But other
return values will be modified, e.g. CreateIntoRelDestReceiver().

It's safe to cast constness away if the variable will not actually be
modified after. Which is e.g. the case above. But making the static
allocations const will a) save memory b) trigger sigbuses if you modify
them. So the casting constness away here *increases* robustness.

The problem is that just adding a cast like
case DestNone:
return (DestReceiver *) &donothingDR;
also hides errors. If you e.g. changed the type of donothingDR you'd
still not get an error.

So I was wishing for a form of a cast that only casts the const away,
but errors out if there's any other type difference. That's the above, I
think.

I think if we add something this, we'd better accompany it with some
detailed and very clearly-written statements about when you're allowed
to use it. Otherwise, I predict that people will use it in cases
where it's not actually safe, and we'll end up with low-grade bugs.

Well, right now people will (and have) just cast the const away like
above. So I don't really see it being more likely to cause problems than
we're doing now. But yea, it definitely should have a big red warning
label.

Greetings,

Andres Freund

#25Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#24)
Re: Large writable variables

On Tue, Oct 16, 2018 at 12:06 PM Andres Freund <andres@anarazel.de> wrote:

But yea, it definitely should have a big red warning
label.

That's all I'm asking. And the warning label shouldn't just say "use
with caution!" but should rather explain how to know whether you're
doing the right thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#25)
Re: Large writable variables

On 2018-10-16 12:19:55 -0400, Robert Haas wrote:

On Tue, Oct 16, 2018 at 12:06 PM Andres Freund <andres@anarazel.de> wrote:

But yea, it definitely should have a big red warning
label.

That's all I'm asking. And the warning label shouldn't just say "use
with caution!" but should rather explain how to know whether you're
doing the right thing.

I'm thinking of something like:

/*
* Macro that allows to cast constness away from a variable, but doesn't
* allow changing the underlying type. Enforcement of the latter
* currently only works for gcc like compilers.
*
* Please note IT IS NOT SAFE to cast constness away if the variable will ever
* be modified (it would be undefined behaviour). Doing so anyway can cause
* compiler misoptimizations or runtime crashes (modifying readonly memory).
* It is only safe to use when the the variabble will not be modified, but API
* design or language restrictions prevent you from declaring that
* (e.g. because a function returns both const and non-const variables).
*/

Greetings,

Andres Freund

#27Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#26)
Re: Large writable variables

On Tue, Oct 16, 2018 at 12:26 PM Andres Freund <andres@anarazel.de> wrote:

/*
* Macro that allows to cast constness away from a variable, but doesn't
* allow changing the underlying type. Enforcement of the latter
* currently only works for gcc like compilers.
*
* Please note IT IS NOT SAFE to cast constness away if the variable will ever
* be modified (it would be undefined behaviour). Doing so anyway can cause
* compiler misoptimizations or runtime crashes (modifying readonly memory).
* It is only safe to use when the the variabble will not be modified, but API
* design or language restrictions prevent you from declaring that
* (e.g. because a function returns both const and non-const variables).
*/

"variabble" is a little too rich in "b"s.

In terms of a function that returns both const and non-const
variables, it seems a bit sketchy that the caller would know what the
function is doing in particular cases and make decisions based on it,
but maybe that's just how life is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#27)
Re: Large writable variables

Hi,

On 2018-10-16 12:41:44 -0400, Robert Haas wrote:

On Tue, Oct 16, 2018 at 12:26 PM Andres Freund <andres@anarazel.de> wrote:

/*
* Macro that allows to cast constness away from a variable, but doesn't
* allow changing the underlying type. Enforcement of the latter
* currently only works for gcc like compilers.
*
* Please note IT IS NOT SAFE to cast constness away if the variable will ever
* be modified (it would be undefined behaviour). Doing so anyway can cause
* compiler misoptimizations or runtime crashes (modifying readonly memory).
* It is only safe to use when the the variabble will not be modified, but API
* design or language restrictions prevent you from declaring that
* (e.g. because a function returns both const and non-const variables).
*/

"variabble" is a little too rich in "b"s.

variababble.

In terms of a function that returns both const and non-const
variables, it seems a bit sketchy that the caller would know what the
function is doing in particular cases and make decisions based on it,
but maybe that's just how life is.

I don't think it's necessary the callers doing so in most cases. E.g. in
the DestReceiver case, it'll be the choice of the testreceiver (say
intorel_receive modifying things for DestIntoRel), not the caller
choosing when to modify things. The caller / users of dest receivers
won't necessarily know.

I agree it's not pretty, but I don't quite see any really realistic
other approaches here.

Greetings,

Andres Freund

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#28)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

On 2018-10-16 12:41:44 -0400, Robert Haas wrote:

In terms of a function that returns both const and non-const
variables, it seems a bit sketchy that the caller would know what the
function is doing in particular cases and make decisions based on it,
but maybe that's just how life is.

I don't think it's necessary the callers doing so in most cases. E.g. in
the DestReceiver case, it'll be the choice of the testreceiver (say
intorel_receive modifying things for DestIntoRel), not the caller
choosing when to modify things. The caller / users of dest receivers
won't necessarily know.

Yeah, I think the use-case is more like "this API specifies non-const
pointers, but some instances can return pointers to const objects,
while others return non-const objects". Changing the API to const
isn't better, it just moves where you have to cast away const.

regards, tom lane

#30Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#1)
Re: Large writable variables

On Mon, Oct 15, 2018 at 4:08 PM Andres Freund <andres@anarazel.de> wrote:

So we have 500kb of not-initialized memory mapped into every
process. That's, uh, not nothing.

Thinking about this a bit more, why is this bad? I mean, if the
memory is never touched, the OS does not really need to allocate or
zero any pages, or even make any page table entries. If somebody
actually accesses the data, then we'll take a page fault and have to
really allocate, but otherwise I would think we could have 50MB of
unused bss floating around and it wouldn't really matter, let alone
500kB.

What am I missing?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: Large writable variables

Hi,

On 2018-10-15 13:07:54 -0700, Andres Freund wrote:

Hi,

while briefly thinking about the per-process overhead of postgres, I
looked at the section sizes of a modern postgres. In particular which
memory areas are *not* shared between processes.

If you look at the section sizes that are mapped read-write:

$ size --format=sysv src/backend/postgres
src/backend/postgres :
section size addr
.rodata 1593617 5730304 (read-only, for reference)
.data.rel.ro 134944 8039904 (read only after start)
.data 53888 8178912 (read-write, initialized)
.bss 510416 8232800 (read-write, uninitialized)
Total 52417197

We improve a fair bit over the last two days:

text data bss dec hex filename
8030289 191888 227024 8449201 80ecb1 src/backend/postgres
8029097 192880 510416 8732393 853ee9 src/backend/postgres.before

breakdown:
section size addr
.rodata 1594577 5730304 (read-only, for reference)
.data.rel.ro 136928 8037920 (read only after start)
.data 50912 8178912 (read-write, initialized)
.bss 227024 8229824 (read-write, uninitialized)

As to be expected .rodata and .data.rel.ro gained a bit in size (as they
contain constant data and we made more vars constant), whereas .data
shrunk a bit, and .bss shrunk drastically by moving things to be
dynamically allocated.

Nice progress, thanks everyone!

The top .bss entries (uninitialized modifyable vars) are now:
.bss 0000000000020020 hist_entries
.bss 0000000000005b90 tzdefrules_s
.bss 0000000000004000 hist_start
.bss 0000000000002000 PqRecvBuffer
.bss 0000000000001410 BackendWritebackContext
.bss 0000000000000c80 held_lwlocks
.bss 0000000000000b00 re_array
.bss 0000000000000800 buffer_lists
.bss 0000000000000600 syscache_callback_list
.bss 0000000000000560 reverse_dispatch_table
.bss 0000000000000400 tzdir.7228
.bss 0000000000000400 pkglib_path
.bss 0000000000000400 OutputFileName
.bss 0000000000000400 my_exec_path
.bss 0000000000000400 ExtraOptions
.bss 0000000000000398 errordata
.bss 0000000000000320 seq_scan_tables

The top .data entries (initialized modifyable vars) are now:
.data 0000000000004380 ConfigureNamesInt
.data 0000000000003570 ConfigureNamesBool
.data 0000000000002140 ConfigureNamesString
.data 00000000000010e0 ConfigureNamesEnum
.data 0000000000000d20 ConfigureNamesReal
.data 0000000000000420 intRelOpts
.data 00000000000001c0 realRelOpts
.data 0000000000000118 boolRelOpts
.data 00000000000000a8 stringRelOpts
.data 0000000000000068 SnapshotSelfData
.data 0000000000000068 SnapshotAnyData
.data 0000000000000068 SecondarySnapshotData
.data 0000000000000068 CurrentSnapshotData
.data 0000000000000068 CatalogSnapshotData

- Andres

#32Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#30)
Re: Large writable variables

Hi,

On 2018-10-16 15:13:43 -0400, Robert Haas wrote:

On Mon, Oct 15, 2018 at 4:08 PM Andres Freund <andres@anarazel.de> wrote:

So we have 500kb of not-initialized memory mapped into every
process. That's, uh, not nothing.

Thinking about this a bit more, why is this bad? I mean, if the
memory is never touched, the OS does not really need to allocate or
zero any pages, or even make any page table entries. If somebody
actually accesses the data, then we'll take a page fault and have to
really allocate, but otherwise I would think we could have 50MB of
unused bss floating around and it wouldn't really matter, let alone
500kB.

What am I missing?

For one the OS will actually reserve all that memory when you use memory
overcommit = 2 (which you should). There's no need to reserve memory for
data that's guaranteed to be shared. It also reduces the frequency of
page faults when looking at other global variables laid out nearby - we
could do something about that by giving the linker placement
instructions, but that's also work. The size of the pagetable also has
performance effects, although it'll often be dwarfed by the shared
buffers entry if not using huge pages. WRT .data entries, marking them
as const where appropriate both reduces memory usage, possibly allows
more compiler optimizations, and allows to catch bugs.

There's also the fact that not every OS has COW pagetables, or we use
processes in a way that don't allow that (say windows, where we don't
fork).

Greetings,

Andres Freund

#33Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#22)
1 attachment(s)
Re: Large writable variables

On 2018-10-16 10:16:33 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-10-16 01:59:00 -0400, Tom Lane wrote:

Also, I noticed that the biggest part of those structs are arrays of
FormatNode, which has been designed with complete lack of thought about
size or padding issues. We can very easily cut it in half on 64-bit
machines.

Heh, neat. I feel like we've paid very little attention to that in a
myriad of places :(

Most of the time, we probably *shouldn't* pay attention to it; logical
field ordering is worth a good deal IMO.

Sure. But there's plenty structs which we allocate a bunch off, that are
frequently accessed, where a lot of space is wasted to padding. I agree
that we don't need to contort many structs, but there's plenty where we
should. Often enough it's possible to reorder without making things
make meaningfully less sense.

But in a case like this,
where there are large arrays of the things and it's not very painful
to avoid padding waste, it's worth the trouble.

Attached is a patch that shrinks fmgr_builtins by 25%. That seems
worthwhile, it's pretty frequently accessed, making it more dense is
helpful. Unless somebody protests soon, I'm going to apply that...

Greetings,

Andres Freund

Attachments:

0001-Reorder-FmgrBuiltin-members-saving-25-in-size.patchtext/x-diff; charset=us-asciiDownload
From 575a51d94fd4bef6914ba55b33ba96504f64d574 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 16 Oct 2018 13:05:52 -0700
Subject: [PATCH] Reorder FmgrBuiltin members, saving 25% in size.

That's worth it, as fmgr_builtins is frequently accessed, and as
fmgr_builtins is one of the biggest constant variables in a backend.
---
 src/backend/utils/Gen_fmgrtab.pl | 2 +-
 src/include/utils/fmgrtab.h      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index fa30436895b..ca282913552 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -230,7 +230,7 @@ my $fmgr_count = 0;
 foreach my $s (sort { $a->{oid} <=> $b->{oid} } @fmgr)
 {
 	print $tfh
-	  "  { $s->{oid}, \"$s->{prosrc}\", $s->{nargs}, $bmap{$s->{strict}}, $bmap{$s->{retset}}, $s->{prosrc} }";
+	  "  { $s->{oid}, $s->{nargs}, $bmap{$s->{strict}}, $bmap{$s->{retset}}, \"$s->{prosrc}\", $s->{prosrc} }";
 
 	$fmgr_builtin_oid_index[ $s->{oid} ] = $fmgr_count++;
 
diff --git a/src/include/utils/fmgrtab.h b/src/include/utils/fmgrtab.h
index d8317eb3eac..8d89d66777b 100644
--- a/src/include/utils/fmgrtab.h
+++ b/src/include/utils/fmgrtab.h
@@ -25,10 +25,10 @@
 typedef struct
 {
 	Oid			foid;			/* OID of the function */
-	const char *funcName;		/* C name of the function */
 	short		nargs;			/* 0..FUNC_MAX_ARGS, or -1 if variable count */
 	bool		strict;			/* T if function is "strict" */
 	bool		retset;			/* T if function returns a set */
+	const char * const funcName;		/* C name of the function */
 	PGFunction	func;			/* pointer to compiled function */
 } FmgrBuiltin;
 
-- 
2.18.0.rc2.dirty

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#33)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

Attached is a patch that shrinks fmgr_builtins by 25%. That seems
worthwhile, it's pretty frequently accessed, making it more dense is
helpful. Unless somebody protests soon, I'm going to apply that...

Hah. I'm pretty sure that struct *was* set up with an eye to padding ...
on 32-bit machines. This does make it shorter on 64-bit, but also
makes the size not a power of 2, which might add a few cycles to
array indexing calculations. Might be worth checking whether that's
going to be an issue anywhere.

What's the point of the extra const decoration on funcName? ISTM
the whole struct should be const, or not.

regards, tom lane

#35Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Tom Lane (#34)
Re: Large writable variables

On 17/10/2018 09:36, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Attached is a patch that shrinks fmgr_builtins by 25%. That seems
worthwhile, it's pretty frequently accessed, making it more dense is
helpful. Unless somebody protests soon, I'm going to apply that...

Hah. I'm pretty sure that struct *was* set up with an eye to padding ...
on 32-bit machines. This does make it shorter on 64-bit, but also
makes the size not a power of 2, which might add a few cycles to
array indexing calculations. Might be worth checking whether that's
going to be an issue anywhere.

What's the point of the extra const decoration on funcName? ISTM
the whole struct should be const, or not.

regards, tom lane

Would it be useful to add dummy variable(s) to bring it up to a power of 2?

Cheers,
Gavin

#36Andres Freund
andres@anarazel.de
In reply to: Gavin Flower (#35)
Re: Large writable variables

Hi,

On 2018-10-17 09:38:18 +1300, Gavin Flower wrote:

On 17/10/2018 09:36, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Attached is a patch that shrinks fmgr_builtins by 25%. That seems
worthwhile, it's pretty frequently accessed, making it more dense is
helpful. Unless somebody protests soon, I'm going to apply that...

Hah. I'm pretty sure that struct *was* set up with an eye to padding ...
on 32-bit machines. This does make it shorter on 64-bit, but also
makes the size not a power of 2, which might add a few cycles to
array indexing calculations. Might be worth checking whether that's
going to be an issue anywhere.

What's the point of the extra const decoration on funcName? ISTM
the whole struct should be const, or not.

Would it be useful to add dummy variable(s) to bring it up to a power of 2?

Err. Reread what we're talking about. The point was to reduce the size,
it's a power of two right now (32). We could of course also just do
nothing (re-add a dummy variable), which would, drumroll, do nothing.

Greetings,

Andres Freund

#37Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#34)
Re: Large writable variables

On 2018-10-16 16:36:12 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Attached is a patch that shrinks fmgr_builtins by 25%. That seems
worthwhile, it's pretty frequently accessed, making it more dense is
helpful. Unless somebody protests soon, I'm going to apply that...

Hah. I'm pretty sure that struct *was* set up with an eye to padding ...
on 32-bit machines.

Possible, the new layout should work just as well there, luckily.

This does make it shorter on 64-bit, but also
makes the size not a power of 2, which might add a few cycles to
array indexing calculations. Might be worth checking whether that's
going to be an issue anywhere.

I can't imagine that that outweight the cost of additional cache misses
on any platform where performance matters. On x86 I assume indexing
into an array with 24byte stride, will normally be just two leas (lea
eax, [eax + eax * 2]; lea eax, [ebx + eax * 8]; where eax initially is
the index, and ebx the array base). Indexing also plays less of a role
than in the past, because previously we did a binary search, but now we
normally look up the index via fmgr_builtin_oid_index.

What's the point of the extra const decoration on funcName? ISTM
the whole struct should be const, or not.

The whole array is const. There's cases where that allows the compiler
more freedom - but I guess it's really a bit redundant here.

Greetings,

Andres Freund

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
1 attachment(s)
Re: Large writable variables

I wrote:

1. tzdefrules_s is filled in the postmaster, and if it's not filled there
it'd be filled by every child process, so there's zero point in converting
it to malloc-on-the-fly style. This is because tzparse() always wants
it filled. That's actually a tad annoying, because it looks to me like
with many timezone settings the data will not get used, but I'm hesitant
to mess with the IANA code's logic enough to improve that. Maybe I'll try
submitting an upstream patch and see what they think of it first.

I spent some time studying this, and realized that the reason that they
always try to load the TZDEFRULES zone data is that they want to absorb
whatever leap-seconds data it has. We don't want that and would be glad
to just drop said data, but I'm sure that's intentional on their end,
so they are not going to be interested in a patch here.

However ... if you assume that any leap-seconds data in that zone can
be ignored, then the only case where the data need be loaded is when
parsing a POSIX-style zone name that has a DST component but does not
specify a POSIX-style transition date rule. That's possible in our code
but I do not think it is a mainstream case; in particular, that seems
never to be reached when loading a real zone data file.

Hence, the attached patch postpones the TZDEFRULES load until we find
we actually need it, and ignores any leap-second data therein, and
incidentally makes the data space be malloc-on-demand rather than static.

This is actually less of a hack compared to the upstream code than what
we were doing before, so I feel pretty pleased with it.

regards, tom lane

Attachments:

better-tzdefrules-handling.patchtext/x-diff; charset=us-ascii; name=better-tzdefrules-handling.patchDownload
diff --git a/src/timezone/localtime.c b/src/timezone/localtime.c
index a2260e5..e3029d8 100644
*** a/src/timezone/localtime.c
--- b/src/timezone/localtime.c
*************** static const char gmt[] = "GMT";
*** 54,60 ****
   * PG: We cache the result of trying to load the TZDEFRULES zone here.
   * tzdefrules_loaded is 0 if not tried yet, +1 if good, -1 if failed.
   */
! static struct state tzdefrules_s;
  static int	tzdefrules_loaded = 0;
  
  /*
--- 54,60 ----
   * PG: We cache the result of trying to load the TZDEFRULES zone here.
   * tzdefrules_loaded is 0 if not tried yet, +1 if good, -1 if failed.
   */
! static struct state *tzdefrules_s = NULL;
  static int	tzdefrules_loaded = 0;
  
  /*
*************** tzparse(const char *name, struct state *
*** 908,927 ****
  	stdname = name;
  	if (lastditch)
  	{
! 		/*
! 		 * This is intentionally somewhat different from the IANA code.  We do
! 		 * not want to invoke tzload() in the lastditch case: we can't assume
! 		 * pg_open_tzfile() is sane yet, and we don't care about leap seconds
! 		 * anyway.
! 		 */
  		stdlen = strlen(name);	/* length of standard zone name */
  		name += stdlen;
- 		if (stdlen >= sizeof sp->chars)
- 			stdlen = (sizeof sp->chars) - 1;
- 		charcnt = stdlen + 1;
  		stdoffset = 0;
- 		sp->goback = sp->goahead = false;	/* simulate failed tzload() */
- 		load_ok = false;
  	}
  	else
  	{
--- 908,917 ----
  	stdname = name;
  	if (lastditch)
  	{
! 		/* Unlike IANA, don't assume name is exactly "GMT" */
  		stdlen = strlen(name);	/* length of standard zone name */
  		name += stdlen;
  		stdoffset = 0;
  	}
  	else
  	{
*************** tzparse(const char *name, struct state *
*** 945,971 ****
  		name = getoffset(name, &stdoffset);
  		if (name == NULL)
  			return false;
- 		charcnt = stdlen + 1;
- 		if (sizeof sp->chars < charcnt)
- 			return false;
- 
- 		/*
- 		 * This bit also differs from the IANA code, which doesn't make any
- 		 * attempt to avoid repetitive loadings of the TZDEFRULES zone.
- 		 */
- 		if (tzdefrules_loaded == 0)
- 		{
- 			if (tzload(TZDEFRULES, NULL, &tzdefrules_s, false) == 0)
- 				tzdefrules_loaded = 1;
- 			else
- 				tzdefrules_loaded = -1;
- 		}
- 		load_ok = (tzdefrules_loaded > 0);
- 		if (load_ok)
- 			memcpy(sp, &tzdefrules_s, sizeof(struct state));
  	}
! 	if (!load_ok)
! 		sp->leapcnt = 0;		/* so, we're off a little */
  	if (*name != '\0')
  	{
  		if (*name == '<')
--- 935,957 ----
  		name = getoffset(name, &stdoffset);
  		if (name == NULL)
  			return false;
  	}
! 	charcnt = stdlen + 1;
! 	if (sizeof sp->chars < charcnt)
! 		return false;
! 
! 	/*
! 	 * The IANA code always tries tzload(TZDEFRULES) here.  We do not want to
! 	 * do that; it would be bad news in the lastditch case, where we can't
! 	 * assume pg_open_tzfile() is sane yet.  Moreover, the only reason to do
! 	 * it unconditionally is to absorb the TZDEFRULES zone's leap second info,
! 	 * which we don't want to do anyway.  Without that, we only need to load
! 	 * TZDEFRULES if the zone name specifies DST but doesn't incorporate a
! 	 * POSIX-style transition date rule, which is not a common case.
! 	 */
! 	sp->goback = sp->goahead = false;	/* simulate failed tzload() */
! 	sp->leapcnt = 0;			/* intentionally assume no leap seconds */
! 
  	if (*name != '\0')
  	{
  		if (*name == '<')
*************** tzparse(const char *name, struct state *
*** 996,1003 ****
  		}
  		else
  			dstoffset = stdoffset - SECSPERHOUR;
! 		if (*name == '\0' && !load_ok)
! 			name = TZDEFRULESTRING;
  		if (*name == ',' || *name == ';')
  		{
  			struct rule start;
--- 982,1019 ----
  		}
  		else
  			dstoffset = stdoffset - SECSPERHOUR;
! 		if (*name == '\0')
! 		{
! 			/*
! 			 * The POSIX zone name does not provide a transition-date rule.
! 			 * Here we must load the TZDEFRULES zone, if possible, to serve as
! 			 * source data for the transition dates.  Unlike the IANA code, we
! 			 * try to cache the data so it's only loaded once.
! 			 */
! 			if (tzdefrules_loaded == 0)
! 			{
! 				/* Allocate on first use */
! 				if (tzdefrules_s == NULL)
! 					tzdefrules_s = (struct state *) malloc(sizeof(struct state));
! 				if (tzdefrules_s != NULL)
! 				{
! 					if (tzload(TZDEFRULES, NULL, tzdefrules_s, false) == 0)
! 						tzdefrules_loaded = 1;
! 					else
! 						tzdefrules_loaded = -1;
! 					/* In any case, we ignore leap-second data from the file */
! 					tzdefrules_s->leapcnt = 0;
! 				}
! 			}
! 			load_ok = (tzdefrules_loaded > 0);
! 			if (load_ok)
! 				memcpy(sp, tzdefrules_s, sizeof(struct state));
! 			else
! 			{
! 				/* If we can't load TZDEFRULES, fall back to hard-wired rule */
! 				name = TZDEFRULESTRING;
! 			}
! 		}
  		if (*name == ',' || *name == ';')
  		{
  			struct rule start;
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#32)
Re: Large writable variables

BTW, I looked around for .o files with large BSS numbers, and came across

$ size src/interfaces/ecpg/ecpglib/prepare.o
text data bss dec hex filename
4023 4 1048576 1052603 100fbb src/interfaces/ecpg/ecpglib/prepare.o

That megabyte is from a statically allocated statement cache array.
Seems a bit unfriendly to users of ecpglib, given that many apps
would never use the statement cache (AFAICT you have to explicitly
ask for auto-prepare mode to get to that code).

Doesn't look hard to fix though.

regards, tom lane

#40Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#39)
Re: Large writable variables

On 2018-10-16 19:48:42 -0400, Tom Lane wrote:

BTW, I looked around for .o files with large BSS numbers, and came across

$ size src/interfaces/ecpg/ecpglib/prepare.o
text data bss dec hex filename
4023 4 1048576 1052603 100fbb src/interfaces/ecpg/ecpglib/prepare.o

Btw, I think one can fairly argue that large .data vars have a
higher cost than .bss. That's not to say we shouldn't fix .bss ones ;)

I've this awful shell script to figure out the sizes:

for file in $(find . -type f -name '*.o'); do objdump -j .data -t $file|tail -n +5|grep -v '^$'|awk -v "file=$file" '{print $4, $5, $6, file}';done|sort -nr|less

.data 0000000000001180 datetktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o
.data 0000000000000c28 ibmkanji ./src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.o
.data 00000000000003f0 deltatktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o
.data 0000000000000100 sqlca_init ./src/interfaces/ecpg/ecpglib/misc.o
.data 0000000000000100 sqlca_init ./src/interfaces/ecpg/compatlib/informix.o
.data 0000000000000068 day_tab ./src/interfaces/ecpg/pgtypeslib/dt_common.o
.data 0000000000000030 ecpg_query ./src/interfaces/ecpg/preproc/preproc.o
.data 0000000000000030 ecpg_no_indicator ./src/interfaces/ecpg/preproc/preproc.o
.data 0000000000000030 descriptor_type.5770 ./src/interfaces/ecpg/preproc/descriptor.o
.data 000000000000002b ssl_nomem ./src/interfaces/libpq/fe-secure-openssl.o
.data 000000000000002a PLy_subtransaction_doc ./src/pl/plpython/plpy_subxactobject.o
.data 0000000000000023 PLy_cursor_doc ./src/pl/plpython/plpy_cursorobject.o
.data 000000000000001e PLy_result_doc ./src/pl/plpython/plpy_resultobject.o
.data 0000000000000018 PLy_plan_doc ./src/pl/plpython/plpy_planobject.o
.data 0000000000000010 base_yylloc ./src/interfaces/ecpg/preproc/preproc.o

Should also work for other sections.

Hm, there's a lot of ecpg in this :(

That megabyte is from a statically allocated statement cache array.
Seems a bit unfriendly to users of ecpglib, given that many apps
would never use the statement cache (AFAICT you have to explicitly
ask for auto-prepare mode to get to that code).

Doesn't look hard to fix though.

Thanks for tackling that.

Greetings,

Andres Freund

#41Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#20)
1 attachment(s)
Re: Large writable variables

On 16/10/2018 08:30, Andres Freund wrote:

This just reminded me that a couple times I wanted a cast that casts
away const, but otherwise makes sure the type stays the same. I don't
think there's a way to do that in C, but we can write one that verifies
the cast doesn't do something bad if gcc is used:

#if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P)
#define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst) (var)
#else
#define unconstify(cst, var) ((cst) (var))
#endif

Does anybody besides me see value in adding a cleaned up version of
that?

I've had the attached patch lying around. As you can see there, there
are a few places where it could be useful, but not too many.

The name CONST_CAST() is adapted from C++.

Your version with __builtin_types_compatible_p() doesn't work for
casting between char * and const char *, so it wouldn't be very useful.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Add-CONST_CAST-macro.patchtext/plain; charset=UTF-8; name=0001-Add-CONST_CAST-macro.patch; x-mac-creator=0; x-mac-type=0Download
From 31576b37687f44fc14bcf0990eb489466b74bf29 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Wed, 23 May 2018 09:10:51 -0400
Subject: [PATCH] Add CONST_CAST macro

---
 src/backend/utils/adt/json.c       | 4 ++--
 src/backend/utils/adt/misc.c       | 2 +-
 src/backend/utils/adt/varlena.c    | 4 ++--
 src/backend/utils/fmgr/dfmgr.c     | 4 ++--
 src/include/c.h                    | 6 ++++++
 src/interfaces/libpq/pqexpbuffer.c | 2 +-
 src/port/win32setlocale.c          | 2 +-
 7 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 6f0fe94d63..6190b01dd2 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -207,12 +207,12 @@ IsValidJsonNumber(const char *str, int len)
 	 */
 	if (*str == '-')
 	{
-		dummy_lex.input = (char *) str + 1;
+		dummy_lex.input = CONST_CAST(char *, str) + 1;
 		dummy_lex.input_length = len - 1;
 	}
 	else
 	{
-		dummy_lex.input = (char *) str;
+		dummy_lex.input = CONST_CAST(char *, str);
 		dummy_lex.input_length = len;
 	}
 
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 6ea3679835..179ba81880 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -423,7 +423,7 @@ pg_get_keywords(PG_FUNCTION_ARGS)
 		HeapTuple	tuple;
 
 		/* cast-away-const is ugly but alternatives aren't much better */
-		values[0] = (char *) ScanKeywords[funcctx->call_cntr].name;
+		values[0] = CONST_CAST(char *, ScanKeywords[funcctx->call_cntr].name);
 
 		switch (ScanKeywords[funcctx->call_cntr].category)
 		{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index a5e812d026..a98ec2f8df 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -182,7 +182,7 @@ char *
 text_to_cstring(const text *t)
 {
 	/* must cast away the const, unfortunately */
-	text	   *tunpacked = pg_detoast_datum_packed((struct varlena *) t);
+	text	   *tunpacked = pg_detoast_datum_packed(CONST_CAST(text *, t));
 	int			len = VARSIZE_ANY_EXHDR(tunpacked);
 	char	   *result;
 
@@ -213,7 +213,7 @@ void
 text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)
 {
 	/* must cast away the const, unfortunately */
-	text	   *srcunpacked = pg_detoast_datum_packed((struct varlena *) src);
+	text	   *srcunpacked = pg_detoast_datum_packed(CONST_CAST(text *, src));
 	size_t		src_len = VARSIZE_ANY_EXHDR(srcunpacked);
 
 	if (dst_len > 0)
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 4a5cc7cfc7..b2ff7aef35 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -126,7 +126,7 @@ load_external_function(const char *filename, const char *funcname,
 	 * should declare its second argument as "const char *", but older
 	 * platforms might not, so for the time being we just cast away const.
 	 */
-	retval = (PGFunction) dlsym(lib_handle, (char *) funcname);
+	retval = (PGFunction) dlsym(lib_handle, CONST_CAST(char *, funcname));
 
 	if (retval == NULL && signalNotFound)
 		ereport(ERROR,
@@ -175,7 +175,7 @@ PGFunction
 lookup_external_function(void *filehandle, const char *funcname)
 {
 	/* as above, cast away const for the time being */
-	return (PGFunction) dlsym(filehandle, (char *) funcname);
+	return (PGFunction) dlsym(filehandle, CONST_CAST(char *, funcname));
 }
 
 
diff --git a/src/include/c.h b/src/include/c.h
index 4a757bc8ea..6341b2512f 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -208,6 +208,12 @@
 #define unlikely(x) ((x) != 0)
 #endif
 
+/*
+ * Safer casting -- use this for casting away const or volatile.  It ensures
+ * that the source and target types are otherwise compatible.
+ */
+#define CONST_CAST(t, x) ((void)((t)0 == (x)), (t)(x))
+
 /*
  * CppAsString
  *		Convert the argument to a string, using the C preprocessor.
diff --git a/src/interfaces/libpq/pqexpbuffer.c b/src/interfaces/libpq/pqexpbuffer.c
index 43c36c3bff..4015b58a08 100644
--- a/src/interfaces/libpq/pqexpbuffer.c
+++ b/src/interfaces/libpq/pqexpbuffer.c
@@ -57,7 +57,7 @@ markPQExpBufferBroken(PQExpBuffer str)
 	 * to put oom_buffer in read-only storage, so that anyone who tries to
 	 * scribble on a broken PQExpBuffer will get a failure.
 	 */
-	str->data = (char *) oom_buffer;
+	str->data = CONST_CAST(char *, oom_buffer);
 	str->len = 0;
 	str->maxlen = 0;
 }
diff --git a/src/port/win32setlocale.c b/src/port/win32setlocale.c
index 0597c2afca..3514b5606a 100644
--- a/src/port/win32setlocale.c
+++ b/src/port/win32setlocale.c
@@ -183,7 +183,7 @@ pgwin32_setlocale(int category, const char *locale)
 	 * forbidden to modify, so casting away the "const" is innocuous.
 	 */
 	if (result)
-		result = (char *) map_locale(locale_map_result, result);
+		result = CONST_CAST(char *, map_locale(locale_map_result, result));
 
 	return result;
 }
-- 
2.19.1

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#40)
Re: Large writable variables

Andres Freund <andres@anarazel.de> writes:

.data 0000000000001180 datetktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o
.data 0000000000000c28 ibmkanji ./src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.o
.data 00000000000003f0 deltatktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o

Hmm. I think these can just be marked const, will investigate.

The rest of those look to be mostly below the threshold of pain ...

regards, tom lane

#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#42)
Re: Large writable variables

I wrote:

Andres Freund <andres@anarazel.de> writes:

.data 0000000000001180 datetktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o
.data 0000000000000c28 ibmkanji ./src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.o
.data 00000000000003f0 deltatktbl ./src/interfaces/ecpg/pgtypeslib/dt_common.o

Hmm. I think these can just be marked const, will investigate.

I pushed fixes for these, but curiously, the ibmkanji change did not
make any difference to section sizes --- AFAICT, my toolchain already
figured out that it could treat that table as const. The dt_common.c
changes are a win though.

regards, tom lane

#44Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#41)
Re: Large writable variables

On 2018-10-17 21:06:13 +0200, Peter Eisentraut wrote:

On 16/10/2018 08:30, Andres Freund wrote:

This just reminded me that a couple times I wanted a cast that casts
away const, but otherwise makes sure the type stays the same. I don't
think there's a way to do that in C, but we can write one that verifies
the cast doesn't do something bad if gcc is used:

#if defined(HAVE__BUILTIN_TYPES_COMPATIBLE_P)
#define unconstify(cst, var) StaticAssertExpr(__builtin_types_compatible_p(__typeof(var), const cst), "wrong cast"), (cst) (var)
#else
#define unconstify(cst, var) ((cst) (var))
#endif

Does anybody besides me see value in adding a cleaned up version of
that?

I've had the attached patch lying around.

Heh.

As you can see there, there are a few places where it could be useful,
but not too many.

Yea.

The name CONST_CAST() is adapted from C++.

Your version with __builtin_types_compatible_p() doesn't work for
casting between char * and const char *, so it wouldn't be very useful.

Huh, why wouldn't it work for char *? Seems to work fine for me?

+/*
+ * Safer casting -- use this for casting away const or volatile.  It ensures
+ * that the source and target types are otherwise compatible.
+ */
+#define CONST_CAST(t, x) ((void)((t)0 == (x)), (t)(x))
+

Has the advantage that it probably works for globals, but OTOH, it only
works correctly for pointers, and it doesn't reliably trigger warnigns /
errors.

Greetings,

Andres Freund

#45Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#44)
Re: Large writable variables

On 17/10/2018 22:02, Andres Freund wrote:

Your version with __builtin_types_compatible_p() doesn't work for
casting between char * and const char *, so it wouldn't be very useful.

Huh, why wouldn't it work for char *? Seems to work fine for me?

__builtin_types_compatible_p(const char *, char *) returns false (0) for me.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#46Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#45)
Re: Large writable variables

Hi,

On 2018-10-17 23:43:31 +0200, Peter Eisentraut wrote:

On 17/10/2018 22:02, Andres Freund wrote:

Your version with __builtin_types_compatible_p() doesn't work for
casting between char * and const char *, so it wouldn't be very useful.

Huh, why wouldn't it work for char *? Seems to work fine for me?

__builtin_types_compatible_p(const char *, char *) returns false (0) for me.

Right, that's why I added a const, inside the macro, to the type
specified in the unconstify argument. So typeof() yields a const char *,
and the return type is specified as char *, and adding a const in the
argument also yields a const char *.

I'd merged my version since, I don't feel particularly attached to it
though...

Greetings,

Andres Freund

#47Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#46)
1 attachment(s)
Re: Large writable variables

On 17/10/2018 23:51, Andres Freund wrote:

__builtin_types_compatible_p(const char *, char *) returns false (0) for me.

Right, that's why I added a const, inside the macro, to the type
specified in the unconstify argument. So typeof() yields a const char *,
and the return type is specified as char *, and adding a const in the
argument also yields a const char *.

Yeah, that works. The C++-inspired version also allowed casting from
not-const to const, which we don't really need.

I'd perhaps change the signature

#define unconstify(underlying_type, var)

because the "var" doesn't actually have to be a variable.

Attached is my previous patch adapted to your macro.

I'm tempted to get rid of the stuff in dfmgr.c and just let the "older
platforms" get the warning.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Apply-unconstify-in-more-places.patchtext/plain; charset=UTF-8; name=v2-0001-Apply-unconstify-in-more-places.patch; x-mac-creator=0; x-mac-type=0Download
From 54693f7201aa4768509c5d6939961b50a9021bcf Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Thu, 18 Oct 2018 22:11:44 +0200
Subject: [PATCH v2] Apply unconstify() in more places

---
 src/backend/utils/adt/json.c    | 4 ++--
 src/backend/utils/adt/misc.c    | 2 +-
 src/backend/utils/adt/varlena.c | 4 ++--
 src/backend/utils/fmgr/dfmgr.c  | 4 ++--
 src/port/win32setlocale.c       | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 6f0fe94d63..f47a498228 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -207,12 +207,12 @@ IsValidJsonNumber(const char *str, int len)
 	 */
 	if (*str == '-')
 	{
-		dummy_lex.input = (char *) str + 1;
+		dummy_lex.input = unconstify(char *, str) + 1;
 		dummy_lex.input_length = len - 1;
 	}
 	else
 	{
-		dummy_lex.input = (char *) str;
+		dummy_lex.input = unconstify(char *, str);
 		dummy_lex.input_length = len;
 	}
 
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 6ea3679835..309eb2935c 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -423,7 +423,7 @@ pg_get_keywords(PG_FUNCTION_ARGS)
 		HeapTuple	tuple;
 
 		/* cast-away-const is ugly but alternatives aren't much better */
-		values[0] = (char *) ScanKeywords[funcctx->call_cntr].name;
+		values[0] = unconstify(char *, ScanKeywords[funcctx->call_cntr].name);
 
 		switch (ScanKeywords[funcctx->call_cntr].category)
 		{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index a5e812d026..0fd3b15748 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -182,7 +182,7 @@ char *
 text_to_cstring(const text *t)
 {
 	/* must cast away the const, unfortunately */
-	text	   *tunpacked = pg_detoast_datum_packed((struct varlena *) t);
+	text	   *tunpacked = pg_detoast_datum_packed(unconstify(text *, t));
 	int			len = VARSIZE_ANY_EXHDR(tunpacked);
 	char	   *result;
 
@@ -213,7 +213,7 @@ void
 text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)
 {
 	/* must cast away the const, unfortunately */
-	text	   *srcunpacked = pg_detoast_datum_packed((struct varlena *) src);
+	text	   *srcunpacked = pg_detoast_datum_packed(unconstify(text *, src));
 	size_t		src_len = VARSIZE_ANY_EXHDR(srcunpacked);
 
 	if (dst_len > 0)
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 4a5cc7cfc7..d200b960d2 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -126,7 +126,7 @@ load_external_function(const char *filename, const char *funcname,
 	 * should declare its second argument as "const char *", but older
 	 * platforms might not, so for the time being we just cast away const.
 	 */
-	retval = (PGFunction) dlsym(lib_handle, (char *) funcname);
+	retval = (PGFunction) dlsym(lib_handle, unconstify(char *, funcname));
 
 	if (retval == NULL && signalNotFound)
 		ereport(ERROR,
@@ -175,7 +175,7 @@ PGFunction
 lookup_external_function(void *filehandle, const char *funcname)
 {
 	/* as above, cast away const for the time being */
-	return (PGFunction) dlsym(filehandle, (char *) funcname);
+	return (PGFunction) dlsym(filehandle, unconstify(char *, funcname));
 }
 
 
diff --git a/src/port/win32setlocale.c b/src/port/win32setlocale.c
index 0597c2afca..a8cf170dd1 100644
--- a/src/port/win32setlocale.c
+++ b/src/port/win32setlocale.c
@@ -183,7 +183,7 @@ pgwin32_setlocale(int category, const char *locale)
 	 * forbidden to modify, so casting away the "const" is innocuous.
 	 */
 	if (result)
-		result = (char *) map_locale(locale_map_result, result);
+		result = unconstify(char *, map_locale(locale_map_result, result));
 
 	return result;
 }

base-commit: e74dd00f53cd6dc1887f76b9672e5f6dcf0fd8a2
-- 
2.19.1

#48Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#47)
Re: Large writable variables

On 18/10/2018 22:17, Peter Eisentraut wrote:

On 17/10/2018 23:51, Andres Freund wrote:

__builtin_types_compatible_p(const char *, char *) returns false (0) for me.

Right, that's why I added a const, inside the macro, to the type
specified in the unconstify argument. So typeof() yields a const char *,
and the return type is specified as char *, and adding a const in the
argument also yields a const char *.

Yeah, that works. The C++-inspired version also allowed casting from
not-const to const, which we don't really need.

Attached is my previous patch adapted to your macro.

Oh, I forgot to mention, your version doesn't work for this code in
pqexpbuffer.c:

str->data = (char *) oom_buffer;

That's probably not a big deal though.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#49Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#48)
Re: Large writable variables

On 2018-10-18 22:20:29 +0200, Peter Eisentraut wrote:

On 18/10/2018 22:17, Peter Eisentraut wrote:

On 17/10/2018 23:51, Andres Freund wrote:

__builtin_types_compatible_p(const char *, char *) returns false (0) for me.

Right, that's why I added a const, inside the macro, to the type
specified in the unconstify argument. So typeof() yields a const char *,
and the return type is specified as char *, and adding a const in the
argument also yields a const char *.

Yeah, that works. The C++-inspired version also allowed casting from
not-const to const, which we don't really need.

Attached is my previous patch adapted to your macro.

Oh, I forgot to mention, your version doesn't work for this code in
pqexpbuffer.c:

str->data = (char *) oom_buffer;

That kind of seems correct ;). Can easily be done via unconstify(char *,
&oom_buffer[0]), if necessary.

That's probably not a big deal though.

Greetings,

Andres Freund

#50Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#47)
Re: Large writable variables

On 2018-10-18 22:17:55 +0200, Peter Eisentraut wrote:

I'd perhaps change the signature

#define unconstify(underlying_type, var)

because the "var" doesn't actually have to be a variable.

Hm, so expr, or what would you use?

Attached is my previous patch adapted to your macro.

I'm tempted to get rid of the stuff in dfmgr.c and just let the "older
platforms" get the warning.

Yea, that works for me.

Are you planning to apply this?

Greetings,

Andres Freund

#51Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andres Freund (#50)
Re: Large writable variables

On 18/10/2018 21:31, Andres Freund wrote:

On 2018-10-18 22:17:55 +0200, Peter Eisentraut wrote:

I'd perhaps change the signature

#define unconstify(underlying_type, var)

because the "var" doesn't actually have to be a variable.

Hm, so expr, or what would you use?

done

Attached is my previous patch adapted to your macro.

I'm tempted to get rid of the stuff in dfmgr.c and just let the "older
platforms" get the warning.

and done

Yea, that works for me.

Are you planning to apply this?

and done

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services