Wanted: jsonb on-disk representation documentation

Started by Heikki Linnakangasover 11 years ago36 messages
#1Heikki Linnakangas
hlinnakangas@vmware.com

I'm reading the new jsonb code, trying to understand the on-disk
representation. And I cannot make heads or tails of it.

My first entry point was jsonb.h. Jsonb struct is the on-disk
representation, so I looked at the comments above that. No help, the
comments are useless for getting an overview picture.

Help, anyone?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#1)
Re: Wanted: jsonb on-disk representation documentation

On May 6, 2014 9:30:15 PM CEST, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I'm reading the new jsonb code, trying to understand the on-disk
representation. And I cannot make heads or tails of it.

My first entry point was jsonb.h. Jsonb struct is the on-disk
representation, so I looked at the comments above that. No help, the
comments are useless for getting an overview picture.

Help, anyone?

Enthusiatically seconded. I've asked for that about three times without much success. If it had been my decision the patch wouldn't have been merged without that and other adjustments.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#2)
Re: Wanted: jsonb on-disk representation documentation

On Tue, May 6, 2014 at 09:48:04PM +0200, Andres Freund wrote:

On May 6, 2014 9:30:15 PM CEST, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I'm reading the new jsonb code, trying to understand the on-disk
representation. And I cannot make heads or tails of it.

My first entry point was jsonb.h. Jsonb struct is the on-disk
representation, so I looked at the comments above that. No help, the
comments are useless for getting an overview picture.

Help, anyone?

Enthusiatically seconded. I've asked for that about three times without much success. If it had been my decision the patch wouldn't have been merged without that and other adjustments.

I also would like to know what the index-everything hash ops does? Does
it index the keys, values, or both?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Peter Geoghegan
pg@heroku.com
In reply to: Bruce Momjian (#3)
Re: Wanted: jsonb on-disk representation documentation

On Tue, May 6, 2014 at 1:06 PM, Bruce Momjian <bruce@momjian.us> wrote:

I also would like to know what the index-everything hash ops does? Does
it index the keys, values, or both?

It indexes both, but it isn't possible to test existence (of a key)
with the hash GIN opclass.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#2)
Re: Wanted: jsonb on-disk representation documentation

On Tue, May 6, 2014 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote:

Enthusiatically seconded. I've asked for that about three times without much success. If it had been my decision the patch wouldn't have been merged without that and other adjustments.

I'm almost certain that the only feedback of yours that I didn't
incorporate was that I didn't change the name of JsonbValue, a
decision I stand by, and also that I didn't add ascii art to
illustrate the on-disk format. I can write a patch that adds the
latter soon.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Oleg Bartunov
obartunov@gmail.com
In reply to: Peter Geoghegan (#4)
Re: Wanted: jsonb on-disk representation documentation

FYI,
http://obartunov.livejournal.com/178495.html

This is hash based gin opclass for hstore with all operators support.
It's pity we had no time to do the same for jsonb, but we may include
it and couple of other opclasses to contrib/jsonx.

Oleg

On Wed, May 7, 2014 at 12:18 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Tue, May 6, 2014 at 1:06 PM, Bruce Momjian <bruce@momjian.us> wrote:

I also would like to know what the index-everything hash ops does? Does
it index the keys, values, or both?

It indexes both, but it isn't possible to test existence (of a key)
with the hash GIN opclass.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#5)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-06 13:30:26 -0700, Peter Geoghegan wrote:

On Tue, May 6, 2014 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote:

Enthusiatically seconded. I've asked for that about three times without much success. If it had been my decision the patch wouldn't have been merged without that and other adjustments.

I'm almost certain that the only feedback of yours that I didn't
incorporate was that I didn't change the name of JsonbValue, a
decision I stand by, and also that I didn't add ascii art to
illustrate the on-disk format. I can write a patch that adds the
latter soon.

That might or might not be true. I don't really remember. Documentation
about the on-disk format is the one thing I am sure about that's not
done.

The reviews I did were really cursory reviews, nothing thorough. There's
large parts of the code (e.g. jsonb_gin.c) I didn't even look at. And
others I don't really understand. I also didn't have time to look at the
later versions. The code did improve, don't get me wrong. Otherwise I'd
have been very vocal about this when committed.
But it's still pretty hard to read/understand code. Which imo is
problematic for a feature touted being absolutely critical for postgres'
success. If other's want a taste, take a peek at
findJsonbValueFromSuperHeader()'s code.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#7)
Re: Wanted: jsonb on-disk representation documentation

On Tue, May 6, 2014 at 3:37 PM, Andres Freund <andres@anarazel.de> wrote:

That might or might not be true. I don't really remember. Documentation
about the on-disk format is the one thing I am sure about that's not
done.

I think it would be best to do that with reference to a concrete
example. As I said, I'll work on a patch.

The reviews I did were really cursory reviews, nothing thorough. There's
large parts of the code (e.g. jsonb_gin.c) I didn't even look at. And
others I don't really understand. I also didn't have time to look at the
later versions. The code did improve, don't get me wrong. Otherwise I'd
have been very vocal about this when committed.
But it's still pretty hard to read/understand code. Which imo is
problematic for a feature touted being absolutely critical for postgres'
success. If other's want a taste, take a peek at
findJsonbValueFromSuperHeader()'s code.

I don't really know what to say to that. Lots of code in Postgres is
complicated, especially if you look at one particular function without
some wider context. Is your objection that the complexity is
incidental rather than essential? If so, how?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@2ndquadrant.com
In reply to: Peter Geoghegan (#8)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-06 15:45:39 -0700, Peter Geoghegan wrote:

I don't really know what to say to that. Lots of code in Postgres is
complicated, especially if you look at one particular function without
some wider context.
Is your objection that the complexity is incidental rather than
essential?

Yes.

If so, how?

If you think the following is a solution of essential complexity in
*new* code for navigating one level down a relatively simple *new*
datastructure - then we have a disconnect that's larger than I am
willing to argue about.
I can live with the argument that this code is what we have; but calling
this only having the "essential complexity" is absurd.

JsonbValue *
findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
uint32 *lowbound, JsonbValue *key)
{
uint32 superheader = *(uint32 *) sheader;
JEntry *array = (JEntry *) (sheader + sizeof(uint32));
int count = (superheader & JB_CMASK);
JsonbValue *result = palloc(sizeof(JsonbValue));

Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

if (flags & JB_FARRAY & superheader)
{
char *data = (char *) (array + (superheader & JB_CMASK));
int i;

for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (JBE_ISNULL(*e) && key->type == jbvNull)
{
result->type = jbvNull;
result->estSize = sizeof(JEntry);
}
else if (JBE_ISSTRING(*e) && key->type == jbvString)
{
result->type = jbvString;
result->val.string.val = data + JBE_OFF(*e);
result->val.string.len = JBE_LEN(*e);
result->estSize = sizeof(JEntry) + result->val.string.len;
}
else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
{
result->type = jbvNumeric;
result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));

result->estSize = 2 * sizeof(JEntry) +
VARSIZE_ANY(result->val.numeric);
}
else if (JBE_ISBOOL(*e) && key->type == jbvBool)
{
result->type = jbvBool;
result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
result->estSize = sizeof(JEntry);
}
else
continue;

if (compareJsonbScalarValue(key, result) == 0)
return result;
}
}
else if (flags & JB_FOBJECT & superheader)
{
/* Since this is an object, account for *Pairs* of Jentrys */
char *data = (char *) (array + (superheader & JB_CMASK) * 2);
uint32 stopLow = lowbound ? *lowbound : 0,
stopMiddle;

/* Object key past by caller must be a string */
Assert(key->type == jbvString);
...

I am not calling for a revert. I am just saying that it's imo below
project standards.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#9)
Re: Wanted: jsonb on-disk representation documentation

On Tue, May 6, 2014 at 5:13 PM, Andres Freund <andres@2ndquadrant.com> wrote:

If you think the following is a solution of essential complexity in
*new* code for navigating one level down a relatively simple *new*
datastructure - then we have a disconnect that's larger than I am
willing to argue about

You omitted the 40 lines of comments above the function.

I can live with the argument that this code is what we have; but calling
this only having the "essential complexity" is absurd.

I did not say that it only had essential complexity; just that your
criticism was vague. What's wrong with this particular code,
precisely? What complexity is incidental? Why?

I think what you're missing here is that
findJsonbValueFromSuperHeader() is useful for testing "existence" -
that's mostly what it does (serve as a worker function for the 3
existence-type operators). It's also used once to do a binary search
for a key when testing containment, ahead of testing a corresponding
value in a pair (a pair within a rhs that we're testing for
containment within an lhs "this" value). Finally, it's also used once
with arrays when testing containment.

Why do I just match the key within findJsonbValueFromSuperHeader() for
objects, and not the key/value pair you ask? Because that's not what
existence is, and it's easier and clearer to have containment of
nested objects and arrays handling by the higher level containment
function (once we find the value from the pair, to pass back to it).
There is a number of things in tension here. A further factor is the
desire to avoid redundant code. Now, I guess you could make the case
that the handling of the JB_ARRAY and JB_OBJECT cases could be broken
out, but that isn't obviously true, since that creates redundancy for
the majority of callers that only care about existence.

If you're suggesting that the JB_ARRAY and JB_OBJECT cases within that
function are redundant, well, they're not; I'm iterating element-wise
for the former and pairwise for the latter. I'm also returning the
value for the former, and the element (which in a certain sense is
equivalent -- the equivalent of an object/pair "value") for the
latter. Note the user-visible definition of existence if you don't
know what I mean.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Peter Geoghegan (#5)
1 attachment(s)
Re: Wanted: jsonb on-disk representation documentation

On 05/06/2014 11:30 PM, Peter Geoghegan wrote:

On Tue, May 6, 2014 at 12:48 PM, Andres Freund <andres@anarazel.de> wrote:

Enthusiatically seconded. I've asked for that about three times without much success. If it had been my decision the patch wouldn't have been merged without that and other adjustments.

I'm almost certain that the only feedback of yours that I didn't
incorporate was that I didn't change the name of JsonbValue, a
decision I stand by, and also that I didn't add ascii art to
illustrate the on-disk format. I can write a patch that adds the
latter soon.

That would be great.

I found the serialization routine, convertJsonb() to be a bit of a mess.
It's maintaining a custom stack of levels, which can be handy if you
need to avoid recursion, but it's also relying on the native stack. And
I didn't understand the point of splitting it into the "walk" and "put"
functions; the division of labor between the two was far from clear
IMHO. I started refactoring that, and ended up with the attached.

One detail that I found scary is that the estSize field in JsonbValue is
not just any rough estimate. It's used ín the allocation of the output
buffer for convertJsonb(), so it has to be large enough or you hit an
assertion or buffer overflow. I believe it was correct as it was, but
that kind of programming is always scary. I refactored the
convertJsonb() function to use a StringInfo buffer instead, and removed
estSize altogether.

This is still work-in-progress, but I thought I'd post this now to let
people know I'm working on it. For example, the StringInfo isn't
actually very well suited for this purpose, it might be better to have a
custom buffer that's enlarged when needed.

For my own sanity, I started writing some docs on the on-disk format.
See the comments in jsonb.h for my understanding of it. I moved around
the structs a bit in jsonb.h, to make the format clearer, but the actual
on-disk format is unchanged.

- Heikki

Attachments:

jsonb-cleanup-1.patchtext/x-diff; name=jsonb-cleanup-1.patchDownload
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index cf5d6f2..8413da7 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -65,7 +65,7 @@ jsonb_recv(PG_FUNCTION_ARGS)
 	if (version == 1)
 		str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 	else
-		elog(ERROR, "Unsupported jsonb version number %d", version);
+		elog(ERROR, "unsupported jsonb version number %d", version);
 
 	return jsonb_from_cstring(str, nbytes);
 }
@@ -249,7 +249,6 @@ jsonb_in_object_field_start(void *pstate, char *fname, bool isnull)
 	v.type = jbvString;
 	v.val.string.len = checkStringLen(strlen(fname));
 	v.val.string.val = pnstrdup(fname, v.val.string.len);
-	v.estSize = sizeof(JEntry) + v.val.string.len;
 
 	_state->res = pushJsonbValue(&_state->parseState, WJB_KEY, &v);
 }
@@ -290,8 +289,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 	JsonbInState *_state = (JsonbInState *) pstate;
 	JsonbValue	v;
 
-	v.estSize = sizeof(JEntry);
-
 	switch (tokentype)
 	{
 
@@ -300,7 +297,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvString;
 			v.val.string.len = checkStringLen(strlen(token));
 			v.val.string.val = pnstrdup(token, v.val.string.len);
-			v.estSize += v.val.string.len;
 			break;
 		case JSON_TOKEN_NUMBER:
 
@@ -312,7 +308,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvNumeric;
 			v.val.numeric = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(token), 0, -1));
 
-			v.estSize += VARSIZE_ANY(v.val.numeric) +sizeof(JEntry) /* alignment */ ;
 			break;
 		case JSON_TOKEN_TRUE:
 			v.type = jbvBool;
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 1caaa4a..49a1d4d 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -37,49 +37,16 @@
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), \
 							 JENTRY_POSMASK))
 
-/*
- * State used while converting an arbitrary JsonbValue into a Jsonb value
- * (4-byte varlena uncompressed representation of a Jsonb)
- *
- * ConvertLevel:  Bookkeeping around particular level when converting.
- */
-typedef struct convertLevel
-{
-	uint32		i;				/* Iterates once per element, or once per pair */
-	uint32	   *header;			/* Pointer to current container header */
-	JEntry	   *meta;			/* This level's metadata */
-	char	   *begin;			/* Pointer into convertState.buffer */
-} convertLevel;
-
-/*
- * convertState:  Overall bookkeeping state for conversion
- */
-typedef struct convertState
-{
-	/* Preallocated buffer in which to form varlena/Jsonb value */
-	Jsonb	   *buffer;
-	/* Pointer into buffer */
-	char	   *ptr;
-
-	/* State for  */
-	convertLevel *allState,		/* Overall state array */
-			   *contPtr;		/* Cur container pointer (in allState) */
-
-	/* Current size of buffer containing allState array */
-	Size		levelSz;
-
-} convertState;
-
 static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static int	lexicalCompareJsonbStringValue(const void *a, const void *b);
-static Size convertJsonb(JsonbValue *val, Jsonb *buffer);
-static inline short addPaddingInt(convertState *cstate);
-static void walkJsonbValueConversion(JsonbValue *val, convertState *cstate,
-						 uint32 nestlevel);
-static void putJsonbValueConversion(convertState *cstate, JsonbValue *val,
-						uint32 flags, uint32 level);
-static void putScalarConversion(convertState *cstate, JsonbValue *scalarVal,
-					uint32 level, uint32 i);
+static Jsonb *convertJsonb(JsonbValue *val);
+static inline short addPaddingInt(StringInfo buffer);
+static void walkJsonbValueConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void walkJsonbArrayConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void walkJsonbObjectConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void putScalarConversion(StringInfo buffer, JsonbValue *scalarVal,
+								JEntry *header);
+static int reserveStringInfo(StringInfo str, int datalen);
 static void iteratorFromContainerBuf(JsonbIterator *it, char *buffer);
 static bool formIterIsContainer(JsonbIterator **it, JsonbValue *val,
 					JEntry *ent, bool skipNested);
@@ -110,7 +77,6 @@ Jsonb *
 JsonbValueToJsonb(JsonbValue *val)
 {
 	Jsonb	   *out;
-	Size		sz;
 
 	if (IsAJsonbScalar(val))
 	{
@@ -127,17 +93,11 @@ JsonbValueToJsonb(JsonbValue *val)
 		pushJsonbValue(&pstate, WJB_ELEM, val);
 		res = pushJsonbValue(&pstate, WJB_END_ARRAY, NULL);
 
-		out = palloc(VARHDRSZ + res->estSize);
-		sz = convertJsonb(res, out);
-		Assert(sz <= res->estSize);
-		SET_VARSIZE(out, sz + VARHDRSZ);
+		out = convertJsonb(res);
 	}
 	else if (val->type == jbvObject || val->type == jbvArray)
 	{
-		out = palloc(VARHDRSZ + val->estSize);
-		sz = convertJsonb(val, out);
-		Assert(sz <= val->estSize);
-		SET_VARSIZE(out, VARHDRSZ + sz);
+		out = convertJsonb(val);
 	}
 	else
 	{
@@ -337,28 +297,22 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 			if (JBE_ISNULL(*e) && key->type == jbvNull)
 			{
 				result->type = jbvNull;
-				result->estSize = sizeof(JEntry);
 			}
 			else if (JBE_ISSTRING(*e) && key->type == jbvString)
 			{
 				result->type = jbvString;
 				result->val.string.val = data + JBE_OFF(*e);
 				result->val.string.len = JBE_LEN(*e);
-				result->estSize = sizeof(JEntry) + result->val.string.len;
 			}
 			else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
 			{
 				result->type = jbvNumeric;
 				result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
-
-				result->estSize = 2 * sizeof(JEntry) +
-					VARSIZE_ANY(result->val.numeric);
 			}
 			else if (JBE_ISBOOL(*e) && key->type == jbvBool)
 			{
 				result->type = jbvBool;
 				result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
-				result->estSize = sizeof(JEntry);
 			}
 			else
 				continue;
@@ -395,7 +349,6 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 			candidate.type = jbvString;
 			candidate.val.string.val = data + JBE_OFF(*entry);
 			candidate.val.string.len = JBE_LEN(*entry);
-			candidate.estSize = sizeof(JEntry) + candidate.val.string.len;
 
 			difference = lengthCompareJsonbStringValue(&candidate, key, NULL);
 
@@ -410,28 +363,22 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 				if (JBE_ISNULL(*v))
 				{
 					result->type = jbvNull;
-					result->estSize = sizeof(JEntry);
 				}
 				else if (JBE_ISSTRING(*v))
 				{
 					result->type = jbvString;
 					result->val.string.val = data + JBE_OFF(*v);
 					result->val.string.len = JBE_LEN(*v);
-					result->estSize = sizeof(JEntry) + result->val.string.len;
 				}
 				else if (JBE_ISNUMERIC(*v))
 				{
 					result->type = jbvNumeric;
 					result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*v)));
-
-					result->estSize = 2 * sizeof(JEntry) +
-						VARSIZE_ANY(result->val.numeric);
 				}
 				else if (JBE_ISBOOL(*v))
 				{
 					result->type = jbvBool;
 					result->val.boolean = JBE_ISBOOL_TRUE(*v) != 0;
-					result->estSize = sizeof(JEntry);
 				}
 				else
 				{
@@ -443,7 +390,6 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 					result->val.binary.data = data + INTALIGN(JBE_OFF(*v));
 					result->val.binary.len = JBE_LEN(*v) -
 						(INTALIGN(JBE_OFF(*v)) - JBE_OFF(*v));
-					result->estSize = 2 * sizeof(JEntry) + result->val.binary.len;
 				}
 
 				return result;
@@ -500,34 +446,28 @@ getIthJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 i)
 	if (JBE_ISNULL(*e))
 	{
 		result->type = jbvNull;
-		result->estSize = sizeof(JEntry);
 	}
 	else if (JBE_ISSTRING(*e))
 	{
 		result->type = jbvString;
 		result->val.string.val = data + JBE_OFF(*e);
 		result->val.string.len = JBE_LEN(*e);
-		result->estSize = sizeof(JEntry) + result->val.string.len;
 	}
 	else if (JBE_ISNUMERIC(*e))
 	{
 		result->type = jbvNumeric;
 		result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
-
-		result->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(result->val.numeric);
 	}
 	else if (JBE_ISBOOL(*e))
 	{
 		result->type = jbvBool;
 		result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
-		result->estSize = sizeof(JEntry);
 	}
 	else
 	{
 		result->type = jbvBinary;
 		result->val.binary.data = data + INTALIGN(JBE_OFF(*e));
 		result->val.binary.len = JBE_LEN(*e) - (INTALIGN(JBE_OFF(*e)) - JBE_OFF(*e));
-		result->estSize = result->val.binary.len + 2 * sizeof(JEntry);
 	}
 
 	return result;
@@ -558,7 +498,6 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 			*pstate = pushState(pstate);
 			result = &(*pstate)->contVal;
 			(*pstate)->contVal.type = jbvArray;
-			(*pstate)->contVal.estSize = 3 * sizeof(JEntry);
 			(*pstate)->contVal.val.array.nElems = 0;
 			(*pstate)->contVal.val.array.rawScalar = (scalarVal &&
 											 scalarVal->val.array.rawScalar);
@@ -580,7 +519,6 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 			*pstate = pushState(pstate);
 			result = &(*pstate)->contVal;
 			(*pstate)->contVal.type = jbvObject;
-			(*pstate)->contVal.estSize = 3 * sizeof(JEntry);
 			(*pstate)->contVal.val.object.nPairs = 0;
 			(*pstate)->size = 4;
 			(*pstate)->contVal.val.object.pairs = palloc(sizeof(JsonbPair) *
@@ -1209,244 +1147,209 @@ lexicalCompareJsonbStringValue(const void *a, const void *b)
 					  vb->val.string.len, DEFAULT_COLLATION_OID);
 }
 
+
 /*
- * Given a JsonbValue, convert to Jsonb and store in preallocated Jsonb buffer
- * sufficiently large to fit the value
+ * Reserve 'datalen' bytes at the end of StringInfo, enlarging the underlying
+ * buffer if necessary. Returns the offset to the reserved area.
  */
-static Size
-convertJsonb(JsonbValue *val, Jsonb *buffer)
+static int
+reserveStringInfo(StringInfo str, int datalen)
 {
-	convertState state;
-	Size		len;
+	int		offset;
 
-	/* Should not already have binary representation */
-	Assert(val->type != jbvBinary);
+	/* Make more room if needed */
+	enlargeStringInfo(str, datalen);
 
-	state.buffer = buffer;
-	/* Start from superheader */
-	state.ptr = VARDATA(state.buffer);
-	state.levelSz = 8;
-	state.allState = palloc(sizeof(convertLevel) * state.levelSz);
+	/* remember current offset */
+	offset = str->len;
 
-	walkJsonbValueConversion(val, &state, 0);
+	/* reserve the space */
+	str->len += datalen;
 
-	len = state.ptr - VARDATA(state.buffer);
+	/* XXX: we don't bother with the trailing null that normal StringInfo
+	 * functions append */
 
-	Assert(len <= val->estSize);
-	return len;
+	return offset;
 }
 
 /*
- * Walk the tree representation of Jsonb, as part of the process of converting
- * a JsonbValue to a Jsonb.
- *
- * This high-level function takes care of recursion into sub-containers, but at
- * the top level calls putJsonbValueConversion once per sequential processing
- * token (in a manner similar to generic iteration).
+ * Given a JsonbValue, convert to Jsonb. The result is palloc'd.
  */
-static void
-walkJsonbValueConversion(JsonbValue *val, convertState *cstate,
-						 uint32 nestlevel)
+static Jsonb *
+convertJsonb(JsonbValue *val)
 {
-	int			i;
+	StringInfoData buffer;
+	JEntry		header;
+	Jsonb	   *res;
 
-	check_stack_depth();
+	initStringInfo(&buffer);
 
-	if (!val)
-		return;
+	/* Should not already have binary representation */
+	Assert(val->type != jbvBinary);
 
-	switch (val->type)
-	{
-		case jbvArray:
+	/* Reserve the Jsonb header */
+	reserveStringInfo(&buffer, sizeof(VARHDRSZ));
 
-			putJsonbValueConversion(cstate, val, WJB_BEGIN_ARRAY, nestlevel);
-			for (i = 0; i < val->val.array.nElems; i++)
-			{
-				if (IsAJsonbScalar(&val->val.array.elems[i]) ||
-					val->val.array.elems[i].type == jbvBinary)
-					putJsonbValueConversion(cstate, val->val.array.elems + i,
-											WJB_ELEM, nestlevel);
-				else
-					walkJsonbValueConversion(val->val.array.elems + i, cstate,
-											 nestlevel + 1);
-			}
-			putJsonbValueConversion(cstate, val, WJB_END_ARRAY, nestlevel);
+	walkJsonbValueConversion(val, &buffer, &header, 0);
 
-			break;
-		case jbvObject:
+	res = (Jsonb *) buffer.data;
 
-			putJsonbValueConversion(cstate, val, WJB_BEGIN_OBJECT, nestlevel);
-			for (i = 0; i < val->val.object.nPairs; i++)
-			{
-				putJsonbValueConversion(cstate, &val->val.object.pairs[i].key,
-										WJB_KEY, nestlevel);
-
-				if (IsAJsonbScalar(&val->val.object.pairs[i].value) ||
-					val->val.object.pairs[i].value.type == jbvBinary)
-					putJsonbValueConversion(cstate,
-											&val->val.object.pairs[i].value,
-											WJB_VALUE, nestlevel);
-				else
-					walkJsonbValueConversion(&val->val.object.pairs[i].value,
-											 cstate, nestlevel + 1);
-			}
-			putJsonbValueConversion(cstate, val, WJB_END_OBJECT, nestlevel);
+	SET_VARSIZE(res, buffer.len);
 
-			break;
-		default:
-			elog(ERROR, "unknown type of jsonb container");
-	}
+	return res;
 }
 
 /*
- * walkJsonbValueConversion() worker.  Add padding sufficient to int-align our
- * access to conversion buffer.
+ * Convert a single JsonbValue to a Jsonb node. 
+ *
+ * The value is written out to 'buffer'. The JEntry header for this node is
+ * returned in *header. It is filled in with the length of this value, but if
+ * it is stored in an array or an object (which is always, except for the root
+ * node), it is the caller's responsibility to adjust it with the offset
+ * within the container.
+ *
+ * If the value is an array or an object, this recurses. 'level' is only used
+ * for debugging purposes.
  */
-static inline
-short
-addPaddingInt(convertState *cstate)
+static void
+walkJsonbValueConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level)
 {
-	short		padlen,
-				p;
-
-	padlen = INTALIGN(cstate->ptr - VARDATA(cstate->buffer)) -
-		(cstate->ptr - VARDATA(cstate->buffer));
+	check_stack_depth();
 
-	for (p = padlen; p > 0; p--)
-	{
-		*cstate->ptr = '\0';
-		cstate->ptr++;
-	}
+	if (!val)
+		return;
 
-	return padlen;
+	if (IsAJsonbScalar(val) || val->type == jbvBinary)
+		putScalarConversion(buffer, val, header);
+	else if (val->type == jbvArray)
+		walkJsonbArrayConversion(val, buffer, header, level);
+	else if (val->type == jbvObject)
+		walkJsonbObjectConversion(val, buffer, header, level);
+	else
+		elog(ERROR, "unknown type of jsonb container");
 }
 
-/*
- * walkJsonbValueConversion() worker.
- *
- * As part of the process of converting an arbitrary JsonbValue to a Jsonb,
- * copy over an arbitrary individual JsonbValue.  This function may copy any
- * type of value, even containers (Objects/arrays).  However, it is not
- * responsible for recursive aspects of walking the tree (so only top-level
- * Object/array details are handled).
- *
- * No details about their keys/values/elements are handled recursively -
- * rather, the function is called as required for the start of an Object/Array,
- * and the end (i.e.  there is one call per sequential processing WJB_* token).
- */
 static void
-putJsonbValueConversion(convertState *cstate, JsonbValue *val, uint32 flags,
-						uint32 level)
+walkJsonbArrayConversion(JsonbValue *val, StringInfo buffer, JEntry *pheader, int level)
 {
-	if (level == cstate->levelSz)
+	int			offset;
+	int			metaoffset;
+	int			i;
+	int			totallen;
+	JEntry		header;
+
+	/* Initialize pointer into conversion buffer at this level */
+	offset = buffer->len;
+
+	addPaddingInt(buffer);
+
+	/*
+	 * Construct the header Jentry, stored in the beginning of the variable-
+	 * length payload.
+	 */
+	header.header = val->val.array.nElems | JB_FARRAY;
+	if (val->val.array.rawScalar)
 	{
-		cstate->levelSz *= 2;
-		cstate->allState = repalloc(cstate->allState,
-									sizeof(convertLevel) * cstate->levelSz);
+		Assert(val->val.array.nElems == 1);
+		Assert(level == 0);
+		header.header |= JB_FSCALAR;
 	}
 
-	cstate->contPtr = cstate->allState + level;
+	appendBinaryStringInfo(buffer, (char *) &header, sizeof(uint32));
+	/* reserve space for the JEntries of the elements. */
+	metaoffset = reserveStringInfo(buffer, sizeof(JEntry) * val->val.array.nElems);
 
-	if (flags & (WJB_BEGIN_ARRAY | WJB_BEGIN_OBJECT))
+	totallen = 0;
+	for (i = 0; i < val->val.array.nElems; i++)
 	{
-		Assert(((flags & WJB_BEGIN_ARRAY) && val->type == jbvArray) ||
-			   ((flags & WJB_BEGIN_OBJECT) && val->type == jbvObject));
+		JsonbValue *elem = &val->val.array.elems[i];
+		int len;
+		JEntry meta;
+
+		walkJsonbValueConversion(elem, buffer, &meta, level + 1);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+		if (i == 0)
+			meta.header |= JENTRY_ISFIRST;
+		else
+			meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * i], &meta, sizeof(JEntry));
+	}
 
-		/* Initialize pointer into conversion buffer at this level */
-		cstate->contPtr->begin = cstate->ptr;
+	totallen = buffer->len - offset;
 
-		addPaddingInt(cstate);
+	/* Initialize the header of this node, in the container's JEntry array */
+	pheader->header = JENTRY_ISNEST | totallen;
+}
 
-		/* Initialize everything else at this level */
-		cstate->contPtr->header = (uint32 *) cstate->ptr;
-		/* Advance past header */
-		cstate->ptr += sizeof(uint32);
-		cstate->contPtr->meta = (JEntry *) cstate->ptr;
-		cstate->contPtr->i = 0;
+static void
+walkJsonbObjectConversion(JsonbValue *val, StringInfo buffer, JEntry *pheader, int level)
+{
+	JEntry		header;
+	int			offset;
+	int			metaoffset;
+	int			i;
+	int			totallen;
 
-		if (val->type == jbvArray)
-		{
-			*cstate->contPtr->header = val->val.array.nElems | JB_FARRAY;
-			cstate->ptr += sizeof(JEntry) * val->val.array.nElems;
+	/* Initialize pointer into conversion buffer at this level */
+	offset = buffer->len;
 
-			if (val->val.array.rawScalar)
-			{
-				Assert(val->val.array.nElems == 1);
-				Assert(level == 0);
-				*cstate->contPtr->header |= JB_FSCALAR;
-			}
-		}
-		else
-		{
-			*cstate->contPtr->header = val->val.object.nPairs | JB_FOBJECT;
-			cstate->ptr += sizeof(JEntry) * val->val.object.nPairs * 2;
-		}
-	}
-	else if (flags & WJB_ELEM)
-	{
-		putScalarConversion(cstate, val, level, cstate->contPtr->i);
-		cstate->contPtr->i++;
-	}
-	else if (flags & WJB_KEY)
-	{
-		Assert(val->type == jbvString);
+	addPaddingInt(buffer);
 
-		putScalarConversion(cstate, val, level, cstate->contPtr->i * 2);
-	}
-	else if (flags & WJB_VALUE)
-	{
-		putScalarConversion(cstate, val, level, cstate->contPtr->i * 2 + 1);
-		cstate->contPtr->i++;
-	}
-	else if (flags & (WJB_END_ARRAY | WJB_END_OBJECT))
-	{
-		convertLevel *prevPtr;	/* Prev container pointer */
-		uint32		len,
-					i;
+	/* Initialize header */
+	header.header = val->val.object.nPairs | JB_FOBJECT;
+	appendBinaryStringInfo(buffer, (char *) &header, sizeof(uint32));
 
-		Assert(((flags & WJB_END_ARRAY) && val->type == jbvArray) ||
-			   ((flags & WJB_END_OBJECT) && val->type == jbvObject));
+	/* reserve space for the JEntries of the keys and values */
+	metaoffset = reserveStringInfo(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);
 
-		if (level == 0)
-			return;
+	totallen = 0;
+	for (i = 0; i < val->val.object.nPairs; i++)
+	{
+		JsonbPair *pair = &val->val.object.pairs[i];
+		int len;
+		JEntry meta;
 
-		len = cstate->ptr - (char *) cstate->contPtr->begin;
+		/* put key */
+		putScalarConversion(buffer, &pair->key, &meta);
 
-		prevPtr = cstate->contPtr - 1;
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+		if (i == 0)
+			meta.header |= JENTRY_ISFIRST;
+		else
+			meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * (i * 2)], &meta, sizeof(JEntry));
+
+		walkJsonbValueConversion(&pair->value, buffer, &meta, level);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+		meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * (i * 2 + 1)], &meta, sizeof(JEntry));
+	}
 
-		if (*prevPtr->header & JB_FARRAY)
-		{
-			i = prevPtr->i;
+	totallen = buffer->len - offset;
 
-			prevPtr->meta[i].header = JENTRY_ISNEST;
+	pheader->header = JENTRY_ISNEST | totallen;
+}
 
-			if (i == 0)
-				prevPtr->meta[0].header |= JENTRY_ISFIRST | len;
-			else
-				prevPtr->meta[i].header |=
-					(prevPtr->meta[i - 1].header & JENTRY_POSMASK) + len;
-		}
-		else if (*prevPtr->header & JB_FOBJECT)
-		{
-			i = 2 * prevPtr->i + 1;		/* Value, not key */
+/*
+ * Append padding, so that the length of the StringInfo is int-aligned.
+ * Returns the number of padding bytes appended.
+ */
+static inline
+short
+addPaddingInt(StringInfo buffer)
+{
+	short		padlen,
+				p;
 
-			prevPtr->meta[i].header = JENTRY_ISNEST;
+	padlen = INTALIGN(buffer->len) - buffer->len;
 
-			prevPtr->meta[i].header |=
-				(prevPtr->meta[i - 1].header & JENTRY_POSMASK) + len;
-		}
-		else
-		{
-			elog(ERROR, "invalid jsonb container type");
-		}
+	for (p = 0; p < padlen; p++)
+		appendStringInfoChar(buffer, '\0');
 
-		Assert(cstate->ptr - cstate->contPtr->begin <= val->estSize);
-		prevPtr->i++;
-	}
-	else
-	{
-		elog(ERROR, "unknown flag encountered during jsonb tree walk");
-	}
+	return padlen;
 }
 
 /*
@@ -1456,64 +1359,42 @@ putJsonbValueConversion(convertState *cstate, JsonbValue *val, uint32 flags,
  * This is a worker function for putJsonbValueConversion() (itself a worker for
  * walkJsonbValueConversion()).  It handles the details with regard to Jentry
  * metadata peculiar to each scalar type.
+ *
+ * It is the callers responsibility to shift the offset if this is stored
+ * in an array or object.
  */
 static void
-putScalarConversion(convertState *cstate, JsonbValue *scalarVal, uint32 level,
-					uint32 i)
+putScalarConversion(StringInfo buffer, JsonbValue *scalarVal, JEntry *header)
 {
 	int			numlen;
 	short		padlen;
 
-	cstate->contPtr = cstate->allState + level;
-
-	if (i == 0)
-		cstate->contPtr->meta[0].header = JENTRY_ISFIRST;
-	else
-		cstate->contPtr->meta[i].header = 0;
-
 	switch (scalarVal->type)
 	{
 		case jbvNull:
-			cstate->contPtr->meta[i].header |= JENTRY_ISNULL;
-
-			if (i > 0)
-				cstate->contPtr->meta[i].header |=
-					cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK;
+			header->header = JENTRY_ISNULL;
 			break;
+
 		case jbvString:
-			memcpy(cstate->ptr, scalarVal->val.string.val, scalarVal->val.string.len);
-			cstate->ptr += scalarVal->val.string.len;
+			appendBinaryStringInfo(buffer, scalarVal->val.string.val, scalarVal->val.string.len);
 
-			if (i == 0)
-				cstate->contPtr->meta[0].header |= scalarVal->val.string.len;
-			else
-				cstate->contPtr->meta[i].header |=
-					(cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK) +
-					scalarVal->val.string.len;
+			header->header = scalarVal->val.string.len;
 			break;
+
 		case jbvNumeric:
 			numlen = VARSIZE_ANY(scalarVal->val.numeric);
-			padlen = addPaddingInt(cstate);
+			padlen = addPaddingInt(buffer);
 
-			memcpy(cstate->ptr, scalarVal->val.numeric, numlen);
-			cstate->ptr += numlen;
+			appendBinaryStringInfo(buffer, (char *) scalarVal->val.numeric, numlen);
 
-			cstate->contPtr->meta[i].header |= JENTRY_ISNUMERIC;
-			if (i == 0)
-				cstate->contPtr->meta[0].header |= padlen + numlen;
-			else
-				cstate->contPtr->meta[i].header |=
-					(cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK)
-					+ padlen + numlen;
+			header->header = JENTRY_ISNUMERIC | (padlen + numlen);
 			break;
+
 		case jbvBool:
-			cstate->contPtr->meta[i].header |= (scalarVal->val.boolean) ?
+			header->header = (scalarVal->val.boolean) ?
 				JENTRY_ISTRUE : JENTRY_ISFALSE;
-
-			if (i > 0)
-				cstate->contPtr->meta[i].header |=
-					cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK;
 			break;
+
 		default:
 			elog(ERROR, "invalid jsonb scalar type");
 	}
@@ -1584,7 +1465,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 	if (JBE_ISNULL(*ent))
 	{
 		val->type = jbvNull;
-		val->estSize = sizeof(JEntry);
 
 		return false;
 	}
@@ -1593,7 +1473,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvString;
 		val->val.string.val = (*it)->dataProper + JBE_OFF(*ent);
 		val->val.string.len = JBE_LEN(*ent);
-		val->estSize = sizeof(JEntry) + val->val.string.len;
 
 		return false;
 	}
@@ -1602,15 +1481,12 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvNumeric;
 		val->val.numeric = (Numeric) ((*it)->dataProper + INTALIGN(JBE_OFF(*ent)));
 
-		val->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(val->val.numeric);
-
 		return false;
 	}
 	else if (JBE_ISBOOL(*ent))
 	{
 		val->type = jbvBool;
 		val->val.boolean = JBE_ISBOOL_TRUE(*ent) != 0;
-		val->estSize = sizeof(JEntry);
 
 		return false;
 	}
@@ -1619,7 +1495,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvBinary;
 		val->val.binary.data = (*it)->dataProper + INTALIGN(JBE_OFF(*ent));
 		val->val.binary.len = JBE_LEN(*ent) - (INTALIGN(JBE_OFF(*ent)) - JBE_OFF(*ent));
-		val->estSize = val->val.binary.len + 2 * sizeof(JEntry);
 
 		return false;
 	}
@@ -1694,8 +1569,6 @@ appendKey(JsonbParseState *pstate, JsonbValue *string)
 
 	object->val.object.pairs[object->val.object.nPairs].key = *string;
 	object->val.object.pairs[object->val.object.nPairs].order = object->val.object.nPairs;
-
-	object->estSize += string->estSize;
 }
 
 /*
@@ -1710,7 +1583,6 @@ appendValue(JsonbParseState *pstate, JsonbValue *scalarVal)
 	Assert(object->type == jbvObject);
 
 	object->val.object.pairs[object->val.object.nPairs++].value = *scalarVal;
-	object->estSize += scalarVal->estSize;
 }
 
 /*
@@ -1737,7 +1609,6 @@ appendElement(JsonbParseState *pstate, JsonbValue *scalarVal)
 	}
 
 	array->val.array.elems[array->val.array.nElems++] = *scalarVal;
-	array->estSize += scalarVal->estSize;
 }
 
 /*
@@ -1832,11 +1703,7 @@ uniqueifyJsonbObject(JsonbValue *object)
 		while (ptr - object->val.object.pairs < object->val.object.nPairs)
 		{
 			/* Avoid copying over duplicate */
-			if (lengthCompareJsonbStringValue(ptr, res, NULL) == 0)
-			{
-				object->estSize -= ptr->key.estSize + ptr->value.estSize;
-			}
-			else
+			if (lengthCompareJsonbStringValue(ptr, res, NULL) != 0)
 			{
 				res++;
 				if (ptr != res)
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 6b1ce9b..a9fedcb 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -1477,7 +1477,7 @@ each_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 						StringInfo	jtext = makeStringInfo();
 						Jsonb	   *jb = JsonbValueToJsonb(&v);
 
-						(void) JsonbToCString(jtext, VARDATA(jb), 2 * v.estSize);
+						(void) JsonbToCString(jtext, VARDATA(jb), 0);
 						sv = cstring_to_text_with_len(jtext->data, jtext->len);
 					}
 
@@ -1797,7 +1797,7 @@ elements_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 						StringInfo	jtext = makeStringInfo();
 						Jsonb	   *jb = JsonbValueToJsonb(&v);
 
-						(void) JsonbToCString(jtext, VARDATA(jb), 2 * v.estSize);
+						(void) JsonbToCString(jtext, VARDATA(jb), 0);
 						sv = cstring_to_text_with_len(jtext->data, jtext->len);
 					}
 
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index dea64ad..2e62c25 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -16,17 +16,6 @@
 #include "utils/array.h"
 #include "utils/numeric.h"
 
-/*
- * JB_CMASK is used to extract count of items
- *
- * It's not possible to get more than 2^28 items into an Jsonb.
- */
-#define JB_CMASK				0x0FFFFFFF
-
-#define JB_FSCALAR				0x10000000
-#define JB_FOBJECT				0x20000000
-#define JB_FARRAY				0x40000000
-
 /* Get information on varlena Jsonb */
 #define JB_ROOT_COUNT(jbp_)		( *(uint32*) VARDATA(jbp_) & JB_CMASK)
 #define JB_ROOT_IS_SCALAR(jbp_) ( *(uint32*) VARDATA(jbp_) & JB_FSCALAR)
@@ -109,35 +98,88 @@ typedef char *JsonbSuperHeader;
  * representation.  Often, JsonbValues are just shims through which a Jsonb
  * buffer is accessed, but they can also be deep copied and passed around.
  *
- * We have an abstraction called a "superheader".  This is a pointer that
- * conventionally points to the first item after our 4-byte uncompressed
- * varlena header, from which we can read flags using bitwise operations.
+ * Jsonb is a tree structure. Each node in the tree consists of a JEntry
+ * header, and a variable-length content.  The JEntry header indicates what
+ * kind of a node it is, e.g. a string or an array (see JENTRY_IS* macros),
+ * and the offset and length of its variable-length portion within the
+ * container.
+ *
+ * The header and the content of a node are not stored physically together.
+ * Instead, the array or object containing the node has an array that holds
+ * the JEntry headers of all the child nodes, followed by their variable-length
+ * portions.
  *
+ * The root node is an exception; it has no parent array or object that could
+ * hold its JEntry. Hence, there is no Jentry header for the root node.
+ * It is implicitly known that the the root node must be an array or an
+ * object. The content of both an array and an object begins with a uint32
+ * header field containing the number of elements, and an JB_FOBJECT or
+ * JB_FARRAY flag. By peeking into that header, we can determine which it is.
+ * When a naked scalar value needs to be stored as a Jsonb value, what we
+ * actually store is an array with one element, with the flags in the array's
+ * header field set to JB_FSCALAR | JB_FARRAY.
+ *
+ * The variable-length data of a container node, an array or an object,
+ * begins with a uint32 header. It contains the number of child nodes,
+ * and a flag indicating if it's an array or an object (JB_* macros).
+ * An array has one child node for each element, and an object has two
+ * child nodes for each  key-value pair. After the uint32 header, there is
+ * an array of JEntry structs, one for each child node, followed by the
+ * variable-length data of each child.
+ *
+ * To encode the length and offset of the variable-length portion of each
+ * node in a compact way, the JEntry stores only the end offset within the
+ * variable-length portion of the container node. For the first JEntry in the
+ * container's JEntry array, that equals to the length of the node data. For
+ * convenience, the JENTRY_ISFIRST flag is set. The begin offset and length
+ * of the rest of the entries can be calculated using the end offset of the
+ * previous JEntry in the array.
+ *
+ *
+ * Alignment
+ * ---------
+ *
+ * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
+ * the variable-length portion of some node types is aligned to a 4-byte
+ * boundary, while others are not. When alignment is needed, the padding is
+ * in the beginning of the node that requires it. For example, if a numeric
+ * node is stored after a string node, so that the numeric node begins at
+ * offset 3, the variable-length portion of the numeric node will begin with
+ * one padding byte.
+ *
+ * XXX: old comment below; what does this mean?
  * Frequently, we pass a superheader reference to a function, and it doesn't
  * matter if it points to just after the start of a Jsonb, or to a temp buffer.
  */
+
 typedef struct
 {
-	int32		vl_len_;		/* varlena header (do not touch directly!) */
-	uint32		superheader;
-	/* (array of JEntry follows, size determined using uint32 superheader) */
-} Jsonb;
+	uint32		header;			/* Shares some flags with superheader */
+} JEntry;
 
 /*
- * JEntry: there is one of these for each key _and_ value for objects.  Arrays
- * have one per element.
- *
- * The position offset points to the _end_ so that we can get the length by
- * subtraction from the previous entry.  The JENTRY_ISFIRST flag indicates if
- * there is a previous entry.
+ * A jsonb array or object node.
+ * 
+ * An array has one child for each element. An object has two children for
+ * each key/value pair.
  */
+typedef struct JsonbContainer
+{
+	uint32		header;			/* number of children, and flags (JB_* below) */
+	JEntry		children[1];	/* variable length */
+} JsonbContainer;
+
 typedef struct
 {
-	uint32		header;			/* Shares some flags with superheader */
-} JEntry;
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	JsonbContainer root;
+} Jsonb;
 
-#define IsAJsonbScalar(jsonbval)	((jsonbval)->type >= jbvNull && \
-									 (jsonbval)->type <= jbvBool)
+#define JB_CMASK				0x0FFFFFFF
+
+#define JB_FSCALAR				0x10000000
+#define JB_FOBJECT				0x20000000
+#define JB_FARRAY				0x40000000
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -161,8 +203,6 @@ struct JsonbValue
 		jbvBinary
 	}			type;			/* Influences sort order */
 
-	int			estSize;		/* Estimated size of node (including subnodes) */
-
 	union
 	{
 		Numeric numeric;
@@ -194,6 +234,9 @@ struct JsonbValue
 	}			val;
 };
 
+#define IsAJsonbScalar(jsonbval)	((jsonbval)->type >= jbvNull && \
+									 (jsonbval)->type <= jbvBool)
+
 /*
  * Pair within an Object.
  *
#12Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Heikki Linnakangas (#11)
1 attachment(s)
Re: Wanted: jsonb on-disk representation documentation

Continuing the review, I don't like the "superheader" terminology. To
me, "super" implies that there's some other kind of header involved, and
the superheader somehow includes or the parent of that. But it actually
seems to refer to the header field in the beginning of an array or
object value. In essence, "superheader" is used as the common term to
refer to an object that can be an array or an object. I propose that we
change that to "container".

Noticed something funny while looking at the convertJsonb function:

postgres=# select substr(((('["' ||repeat('x', 268435455) || '", "' ||
repeat('y', 2) || '"]')::jsonb)::text), 268435455);
substr

------------------------------------------------------------------------------
-----------------------------------------------------------
xxx",
0.000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000]
(1 row)

Somehow the second string element in the array, "yy", gets turned into a
numeric. The reason is that although we check that the length of a
single string doesn't exceed the maximum of 2^28 that can be stored in
the space reserved for the length in a Jentry, there are no length
checks for the end offset stored there in an array. So if the total size
of elements in an array exceed 2^28, funny things like above happen.

Attached is a WIP patch that fixes the above, renames "superheader" to
"container", and includes the refactorings and cleanup that I posted
earlier today.

- Heikki

Attachments:

jsonb-cleanup-2.patchtext/x-diff; name=jsonb-cleanup-2.patchDownload
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index cf5d6f2..0d667c2 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -33,7 +33,6 @@ static void jsonb_in_array_end(void *pstate);
 static void jsonb_in_object_field_start(void *pstate, char *fname, bool isnull);
 static void jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal);
 static void jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype);
-char	   *JsonbToCString(StringInfo out, char *in, int estimated_len);
 
 /*
  * jsonb type input function
@@ -65,7 +64,7 @@ jsonb_recv(PG_FUNCTION_ARGS)
 	if (version == 1)
 		str = pq_getmsgtext(buf, buf->len - buf->cursor, &nbytes);
 	else
-		elog(ERROR, "Unsupported jsonb version number %d", version);
+		elog(ERROR, "unsupported jsonb version number %d", version);
 
 	return jsonb_from_cstring(str, nbytes);
 }
@@ -79,7 +78,7 @@ jsonb_out(PG_FUNCTION_ARGS)
 	Jsonb	   *jb = PG_GETARG_JSONB(0);
 	char	   *out;
 
-	out = JsonbToCString(NULL, VARDATA(jb), VARSIZE(jb));
+	out = JsonbToCString(NULL, (JsonbContainer *) VARDATA(jb), VARSIZE(jb));
 
 	PG_RETURN_CSTRING(out);
 }
@@ -97,7 +96,7 @@ jsonb_send(PG_FUNCTION_ARGS)
 	StringInfo	jtext = makeStringInfo();
 	int			version = 1;
 
-	(void) JsonbToCString(jtext, VARDATA(jb), VARSIZE(jb));
+	(void) JsonbToCString(jtext, (JsonbContainer *) VARDATA(jb), VARSIZE(jb));
 
 	pq_begintypsend(&buf);
 	pq_sendint(&buf, version, 1);
@@ -130,7 +129,7 @@ jsonb_typeof(PG_FUNCTION_ARGS)
 	{
 		Assert(JB_ROOT_IS_SCALAR(in));
 
-		it = JsonbIteratorInit(VARDATA_ANY(in));
+		it = JsonbIteratorInit(&in->root);
 
 		/*
 		 * A root scalar is stored as an array of one element, so we get the
@@ -249,7 +248,6 @@ jsonb_in_object_field_start(void *pstate, char *fname, bool isnull)
 	v.type = jbvString;
 	v.val.string.len = checkStringLen(strlen(fname));
 	v.val.string.val = pnstrdup(fname, v.val.string.len);
-	v.estSize = sizeof(JEntry) + v.val.string.len;
 
 	_state->res = pushJsonbValue(&_state->parseState, WJB_KEY, &v);
 }
@@ -290,8 +288,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 	JsonbInState *_state = (JsonbInState *) pstate;
 	JsonbValue	v;
 
-	v.estSize = sizeof(JEntry);
-
 	switch (tokentype)
 	{
 
@@ -300,7 +296,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvString;
 			v.val.string.len = checkStringLen(strlen(token));
 			v.val.string.val = pnstrdup(token, v.val.string.len);
-			v.estSize += v.val.string.len;
 			break;
 		case JSON_TOKEN_NUMBER:
 
@@ -312,7 +307,6 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 			v.type = jbvNumeric;
 			v.val.numeric = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(token), 0, -1));
 
-			v.estSize += VARSIZE_ANY(v.val.numeric) +sizeof(JEntry) /* alignment */ ;
 			break;
 		case JSON_TOKEN_TRUE:
 			v.type = jbvBool;
@@ -374,7 +368,7 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
  * if they are converting it to a text* object.
  */
 char *
-JsonbToCString(StringInfo out, JsonbSuperHeader in, int estimated_len)
+JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
 {
 	bool		first = true;
 	JsonbIterator *it;
diff --git a/src/backend/utils/adt/jsonb_gin.c b/src/backend/utils/adt/jsonb_gin.c
index 9f8c178..cc71e5e 100644
--- a/src/backend/utils/adt/jsonb_gin.c
+++ b/src/backend/utils/adt/jsonb_gin.c
@@ -80,7 +80,7 @@ gin_extract_jsonb(PG_FUNCTION_ARGS)
 
 	entries = (Datum *) palloc(sizeof(Datum) * total);
 
-	it = JsonbIteratorInit(VARDATA(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
 	{
@@ -487,7 +487,7 @@ gin_extract_jsonb_hash(PG_FUNCTION_ARGS)
 
 	entries = (Datum *) palloc(sizeof(Datum) * total);
 
-	it = JsonbIteratorInit(VARDATA(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	tail.parent = NULL;
 	tail.hash = 0;
diff --git a/src/backend/utils/adt/jsonb_op.c b/src/backend/utils/adt/jsonb_op.c
index 38bd567..1db6382 100644
--- a/src/backend/utils/adt/jsonb_op.c
+++ b/src/backend/utils/adt/jsonb_op.c
@@ -34,10 +34,10 @@ jsonb_exists(PG_FUNCTION_ARGS)
 	kval.val.string.val = VARDATA_ANY(key);
 	kval.val.string.len = VARSIZE_ANY_EXHDR(key);
 
-	v = findJsonbValueFromSuperHeader(VARDATA(jb),
-									  JB_FOBJECT | JB_FARRAY,
-									  NULL,
-									  &kval);
+	v = findJsonbValueFromContainer(&jb->root,
+									JB_FOBJECT | JB_FARRAY,
+									NULL,
+									&kval);
 
 	PG_RETURN_BOOL(v != NULL);
 }
@@ -66,9 +66,9 @@ jsonb_exists_any(PG_FUNCTION_ARGS)
 	 */
 	for (i = 0; i < arrKey->val.array.nElems; i++)
 	{
-		if (findJsonbValueFromSuperHeader(VARDATA(jb),
-										  JB_FOBJECT | JB_FARRAY,
-										  plowbound,
+		if (findJsonbValueFromContainer(&jb->root,
+										JB_FOBJECT | JB_FARRAY,
+										plowbound,
 										arrKey->val.array.elems + i) != NULL)
 			PG_RETURN_BOOL(true);
 	}
@@ -100,9 +100,9 @@ jsonb_exists_all(PG_FUNCTION_ARGS)
 	 */
 	for (i = 0; i < arrKey->val.array.nElems; i++)
 	{
-		if (findJsonbValueFromSuperHeader(VARDATA(jb),
-										  JB_FOBJECT | JB_FARRAY,
-										  plowbound,
+		if (findJsonbValueFromContainer(&jb->root,
+										JB_FOBJECT | JB_FARRAY,
+										plowbound,
 										arrKey->val.array.elems + i) == NULL)
 			PG_RETURN_BOOL(false);
 	}
@@ -123,8 +123,8 @@ jsonb_contains(PG_FUNCTION_ARGS)
 		JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
 		PG_RETURN_BOOL(false);
 
-	it1 = JsonbIteratorInit(VARDATA(val));
-	it2 = JsonbIteratorInit(VARDATA(tmpl));
+	it1 = JsonbIteratorInit(&val->root);
+	it2 = JsonbIteratorInit(&tmpl->root);
 
 	PG_RETURN_BOOL(JsonbDeepContains(&it1, &it2));
 }
@@ -143,8 +143,8 @@ jsonb_contained(PG_FUNCTION_ARGS)
 		JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
 		PG_RETURN_BOOL(false);
 
-	it1 = JsonbIteratorInit(VARDATA(val));
-	it2 = JsonbIteratorInit(VARDATA(tmpl));
+	it1 = JsonbIteratorInit(&val->root);
+	it2 = JsonbIteratorInit(&tmpl->root);
 
 	PG_RETURN_BOOL(JsonbDeepContains(&it1, &it2));
 }
@@ -156,7 +156,7 @@ jsonb_ne(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	bool		res;
 
-	res = (compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb)) != 0);
+	res = (compareJsonbContainers(&jba->root, &jbb->root) != 0);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -173,7 +173,7 @@ jsonb_lt(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	bool		res;
 
-	res = (compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb)) < 0);
+	res = (compareJsonbContainers(&jba->root, &jbb->root) < 0);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -187,7 +187,7 @@ jsonb_gt(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	bool		res;
 
-	res = (compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb)) > 0);
+	res = (compareJsonbContainers(&jba->root, &jbb->root) > 0);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -201,7 +201,7 @@ jsonb_le(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	bool		res;
 
-	res = (compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb)) <= 0);
+	res = (compareJsonbContainers(&jba->root, &jbb->root) <= 0);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -215,7 +215,7 @@ jsonb_ge(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	bool		res;
 
-	res = (compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb)) >= 0);
+	res = (compareJsonbContainers(&jba->root, &jbb->root) >= 0);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -229,7 +229,7 @@ jsonb_eq(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	bool		res;
 
-	res = (compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb)) == 0);
+	res = (compareJsonbContainers(&jba->root, &jbb->root) == 0);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -243,7 +243,7 @@ jsonb_cmp(PG_FUNCTION_ARGS)
 	Jsonb	   *jbb = PG_GETARG_JSONB(1);
 	int			res;
 
-	res = compareJsonbSuperHeaderValue(VARDATA(jba), VARDATA(jbb));
+	res = compareJsonbContainers(&jba->root, &jbb->root);
 
 	PG_FREE_IF_COPY(jba, 0);
 	PG_FREE_IF_COPY(jbb, 1);
@@ -265,7 +265,7 @@ jsonb_hash(PG_FUNCTION_ARGS)
 	if (JB_ROOT_COUNT(jb) == 0)
 		PG_RETURN_INT32(0);
 
-	it = JsonbIteratorInit(VARDATA(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
 	{
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 1caaa4a..0ab1f37 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -37,50 +37,17 @@
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), \
 							 JENTRY_POSMASK))
 
-/*
- * State used while converting an arbitrary JsonbValue into a Jsonb value
- * (4-byte varlena uncompressed representation of a Jsonb)
- *
- * ConvertLevel:  Bookkeeping around particular level when converting.
- */
-typedef struct convertLevel
-{
-	uint32		i;				/* Iterates once per element, or once per pair */
-	uint32	   *header;			/* Pointer to current container header */
-	JEntry	   *meta;			/* This level's metadata */
-	char	   *begin;			/* Pointer into convertState.buffer */
-} convertLevel;
-
-/*
- * convertState:  Overall bookkeeping state for conversion
- */
-typedef struct convertState
-{
-	/* Preallocated buffer in which to form varlena/Jsonb value */
-	Jsonb	   *buffer;
-	/* Pointer into buffer */
-	char	   *ptr;
-
-	/* State for  */
-	convertLevel *allState,		/* Overall state array */
-			   *contPtr;		/* Cur container pointer (in allState) */
-
-	/* Current size of buffer containing allState array */
-	Size		levelSz;
-
-} convertState;
-
 static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static int	lexicalCompareJsonbStringValue(const void *a, const void *b);
-static Size convertJsonb(JsonbValue *val, Jsonb *buffer);
-static inline short addPaddingInt(convertState *cstate);
-static void walkJsonbValueConversion(JsonbValue *val, convertState *cstate,
-						 uint32 nestlevel);
-static void putJsonbValueConversion(convertState *cstate, JsonbValue *val,
-						uint32 flags, uint32 level);
-static void putScalarConversion(convertState *cstate, JsonbValue *scalarVal,
-					uint32 level, uint32 i);
-static void iteratorFromContainerBuf(JsonbIterator *it, char *buffer);
+static Jsonb *convertJsonb(JsonbValue *val);
+static inline short addPaddingInt(StringInfo buffer);
+static void walkJsonbValueConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void walkJsonbArrayConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void walkJsonbObjectConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level);
+static void putScalarConversion(StringInfo buffer, JsonbValue *scalarVal,
+								JEntry *header);
+static int reserveStringInfo(StringInfo str, int datalen);
+static void iteratorFromContainer(JsonbIterator *it, JsonbContainer *container);
 static bool formIterIsContainer(JsonbIterator **it, JsonbValue *val,
 					JEntry *ent, bool skipNested);
 static JsonbIterator *freeAndGetParent(JsonbIterator *it);
@@ -110,7 +77,6 @@ Jsonb *
 JsonbValueToJsonb(JsonbValue *val)
 {
 	Jsonb	   *out;
-	Size		sz;
 
 	if (IsAJsonbScalar(val))
 	{
@@ -127,17 +93,11 @@ JsonbValueToJsonb(JsonbValue *val)
 		pushJsonbValue(&pstate, WJB_ELEM, val);
 		res = pushJsonbValue(&pstate, WJB_END_ARRAY, NULL);
 
-		out = palloc(VARHDRSZ + res->estSize);
-		sz = convertJsonb(res, out);
-		Assert(sz <= res->estSize);
-		SET_VARSIZE(out, sz + VARHDRSZ);
+		out = convertJsonb(res);
 	}
 	else if (val->type == jbvObject || val->type == jbvArray)
 	{
-		out = palloc(VARHDRSZ + val->estSize);
-		sz = convertJsonb(val, out);
-		Assert(sz <= val->estSize);
-		SET_VARSIZE(out, VARHDRSZ + sz);
+		out = convertJsonb(val);
 	}
 	else
 	{
@@ -161,7 +121,7 @@ JsonbValueToJsonb(JsonbValue *val)
  * memory here.
  */
 int
-compareJsonbSuperHeaderValue(JsonbSuperHeader a, JsonbSuperHeader b)
+compareJsonbContainers(JsonbContainer *a, JsonbContainer *b)
 {
 	JsonbIterator *ita,
 			   *itb;
@@ -288,15 +248,15 @@ compareJsonbSuperHeaderValue(JsonbSuperHeader a, JsonbSuperHeader b)
  *
  * In order to proceed with the search, it is necessary for callers to have
  * both specified an interest in exactly one particular container type with an
- * appropriate flag, as well as having the pointed-to Jsonb superheader be of
+ * appropriate flag, as well as having the pointed-to Jsonb container be of
  * one of those same container types at the top level. (Actually, we just do
  * whichever makes sense to save callers the trouble of figuring it out - at
- * most one can make sense, because the super header either points to an array
- * (possible a "raw scalar" pseudo array) or an object.)
+ * most one can make sense, because the container either points to an array
+ * (possibly a "raw scalar" pseudo array) or an object.)
  *
  * Note that we can return a jbvBinary JsonbValue if this is called on an
  * object, but we never do so on an array.  If the caller asks to look through
- * a container type that is not of the type pointed to by the superheader,
+ * a container type that is not of the type pointed to by the container,
  * immediately fall through and return NULL.  If we cannot find the value,
  * return NULL.  Otherwise, return palloc()'d copy of value.
  *
@@ -309,25 +269,24 @@ compareJsonbSuperHeaderValue(JsonbSuperHeader a, JsonbSuperHeader b)
  * item, picking it up where we left off knowing that the second or subsequent
  * item can not be at a point below the low bound set when the first was found.
  * This is only useful for objects, not arrays (which have a user-defined
- * order), so array superheader Jsonbs should just pass NULL.  Moreover, it's
+ * order), so array container Jsonbs should just pass NULL.  Moreover, it's
  * only useful because we only match object pairs on the basis of their key, so
  * presumably anyone exploiting this is only interested in matching Object keys
  * with a String.  lowbound is given in units of pairs, not underlying values.
  */
 JsonbValue *
-findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
-							  uint32 *lowbound, JsonbValue *key)
+findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
+							uint32 *lowbound, JsonbValue *key)
 {
-	uint32		superheader = *(uint32 *) sheader;
-	JEntry	   *array = (JEntry *) (sheader + sizeof(uint32));
-	int			count = (superheader & JB_CMASK);
+	JEntry	   *array = container->children;
+	int			count = (container->header & JB_CMASK);
 	JsonbValue *result = palloc(sizeof(JsonbValue));
 
 	Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);
 
-	if (flags & JB_FARRAY & superheader)
+	if (flags & JB_FARRAY & container->header)
 	{
-		char	   *data = (char *) (array + (superheader & JB_CMASK));
+		char	   *data = (char *) (array + (container->header & JB_CMASK));
 		int			i;
 
 		for (i = 0; i < count; i++)
@@ -337,28 +296,22 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 			if (JBE_ISNULL(*e) && key->type == jbvNull)
 			{
 				result->type = jbvNull;
-				result->estSize = sizeof(JEntry);
 			}
 			else if (JBE_ISSTRING(*e) && key->type == jbvString)
 			{
 				result->type = jbvString;
 				result->val.string.val = data + JBE_OFF(*e);
 				result->val.string.len = JBE_LEN(*e);
-				result->estSize = sizeof(JEntry) + result->val.string.len;
 			}
 			else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
 			{
 				result->type = jbvNumeric;
 				result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
-
-				result->estSize = 2 * sizeof(JEntry) +
-					VARSIZE_ANY(result->val.numeric);
 			}
 			else if (JBE_ISBOOL(*e) && key->type == jbvBool)
 			{
 				result->type = jbvBool;
 				result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
-				result->estSize = sizeof(JEntry);
 			}
 			else
 				continue;
@@ -367,10 +320,10 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 				return result;
 		}
 	}
-	else if (flags & JB_FOBJECT & superheader)
+	else if (flags & JB_FOBJECT & container->header)
 	{
 		/* Since this is an object, account for *Pairs* of Jentrys */
-		char	   *data = (char *) (array + (superheader & JB_CMASK) * 2);
+		char	   *data = (char *) (array + (container->header & JB_CMASK) * 2);
 		uint32		stopLow = lowbound ? *lowbound : 0,
 					stopMiddle;
 
@@ -395,7 +348,6 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 			candidate.type = jbvString;
 			candidate.val.string.val = data + JBE_OFF(*entry);
 			candidate.val.string.len = JBE_LEN(*entry);
-			candidate.estSize = sizeof(JEntry) + candidate.val.string.len;
 
 			difference = lengthCompareJsonbStringValue(&candidate, key, NULL);
 
@@ -410,28 +362,22 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 				if (JBE_ISNULL(*v))
 				{
 					result->type = jbvNull;
-					result->estSize = sizeof(JEntry);
 				}
 				else if (JBE_ISSTRING(*v))
 				{
 					result->type = jbvString;
 					result->val.string.val = data + JBE_OFF(*v);
 					result->val.string.len = JBE_LEN(*v);
-					result->estSize = sizeof(JEntry) + result->val.string.len;
 				}
 				else if (JBE_ISNUMERIC(*v))
 				{
 					result->type = jbvNumeric;
 					result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*v)));
-
-					result->estSize = 2 * sizeof(JEntry) +
-						VARSIZE_ANY(result->val.numeric);
 				}
 				else if (JBE_ISBOOL(*v))
 				{
 					result->type = jbvBool;
 					result->val.boolean = JBE_ISBOOL_TRUE(*v) != 0;
-					result->estSize = sizeof(JEntry);
 				}
 				else
 				{
@@ -443,7 +389,6 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 					result->val.binary.data = data + INTALIGN(JBE_OFF(*v));
 					result->val.binary.len = JBE_LEN(*v) -
 						(INTALIGN(JBE_OFF(*v)) - JBE_OFF(*v));
-					result->estSize = 2 * sizeof(JEntry) + result->val.binary.len;
 				}
 
 				return result;
@@ -467,67 +412,57 @@ findJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 flags,
 }
 
 /*
- * Get i-th value of Jsonb array from superheader.
+ * Get i-th value of a Jsonb array.
  *
- * Returns palloc()'d copy of value.
+ * Returns palloc()'d copy of the value, or NULL if it does not exist.
  */
 JsonbValue *
-getIthJsonbValueFromSuperHeader(JsonbSuperHeader sheader, uint32 i)
+getIthJsonbValueFromContainer(JsonbContainer *container, uint32 i)
 {
-	uint32		superheader = *(uint32 *) sheader;
 	JsonbValue *result;
-	JEntry	   *array,
-			   *e;
+	JEntry	   *e;
 	char	   *data;
+	uint32		nelements;
 
-	result = palloc(sizeof(JsonbValue));
+	if ((container->header & JB_FARRAY) == 0)
+		elog(ERROR, "not a jsonb array");
 
-	if (i >= (superheader & JB_CMASK))
+	nelements = container->header & JB_CMASK;
+
+	if (i >= nelements)
 		return NULL;
 
-	array = (JEntry *) (sheader + sizeof(uint32));
+	e = &container->children[i];
 
-	if (superheader & JB_FARRAY)
-	{
-		e = array + i;
-		data = (char *) (array + (superheader & JB_CMASK));
-	}
-	else
-	{
-		elog(ERROR, "not a jsonb array");
-	}
+	data = (char *) &container->children[nelements];
+
+	result = palloc(sizeof(JsonbValue));
 
 	if (JBE_ISNULL(*e))
 	{
 		result->type = jbvNull;
-		result->estSize = sizeof(JEntry);
 	}
 	else if (JBE_ISSTRING(*e))
 	{
 		result->type = jbvString;
 		result->val.string.val = data + JBE_OFF(*e);
 		result->val.string.len = JBE_LEN(*e);
-		result->estSize = sizeof(JEntry) + result->val.string.len;
 	}
 	else if (JBE_ISNUMERIC(*e))
 	{
 		result->type = jbvNumeric;
 		result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
-
-		result->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(result->val.numeric);
 	}
 	else if (JBE_ISBOOL(*e))
 	{
 		result->type = jbvBool;
 		result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
-		result->estSize = sizeof(JEntry);
 	}
 	else
 	{
 		result->type = jbvBinary;
 		result->val.binary.data = data + INTALIGN(JBE_OFF(*e));
 		result->val.binary.len = JBE_LEN(*e) - (INTALIGN(JBE_OFF(*e)) - JBE_OFF(*e));
-		result->estSize = result->val.binary.len + 2 * sizeof(JEntry);
 	}
 
 	return result;
@@ -558,7 +493,6 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 			*pstate = pushState(pstate);
 			result = &(*pstate)->contVal;
 			(*pstate)->contVal.type = jbvArray;
-			(*pstate)->contVal.estSize = 3 * sizeof(JEntry);
 			(*pstate)->contVal.val.array.nElems = 0;
 			(*pstate)->contVal.val.array.rawScalar = (scalarVal &&
 											 scalarVal->val.array.rawScalar);
@@ -580,7 +514,6 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 			*pstate = pushState(pstate);
 			result = &(*pstate)->contVal;
 			(*pstate)->contVal.type = jbvObject;
-			(*pstate)->contVal.estSize = 3 * sizeof(JEntry);
 			(*pstate)->contVal.val.object.nPairs = 0;
 			(*pstate)->size = 4;
 			(*pstate)->contVal.val.object.pairs = palloc(sizeof(JsonbPair) *
@@ -635,17 +568,17 @@ pushJsonbValue(JsonbParseState **pstate, int seq, JsonbValue *scalarVal)
 }
 
 /*
- * Given a Jsonb superheader, expand to JsonbIterator to iterate over items
+ * Given a JsonbContainer, expand to JsonbIterator to iterate over items
  * fully expanded to in-memory representation for manipulation.
  *
  * See JsonbIteratorNext() for notes on memory management.
  */
 JsonbIterator *
-JsonbIteratorInit(JsonbSuperHeader sheader)
+JsonbIteratorInit(JsonbContainer *sheader)
 {
 	JsonbIterator *it = palloc(sizeof(JsonbIterator));
 
-	iteratorFromContainerBuf(it, sheader);
+	iteratorFromContainer(it, sheader);
 	it->parent = NULL;
 
 	return it;
@@ -875,10 +808,10 @@ JsonbDeepContains(JsonbIterator **val, JsonbIterator **mContained)
 			Assert(rcont == WJB_KEY);
 
 			/* First, find value by key... */
-			lhsVal = findJsonbValueFromSuperHeader((*val)->buffer,
-												   JB_FOBJECT,
-												   NULL,
-												   &vcontained);
+			lhsVal = findJsonbValueFromContainer((JsonbContainer *) (*val)->buffer,
+												 JB_FOBJECT,
+												 NULL,
+												 &vcontained);
 
 			if (!lhsVal)
 				return false;
@@ -913,8 +846,8 @@ JsonbDeepContains(JsonbIterator **val, JsonbIterator **mContained)
 				Assert(lhsVal->type == jbvBinary);
 				Assert(vcontained.type == jbvBinary);
 
-				nestval = JsonbIteratorInit(lhsVal->val.binary.data);
-				nestContained = JsonbIteratorInit(vcontained.val.binary.data);
+				nestval = JsonbIteratorInit((JsonbContainer *) lhsVal->val.binary.data);
+				nestContained = JsonbIteratorInit((JsonbContainer *) vcontained.val.binary.data);
 
 				/*
 				 * Match "value" side of rhs datum object's pair recursively.
@@ -978,10 +911,10 @@ JsonbDeepContains(JsonbIterator **val, JsonbIterator **mContained)
 
 			if (IsAJsonbScalar(&vcontained))
 			{
-				if (!findJsonbValueFromSuperHeader((*val)->buffer,
-												   JB_FARRAY,
-												   NULL,
-												   &vcontained))
+				if (!findJsonbValueFromContainer((JsonbContainer *) (*val)->buffer,
+												 JB_FARRAY,
+												 NULL,
+												 &vcontained))
 					return false;
 			}
 			else
@@ -1025,8 +958,8 @@ JsonbDeepContains(JsonbIterator **val, JsonbIterator **mContained)
 							   *nestContained;
 					bool		contains;
 
-					nestval = JsonbIteratorInit(lhsConts[i].val.binary.data);
-					nestContained = JsonbIteratorInit(vcontained.val.binary.data);
+					nestval = JsonbIteratorInit((JsonbContainer *) lhsConts[i].val.binary.data);
+					nestContained = JsonbIteratorInit((JsonbContainer *) vcontained.val.binary.data);
 
 					contains = JsonbDeepContains(&nestval, &nestContained);
 
@@ -1209,244 +1142,230 @@ lexicalCompareJsonbStringValue(const void *a, const void *b)
 					  vb->val.string.len, DEFAULT_COLLATION_OID);
 }
 
+
+/*
+ * Reserve 'datalen' bytes at the end of StringInfo, enlarging the underlying
+ * buffer if necessary. Returns the offset to the reserved area.
+ */
+static int
+reserveStringInfo(StringInfo str, int datalen)
+{
+	int		offset;
+
+	/* Make more room if needed */
+	enlargeStringInfo(str, datalen);
+
+	/* remember current offset */
+	offset = str->len;
+
+	/* reserve the space */
+	str->len += datalen;
+
+	/* XXX: we don't bother with the trailing null that normal StringInfo
+	 * functions append */
+
+	return offset;
+}
+
 /*
- * Given a JsonbValue, convert to Jsonb and store in preallocated Jsonb buffer
- * sufficiently large to fit the value
+ * Given a JsonbValue, convert to Jsonb. The result is palloc'd.
  */
-static Size
-convertJsonb(JsonbValue *val, Jsonb *buffer)
+static Jsonb *
+convertJsonb(JsonbValue *val)
 {
-	convertState state;
-	Size		len;
+	StringInfoData buffer;
+	JEntry		header;
+	Jsonb	   *res;
+
+	initStringInfo(&buffer);
 
 	/* Should not already have binary representation */
 	Assert(val->type != jbvBinary);
 
-	state.buffer = buffer;
-	/* Start from superheader */
-	state.ptr = VARDATA(state.buffer);
-	state.levelSz = 8;
-	state.allState = palloc(sizeof(convertLevel) * state.levelSz);
+	/* Reserve the Jsonb header */
+	reserveStringInfo(&buffer, sizeof(VARHDRSZ));
 
-	walkJsonbValueConversion(val, &state, 0);
+	walkJsonbValueConversion(val, &buffer, &header, 0);
 
-	len = state.ptr - VARDATA(state.buffer);
+	res = (Jsonb *) buffer.data;
 
-	Assert(len <= val->estSize);
-	return len;
+	SET_VARSIZE(res, buffer.len);
+
+	return res;
 }
 
 /*
- * Walk the tree representation of Jsonb, as part of the process of converting
- * a JsonbValue to a Jsonb.
+ * Convert a single JsonbValue to a Jsonb node. 
  *
- * This high-level function takes care of recursion into sub-containers, but at
- * the top level calls putJsonbValueConversion once per sequential processing
- * token (in a manner similar to generic iteration).
+ * The value is written out to 'buffer'. The JEntry header for this node is
+ * returned in *header. It is filled in with the length of this value, but if
+ * it is stored in an array or an object (which is always, except for the root
+ * node), it is the caller's responsibility to adjust it with the offset
+ * within the container.
+ *
+ * If the value is an array or an object, this recurses. 'level' is only used
+ * for debugging purposes.
  */
 static void
-walkJsonbValueConversion(JsonbValue *val, convertState *cstate,
-						 uint32 nestlevel)
+walkJsonbValueConversion(JsonbValue *val, StringInfo buffer, JEntry *header, int level)
 {
-	int			i;
-
 	check_stack_depth();
 
 	if (!val)
 		return;
 
-	switch (val->type)
-	{
-		case jbvArray:
+	if (IsAJsonbScalar(val) || val->type == jbvBinary)
+		putScalarConversion(buffer, val, header);
+	else if (val->type == jbvArray)
+		walkJsonbArrayConversion(val, buffer, header, level);
+	else if (val->type == jbvObject)
+		walkJsonbObjectConversion(val, buffer, header, level);
+	else
+		elog(ERROR, "unknown type of jsonb container");
+}
 
-			putJsonbValueConversion(cstate, val, WJB_BEGIN_ARRAY, nestlevel);
-			for (i = 0; i < val->val.array.nElems; i++)
-			{
-				if (IsAJsonbScalar(&val->val.array.elems[i]) ||
-					val->val.array.elems[i].type == jbvBinary)
-					putJsonbValueConversion(cstate, val->val.array.elems + i,
-											WJB_ELEM, nestlevel);
-				else
-					walkJsonbValueConversion(val->val.array.elems + i, cstate,
-											 nestlevel + 1);
-			}
-			putJsonbValueConversion(cstate, val, WJB_END_ARRAY, nestlevel);
+static void
+walkJsonbArrayConversion(JsonbValue *val, StringInfo buffer, JEntry *pheader, int level)
+{
+	int			offset;
+	int			metaoffset;
+	int			i;
+	int			totallen;
+	JEntry		header;
 
-			break;
-		case jbvObject:
+	/* Initialize pointer into conversion buffer at this level */
+	offset = buffer->len;
 
-			putJsonbValueConversion(cstate, val, WJB_BEGIN_OBJECT, nestlevel);
-			for (i = 0; i < val->val.object.nPairs; i++)
-			{
-				putJsonbValueConversion(cstate, &val->val.object.pairs[i].key,
-										WJB_KEY, nestlevel);
-
-				if (IsAJsonbScalar(&val->val.object.pairs[i].value) ||
-					val->val.object.pairs[i].value.type == jbvBinary)
-					putJsonbValueConversion(cstate,
-											&val->val.object.pairs[i].value,
-											WJB_VALUE, nestlevel);
-				else
-					walkJsonbValueConversion(&val->val.object.pairs[i].value,
-											 cstate, nestlevel + 1);
-			}
-			putJsonbValueConversion(cstate, val, WJB_END_OBJECT, nestlevel);
+	addPaddingInt(buffer);
 
-			break;
-		default:
-			elog(ERROR, "unknown type of jsonb container");
+	/*
+	 * Construct the header Jentry, stored in the beginning of the variable-
+	 * length payload.
+	 */
+	header.header = val->val.array.nElems | JB_FARRAY;
+	if (val->val.array.rawScalar)
+	{
+		Assert(val->val.array.nElems == 1);
+		Assert(level == 0);
+		header.header |= JB_FSCALAR;
 	}
-}
-
-/*
- * walkJsonbValueConversion() worker.  Add padding sufficient to int-align our
- * access to conversion buffer.
- */
-static inline
-short
-addPaddingInt(convertState *cstate)
-{
-	short		padlen,
-				p;
 
-	padlen = INTALIGN(cstate->ptr - VARDATA(cstate->buffer)) -
-		(cstate->ptr - VARDATA(cstate->buffer));
+	appendBinaryStringInfo(buffer, (char *) &header, sizeof(uint32));
+	/* reserve space for the JEntries of the elements. */
+	metaoffset = reserveStringInfo(buffer, sizeof(JEntry) * val->val.array.nElems);
 
-	for (p = padlen; p > 0; p--)
+	totallen = 0;
+	for (i = 0; i < val->val.array.nElems; i++)
 	{
-		*cstate->ptr = '\0';
-		cstate->ptr++;
+		JsonbValue *elem = &val->val.array.elems[i];
+		int len;
+		JEntry meta;
+
+		walkJsonbValueConversion(elem, buffer, &meta, level + 1);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
+
+		if (totallen > JENTRY_POSMASK)
+			ereport(ERROR,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+							JENTRY_POSMASK)));
+
+		if (i == 0)
+			meta.header |= JENTRY_ISFIRST;
+		else
+			meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * i], &meta, sizeof(JEntry));
 	}
 
-	return padlen;
+	totallen = buffer->len - offset;
+
+	/* Initialize the header of this node, in the container's JEntry array */
+	pheader->header = JENTRY_ISNEST | totallen;
 }
 
-/*
- * walkJsonbValueConversion() worker.
- *
- * As part of the process of converting an arbitrary JsonbValue to a Jsonb,
- * copy over an arbitrary individual JsonbValue.  This function may copy any
- * type of value, even containers (Objects/arrays).  However, it is not
- * responsible for recursive aspects of walking the tree (so only top-level
- * Object/array details are handled).
- *
- * No details about their keys/values/elements are handled recursively -
- * rather, the function is called as required for the start of an Object/Array,
- * and the end (i.e.  there is one call per sequential processing WJB_* token).
- */
 static void
-putJsonbValueConversion(convertState *cstate, JsonbValue *val, uint32 flags,
-						uint32 level)
+walkJsonbObjectConversion(JsonbValue *val, StringInfo buffer, JEntry *pheader, int level)
 {
-	if (level == cstate->levelSz)
-	{
-		cstate->levelSz *= 2;
-		cstate->allState = repalloc(cstate->allState,
-									sizeof(convertLevel) * cstate->levelSz);
-	}
+	uint32		header;
+	int			offset;
+	int			metaoffset;
+	int			i;
+	int			totallen;
 
-	cstate->contPtr = cstate->allState + level;
+	/* Initialize pointer into conversion buffer at this level */
+	offset = buffer->len;
 
-	if (flags & (WJB_BEGIN_ARRAY | WJB_BEGIN_OBJECT))
-	{
-		Assert(((flags & WJB_BEGIN_ARRAY) && val->type == jbvArray) ||
-			   ((flags & WJB_BEGIN_OBJECT) && val->type == jbvObject));
+	addPaddingInt(buffer);
 
-		/* Initialize pointer into conversion buffer at this level */
-		cstate->contPtr->begin = cstate->ptr;
+	/* Initialize header */
+	header = val->val.object.nPairs | JB_FOBJECT;
+	appendBinaryStringInfo(buffer, (char *) &header, sizeof(uint32));
 
-		addPaddingInt(cstate);
+	/* reserve space for the JEntries of the keys and values */
+	metaoffset = reserveStringInfo(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);
 
-		/* Initialize everything else at this level */
-		cstate->contPtr->header = (uint32 *) cstate->ptr;
-		/* Advance past header */
-		cstate->ptr += sizeof(uint32);
-		cstate->contPtr->meta = (JEntry *) cstate->ptr;
-		cstate->contPtr->i = 0;
+	totallen = 0;
+	for (i = 0; i < val->val.object.nPairs; i++)
+	{
+		JsonbPair *pair = &val->val.object.pairs[i];
+		int len;
+		JEntry meta;
 
-		if (val->type == jbvArray)
-		{
-			*cstate->contPtr->header = val->val.array.nElems | JB_FARRAY;
-			cstate->ptr += sizeof(JEntry) * val->val.array.nElems;
+		/* put key */
+		putScalarConversion(buffer, &pair->key, &meta);
 
-			if (val->val.array.rawScalar)
-			{
-				Assert(val->val.array.nElems == 1);
-				Assert(level == 0);
-				*cstate->contPtr->header |= JB_FSCALAR;
-			}
-		}
-		else
-		{
-			*cstate->contPtr->header = val->val.object.nPairs | JB_FOBJECT;
-			cstate->ptr += sizeof(JEntry) * val->val.object.nPairs * 2;
-		}
-	}
-	else if (flags & WJB_ELEM)
-	{
-		putScalarConversion(cstate, val, level, cstate->contPtr->i);
-		cstate->contPtr->i++;
-	}
-	else if (flags & WJB_KEY)
-	{
-		Assert(val->type == jbvString);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
 
-		putScalarConversion(cstate, val, level, cstate->contPtr->i * 2);
-	}
-	else if (flags & WJB_VALUE)
-	{
-		putScalarConversion(cstate, val, level, cstate->contPtr->i * 2 + 1);
-		cstate->contPtr->i++;
-	}
-	else if (flags & (WJB_END_ARRAY | WJB_END_OBJECT))
-	{
-		convertLevel *prevPtr;	/* Prev container pointer */
-		uint32		len,
-					i;
+		if (totallen > JENTRY_POSMASK)
+			ereport(ERROR,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+							JENTRY_POSMASK)));
 
-		Assert(((flags & WJB_END_ARRAY) && val->type == jbvArray) ||
-			   ((flags & WJB_END_OBJECT) && val->type == jbvObject));
+		if (i == 0)
+			meta.header |= JENTRY_ISFIRST;
+		else
+			meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * (i * 2)], &meta, sizeof(JEntry));
 
-		if (level == 0)
-			return;
+		walkJsonbValueConversion(&pair->value, buffer, &meta, level);
+		len = meta.header & JENTRY_POSMASK;
+		totallen += len;
 
-		len = cstate->ptr - (char *) cstate->contPtr->begin;
+		if (totallen > JENTRY_POSMASK)
+			ereport(ERROR,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+							JENTRY_POSMASK)));
 
-		prevPtr = cstate->contPtr - 1;
+		meta.header = (meta.header & ~JENTRY_POSMASK) | totallen;
+		memcpy(&buffer->data[metaoffset + sizeof(JEntry) * (i * 2 + 1)], &meta, sizeof(JEntry));
+	}
 
-		if (*prevPtr->header & JB_FARRAY)
-		{
-			i = prevPtr->i;
+	totallen = buffer->len - offset;
 
-			prevPtr->meta[i].header = JENTRY_ISNEST;
+	pheader->header = JENTRY_ISNEST | totallen;
+}
 
-			if (i == 0)
-				prevPtr->meta[0].header |= JENTRY_ISFIRST | len;
-			else
-				prevPtr->meta[i].header |=
-					(prevPtr->meta[i - 1].header & JENTRY_POSMASK) + len;
-		}
-		else if (*prevPtr->header & JB_FOBJECT)
-		{
-			i = 2 * prevPtr->i + 1;		/* Value, not key */
+/*
+ * Append padding, so that the length of the StringInfo is int-aligned.
+ * Returns the number of padding bytes appended.
+ */
+static inline
+short
+addPaddingInt(StringInfo buffer)
+{
+	short		padlen,
+				p;
 
-			prevPtr->meta[i].header = JENTRY_ISNEST;
+	padlen = INTALIGN(buffer->len) - buffer->len;
 
-			prevPtr->meta[i].header |=
-				(prevPtr->meta[i - 1].header & JENTRY_POSMASK) + len;
-		}
-		else
-		{
-			elog(ERROR, "invalid jsonb container type");
-		}
+	for (p = 0; p < padlen; p++)
+		appendStringInfoChar(buffer, '\0');
 
-		Assert(cstate->ptr - cstate->contPtr->begin <= val->estSize);
-		prevPtr->i++;
-	}
-	else
-	{
-		elog(ERROR, "unknown flag encountered during jsonb tree walk");
-	}
+	return padlen;
 }
 
 /*
@@ -1456,84 +1375,59 @@ putJsonbValueConversion(convertState *cstate, JsonbValue *val, uint32 flags,
  * This is a worker function for putJsonbValueConversion() (itself a worker for
  * walkJsonbValueConversion()).  It handles the details with regard to Jentry
  * metadata peculiar to each scalar type.
+ *
+ * It is the callers responsibility to shift the offset if this is stored
+ * in an array or object.
  */
 static void
-putScalarConversion(convertState *cstate, JsonbValue *scalarVal, uint32 level,
-					uint32 i)
+putScalarConversion(StringInfo buffer, JsonbValue *scalarVal, JEntry *header)
 {
 	int			numlen;
 	short		padlen;
 
-	cstate->contPtr = cstate->allState + level;
-
-	if (i == 0)
-		cstate->contPtr->meta[0].header = JENTRY_ISFIRST;
-	else
-		cstate->contPtr->meta[i].header = 0;
-
 	switch (scalarVal->type)
 	{
 		case jbvNull:
-			cstate->contPtr->meta[i].header |= JENTRY_ISNULL;
-
-			if (i > 0)
-				cstate->contPtr->meta[i].header |=
-					cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK;
+			header->header = JENTRY_ISNULL;
 			break;
+
 		case jbvString:
-			memcpy(cstate->ptr, scalarVal->val.string.val, scalarVal->val.string.len);
-			cstate->ptr += scalarVal->val.string.len;
+			appendBinaryStringInfo(buffer, scalarVal->val.string.val, scalarVal->val.string.len);
 
-			if (i == 0)
-				cstate->contPtr->meta[0].header |= scalarVal->val.string.len;
-			else
-				cstate->contPtr->meta[i].header |=
-					(cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK) +
-					scalarVal->val.string.len;
+			header->header = scalarVal->val.string.len;
 			break;
+
 		case jbvNumeric:
 			numlen = VARSIZE_ANY(scalarVal->val.numeric);
-			padlen = addPaddingInt(cstate);
+			padlen = addPaddingInt(buffer);
 
-			memcpy(cstate->ptr, scalarVal->val.numeric, numlen);
-			cstate->ptr += numlen;
+			appendBinaryStringInfo(buffer, (char *) scalarVal->val.numeric, numlen);
 
-			cstate->contPtr->meta[i].header |= JENTRY_ISNUMERIC;
-			if (i == 0)
-				cstate->contPtr->meta[0].header |= padlen + numlen;
-			else
-				cstate->contPtr->meta[i].header |=
-					(cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK)
-					+ padlen + numlen;
+			header->header = JENTRY_ISNUMERIC | (padlen + numlen);
 			break;
+
 		case jbvBool:
-			cstate->contPtr->meta[i].header |= (scalarVal->val.boolean) ?
+			header->header = (scalarVal->val.boolean) ?
 				JENTRY_ISTRUE : JENTRY_ISFALSE;
-
-			if (i > 0)
-				cstate->contPtr->meta[i].header |=
-					cstate->contPtr->meta[i - 1].header & JENTRY_POSMASK;
 			break;
+
 		default:
 			elog(ERROR, "invalid jsonb scalar type");
 	}
 }
 
 /*
- * Given superheader pointer into buffer, initialize iterator.  Must be a
- * container type.
+ * Initialize an iterator for iterating all elements in a container.
  */
 static void
-iteratorFromContainerBuf(JsonbIterator *it, JsonbSuperHeader sheader)
+iteratorFromContainer(JsonbIterator *it, JsonbContainer *container)
 {
-	uint32		superheader = *(uint32 *) sheader;
-
-	it->containerType = superheader & (JB_FARRAY | JB_FOBJECT);
-	it->nElems = superheader & JB_CMASK;
-	it->buffer = sheader;
+	it->containerType = container->header & (JB_FARRAY | JB_FOBJECT);
+	it->nElems = container->header & JB_CMASK;
+	it->buffer = (char *) container;
 
 	/* Array starts just after header */
-	it->meta = (JEntry *) (sheader + sizeof(uint32));
+	it->meta = (JEntry *) (container->header + sizeof(uint32));
 	it->state = jbi_start;
 
 	switch (it->containerType)
@@ -1541,7 +1435,7 @@ iteratorFromContainerBuf(JsonbIterator *it, JsonbSuperHeader sheader)
 		case JB_FARRAY:
 			it->dataProper =
 				(char *) it->meta + it->nElems * sizeof(JEntry);
-			it->isScalar = (superheader & JB_FSCALAR) != 0;
+			it->isScalar = (container->header & JB_FSCALAR) != 0;
 			/* This is either a "raw scalar", or an array */
 			Assert(!it->isScalar || it->nElems == 1);
 			break;
@@ -1584,7 +1478,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 	if (JBE_ISNULL(*ent))
 	{
 		val->type = jbvNull;
-		val->estSize = sizeof(JEntry);
 
 		return false;
 	}
@@ -1593,7 +1486,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvString;
 		val->val.string.val = (*it)->dataProper + JBE_OFF(*ent);
 		val->val.string.len = JBE_LEN(*ent);
-		val->estSize = sizeof(JEntry) + val->val.string.len;
 
 		return false;
 	}
@@ -1602,15 +1494,12 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvNumeric;
 		val->val.numeric = (Numeric) ((*it)->dataProper + INTALIGN(JBE_OFF(*ent)));
 
-		val->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(val->val.numeric);
-
 		return false;
 	}
 	else if (JBE_ISBOOL(*ent))
 	{
 		val->type = jbvBool;
 		val->val.boolean = JBE_ISBOOL_TRUE(*ent) != 0;
-		val->estSize = sizeof(JEntry);
 
 		return false;
 	}
@@ -1619,7 +1508,6 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		val->type = jbvBinary;
 		val->val.binary.data = (*it)->dataProper + INTALIGN(JBE_OFF(*ent));
 		val->val.binary.len = JBE_LEN(*ent) - (INTALIGN(JBE_OFF(*ent)) - JBE_OFF(*ent));
-		val->estSize = val->val.binary.len + 2 * sizeof(JEntry);
 
 		return false;
 	}
@@ -1633,8 +1521,8 @@ formIterIsContainer(JsonbIterator **it, JsonbValue *val, JEntry *ent,
 		 */
 		JsonbIterator *child = palloc(sizeof(JsonbIterator));
 
-		iteratorFromContainerBuf(child,
-								 (*it)->dataProper + INTALIGN(JBE_OFF(*ent)));
+		iteratorFromContainer(child,
+							  (JsonbContainer *) ((*it)->dataProper + INTALIGN(JBE_OFF(*ent))));
 
 		child->parent = *it;
 		*it = child;
@@ -1694,8 +1582,6 @@ appendKey(JsonbParseState *pstate, JsonbValue *string)
 
 	object->val.object.pairs[object->val.object.nPairs].key = *string;
 	object->val.object.pairs[object->val.object.nPairs].order = object->val.object.nPairs;
-
-	object->estSize += string->estSize;
 }
 
 /*
@@ -1710,7 +1596,6 @@ appendValue(JsonbParseState *pstate, JsonbValue *scalarVal)
 	Assert(object->type == jbvObject);
 
 	object->val.object.pairs[object->val.object.nPairs++].value = *scalarVal;
-	object->estSize += scalarVal->estSize;
 }
 
 /*
@@ -1737,7 +1622,6 @@ appendElement(JsonbParseState *pstate, JsonbValue *scalarVal)
 	}
 
 	array->val.array.elems[array->val.array.nElems++] = *scalarVal;
-	array->estSize += scalarVal->estSize;
 }
 
 /*
@@ -1832,11 +1716,7 @@ uniqueifyJsonbObject(JsonbValue *object)
 		while (ptr - object->val.object.pairs < object->val.object.nPairs)
 		{
 			/* Avoid copying over duplicate */
-			if (lengthCompareJsonbStringValue(ptr, res, NULL) == 0)
-			{
-				object->estSize -= ptr->key.estSize + ptr->value.estSize;
-			}
-			else
+			if (lengthCompareJsonbStringValue(ptr, res, NULL) != 0)
 			{
 				res++;
 				if (ptr != res)
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 6b1ce9b..ac393ce 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -106,7 +106,7 @@ static inline Datum populate_recordset_worker(FunctionCallInfo fcinfo,
 						  bool have_record_arg);
 
 /* Worker that takes care of common setup for us */
-static JsonbValue *findJsonbValueFromSuperHeaderLen(JsonbSuperHeader sheader,
+static JsonbValue *findJsonbValueFromContainerLen(JsonbContainer *container,
 								 uint32 flags,
 								 char *key,
 								 uint32 keylen);
@@ -286,7 +286,7 @@ jsonb_object_keys(PG_FUNCTION_ARGS)
 		state->sent_count = 0;
 		state->result = palloc(state->result_size * sizeof(char *));
 
-		it = JsonbIteratorInit(VARDATA_ANY(jb));
+		it = JsonbIteratorInit(&jb->root);
 
 		while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 		{
@@ -484,7 +484,7 @@ jsonb_object_field(PG_FUNCTION_ARGS)
 
 	Assert(JB_ROOT_IS_OBJECT(jb));
 
-	it = JsonbIteratorInit(VARDATA_ANY(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 	{
@@ -545,7 +545,7 @@ jsonb_object_field_text(PG_FUNCTION_ARGS)
 
 	Assert(JB_ROOT_IS_OBJECT(jb));
 
-	it = JsonbIteratorInit(VARDATA_ANY(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 	{
@@ -580,7 +580,7 @@ jsonb_object_field_text(PG_FUNCTION_ARGS)
 					StringInfo	jtext = makeStringInfo();
 					Jsonb	   *tjb = JsonbValueToJsonb(&v);
 
-					(void) JsonbToCString(jtext, VARDATA(tjb), -1);
+					(void) JsonbToCString(jtext, &tjb->root , -1);
 					result = cstring_to_text_with_len(jtext->data, jtext->len);
 				}
 				PG_RETURN_TEXT_P(result);
@@ -628,7 +628,7 @@ jsonb_array_element(PG_FUNCTION_ARGS)
 
 	Assert(JB_ROOT_IS_ARRAY(jb));
 
-	it = JsonbIteratorInit(VARDATA_ANY(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 	{
@@ -682,7 +682,7 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
 
 	Assert(JB_ROOT_IS_ARRAY(jb));
 
-	it = JsonbIteratorInit(VARDATA_ANY(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 	{
@@ -711,7 +711,7 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
 					StringInfo	jtext = makeStringInfo();
 					Jsonb	   *tjb = JsonbValueToJsonb(&v);
 
-					(void) JsonbToCString(jtext, VARDATA(tjb), -1);
+					(void) JsonbToCString(jtext, &tjb->root, -1);
 					result = cstring_to_text_with_len(jtext->data, jtext->len);
 				}
 				PG_RETURN_TEXT_P(result);
@@ -1155,7 +1155,7 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 				have_array = false;
 	JsonbValue *jbvp = NULL;
 	JsonbValue	tv;
-	JsonbSuperHeader superHeader;
+	JsonbContainer *container;
 
 	if (array_contains_nulls(path))
 		ereport(ERROR,
@@ -1170,15 +1170,15 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 	else if (JB_ROOT_IS_ARRAY(jb) && !JB_ROOT_IS_SCALAR(jb))
 		have_array = true;
 
-	superHeader = (JsonbSuperHeader) VARDATA(jb);
+	container = &jb->root;
 
 	for (i = 0; i < npath; i++)
 	{
 		if (have_object)
 		{
-			jbvp = findJsonbValueFromSuperHeaderLen(superHeader,
-													JB_FOBJECT,
-													VARDATA_ANY(pathtext[i]),
+			jbvp = findJsonbValueFromContainerLen(container,
+												  JB_FOBJECT,
+												  VARDATA_ANY(pathtext[i]),
 											 VARSIZE_ANY_EXHDR(pathtext[i]));
 		}
 		else if (have_array)
@@ -1192,7 +1192,7 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 			if (*endptr != '\0' || lindex > INT_MAX || lindex < 0)
 				PG_RETURN_NULL();
 			index = (uint32) lindex;
-			jbvp = getIthJsonbValueFromSuperHeader(superHeader, index);
+			jbvp = getIthJsonbValueFromContainer(container, index);
 		}
 		else
 		{
@@ -1210,11 +1210,11 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 
 		if (jbvp->type == jbvBinary)
 		{
-			JsonbIterator *it = JsonbIteratorInit(jbvp->val.binary.data);
+			JsonbIterator *it = JsonbIteratorInit((JsonbContainer *) jbvp->val.binary.data);
 			int			r;
 
 			r = JsonbIteratorNext(&it, &tv, true);
-			superHeader = (JsonbSuperHeader) jbvp->val.binary.data;
+			container = (JsonbContainer *) jbvp->val.binary.data;
 			have_object = r == WJB_BEGIN_OBJECT;
 			have_array = r == WJB_BEGIN_ARRAY;
 		}
@@ -1238,7 +1238,7 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
 	if (as_text)
 	{
 		PG_RETURN_TEXT_P(cstring_to_text(JsonbToCString(NULL,
-														VARDATA(res),
+														&res->root,
 														VARSIZE(res))));
 	}
 	else
@@ -1428,7 +1428,7 @@ each_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 									ALLOCSET_DEFAULT_MAXSIZE);
 
 
-	it = JsonbIteratorInit(VARDATA_ANY(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 	{
@@ -1477,7 +1477,7 @@ each_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 						StringInfo	jtext = makeStringInfo();
 						Jsonb	   *jb = JsonbValueToJsonb(&v);
 
-						(void) JsonbToCString(jtext, VARDATA(jb), 2 * v.estSize);
+						(void) JsonbToCString(jtext, &jb->root, 0);
 						sv = cstring_to_text_with_len(jtext->data, jtext->len);
 					}
 
@@ -1753,7 +1753,7 @@ elements_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 									ALLOCSET_DEFAULT_MAXSIZE);
 
 
-	it = JsonbIteratorInit(VARDATA_ANY(jb));
+	it = JsonbIteratorInit(&jb->root);
 
 	while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 	{
@@ -1797,7 +1797,7 @@ elements_worker_jsonb(FunctionCallInfo fcinfo, bool as_text)
 						StringInfo	jtext = makeStringInfo();
 						Jsonb	   *jb = JsonbValueToJsonb(&v);
 
-						(void) JsonbToCString(jtext, VARDATA(jb), 2 * v.estSize);
+						(void) JsonbToCString(jtext, &jb->root, 0);
 						sv = cstring_to_text_with_len(jtext->data, jtext->len);
 					}
 
@@ -2219,8 +2219,8 @@ populate_record_worker(FunctionCallInfo fcinfo, bool have_record_arg)
 		{
 			char	   *key = NameStr(tupdesc->attrs[i]->attname);
 
-			v = findJsonbValueFromSuperHeaderLen(VARDATA(jb), JB_FOBJECT, key,
-												 strlen(key));
+			v = findJsonbValueFromContainerLen(&jb->root, JB_FOBJECT, key,
+											   strlen(key));
 		}
 
 		/*
@@ -2282,7 +2282,7 @@ populate_record_worker(FunctionCallInfo fcinfo, bool have_record_arg)
 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 							 errmsg("cannot populate with a nested object unless use_json_as_text is true")));
 				else if (v->type == jbvBinary)
-					s = JsonbToCString(NULL, v->val.binary.data, v->val.binary.len);
+					s = JsonbToCString(NULL, (JsonbContainer *) v->val.binary.data, v->val.binary.len);
 				else
 					elog(ERROR, "invalid jsonb type");
 			}
@@ -2529,8 +2529,8 @@ make_row_from_rec_and_jsonb(Jsonb *element, PopulateRecordsetState *state)
 
 		key = NameStr(tupdesc->attrs[i]->attname);
 
-		v = findJsonbValueFromSuperHeaderLen(VARDATA(element), JB_FOBJECT,
-											 key, strlen(key));
+		v = findJsonbValueFromContainerLen(&element->root, JB_FOBJECT,
+										   key, strlen(key));
 
 		/*
 		 * We can't just skip here if the key wasn't found since we might have
@@ -2582,7 +2582,7 @@ make_row_from_rec_and_jsonb(Jsonb *element, PopulateRecordsetState *state)
 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						 errmsg("cannot populate with a nested object unless use_json_as_text is true")));
 			else if (v->type == jbvBinary)
-				s = JsonbToCString(NULL, v->val.binary.data, v->val.binary.len);
+				s = JsonbToCString(NULL, (JsonbContainer *) v->val.binary.data, v->val.binary.len);
 			else
 				elog(ERROR, "invalid jsonb type");
 
@@ -2750,7 +2750,7 @@ populate_recordset_worker(FunctionCallInfo fcinfo, bool have_record_arg)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 			   errmsg("cannot call jsonb_populate_recordset on non-array")));
 
-		it = JsonbIteratorInit(VARDATA_ANY(jb));
+		it = JsonbIteratorInit(&jb->root);
 
 		while ((r = JsonbIteratorNext(&it, &v, skipNested)) != WJB_DONE)
 		{
@@ -3019,11 +3019,11 @@ populate_recordset_object_field_end(void *state, char *fname, bool isnull)
 }
 
 /*
- * findJsonbValueFromSuperHeader() wrapper that sets up JsonbValue key string.
+ * findJsonbValueFromContainer() wrapper that sets up JsonbValue key string.
  */
 static JsonbValue *
-findJsonbValueFromSuperHeaderLen(JsonbSuperHeader sheader, uint32 flags,
-								 char *key, uint32 keylen)
+findJsonbValueFromContainerLen(JsonbContainer *container, uint32 flags,
+							   char *key, uint32 keylen)
 {
 	JsonbValue	k;
 
@@ -3031,5 +3031,5 @@ findJsonbValueFromSuperHeaderLen(JsonbSuperHeader sheader, uint32 flags,
 	k.val.string.val = key;
 	k.val.string.len = keylen;
 
-	return findJsonbValueFromSuperHeader(sheader, flags, NULL, &k);
+	return findJsonbValueFromContainer(container, flags, NULL, &k);
 }
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index dea64ad..14ac2ee 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -16,17 +16,6 @@
 #include "utils/array.h"
 #include "utils/numeric.h"
 
-/*
- * JB_CMASK is used to extract count of items
- *
- * It's not possible to get more than 2^28 items into an Jsonb.
- */
-#define JB_CMASK				0x0FFFFFFF
-
-#define JB_FSCALAR				0x10000000
-#define JB_FOBJECT				0x20000000
-#define JB_FARRAY				0x40000000
-
 /* Get information on varlena Jsonb */
 #define JB_ROOT_COUNT(jbp_)		( *(uint32*) VARDATA(jbp_) & JB_CMASK)
 #define JB_ROOT_IS_SCALAR(jbp_) ( *(uint32*) VARDATA(jbp_) & JB_FSCALAR)
@@ -98,7 +87,6 @@
 
 typedef struct JsonbPair JsonbPair;
 typedef struct JsonbValue JsonbValue;
-typedef char *JsonbSuperHeader;
 
 /*
  * Jsonbs are varlena objects, so must meet the varlena convention that the
@@ -109,35 +97,87 @@ typedef char *JsonbSuperHeader;
  * representation.  Often, JsonbValues are just shims through which a Jsonb
  * buffer is accessed, but they can also be deep copied and passed around.
  *
- * We have an abstraction called a "superheader".  This is a pointer that
- * conventionally points to the first item after our 4-byte uncompressed
- * varlena header, from which we can read flags using bitwise operations.
+ * Jsonb is a tree structure. Each node in the tree consists of a JEntry
+ * header, and a variable-length content.  The JEntry header indicates what
+ * kind of a node it is, e.g. a string or an array (see JENTRY_IS* macros),
+ * and the offset and length of its variable-length portion within the
+ * container.
+ *
+ * The header and the content of a node are not stored physically together.
+ * Instead, the array or object containing the node has an array that holds
+ * the JEntry headers of all the child nodes, followed by their variable-length
+ * portions.
  *
- * Frequently, we pass a superheader reference to a function, and it doesn't
- * matter if it points to just after the start of a Jsonb, or to a temp buffer.
+ * The root node is an exception; it has no parent array or object that could
+ * hold its JEntry. Hence, there is no Jentry header for the root node.
+ * It is implicitly known that the the root node must be an array or an
+ * object. The content of both an array and an object begins with a uint32
+ * header field containing the number of elements, and an JB_FOBJECT or
+ * JB_FARRAY flag. By peeking into that header, we can determine which it is.
+ * When a naked scalar value needs to be stored as a Jsonb value, what we
+ * actually store is an array with one element, with the flags in the array's
+ * header field set to JB_FSCALAR | JB_FARRAY.
+ *
+ * The variable-length data of a container node, an array or an object,
+ * begins with a uint32 header. It contains the number of child nodes,
+ * and a flag indicating if it's an array or an object (JB_* macros).
+ * An array has one child node for each element, and an object has two
+ * child nodes for each  key-value pair. After the uint32 header, there is
+ * an array of JEntry structs, one for each child node, followed by the
+ * variable-length data of each child.
+ *
+ * To encode the length and offset of the variable-length portion of each
+ * node in a compact way, the JEntry stores only the end offset within the
+ * variable-length portion of the container node. For the first JEntry in the
+ * container's JEntry array, that equals to the length of the node data. For
+ * convenience, the JENTRY_ISFIRST flag is set. The begin offset and length
+ * of the rest of the entries can be calculated using the end offset of the
+ * previous JEntry in the array.
+ *
+ *
+ * Alignment
+ * ---------
+ *
+ * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
+ * the variable-length portion of some node types is aligned to a 4-byte
+ * boundary, while others are not. When alignment is needed, the padding is
+ * in the beginning of the node that requires it. For example, if a numeric
+ * node is stored after a string node, so that the numeric node begins at
+ * offset 3, the variable-length portion of the numeric node will begin with
+ * one padding byte.
+ *
+ * Frequently, we pass a JsonbContainer reference to a function, and it doesn't
+ * matter if it points to Jsonb->root, or to a temp buffer.
  */
+
 typedef struct
 {
-	int32		vl_len_;		/* varlena header (do not touch directly!) */
-	uint32		superheader;
-	/* (array of JEntry follows, size determined using uint32 superheader) */
-} Jsonb;
+	uint32		header;			/* See JENTRY_* flags */
+} JEntry;
 
 /*
- * JEntry: there is one of these for each key _and_ value for objects.  Arrays
- * have one per element.
- *
- * The position offset points to the _end_ so that we can get the length by
- * subtraction from the previous entry.  The JENTRY_ISFIRST flag indicates if
- * there is a previous entry.
+ * A jsonb array or object node.
+ * 
+ * An array has one child for each element. An object has two children for
+ * each key/value pair. 
  */
+typedef struct JsonbContainer
+{
+	uint32		header;			/* number of elements, and flags (JB_* below) */
+	JEntry		children[1];	/* variable length */
+} JsonbContainer;
+
 typedef struct
 {
-	uint32		header;			/* Shares some flags with superheader */
-} JEntry;
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	JsonbContainer root;
+} Jsonb;
 
-#define IsAJsonbScalar(jsonbval)	((jsonbval)->type >= jbvNull && \
-									 (jsonbval)->type <= jbvBool)
+#define JB_CMASK				0x0FFFFFFF
+
+#define JB_FSCALAR				0x10000000
+#define JB_FOBJECT				0x20000000
+#define JB_FARRAY				0x40000000
 
 /*
  * JsonbValue:	In-memory representation of Jsonb.  This is a convenient
@@ -161,8 +201,6 @@ struct JsonbValue
 		jbvBinary
 	}			type;			/* Influences sort order */
 
-	int			estSize;		/* Estimated size of node (including subnodes) */
-
 	union
 	{
 		Numeric numeric;
@@ -194,6 +232,9 @@ struct JsonbValue
 	}			val;
 };
 
+#define IsAJsonbScalar(jsonbval)	((jsonbval)->type >= jbvNull && \
+									 (jsonbval)->type <= jbvBool)
+
 /*
  * Pair within an Object.
  *
@@ -294,17 +335,16 @@ extern Datum gin_consistent_jsonb_hash(PG_FUNCTION_ARGS);
 extern Datum gin_triconsistent_jsonb_hash(PG_FUNCTION_ARGS);
 
 /* Support functions */
-extern int compareJsonbSuperHeaderValue(JsonbSuperHeader a,
-							 JsonbSuperHeader b);
-extern JsonbValue *findJsonbValueFromSuperHeader(JsonbSuperHeader sheader,
+extern int compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
+extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
 							  uint32 flags,
 							  uint32 *lowbound,
 							  JsonbValue *key);
-extern JsonbValue *getIthJsonbValueFromSuperHeader(JsonbSuperHeader sheader,
+extern JsonbValue *getIthJsonbValueFromContainer(JsonbContainer *sheader,
 								uint32 i);
 extern JsonbValue *pushJsonbValue(JsonbParseState **pstate, int seq,
 			   JsonbValue *scalarVal);
-extern JsonbIterator *JsonbIteratorInit(JsonbSuperHeader buffer);
+extern JsonbIterator *JsonbIteratorInit(JsonbContainer *container);
 extern int JsonbIteratorNext(JsonbIterator **it, JsonbValue *val,
 				  bool skipNested);
 extern Jsonb *JsonbValueToJsonb(JsonbValue *val);
@@ -314,7 +354,7 @@ extern JsonbValue *arrayToJsonbSortedArray(ArrayType *a);
 extern void JsonbHashScalarValue(const JsonbValue *scalarVal, uint32 *hash);
 
 /* jsonb.c support function */
-extern char *JsonbToCString(StringInfo out, JsonbSuperHeader in,
+extern char *JsonbToCString(StringInfo out, JsonbContainer *in,
 			   int estimated_len);
 
 #endif   /* __JSONB_H__ */
#13Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Heikki Linnakangas (#12)
Re: Wanted: jsonb on-disk representation documentation

So, apart from cleaning up the code, we really need to take a close look
at the on-disk format now. The code can be cleaned up later, too, but
we're going to be stuck with the on-disk format forever, so it's
critical to get that right.

First, a few observations:

* JENTRY_ISFIRST is redundant. Whenever you deal with the Jentry struct,
you know from the context which element in the array it is.

* JENTRY_ISNEST is set but never used.

* JENTRY_ISBOOL is defined as (JENTRY_ISNUMERIC | JENTRY_ISNEST), which
seems confusing.

I'm going to proceed refactoring those things, which will change the
on-disk format. It's late in the release cycle - these things really
should've been cleaned up earlier - but it's important to get the
on-disk format right. Shout if you have any objections.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#13)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 8:20 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

So, apart from cleaning up the code, we really need to take a close look at
the on-disk format now. The code can be cleaned up later, too, but we're
going to be stuck with the on-disk format forever, so it's critical to get
that right.

First, a few observations:

* JENTRY_ISFIRST is redundant. Whenever you deal with the Jentry struct, you
know from the context which element in the array it is.

* JENTRY_ISNEST is set but never used.

* JENTRY_ISBOOL is defined as (JENTRY_ISNUMERIC | JENTRY_ISNEST), which
seems confusing.

I'm going to proceed refactoring those things, which will change the on-disk
format. It's late in the release cycle - these things really should've been
cleaned up earlier - but it's important to get the on-disk format right.
Shout if you have any objections.

+1. It is saner to do that now than never.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#13)
Re: Wanted: jsonb on-disk representation documentation

Hi,

On 2014-05-07 14:20:19 +0300, Heikki Linnakangas wrote:

So, apart from cleaning up the code, we really need to take a close look at
the on-disk format now. The code can be cleaned up later, too, but we're
going to be stuck with the on-disk format forever, so it's critical to get
that right.

+1

First, a few observations:

Agreed.

I'd like to add that:
* Imo we need space in jsonb ondisk values to indicate a format
version. We won't fully get this right.
* The jentry representation should be changed so it's possible to get the type
of a entry without checking individual types. That'll make code like
findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
much easier to read. Preferrably so it an have the same values (after
shifting/masking) ask the JBE variants. And it needs space for futher
types (or representations thereof).
* I wonder if the hash/object pair representation is wise and if it
shouldn't be keys combined with offsets first, and then the
values. That will make access much faster. And that's what jsonb
essentially is about.
* I think both arrays and hashes should grow individual structs. With
space for additional flags.

* I have doubts of the wisdom of allowing to embed jbvBinary values in
JsonbValues. Although that can be changed later since it's not on disk.

I'm going to proceed refactoring those things, which will change the on-disk
format. It's late in the release cycle - these things really should've been
cleaned up earlier - but it's important to get the on-disk format right.
Shout if you have any objections.

I don't think it's likely that beta1 will be binary compatible with the
final version at this point.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#15)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 7:40 AM, Andres Freund <andres@anarazel.de> wrote:

I'm going to proceed refactoring those things, which will change the on-disk
format. It's late in the release cycle - these things really should've been
cleaned up earlier - but it's important to get the on-disk format right.
Shout if you have any objections.

+1.

I don't think it's likely that beta1 will be binary compatible with the
final version at this point.

I rather think we're not ready for beta1 at this point (but I expect
to lose that argument).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#13)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 1:20 PM, Heikki Linnakangas
<hlinnakangas@vmware.com>wrote:

So, apart from cleaning up the code, we really need to take a close look
at the on-disk format now. The code can be cleaned up later, too, but we're
going to be stuck with the on-disk format forever, so it's critical to get
that right.

First, a few observations:

* JENTRY_ISFIRST is redundant. Whenever you deal with the Jentry struct,
you know from the context which element in the array it is.

* JENTRY_ISNEST is set but never used.

* JENTRY_ISBOOL is defined as (JENTRY_ISNUMERIC | JENTRY_ISNEST), which
seems confusing.

I'm going to proceed refactoring those things, which will change the
on-disk format. It's late in the release cycle - these things really
should've been cleaned up earlier - but it's important to get the on-disk
format right. Shout if you have any objections.

+1. It's now or never. If the on-disk format needs changing, not doing it
now is going to leave us with a "jsonc" in the future... Better bite the
bullet now.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#18Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#16)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-07 08:50:33 -0400, Robert Haas wrote:

On Wed, May 7, 2014 at 7:40 AM, Andres Freund <andres@anarazel.de> wrote:

I don't think it's likely that beta1 will be binary compatible with the
final version at this point.

I rather think we're not ready for beta1 at this point (but I expect
to lose that argument).

Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in beta1
personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#18)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 2:56 PM, Andres Freund <andres@2ndquadrant.com>wrote:

On 2014-05-07 08:50:33 -0400, Robert Haas wrote:

On Wed, May 7, 2014 at 7:40 AM, Andres Freund <andres@anarazel.de>

wrote:

I don't think it's likely that beta1 will be binary compatible with the
final version at this point.

I rather think we're not ready for beta1 at this point (but I expect
to lose that argument).

Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in beta1
personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

We need to be very careful to tell people about it though. Preferrably if
we *know* a dump/reload will be needed to go beta1->beta2, we should
actually document that in the releasenotes of beta1 already. So people can
make proper plans..

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#20Andres Freund
andres@2ndquadrant.com
In reply to: Magnus Hagander (#19)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-07 15:00:01 +0200, Magnus Hagander wrote:

On Wed, May 7, 2014 at 2:56 PM, Andres Freund <andres@2ndquadrant.com>wrote:

On 2014-05-07 08:50:33 -0400, Robert Haas wrote:

On Wed, May 7, 2014 at 7:40 AM, Andres Freund <andres@anarazel.de>

wrote:

I don't think it's likely that beta1 will be binary compatible with the
final version at this point.

I rather think we're not ready for beta1 at this point (but I expect
to lose that argument).

Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in beta1
personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

We need to be very careful to tell people about it though. Preferrably if
we *know* a dump/reload will be needed to go beta1->beta2, we should
actually document that in the releasenotes of beta1 already. So people can
make proper plans..

Yes, I think it actually makes sense to add that to *all* beta release
notes. Even in beta2, although slightly weakened.
That's not a new thing btw. E.g. 9.3 has had a catversion bump between
beta1/2:
git diff 09bd2acbe5ac866ce9..817a89423f429a6a8b -- src/include/catalog/catversion.h

The more interesting note probably is that there quite possibly won't be
pg_upgrade'ability...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#20)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 3:04 PM, Andres Freund <andres@2ndquadrant.com>wrote:

On 2014-05-07 15:00:01 +0200, Magnus Hagander wrote:

On Wed, May 7, 2014 at 2:56 PM, Andres Freund <andres@2ndquadrant.com
wrote:

On 2014-05-07 08:50:33 -0400, Robert Haas wrote:

On Wed, May 7, 2014 at 7:40 AM, Andres Freund <andres@anarazel.de>

wrote:

I don't think it's likely that beta1 will be binary compatible

with the

final version at this point.

I rather think we're not ready for beta1 at this point (but I expect
to lose that argument).

Well, I guess it depends on what we define 'beta1' to be. Imo

evaluating

problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in

beta1

personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

We need to be very careful to tell people about it though. Preferrably if
we *know* a dump/reload will be needed to go beta1->beta2, we should
actually document that in the releasenotes of beta1 already. So people

can

make proper plans..

Yes, I think it actually makes sense to add that to *all* beta release
notes. Even in beta2, although slightly weakened.
That's not a new thing btw. E.g. 9.3 has had a catversion bump between
beta1/2:
git diff 09bd2acbe5ac866ce9..817a89423f429a6a8b --
src/include/catalog/catversion.h

The more interesting note probably is that there quite possibly won't be
pg_upgrade'ability...

Yeah, that's the big thing really.

Requiring pg_upgrade between betas might even be "good" in the sense that
then we get more testing of pg_upgrade :) But requiring a dump/reload is
going to hurt people more.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#19)
Re: Wanted: jsonb on-disk representation documentation

Magnus Hagander <magnus@hagander.net> writes:

On Wed, May 7, 2014 at 2:56 PM, Andres Freund <andres@2ndquadrant.com>wrote:

Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in beta1
personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

We need to be very careful to tell people about it though. Preferrably if
we *know* a dump/reload will be needed to go beta1->beta2, we should
actually document that in the releasenotes of beta1 already. So people can
make proper plans..

This seems like much ado about very little. The policy will be the same
as it ever was: once beta1 is out, we prefer to avoid forcing an initdb,
but we'll do it if we have to.

In any case, +1 for fixing whatever needs to be fixed now; I expect to
have a fix for the limited-GIN-index-length issue later today, and that
really is also an on-disk format change, though it won't affect short
index entries. ("Short" is TBD; I was thinking of hashing keys longer
than say 128 bytes.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#22)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-07 09:44:36 -0400, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Wed, May 7, 2014 at 2:56 PM, Andres Freund <andres@2ndquadrant.com>wrote:

Well, I guess it depends on what we define 'beta1' to be. Imo evaluating
problematic pieces of new code, locating unfinished pieces is part of
that. I don't see much point in forbidding incompatible changes in beta1
personally. That robs th the development cycle of the only period where
users can actually test the new version in a halfway sane manner and
report back with things that apparently broken.

We need to be very careful to tell people about it though. Preferrably if
we *know* a dump/reload will be needed to go beta1->beta2, we should
actually document that in the releasenotes of beta1 already. So people can
make proper plans..

This seems like much ado about very little. The policy will be the same
as it ever was: once beta1 is out, we prefer to avoid forcing an initdb,
but we'll do it if we have to.

I think Magnus' point is that we should tell users that we'll try but
won't guarantee anything. -hackers knowing about it doesn't mean users
will know.

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#15)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 4:40 AM, Andres Freund <andres@anarazel.de> wrote:

* Imo we need space in jsonb ondisk values to indicate a format
version. We won't fully get this right.

That would be nice.

* The jentry representation should be changed so it's possible to get the type
of a entry without checking individual types. That'll make code like
findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
much easier to read. Preferrably so it an have the same values (after
shifting/masking) ask the JBE variants. And it needs space for futher
types (or representations thereof).

I don't know what you mean. Every place that macros like
JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
union fields. At best, you could have those functions check that the
types match, and then handle each case in a switch that only looked at
(say) the "candidate", but that doesn't really save much at all. It
wouldn't take much to have the macros give enum constant values back
as you suggest, though.

* I wonder if the hash/object pair representation is wise and if it
shouldn't be keys combined with offsets first, and then the
values. That will make access much faster. And that's what jsonb
essentially is about.

I like that the physical layout reflects the layout of the original
JSON document. Besides, I don't think this is obviously the case. Are
you sure that it won't be more useful to make children as close to
their parents as possible? Particularly given idiomatic usage. Jsonb,
in point of fact, *is* fast.

* I think both arrays and hashes should grow individual structs. With
space for additional flags.

Why?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Andres Freund
andres@2ndquadrant.com
In reply to: Peter Geoghegan (#24)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-07 10:55:28 -0700, Peter Geoghegan wrote:

On Wed, May 7, 2014 at 4:40 AM, Andres Freund <andres@anarazel.de> wrote:

* The jentry representation should be changed so it's possible to get the type
of a entry without checking individual types. That'll make code like
findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
much easier to read. Preferrably so it an have the same values (after
shifting/masking) ask the JBE variants. And it needs space for futher
types (or representations thereof).

I don't know what you mean. Every place that macros like
JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
union fields. At best, you could have those functions check that the
types match, and then handle each case in a switch that only looked at
(say) the "candidate", but that doesn't really save much at all. It
wouldn't take much to have the macros give enum constant values back
as you suggest, though.

Compare
for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (JBE_ISNULL(*e) && key->type == jbvNull)
{
result->type = jbvNull;
result->estSize = sizeof(JEntry);
}
else if (JBE_ISSTRING(*e) && key->type == jbvString)
{
result->type = jbvString;
result->val.string.val = data + JBE_OFF(*e);
result->val.string.len = JBE_LEN(*e);
result->estSize = sizeof(JEntry) + result->val.string.len;
}
else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
{
result->type = jbvNumeric;
result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));

result->estSize = 2 * sizeof(JEntry) +
VARSIZE_ANY(result->val.numeric);
}
else if (JBE_ISBOOL(*e) && key->type == jbvBool)
{
result->type = jbvBool;
result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
result->estSize = sizeof(JEntry);
}
else
continue;

if (compareJsonbScalarValue(key, result) == 0)
return result;
}
with

for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (!JBE_TYPE_IS_SCALAR(*e))
continue;

if (JBE_TYPE(*e) != key->type)
continue;

result = getJsonbValue(e);

if (compareJsonbScalarValue(key, result) == 0)
return result;
}

Yes, it's not a fair comparison because I made up getJsonbValue(). But
it's *much* more readable regardless. And there's more places that could
use it. Like the second half of findJsonbValueFromSuperHeader(). FWIW,
that's one of the requests I definitely made before.

* I wonder if the hash/object pair representation is wise and if it
shouldn't be keys combined with offsets first, and then the
values. That will make access much faster. And that's what jsonb
essentially is about.

I like that the physical layout reflects the layout of the original
JSON document.

Don't see much value in that. This is a binary representation and it'd
be bijective.

Besides, I don't think this is obviously the case. Are
you sure that it won't be more useful to make children as close to
their parents as possible? Particularly given idiomatic usage.

Because - if done right - it would allow elementwise access without
scanning previous values.

* I think both arrays and hashes should grow individual structs. With
space for additional flags.

Why?

Because a) it will make the code more readable b) it'll allow for
adding different representations of hashes/arrays.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#15)
Re: Wanted: jsonb on-disk representation documentation

Andres Freund <andres@anarazel.de> writes:

* The jentry representation should be changed so it's possible to get the type
of a entry without checking individual types. That'll make code like
findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
much easier to read. Preferrably so it an have the same values (after
shifting/masking) ask the JBE variants. And it needs space for futher
types (or representations thereof).

Further types? What further types? JSON seems unlikely to grow any
other value types than what it's got.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Andres Freund
andres@2ndquadrant.com
In reply to: Tom Lane (#26)
Re: Wanted: jsonb on-disk representation documentation

On 2014-05-07 14:48:51 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

* The jentry representation should be changed so it's possible to get the type
of a entry without checking individual types. That'll make code like
findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
much easier to read. Preferrably so it an have the same values (after
shifting/masking) ask the JBE variants. And it needs space for futher
types (or representations thereof).

Further types? What further types? JSON seems unlikely to grow any
other value types than what it's got.

I am not thinking about user level exposed types, but e.g. a hash/object
representation that allows duplicated keys and keeps the original
order. Because I am pretty sure that'll come.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#27)
Re: Wanted: jsonb on-disk representation documentation

On 05/07/2014 02:52 PM, Andres Freund wrote:

On 2014-05-07 14:48:51 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

* The jentry representation should be changed so it's possible to get the type
of a entry without checking individual types. That'll make code like
findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
much easier to read. Preferrably so it an have the same values (after
shifting/masking) ask the JBE variants. And it needs space for futher
types (or representations thereof).

Further types? What further types? JSON seems unlikely to grow any
other value types than what it's got.

I am not thinking about user level exposed types, but e.g. a hash/object
representation that allows duplicated keys and keeps the original
order. Because I am pretty sure that'll come.

That makes one of you. We've had hstore for years and nobody that I
recall has asked for preservation of key order or duplicates. And any
app that relies on either in JSON is still, IMNSHO, broken by design.

OTOH, there are suggestions of "superjson" types that support other
scalar types such as timestamps.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#24)
Re: Wanted: jsonb on-disk representation documentation

btw ... in jsonb.h there is this comment:

* Jsonb Keys and string array elements are treated equivalently when
* serialized to text index storage. One day we may wish to create an opclass
* that only indexes values, but for now keys and values are stored in GIN
* indexes in a way that doesn't really consider their relationship to each
* other.

Is this an obsolete speculation about writing jsonb_hash_ops, or is there
still something worth preserving there? If so, what?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#29)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* Jsonb Keys and string array elements are treated equivalently when
* serialized to text index storage. One day we may wish to create an opclass
* that only indexes values, but for now keys and values are stored in GIN
* indexes in a way that doesn't really consider their relationship to each
* other.

Is this an obsolete speculation about writing jsonb_hash_ops, or is there
still something worth preserving there? If so, what?

This is not obsolete. It would equally apply to a GiST opclass that
wanted to support our current definition of existence. Array elements
are keys simply by virtue of being strings, but otherwise are treated
as values. See the large comment block within gin_extract_jsonb().

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#29)
Re: Wanted: jsonb on-disk representation documentation

And while I'm looking at it ...

The jsonb_ops storage format for values is inherently lossy, because it
cannot distinguish the string values "n", "t", "f" from JSON null or
boolean true, false respectively; nor does it know the difference between
a number and a string containing digits. This appears to not quite be a
bug because the consistent functions force recheck for all queries that
care about values (as opposed to keys). But if it's documented anywhere
I don't see where. And in any case, is it a good idea? We could fairly
easily change things so that these cases are guaranteed distinguishable.
We're using an entire byte to convey one bit of information (key or
value); I'm inclined to redefine the flag byte so that it tells not just
that but which JSON datatype is involved.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#31)
Re: Wanted: jsonb on-disk representation documentation

On Wed, May 7, 2014 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The jsonb_ops storage format for values is inherently lossy, because it
cannot distinguish the string values "n", "t", "f" from JSON null or
boolean true, false respectively; nor does it know the difference between
a number and a string containing digits. This appears to not quite be a
bug because the consistent functions force recheck for all queries that
care about values (as opposed to keys). But if it's documented anywhere
I don't see where.

The fact that we *don't* set the reset flag for
JsonbExistsStrategyNumber, and why that's okay is prominently
documented. So I'd say that it is.

And in any case, is it a good idea? We could fairly
easily change things so that these cases are guaranteed distinguishable.
We're using an entire byte to convey one bit of information (key or
value); I'm inclined to redefine the flag byte so that it tells not just
that but which JSON datatype is involved.

It seemed simpler to do it that way. As I've said before, jsonb_ops is
mostly concerned with hstore-style indexing. It could also be
particularly useful for expressional indexes on "tags" arrays of
strings, which is a common use-case.

jsonb_hash_ops on the other hand is for testing containment, which is
useful for querying heavily nested documents, where typically there is
a very low selectivity for keys. It's not the default largely because
I was concerned about not supporting all indexable operators by
default, and because I suspected that it would be preferred to have a
default closer to hstore.

Anyway, doing things that way for values won't obviate the need to set
the reset flag, unless you come up with a much more sophisticated
scheme. Existence (of keys) is only tested in respect of the
top-level. Containment (where values are tested) is more complicated.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#30)
Re: Wanted: jsonb on-disk representation documentation

Peter Geoghegan <pg@heroku.com> writes:

On Wed, May 7, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is this an obsolete speculation about writing jsonb_hash_ops, or is there
still something worth preserving there? If so, what?

This is not obsolete. It would equally apply to a GiST opclass that
wanted to support our current definition of existence. Array elements
are keys simply by virtue of being strings, but otherwise are treated
as values. See the large comment block within gin_extract_jsonb().

It's not that aspect I'm complaining about, it's the bit about "one day we
may wish to write". This comment should restrict itself to describing
what jsonb_ops does, not make already-or-soon-to-be-obsolete statements
about what other opclasses might or might not do.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#32)
Re: Wanted: jsonb on-disk representation documentation

Peter Geoghegan <pg@heroku.com> writes:

On Wed, May 7, 2014 at 12:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The jsonb_ops storage format for values is inherently lossy, because it
cannot distinguish the string values "n", "t", "f" from JSON null or
boolean true, false respectively; nor does it know the difference between
a number and a string containing digits. This appears to not quite be a
bug because the consistent functions force recheck for all queries that
care about values (as opposed to keys). But if it's documented anywhere
I don't see where.

The fact that we *don't* set the reset flag for
JsonbExistsStrategyNumber, and why that's okay is prominently
documented. So I'd say that it is.

Meh. Are you talking about the large comment block in gin_extract_jsonb?
The readability of that comment starts to go downhill with its use of
"reset" to refer to what everything else calls a "recheck" flag, and in
any case it's claiming that we *don't* need a recheck for exists (a
statement I suspect to be false, but more later). It entirely fails to
explain that other query types *do* need a recheck because of arbitrary
decisions about not representing JSON datatype information fully. There's
another comment in gin_consistent_jsonb that's just as misleading, because
it mentions (vaguely) some reasons why recheck is necessary without
mentioning this one.

A larger issue here is that, to the extent that this comment even has
the information I'm after, the comment is in the wrong place. It is not
attached to the code that's actually making the lossy representational
choices (ie, make_scalar_key), nor to the code that decides whether or not
a recheck is needed (ie, gin_consistent_jsonb). I don't think that basic
architectural choices like these should be relegated to comment blocks
inside specific functions to begin with. A README file would be better,
perhaps, but there's not a specific directory associated with the jsonb
code; so I think this sort of info belongs either in jsonb.h or in the
file header comment for jsonb_gin.c.

Anyway, doing things that way for values won't obviate the need to set
the reset flag, unless you come up with a much more sophisticated
scheme. Existence (of keys) is only tested in respect of the
top-level. Containment (where values are tested) is more complicated.

I'm not expecting that it will make things better for the current query
operators. I am thinking that exact rather than lossy storage might help
for future operators wanting to use this same index representation.
Once we ship 9.4, it's going to be very hard to change the index
representation, especially for the default opclass (cf the business about
which opclass is default for type inet). So it behooves us to not throw
information away needlessly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#34)
Re: Wanted: jsonb on-disk representation documentation

On 05/07/2014 04:13 PM, Tom Lane wrote:

A README file would be better,
perhaps, but there's not a specific directory associated with the jsonb
code; so I think this sort of info belongs either in jsonb.h or in the
file header comment for jsonb_gin.c.

Is there any reason we couldn't have a README.jsonb?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#35)
Re: Wanted: jsonb on-disk representation documentation

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/07/2014 04:13 PM, Tom Lane wrote:

A README file would be better,
perhaps, but there's not a specific directory associated with the jsonb
code; so I think this sort of info belongs either in jsonb.h or in the
file header comment for jsonb_gin.c.

Is there any reason we couldn't have a README.jsonb?

We could, but the only place I can see to put it would be in
backend/utils/adt/, which seems like a poor precedent; I don't want to end
up with forty-two README.foo files in there. The header comments for the
jsonbxxx.c files are probably better candidates for collecting this sort
of info.

(The larger problem here is that utils/adt/ has become a catchbasin for
all kinds of stuff that can barely squeeze under the rubric of "abstract
data type". But fixing that is something I don't care to tackle now.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers