X-From-Line: mailagent Mon Sep 11 12:43:45 EST 2006
Return-Path: <tgl@sss.pgh.pa.us>
Received: from po11.mit.edu [18.7.21.73]
	by stark.xeocode.com with POP3 (fetchmail-5.9.7)
	for stark@localhost (single-drop); Mon, 11 Sep 2006 13:43:49 -0400 (EDT)
Received: from po11.mit.edu ([unix socket])
	by po11.mit.edu (Cyrus v2.1.5) with LMTP;
	Mon, 11 Sep 2006 13:15:53 -0400
X-Sieve: CMU Sieve 2.2
Received: from pacific-carrier-annex.mit.edu by po11.mit.edu (8.13.6/4.7) id
	k8BHFqgb001327; Mon, 11 Sep 2006 13:15:53 -0400 (EDT)
Received: from mit.edu (M24-004-BARRACUDA-1.MIT.EDU [18.7.7.111])
	by pacific-carrier-annex.mit.edu (8.13.6/8.9.2) with ESMTP id
	k8BHFkr7024937
	for <gsstark@mit.edu>; Mon, 11 Sep 2006 13:15:46 -0400 (EDT)
Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])
	by mit.edu (Spam Firewall) with ESMTP id 9D239E6F2F
	for <gsstark@mit.edu>; Mon, 11 Sep 2006 13:15:45 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss.pgh.pa.us (8.13.6/8.13.6) with ESMTP id k8BHFhFo004134;
	Mon, 11 Sep 2006 13:15:43 -0400 (EDT)
To: Gregory Stark <stark@enterprisedb.com>
cc: Gregory Stark <gsstark@mit.edu>, Bruce Momjian <bruce@momjian.us>,
	Peter Eisentraut <peter_e@gmx.net>, pgsql-hackers@postgresql.org,
	Martijn van Oosterhout <kleptog@svana.org>
Subject: Re: [HACKERS] Fixed length data types issue 
In-reply-to: <87fyeyyb3c.fsf@enterprisedb.com> 
References: <200609102340.k8ANe7t20884@momjian.us>
	<5604.1157932798@sss.pgh.pa.us> <871wqjtf21.fsf@stark.xeocode.com>
	<6043.1157937411@sss.pgh.pa.us> <87fyeyyb3c.fsf@enterprisedb.com>
Comments: In-reply-to Gregory Stark <stark@enterprisedb.com>
	message dated "Mon, 11 Sep 2006 11:03:03 +0100"
Date: Mon, 11 Sep 2006 13:15:43 -0400
X-Gnus-Mail-Source: directory:~/incoming
Message-ID: <4133.1157994943@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Spam-Score: 0.00
X-Spam-Flag: NO
X-Scanned-By: MIMEDefang 2.42
X-Mailagent-Processed: Mon, 11 Sep 2006 12:44:17 -0500
X-Mailagent-Processed: Job:203423 File:fm3914
X-Filter: mailagent [version 3.0 PL73] for gsstark@mit.edu
Lines: 64
Xref: stark.xeocode.com work.enterprisedb:683

Gregory Stark <stark@enterprisedb.com> writes:
> In any case it seems a bit backwards to me. Wouldn't it be better to
> preserve bits in the case of short length words where they're precious
> rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...

Well, I don't find that real persuasive: you're saying that it's
important to have a 1-byte not 2-byte header for datums between 64 and
127 bytes long.  Which is by definition less than a 2% savings for those
values.  I think its's more important to pick bitpatterns that reduce
the number of cases heap_deform_tuple has to think about while decoding
the length of a field --- every "if" in that inner loop is expensive.

I realized this morning that if we are going to preserve the rule that
4-byte-header and compressed-header cases can be distinguished from the
data alone, there is no reason to be very worried about whether the
2-byte cases can represent the maximal length of an in-line datum.
If you want to do 16K inline (and your page is big enough for that)
you can just fall back to the 4-byte-header case.  So there's no real
disadvantage if the 2-byte headers can only go up to 4K or so.  This
gives us some more flexibility in the bitpattern choices.

Another thought that occurred to me is that if we preserve the
convention that a length word's value includes itself, then for a
1-byte header the bit pattern 10000000 is meaningless --- the count
has to be at least 1.  So one trick we could play is to take over
this value as the signal for "toast pointer follows", with the
assumption that the tuple-decoder code knows a-priori how big a
toast pointer is.  I am not real enamored of this, because it certainly
adds one case to the inner heap_deform_tuple loop and it'll give us
problems if we ever want more than one kind of toast pointer.  But
it's a possibility.

Anyway, a couple of encodings that I'm thinking about now involve
limiting uncompressed data to 1G (same as now), so that we can play
with the first 2 bits instead of just 1:

00xxxxxx	4-byte length word, aligned, uncompressed data (up to 1G)
01xxxxxx	4-byte length word, aligned, compressed data (up to 1G)
100xxxxx	1-byte length word, unaligned, TOAST pointer
1010xxxx	2-byte length word, unaligned, uncompressed data (up to 4K)
1011xxxx	2-byte length word, unaligned, compressed data (up to 4K)
11xxxxxx	1-byte length word, unaligned, uncompressed data (up to 63b)

or

00xxxxxx	4-byte length word, aligned, uncompressed data (up to 1G)
010xxxxx	2-byte length word, unaligned, uncompressed data (up to 8K)
011xxxxx	2-byte length word, unaligned, compressed data (up to 8K)
10000000	1-byte length word, unaligned, TOAST pointer
1xxxxxxx	1-byte length word, unaligned, uncompressed data (up to 127b)
		(xxxxxxx not all zero)

This second choice allows longer datums in both the 1-byte and 2-byte
header formats, but it hardwires the length of a TOAST pointer and
requires four cases to be distinguished in the inner loop; the first
choice only requires three cases, because TOAST pointer and 1-byte
header can be handled by the same rule "length is low 6 bits of byte".
The second choice also loses the ability to store in-line compressed
data above 8K, but that's probably an insignificant loss.

There's more than one way to do it ...

			regards, tom lane


