X-From-Line: mailagent Tue Nov 16 16:52:32 EST 2004
Return-Path: <gsstark@MIT.EDU>
Received: from po11.mit.edu [18.7.21.73]
	by stark.xeocode.com with POP3 (fetchmail-5.9.7)
	for stark@localhost (single-drop); Tue, 16 Nov 2004 16:52:34 -0500 (EST)
Received: from po11.mit.edu (po11.mit.edu [18.7.21.73])
	by po11.mit.edu (Cyrus v2.1.5) with LMTP;
	Tue, 16 Nov 2004 16:13:06 -0500
X-Sieve: CMU Sieve 2.2
Received: from fort-point-station.mit.edu by po11.mit.edu (8.12.4/4.7) id
	iAGLCxWp023747; Tue, 16 Nov 2004 16:13:00 -0500 (EST)
Received: from stark.xeocode.com (gsstark.mtl.istop.com [66.11.160.162])
	by fort-point-station.mit.edu (8.12.4/8.9.2) with ESMTP id
	iAGLCww8019961
	for <gsstark@mit.edu>; Tue, 16 Nov 2004 16:12:58 -0500 (EST)
Received: from localhost ([127.0.0.1] helo=stark.xeocode.com)
	by stark.xeocode.com with smtp (Exim 3.36 #1 (Debian))
	id 1CUAdF-00076T-00; Tue, 16 Nov 2004 16:12:57 -0500
Sender: gsstark@MIT.EDU
To: Greg Stark <gsstark@mit.edu>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] GiST: PickSplit and multi-attr indexes
References: <419723A7.5040505@samurai.com> <27866.1100476459@sss.pgh.pa.us>
	<1100492090.23420.28.camel@localhost.localdomain>
	<7879.1100531985@sss.pgh.pa.us>
	<1100578365.23420.71.camel@localhost.localdomain>
	<17937.1100616003@sss.pgh.pa.us> <87wtwl7bqe.fsf@stark.xeocode.com>
In-Reply-To: <87wtwl7bqe.fsf@stark.xeocode.com>
From: Greg Stark <gsstark@MIT.EDU>
Organization: The Emacs Conspiracy; member since 1992
Date: 16 Nov 2004 16:12:57 -0500
X-Gnus-Mail-Source: directory:~/incoming
Message-ID: <87r7mt7b1i.fsf@stark.xeocode.com>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Scanned-By: MIMEDefang 2.42
X-Spam-Score: -4.9
X-Mailagent-Processed: Tue, 16 Nov 2004 16:53:11 -0500
X-Mailagent-Processed: Job:158263 File:fm28493
X-Bogosity: No, tests=bogofilter, spamicity=0.379460, version=0.17.5
X-Spam-DCC: : 
X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on stark.xeocode.com
X-Spam-Level: 
X-Spam-Status: No, hits=0.0 required=4.0 tests=none autolearn=ham version=2.64
X-Filter: mailagent [version 3.0 PL73] for gsstark@mit.edu
Lines: 30
Xref: stark.xeocode.com misc:37449

Greg Stark <gsstark@MIT.EDU> writes:

> I'm not sure that GiST indexes behave the same way as btree indexes for the
> multi-column case. 
> 
> In a btree index the second column is entirely subordinate to the first
> column. In a GiST index the data is multi-dimensional, and all dimensions are
> equally important.

In fact on further consideration I do have a proposal.

If you look at the GiST implementations for various index types you'll see
that many (all?) take the same approach for PickSplit. In fact they pretty
much all have the same code copy/pasted to handle it.

The approach they take is to have a function which calculates an abstract
"distance" between any two entries. There's an algorithm that they use to pick
the split based on this distance function.

If you abandoned "PickSplit" and instead exposed this distance function as the
external API then the behaviour for multi-column indexes is clear. You
calculate the distance along all the axes and calculate the diagonal distance.

I think abandoning PickSplit for the distance function might also mean you
don't need a separate function for Penalty either, but I'm not sure on that.

-- 
greg



