[PATCH] binary heap implementation
Hi.
There are two or three places in the Postgres source that implement heap
sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the
BDR code. It seemed reasonable to factor out the functionality.
I've attached a patch (binaryheap.diff) that contains a straightforward
implementation of a binary heap (originally written by Andres, with a
few bugfixes and cleanups by me).
There doesn't seem to be any place to put unit tests for code like this,
so at Alvaro's suggestion, I have attached a small standalone program I
used for testing (testbinheap.c) and a Makefile. If you copy them both
to src/backend/lib and run "make -f Makefile.binheap", it'll build the
test program. It's nothing exciting, just exercises the functions in
various ways.
I've also attached a patch (nodeMergeAppend.diff) that converts the code
in nodeMergeAppend.c to use binaryheap. It passes "make check", and also
the following test (which is planned as a merge append):
CREATE OR REPLACE VIEW gen AS SELECT * FROM
generate_series(1, 100000) g(i) ORDER BY g.i OFFSET 0;
SELECT * FROM (
SELECT * FROM gen UNION ALL SELECT * FROM gen UNION ALL
SELECT * FROM gen UNION ALL SELECT * FROM gen UNION ALL
SELECT * FROM gen UNION ALL SELECT * FROM gen) s
ORDER BY i OFFSET 1000000;
Initially there was a slight performance degradation with my patch, but
I speeded things up and now my code is at least at par with, and maybe
slightly faster than, the old code. On my laptop, both consistently
take ~2.4s on average to execute the above SELECT.
Comments? Suggestions?
-- Abhijit
Attachments:
binaryheap.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 98ce3d7..327a1bc 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = ilist.o stringinfo.o
+OBJS = ilist.o binaryheap.o stringinfo.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
new file mode 100644
index 0000000..aebf08f
--- /dev/null
+++ b/src/include/lib/binaryheap.h
@@ -0,0 +1,138 @@
+/*
+ * binaryheap.h
+ *
+ * A simple binary heap implementation
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * src/include/lib/binaryheap.h
+ */
+
+#ifndef BINARYHEAP_H
+#define BINARYHEAP_H
+
+/*
+ * This structure represents a single node in a binaryheap, and just
+ * holds two pointers. The heap management code doesn't care what is
+ * stored in a node (in particular, the key or value may be NULL),
+ * only that the comparator function can compare any two nodes.
+ */
+
+typedef struct binaryheap_node
+{
+ void *key;
+ void *value;
+} binaryheap_node;
+
+/*
+ * For a max-heap, the comparator must return:
+ * -1 iff a < b
+ * 0 iff a == b
+ * +1 iff a > b
+ * For a min-heap, the conditions are reversed.
+ */
+typedef int (*binaryheap_comparator)(binaryheap_node *a, binaryheap_node *b);
+
+/*
+ * binaryheap
+ *
+ * size how many nodes are currently in "nodes"
+ * space how many nodes can be stored in "nodes"
+ * comparator comparator to define the heap property
+ * nodes the first of a list of "space" nodes
+ */
+
+typedef struct binaryheap
+{
+ size_t size;
+ size_t space;
+ binaryheap_comparator compare;
+ binaryheap_node nodes[1];
+} binaryheap;
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function.
+ */
+
+binaryheap *
+binaryheap_allocate(size_t capacity, binaryheap_comparator compare);
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+
+void
+binaryheap_free(binaryheap *heap);
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+ * in O(1) without preserving the heap property. This is a convenience
+ * to add elements quickly to a new heap. To obtain a valid heap, one
+ * must call binaryheap_build() afterwards.
+ */
+
+void
+binaryheap_add_unordered(binaryheap *heap, void *key, void *value);
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+
+void
+binaryheap_build(binaryheap *heap);
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+ * preserving the heap property.
+ */
+
+void
+binaryheap_add(binaryheap *heap, void *key, void *value);
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+
+binaryheap_node *
+binaryheap_first(binaryheap *heap);
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+
+binaryheap_node *
+binaryheap_remove_first(binaryheap *heap);
+
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
+ * ensure that the heap property is preserved. O(1) in the best case,
+ * or O(log n) if it must fall back to sifting the new node down.
+ */
+
+void
+binaryheap_replace_first(binaryheap *heap, void *newkey, void *newval);
+
+#endif /* BINARYHEAP_H */
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
new file mode 100644
index 0000000..0fa8525
--- /dev/null
+++ b/src/backend/lib/binaryheap.c
@@ -0,0 +1,342 @@
+/*-------------------------------------------------------------------------
+ *
+ * binaryheap.c
+ * A simple binary heap implementaion
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/lib/binaryheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/binaryheap.h"
+
+/* XXX
+ * The following is only to allow testbinheap.c to use this code outside
+ * the backend; should be removed before committing. */
+#ifdef TESTBINHEAP
+#define palloc malloc
+#define pfree free
+#endif
+
+static void sift_down(binaryheap *heap, size_t node_off);
+static void sift_up(binaryheap *heap, size_t node_off);
+static inline void swap_nodes(binaryheap *heap, size_t a, size_t b);
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function.
+ */
+
+binaryheap *
+binaryheap_allocate(size_t capacity, binaryheap_comparator compare)
+{
+ int sz = sizeof(binaryheap) + sizeof(binaryheap_node) * (capacity-1);
+
+ binaryheap *heap = palloc(sz);
+ heap->compare = compare;
+ heap->space = capacity;
+ heap->size = 0;
+ return heap;
+}
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+
+void
+binaryheap_free(binaryheap *heap)
+{
+ pfree(heap);
+}
+
+/*
+ * These utility functions return the offset of the left child, right
+ * child, and parent of the node at the given index, respectively.
+ *
+ * The heap is represented as an array of nodes, with the root node
+ * stored at index 0. The left child of node i is at index 2*i+1, and
+ * the right child at 2*i+2. The parent of node i is at index (i-1)/2.
+ */
+
+static inline int
+left_offset(size_t i)
+{
+ return 2 * i + 1;
+}
+
+static inline int
+right_offset(size_t i)
+{
+ return 2 * i + 2;
+}
+
+static inline int
+parent_offset(size_t i)
+{
+ return floor((i - 1) / 2);
+}
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+ * in O(1) without preserving the heap property. This is a convenience
+ * to add elements quickly to a new heap. To obtain a valid heap, one
+ * must call binaryheap_build() afterwards.
+ */
+
+void
+binaryheap_add_unordered(binaryheap *heap, void *key, void *value)
+{
+ if (heap->size + 1 == heap->space)
+ Assert("binary heap is full");
+ heap->nodes[heap->size].key = key;
+ heap->nodes[heap->size++].value = value;
+}
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+
+void
+binaryheap_build(binaryheap *heap)
+{
+ int i;
+
+ for (i = parent_offset(heap->size - 1); i >= 0; i--)
+ {
+ sift_down(heap, i);
+ }
+}
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+ * preserving the heap property.
+ */
+
+void
+binaryheap_add(binaryheap *heap, void *key, void *value)
+{
+ binaryheap_add_unordered(heap, key, value);
+ sift_up(heap, heap->size - 1);
+}
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+
+binaryheap_node *
+binaryheap_first(binaryheap *heap)
+{
+ if (!heap->size)
+ return NULL;
+ return &heap->nodes[0];
+}
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+
+binaryheap_node *
+binaryheap_remove_first(binaryheap *heap)
+{
+ if (heap->size == 0)
+ return NULL;
+
+ if (heap->size == 1)
+ {
+ heap->size--;
+ return &heap->nodes[0];
+ }
+
+ /*
+ * Swap the root and last nodes, decrease the size of the heap (i.e.
+ * remove the former root node) and sift the new root node down to
+ * its correct position.
+ */
+
+ swap_nodes(heap, 0, heap->size - 1);
+
+ heap->size--;
+ sift_down(heap, 0);
+ return &heap->nodes[heap->size];
+}
+
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
+ * ensure that the heap property is preserved. O(1) in the best case,
+ * or O(log n) if it must fall back to sifting the new node down.
+ */
+
+void
+binaryheap_replace_first(binaryheap *heap, void *key, void *val)
+{
+ int ret;
+ size_t next_off = 0;
+
+ if (key)
+ heap->nodes[0].key = key;
+ if (val)
+ heap->nodes[0].value = val;
+
+ /*
+ * If the new root node is still larger than the largest of its
+ * children (of which there may be 0, 1, or 2), then the heap is
+ * still valid.
+ */
+
+ if (heap->size == 1) {
+ return;
+ }
+ if (heap->size == 2) {
+ next_off = 1;
+ }
+ else {
+ /* Two children: pick the larger one */
+ ret = heap->compare(&heap->nodes[2], &heap->nodes[1]);
+ if (ret == -1)
+ next_off = 1;
+ else
+ next_off = 2;
+ }
+
+ ret = heap->compare(&heap->nodes[next_off], &heap->nodes[0]);
+ if (ret == -1)
+ return;
+
+ /* The child is larger. Swap and sift the new node down. */
+
+ swap_nodes(heap, 0, next_off);
+ sift_down(heap, next_off);
+}
+
+/*
+ * Swap the contents of two nodes.
+ */
+
+static inline void
+swap_nodes(binaryheap *heap, size_t a, size_t b)
+{
+ binaryheap_node swap;
+ swap.value = heap->nodes[a].value;
+ swap.key = heap->nodes[a].key;
+
+ heap->nodes[a].value = heap->nodes[b].value;
+ heap->nodes[a].key = heap->nodes[b].key;
+
+ heap->nodes[b].key = swap.key;
+ heap->nodes[b].value = swap.value;
+}
+
+/*
+ * Sift a node up to the highest position it can hold according to the
+ * comparator.
+ */
+
+static void
+sift_up(binaryheap *heap, size_t node_off)
+{
+ /* manually unrolled tail recursion */
+ while (true)
+ {
+ size_t parent_off = parent_offset(node_off);
+
+ if (node_off == 0)
+ break;
+
+ if (heap->compare(&heap->nodes[parent_off],
+ &heap->nodes[node_off]) < 0)
+ {
+ /* heap property violated */
+ swap_nodes(heap, node_off, parent_off);
+
+ /* recurse */
+ node_off = parent_off;
+ }
+ else
+ break;
+ }
+}
+
+/*
+ * Sift a node down from its current position to satisfy the heap
+ * property.
+ */
+
+static void
+sift_down(binaryheap *heap, size_t node_off)
+{
+ /* manually unrolled tail recursion */
+ while (true)
+ {
+ size_t left_off = left_offset(node_off);
+ size_t right_off = right_offset(node_off);
+ size_t swap_off = 0;
+
+ /* Is the left child larger than the parent? */
+
+ if (left_off < heap->size &&
+ heap->compare(&heap->nodes[node_off],
+ &heap->nodes[left_off]) < 0)
+ {
+ swap_off = left_off;
+ }
+
+ /*
+ * If not (note: only one child can violate the heap property
+ * after a change), is the right child larger?
+ */
+
+ if (right_off < heap->size &&
+ heap->compare(&heap->nodes[node_off],
+ &heap->nodes[right_off]) < 0)
+ {
+ /* swap with the larger child */
+ if (!swap_off ||
+ heap->compare(&heap->nodes[left_off],
+ &heap->nodes[right_off]) < 0)
+ {
+ swap_off = right_off;
+ }
+ }
+
+ if (!swap_off)
+ {
+ /* heap condition fullfilled, abort */
+ break;
+ }
+
+ /* swap node with the child violating the property */
+ swap_nodes(heap, swap_off, node_off);
+
+ /* recurse, check child subtree */
+ node_off = swap_off;
+ }
+}
nodeMergeAppend.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fec07b8..d4911bd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1100,10 +1100,8 @@ typedef struct AppendState
* nkeys number of sort key columns
* sortkeys sort keys in SortSupport representation
* slots current output tuple of each subplan
- * heap heap of active tuples (represented as array indexes)
- * heap_size number of active heap entries
+ * heap heap of active tuples
* initialized true if we have fetched first tuple from each subplan
- * last_slot last subplan fetched from (which must be re-called)
* ----------------
*/
typedef struct MergeAppendState
@@ -1114,10 +1112,8 @@ typedef struct MergeAppendState
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
- int *ms_heap; /* array of length ms_nplans */
- int ms_heap_size; /* current active length of ms_heap[] */
+ struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- int ms_last_slot; /* last subplan slot we returned from */
} MergeAppendState;
/* ----------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index d5141ba..6b2fef2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -41,6 +41,8 @@
#include "executor/execdebug.h"
#include "executor/nodeMergeAppend.h"
+#include "lib/binaryheap.h"
+
/*
* It gets quite confusing having a heap array (indexed by integers) which
* contains integers which index into the slots array. These typedefs try to
@@ -49,9 +51,7 @@
typedef int SlotNumber;
typedef int HeapPosition;
-static void heap_insert_slot(MergeAppendState *node, SlotNumber new_slot);
-static void heap_siftup_slot(MergeAppendState *node);
-static int32 heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2);
+static int heap_compare_slots(binaryheap_node *a, binaryheap_node *b);
/* ----------------------------------------------------------------
@@ -88,7 +88,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
- mergestate->ms_heap = (int *) palloc0(sizeof(int) * nplans);
+ mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots);
/*
* Miscellaneous initialization
@@ -143,9 +143,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize to show we have not run the subplans yet
*/
- mergestate->ms_heap_size = 0;
mergestate->ms_initialized = false;
- mergestate->ms_last_slot = -1;
return mergestate;
}
@@ -160,6 +158,7 @@ TupleTableSlot *
ExecMergeAppend(MergeAppendState *node)
{
TupleTableSlot *result;
+ binaryheap_node *hn;
SlotNumber i;
if (!node->ms_initialized)
@@ -172,7 +171,9 @@ ExecMergeAppend(MergeAppendState *node)
{
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ {
+ binaryheap_add(node->ms_heap, node, (void *)i);
+ }
}
node->ms_initialized = true;
}
@@ -184,19 +185,22 @@ ExecMergeAppend(MergeAppendState *node)
* the logic a bit by doing this before returning from the prior call,
* but it's better to not pull tuples until necessary.)
*/
- i = node->ms_last_slot;
+ hn = binaryheap_first(node->ms_heap);
+ i = (int)hn->value;
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
- if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ if (!TupIsNull(node->ms_slots[i])) {
+ binaryheap_replace_first(node->ms_heap, node, (void *)i);
+ }
+ else {
+ (void)binaryheap_remove_first(node->ms_heap);
+ }
}
- if (node->ms_heap_size > 0)
+ hn = binaryheap_first(node->ms_heap);
+ if (hn)
{
- /* Return the topmost heap node, and sift up the remaining nodes */
- i = node->ms_heap[0];
+ i = (int)hn->value;
result = node->ms_slots[i];
- node->ms_last_slot = i;
- heap_siftup_slot(node);
}
else
{
@@ -208,65 +212,15 @@ ExecMergeAppend(MergeAppendState *node)
}
/*
- * Insert a new slot into the heap. The slot must contain a valid tuple.
- */
-static void
-heap_insert_slot(MergeAppendState *node, SlotNumber new_slot)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition j;
-
- Assert(!TupIsNull(node->ms_slots[new_slot]));
-
- j = node->ms_heap_size++; /* j is where the "hole" is */
- while (j > 0)
- {
- int i = (j - 1) / 2;
-
- if (heap_compare_slots(node, new_slot, node->ms_heap[i]) >= 0)
- break;
- heap[j] = heap[i];
- j = i;
- }
- heap[j] = new_slot;
-}
-
-/*
- * Delete the heap top (the slot in heap[0]), and sift up.
- */
-static void
-heap_siftup_slot(MergeAppendState *node)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition i,
- n;
-
- if (--node->ms_heap_size <= 0)
- return;
- n = node->ms_heap_size; /* heap[n] needs to be reinserted */
- i = 0; /* i is where the "hole" is */
- for (;;)
- {
- int j = 2 * i + 1;
-
- if (j >= n)
- break;
- if (j + 1 < n && heap_compare_slots(node, heap[j], heap[j + 1]) > 0)
- j++;
- if (heap_compare_slots(node, heap[n], heap[j]) <= 0)
- break;
- heap[i] = heap[j];
- i = j;
- }
- heap[i] = heap[n];
-}
-
-/*
* Compare the tuples in the two given slots.
*/
static int32
-heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
+heap_compare_slots(binaryheap_node *a, binaryheap_node *b)
{
+ MergeAppendState *node = (MergeAppendState *)a->key;
+ SlotNumber slot1 = (int)a->value;
+ SlotNumber slot2 = (int)b->value;
+
TupleTableSlot *s1 = node->ms_slots[slot1];
TupleTableSlot *s2 = node->ms_slots[slot2];
int nkey;
@@ -291,7 +245,7 @@ heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
datum2, isNull2,
sortKey);
if (compare != 0)
- return compare;
+ return -compare;
}
return 0;
}
@@ -347,7 +301,5 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
- node->ms_heap_size = 0;
node->ms_initialized = false;
- node->ms_last_slot = -1;
}
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote:
There are two or three places in the Postgres source that implement heap
sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the
BDR code. It seemed reasonable to factor out the functionality.
This replaces the "simpleheap.[ch]" I had in the last submitted series.
I've attached a patch (binaryheap.diff) that contains a straightforward
implementation of a binary heap (originally written by Andres, with a
few bugfixes and cleanups by me).
I want to emphasise that the only good thing about my submitted version
was that it actually compiled. So quite a bit of the work here is from
Abhijit...
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
Comments? Suggestions?
It could use a run through pgindent. And the header comments are
separated by a blank line from the functions to which they are
attached, which is not project style.
+ if (heap->size + 1 == heap->space)
+ Assert("binary heap is full");
This doesn't really make any sense. Assert("binary heap is full")
will never fail, because that expression is a character string which
is, therefore, not NULL. I think you want Assert(heap->size <
heap->space). Or you could use (heap->size + 1 >= heap->space)
elog(LEVEL, message) if there's some reason this needs to be checked
in non-debug builds.
+ {
+ sift_down(heap, i);
+ }
Project style is to omit braces for a single-line body. This comes up
a few other places as well.
Other than the coding style issues, I think this looks fine. If you
can fix that up, I believe I could go ahead and commit this, unless
anyone else objects.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas escribió:
On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
Comments? Suggestions?
It could use a run through pgindent. And the header comments are
separated by a blank line from the functions to which they are
attached, which is not project style.
Also there are some comments in the .h file which we typically don't
carry.
Other than the coding style issues, I think this looks fine. If you
can fix that up, I believe I could go ahead and commit this, unless
anyone else objects.
Does this include the changes in nodeMergeappend.c?
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Assert(!"description of error") is an idiom I've seen more than once,
although I'm not sure I understand why it's not written as Robert says with
the condition in the brackets (or as a print to STDERR followed by abort()
instead).
On 15 November 2012 15:11, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen <ams@2ndquadrant.com>
wrote:Comments? Suggestions?
It could use a run through pgindent. And the header comments are
separated by a blank line from the functions to which they are
attached, which is not project style.+ if (heap->size + 1 == heap->space) + Assert("binary heap is full");This doesn't really make any sense. Assert("binary heap is full")
will never fail, because that expression is a character string which
is, therefore, not NULL. I think you want Assert(heap->size <
heap->space). Or you could use (heap->size + 1 >= heap->space)
elog(LEVEL, message) if there's some reason this needs to be checked
in non-debug builds.+ { + sift_down(heap, i); + }Project style is to omit braces for a single-line body. This comes up
a few other places as well.Other than the coding style issues, I think this looks fine. If you
can fix that up, I believe I could go ahead and commit this, unless
anyone else objects.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/15/2012 10:11 AM, Robert Haas wrote:
+ { + sift_down(heap, i); + }Project style is to omit braces for a single-line body. This comes up
a few other places as well.
I thought we modified that some years ago, although my memory of it is a
bit hazy. Personally I think it's a bad rule, at least as stated, in the
case where there's an else clause with a compound statement. I'd prefer
to see a rule that says that either both branches of an if/else should
be compound statements or neither should be. I think the cases where
only one branch is a compound statement are rather ugly.
cheers
andrew
On Thu, Nov 15, 2012 at 10:21 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Other than the coding style issues, I think this looks fine. If you
can fix that up, I believe I could go ahead and commit this, unless
anyone else objects.Does this include the changes in nodeMergeappend.c?
I think so. I was going to double-check the performance before
committing, but it doesn't look like there should be a problem.
Coding style changes appear needed in that file as well, however.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote:
There are two or three places in the Postgres source that implement heap
sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the
BDR code. It seemed reasonable to factor out the functionality.
pg_dump also contains a binary heap implementation if memory serves
right which makes me wonder whether we should try binaryheap.[ch]
backend clean... It currently uses palloc/pfree.
Too bad we don't have a memory allocation routine which easily works in
the backend and frontend... palloc references MemoryContextAlloc and
CurrentMemoryContext so thats not that easy to emulate.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andrew Dunstan escribió:
On 11/15/2012 10:11 AM, Robert Haas wrote:
+ { + sift_down(heap, i); + }Project style is to omit braces for a single-line body. This comes up
a few other places as well.I thought we modified that some years ago, although my memory of it
is a bit hazy.
No, we only modified pg_indent to not take the braces off, because
of PG_TRY blocks. But we keep using single statements instead of
compound in many places; but there is no hard rule about it. To me,
using braces were they are not needed is pointless and ugly. We're very
careful about ensuring that our macro definitions work nicely with
single-statement if/else, for example.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 15, 2012 at 11:22 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote:
There are two or three places in the Postgres source that implement heap
sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the
BDR code. It seemed reasonable to factor out the functionality.pg_dump also contains a binary heap implementation if memory serves
right which makes me wonder whether we should try binaryheap.[ch]
backend clean... It currently uses palloc/pfree.Too bad we don't have a memory allocation routine which easily works in
the backend and frontend... palloc references MemoryContextAlloc and
CurrentMemoryContext so thats not that easy to emulate.
I think there's a clear need for a library of code that can be shared
by frontend and backend code. I don't necessarily want to punt
everything into libpq, though. What I wonder if we might do is create
something like src/lib/pgguts that contains routines for memory
allocation, error handling, stringinfos (so that you don't have to use
PQexpBuffer in the frontend and StringInfo in the backend), etc. and
make that usable by both front-end and back-end code. I think that
would be a tremendously nice thing to have.
I don't think we should make it this patch's problem to do that, but
I'd be strongly in favor of such a thing if someone wants to put it
together. It's been a huge nuisance for me in the past and all
indicates are that the work you're doing on LR is just going to
exacerbate that problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2012-11-15 11:37:16 -0500, Robert Haas wrote:
On Thu, Nov 15, 2012 at 11:22 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote:
There are two or three places in the Postgres source that implement heap
sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the
BDR code. It seemed reasonable to factor out the functionality.pg_dump also contains a binary heap implementation if memory serves
right which makes me wonder whether we should try binaryheap.[ch]
backend clean... It currently uses palloc/pfree.Too bad we don't have a memory allocation routine which easily works in
the backend and frontend... palloc references MemoryContextAlloc and
CurrentMemoryContext so thats not that easy to emulate.
I think there's a clear need for a library of code that can be shared
by frontend and backend code. I don't necessarily want to punt
everything into libpq, though.
Aggreed.
What I wonder if we might do is create
something like src/lib/pgguts that contains routines for memory
allocation, error handling, stringinfos (so that you don't have to use
PQexpBuffer in the frontend and StringInfo in the backend), etc. and
make that usable by both front-end and back-end code. I think that
would be a tremendously nice thing to have.
Yes. But it also sounds like quite a bit of work :(
I don't think we should make it this patch's problem to do that
Me neither. I was thinking about letting the "user" allocate enough
memory like:
binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10));
binaryheap_init(heap, 10, comparator);
But thats pretty ugly.
but
I'd be strongly in favor of such a thing if someone wants to put it
together. It's been a huge nuisance for me in the past and all
indicates are that the work you're doing on LR is just going to
exacerbate that problem.
I actually hope I won't have to fight with that all that much, but we
will see :/
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 15, 2012 at 11:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Me neither. I was thinking about letting the "user" allocate enough
memory like:binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10));
binaryheap_init(heap, 10, comparator);But thats pretty ugly.
Yeah. I would vote against doing that for now. We can always uglify
the code later if we decide it's absolutely necessary.
One trick that we could potentially use to make code run in the
frontend and backend is to put it in src/port. That code gets
compiled twice, once within -DFRONTEND and once without. So you can
use #ifdef to handle things that need to work different ways. The
only real problem with that approach is that you end up putting the
code in libpgport where it seems rather out of place. Still, maybe we
could have a src/framework directory that uses the same trick to
produce a libpgframework that frontend code can use, and a lib
pgframework_srv that can be linked into the backend. That's might not
actually be that much work; we'd just need to clone all the existing
src/port logic.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
At 2012-11-15 10:11:58 -0500, robertmhaas@gmail.com wrote:
It could use a run through pgindent.
Done.
I think you want Assert(heap->size < heap->space).
Of course. Thanks for catching that.
Project style is to omit braces for a single-line body.
I've removed most of them. In the few cases where one branch of an
if/else would have a one-line body and the other would not, I chose
to rewrite the code to avoid the problem (because, like Andrew, I
hate having to do that).
I wasn't sure how to present the following:
if (right_off < heap->size &&
heap->compare(&heap->nodes[node_off],
&heap->nodes[right_off]) < 0)
{
/* swap with the larger child */
if (!swap_off ||
heap->compare(&heap->nodes[left_off],
&heap->nodes[right_off]) < 0)
{
swap_off = right_off;
}
}
…so I left it alone. If you want me to remove the inner braces and
resubmit, despite the multi-line condition, let me know.
I didn't know which header comments Álvaro meant I should remove,
so I haven't done that either. I did remove the TESTBINHEAP stuff,
though.
I have attached the updated patch here. Thanks for your review.
-- Abhijit
Attachments:
binaryheap.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 98ce3d7..327a1bc 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = ilist.o stringinfo.o
+OBJS = ilist.o binaryheap.o stringinfo.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
new file mode 100644
index 0000000..54d20b2
--- /dev/null
+++ b/src/include/lib/binaryheap.h
@@ -0,0 +1,121 @@
+/*
+ * binaryheap.h
+ *
+ * A simple binary heap implementation
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * src/include/lib/binaryheap.h
+ */
+
+#ifndef BINARYHEAP_H
+#define BINARYHEAP_H
+
+/*
+ * This structure represents a single node in a binaryheap, and just
+ * holds two pointers. The heap management code doesn't care what is
+ * stored in a node (in particular, the key or value may be NULL),
+ * only that the comparator function can compare any two nodes.
+ */
+
+typedef struct binaryheap_node
+{
+ void *key;
+ void *value;
+} binaryheap_node;
+
+/*
+ * For a max-heap, the comparator must return:
+ * -1 iff a < b
+ * 0 iff a == b
+ * +1 iff a > b
+ * For a min-heap, the conditions are reversed.
+ */
+typedef int (*binaryheap_comparator) (binaryheap_node * a, binaryheap_node * b);
+
+/*
+ * binaryheap
+ *
+ * size how many nodes are currently in "nodes"
+ * space how many nodes can be stored in "nodes"
+ * comparator comparator to define the heap property
+ * nodes the first of a list of "space" nodes
+ */
+typedef struct binaryheap
+{
+ size_t size;
+ size_t space;
+ binaryheap_comparator compare;
+ binaryheap_node nodes[1];
+} binaryheap;
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function.
+ */
+binaryheap * binaryheap_allocate(size_t capacity, binaryheap_comparator compare);
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+void binaryheap_free(binaryheap * heap);
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+ * in O(1) without preserving the heap property. This is a convenience
+ * to add elements quickly to a new heap. To obtain a valid heap, one
+ * must call binaryheap_build() afterwards.
+ */
+void binaryheap_add_unordered(binaryheap * heap, void *key, void *value);
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+void binaryheap_build(binaryheap * heap);
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+ * preserving the heap property.
+ */
+void binaryheap_add(binaryheap * heap, void *key, void *value);
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+binaryheap_node * binaryheap_first(binaryheap * heap);
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+binaryheap_node * binaryheap_remove_first(binaryheap * heap);
+
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
+ * ensure that the heap property is preserved. O(1) in the best case,
+ * or O(log n) if it must fall back to sifting the new node down.
+ */
+void binaryheap_replace_first(binaryheap * heap, void *newkey, void *newval);
+
+#endif /* BINARYHEAP_H */
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
new file mode 100644
index 0000000..c6d00d5
--- /dev/null
+++ b/src/backend/lib/binaryheap.c
@@ -0,0 +1,330 @@
+/*-------------------------------------------------------------------------
+ *
+ * binaryheap.c
+ * A simple binary heap implementaion
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/lib/binaryheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/binaryheap.h"
+
+static void sift_down(binaryheap * heap, size_t node_off);
+static void sift_up(binaryheap * heap, size_t node_off);
+static inline void swap_nodes(binaryheap * heap, size_t a, size_t b);
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function.
+ */
+binaryheap *
+binaryheap_allocate(size_t capacity, binaryheap_comparator compare)
+{
+ int sz = sizeof(binaryheap) + sizeof(binaryheap_node) * (capacity - 1);
+
+ binaryheap *heap = palloc(sz);
+
+ heap->compare = compare;
+ heap->space = capacity;
+ heap->size = 0;
+
+ return heap;
+}
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+void
+binaryheap_free(binaryheap * heap)
+{
+ pfree(heap);
+}
+
+/*
+ * These utility functions return the offset of the left child, right
+ * child, and parent of the node at the given index, respectively.
+ *
+ * The heap is represented as an array of nodes, with the root node
+ * stored at index 0. The left child of node i is at index 2*i+1, and
+ * the right child at 2*i+2. The parent of node i is at index (i-1)/2.
+ */
+
+static inline int
+left_offset(size_t i)
+{
+ return 2 * i + 1;
+}
+
+static inline int
+right_offset(size_t i)
+{
+ return 2 * i + 2;
+}
+
+static inline int
+parent_offset(size_t i)
+{
+ return floor((i - 1) / 2);
+}
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+ * in O(1) without preserving the heap property. This is a convenience
+ * to add elements quickly to a new heap. To obtain a valid heap, one
+ * must call binaryheap_build() afterwards.
+ */
+void
+binaryheap_add_unordered(binaryheap * heap, void *key, void *value)
+{
+ Assert(heap->size < heap->space);
+ heap->nodes[heap->size].key = key;
+ heap->nodes[heap->size].value = value;
+ heap->size++;
+}
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+void
+binaryheap_build(binaryheap * heap)
+{
+ int i;
+
+ for (i = parent_offset(heap->size - 1); i >= 0; i--)
+ sift_down(heap, i);
+}
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+ * preserving the heap property.
+ */
+void
+binaryheap_add(binaryheap * heap, void *key, void *value)
+{
+ binaryheap_add_unordered(heap, key, value);
+ sift_up(heap, heap->size - 1);
+}
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+binaryheap_node *
+binaryheap_first(binaryheap * heap)
+{
+ if (!heap->size)
+ return NULL;
+ return &heap->nodes[0];
+}
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+binaryheap_node *
+binaryheap_remove_first(binaryheap * heap)
+{
+ if (heap->size == 0)
+ return NULL;
+
+ if (heap->size == 1)
+ {
+ heap->size--;
+ return &heap->nodes[0];
+ }
+
+ /*
+ * Swap the root and last nodes, decrease the size of the heap (i.e.
+ * remove the former root node) and sift the new root node down to its
+ * correct position.
+ */
+
+ swap_nodes(heap, 0, heap->size - 1);
+
+ heap->size--;
+ sift_down(heap, 0);
+ return &heap->nodes[heap->size];
+}
+
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
+ * ensure that the heap property is preserved. O(1) in the best case,
+ * or O(log n) if it must fall back to sifting the new node down.
+ */
+void
+binaryheap_replace_first(binaryheap * heap, void *key, void *val)
+{
+ int cmp;
+ size_t next_off = 0;
+
+ if (key)
+ heap->nodes[0].key = key;
+ if (val)
+ heap->nodes[0].value = val;
+
+ /*
+ * If the new root node is still larger than the largest of its children
+ * (of which there may be 0, 1, or 2), then the heap is still valid.
+ */
+
+ if (heap->size == 1)
+ return;
+
+ next_off = 1;
+ if (heap->size > 2)
+ {
+ /* Two children: pick the larger one */
+ cmp = heap->compare(&heap->nodes[2], &heap->nodes[1]);
+ if (cmp > 0)
+ next_off = 2;
+ }
+
+ cmp = heap->compare(&heap->nodes[next_off], &heap->nodes[0]);
+ if (cmp < 0)
+ return;
+
+ /* The child is larger. Swap and sift the new node down. */
+
+ swap_nodes(heap, 0, next_off);
+ sift_down(heap, next_off);
+}
+
+/*
+ * Swap the contents of two nodes.
+ */
+static inline void
+swap_nodes(binaryheap * heap, size_t a, size_t b)
+{
+ binaryheap_node swap;
+
+ swap.value = heap->nodes[a].value;
+ swap.key = heap->nodes[a].key;
+
+ heap->nodes[a].value = heap->nodes[b].value;
+ heap->nodes[a].key = heap->nodes[b].key;
+
+ heap->nodes[b].key = swap.key;
+ heap->nodes[b].value = swap.value;
+}
+
+/*
+ * Sift a node up to the highest position it can hold according to the
+ * comparator.
+ */
+static void
+sift_up(binaryheap * heap, size_t node_off)
+{
+ while (true)
+ {
+ int cmp;
+ size_t parent_off;
+
+ if (node_off == 0)
+ break;
+
+ /*
+ * If this node is smaller than its parent, the heap condition is
+ * satisfied, and we're done.
+ */
+
+ parent_off = parent_offset(node_off);
+ cmp = heap->compare(&heap->nodes[node_off],
+ &heap->nodes[parent_off]);
+ if (cmp <= 0)
+ break;
+
+ /*
+ * Otherwise, swap the node and its parent and go on to check the
+ * node's new parent.
+ */
+
+ swap_nodes(heap, node_off, parent_off);
+ node_off = parent_off;
+ }
+}
+
+/*
+ * Sift a node down from its current position to satisfy the heap
+ * property.
+ */
+static void
+sift_down(binaryheap * heap, size_t node_off)
+{
+ while (true)
+ {
+ size_t left_off = left_offset(node_off);
+ size_t right_off = right_offset(node_off);
+ size_t swap_off = 0;
+
+ /* Is the left child larger than the parent? */
+
+ if (left_off < heap->size &&
+ heap->compare(&heap->nodes[node_off],
+ &heap->nodes[left_off]) < 0)
+ {
+ swap_off = left_off;
+ }
+
+ /*
+ * If not (note: only one child can violate the heap property after a
+ * change), is the right child larger?
+ */
+
+ if (right_off < heap->size &&
+ heap->compare(&heap->nodes[node_off],
+ &heap->nodes[right_off]) < 0)
+ {
+ /* swap with the larger child */
+ if (!swap_off ||
+ heap->compare(&heap->nodes[left_off],
+ &heap->nodes[right_off]) < 0)
+ {
+ swap_off = right_off;
+ }
+ }
+
+ /*
+ * If we didn't find anything to swap, the heap condition is
+ * satisfied, and we're done.
+ */
+
+ if (!swap_off)
+ break;
+
+ /*
+ * Otherwise, swap the node with the child that violates the heap
+ * property; then go on to check its children.
+ */
+
+ swap_nodes(heap, swap_off, node_off);
+ node_off = swap_off;
+ }
+}
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fec07b8..d4911bd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1100,10 +1100,8 @@ typedef struct AppendState
* nkeys number of sort key columns
* sortkeys sort keys in SortSupport representation
* slots current output tuple of each subplan
- * heap heap of active tuples (represented as array indexes)
- * heap_size number of active heap entries
+ * heap heap of active tuples
* initialized true if we have fetched first tuple from each subplan
- * last_slot last subplan fetched from (which must be re-called)
* ----------------
*/
typedef struct MergeAppendState
@@ -1114,10 +1112,8 @@ typedef struct MergeAppendState
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
- int *ms_heap; /* array of length ms_nplans */
- int ms_heap_size; /* current active length of ms_heap[] */
+ struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- int ms_last_slot; /* last subplan slot we returned from */
} MergeAppendState;
/* ----------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index d5141ba..fa05ddb 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -41,6 +41,8 @@
#include "executor/execdebug.h"
#include "executor/nodeMergeAppend.h"
+#include "lib/binaryheap.h"
+
/*
* It gets quite confusing having a heap array (indexed by integers) which
* contains integers which index into the slots array. These typedefs try to
@@ -49,9 +51,7 @@
typedef int SlotNumber;
typedef int HeapPosition;
-static void heap_insert_slot(MergeAppendState *node, SlotNumber new_slot);
-static void heap_siftup_slot(MergeAppendState *node);
-static int32 heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2);
+static int heap_compare_slots(binaryheap_node * a, binaryheap_node * b);
/* ----------------------------------------------------------------
@@ -88,7 +88,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
- mergestate->ms_heap = (int *) palloc0(sizeof(int) * nplans);
+ mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots);
/*
* Miscellaneous initialization
@@ -143,9 +143,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize to show we have not run the subplans yet
*/
- mergestate->ms_heap_size = 0;
mergestate->ms_initialized = false;
- mergestate->ms_last_slot = -1;
return mergestate;
}
@@ -160,6 +158,7 @@ TupleTableSlot *
ExecMergeAppend(MergeAppendState *node)
{
TupleTableSlot *result;
+ binaryheap_node *hn;
SlotNumber i;
if (!node->ms_initialized)
@@ -172,7 +171,7 @@ ExecMergeAppend(MergeAppendState *node)
{
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_add(node->ms_heap, node, (void *) i);
}
node->ms_initialized = true;
}
@@ -184,19 +183,20 @@ ExecMergeAppend(MergeAppendState *node)
* the logic a bit by doing this before returning from the prior call,
* but it's better to not pull tuples until necessary.)
*/
- i = node->ms_last_slot;
+ hn = binaryheap_first(node->ms_heap);
+ i = (int) hn->value;
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_replace_first(node->ms_heap, node, (void *) i);
+ else
+ (void) binaryheap_remove_first(node->ms_heap);
}
- if (node->ms_heap_size > 0)
+ hn = binaryheap_first(node->ms_heap);
+ if (hn)
{
- /* Return the topmost heap node, and sift up the remaining nodes */
- i = node->ms_heap[0];
+ i = (int) hn->value;
result = node->ms_slots[i];
- node->ms_last_slot = i;
- heap_siftup_slot(node);
}
else
{
@@ -208,65 +208,15 @@ ExecMergeAppend(MergeAppendState *node)
}
/*
- * Insert a new slot into the heap. The slot must contain a valid tuple.
- */
-static void
-heap_insert_slot(MergeAppendState *node, SlotNumber new_slot)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition j;
-
- Assert(!TupIsNull(node->ms_slots[new_slot]));
-
- j = node->ms_heap_size++; /* j is where the "hole" is */
- while (j > 0)
- {
- int i = (j - 1) / 2;
-
- if (heap_compare_slots(node, new_slot, node->ms_heap[i]) >= 0)
- break;
- heap[j] = heap[i];
- j = i;
- }
- heap[j] = new_slot;
-}
-
-/*
- * Delete the heap top (the slot in heap[0]), and sift up.
- */
-static void
-heap_siftup_slot(MergeAppendState *node)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition i,
- n;
-
- if (--node->ms_heap_size <= 0)
- return;
- n = node->ms_heap_size; /* heap[n] needs to be reinserted */
- i = 0; /* i is where the "hole" is */
- for (;;)
- {
- int j = 2 * i + 1;
-
- if (j >= n)
- break;
- if (j + 1 < n && heap_compare_slots(node, heap[j], heap[j + 1]) > 0)
- j++;
- if (heap_compare_slots(node, heap[n], heap[j]) <= 0)
- break;
- heap[i] = heap[j];
- i = j;
- }
- heap[i] = heap[n];
-}
-
-/*
* Compare the tuples in the two given slots.
*/
static int32
-heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
+heap_compare_slots(binaryheap_node * a, binaryheap_node * b)
{
+ MergeAppendState *node = (MergeAppendState *) a->key;
+ SlotNumber slot1 = (int) a->value;
+ SlotNumber slot2 = (int) b->value;
+
TupleTableSlot *s1 = node->ms_slots[slot1];
TupleTableSlot *s2 = node->ms_slots[slot2];
int nkey;
@@ -291,7 +241,7 @@ heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
datum2, isNull2,
sortKey);
if (compare != 0)
- return compare;
+ return -compare;
}
return 0;
}
@@ -347,7 +297,5 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
- node->ms_heap_size = 0;
node->ms_initialized = false;
- node->ms_last_slot = -1;
}
On Thu, 2012-11-15 at 17:50 +0100, Andres Freund wrote:
Me neither. I was thinking about letting the "user" allocate enough
memory like:binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10));
binaryheap_init(heap, 10, comparator);But thats pretty ugly.
Why not pass the allocator in as a function pointer? It could either be
an argument to the initialization, or we could make a new global
variable that's present inside the server and outside.
(Slight complication: palloc is a macro)
Regards,
Jeff Davis
On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
[ new patch ]
I went over this again today with a view to getting it committed, but
discovered some compiler warnings that look like this:
warning: cast to pointer from integer of different size
The problem seems to be that the binary heap implementation wants the
key and value to be a void *, but the way it's being used in
nodeMergeAppend.c the value is actually an int. I don't think we want
to commit a binary heap implementation which has an impedance mismatch
of this type with its only client. The obvious solution seems to be
to make the key and value each a Datum, and then clients can use
WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
type they happen to have. I'm trying that approach now and will post
an updated patch based on that approach if it seems to work OK and
nobody suggests something better between now and then.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
[ new patch ]
I went over this again today with a view to getting it committed, but
discovered some compiler warnings that look like this:warning: cast to pointer from integer of different size
The problem seems to be that the binary heap implementation wants the
key and value to be a void *, but the way it's being used in
nodeMergeAppend.c the value is actually an int. I don't think we want
to commit a binary heap implementation which has an impedance mismatch
of this type with its only client. The obvious solution seems to be
to make the key and value each a Datum, and then clients can use
WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
type they happen to have. I'm trying that approach now and will post
an updated patch based on that approach if it seems to work OK and
nobody suggests something better between now and then.
Sounds like a good plan, one other alternative would have been declaring
the parameters a size_t (and thus explicitly always big enough to store
a pointer, but also valid to store inline data) instead of using a
void*. But given there is DatumGetPointer/PointerGetDatum et al, using
the postgres-y datatype seems fine to me.
In the LCR case those really are pointers, so preserving that case is
important to me ;), but your proposal does that, so ...
Andres
On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
[ new patch ]
I went over this again today with a view to getting it committed, but
discovered some compiler warnings that look like this:warning: cast to pointer from integer of different size
The problem seems to be that the binary heap implementation wants the
key and value to be a void *, but the way it's being used in
nodeMergeAppend.c the value is actually an int. I don't think we want
to commit a binary heap implementation which has an impedance mismatch
of this type with its only client. The obvious solution seems to be
to make the key and value each a Datum, and then clients can use
WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
type they happen to have. I'm trying that approach now and will post
an updated patch based on that approach if it seems to work OK and
nobody suggests something better between now and then.Sounds like a good plan, one other alternative would have been declaring
the parameters a size_t (and thus explicitly always big enough to store
a pointer, but also valid to store inline data) instead of using a
void*. But given there is DatumGetPointer/PointerGetDatum et al, using
the postgres-y datatype seems fine to me.In the LCR case those really are pointers, so preserving that case is
important to me ;), but your proposal does that, so ...
On further reflection, I'm wondering if it wouldn't be a smarter idea
to just get rid of the binaryheap_node concept altogether. Instead,
when you allocate a binary heap, you specify an "element size" as well
as a capacity. Then, the binary heap can truly treat the elements as
an opaque array of bytes rather than a struct of any particular type.
Of course, the disadvantage of this approach is that it's likely to be
slightly slower, because we'll need to do a multiplication by a
variable rather than simple bit shift. But the extra flexibility
might be worth it, because for example for merge append this is kind
of inefficient from a storage perspective. We only really need N+1
pointers, but we've got storage for 2N. With the above change plus a
user-specified third argument for the comparator function (passed as
yet another argument to binaryheap_allocate) we could get around that
while preserving the option for LCR or any other client code to do as
it sees fit.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2012-11-20 15:06:58 -0500, Robert Haas wrote:
On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
[ new patch ]
I went over this again today with a view to getting it committed, but
discovered some compiler warnings that look like this:warning: cast to pointer from integer of different size
The problem seems to be that the binary heap implementation wants the
key and value to be a void *, but the way it's being used in
nodeMergeAppend.c the value is actually an int. I don't think we want
to commit a binary heap implementation which has an impedance mismatch
of this type with its only client. The obvious solution seems to be
to make the key and value each a Datum, and then clients can use
WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
type they happen to have. I'm trying that approach now and will post
an updated patch based on that approach if it seems to work OK and
nobody suggests something better between now and then.Sounds like a good plan, one other alternative would have been declaring
the parameters a size_t (and thus explicitly always big enough to store
a pointer, but also valid to store inline data) instead of using a
void*. But given there is DatumGetPointer/PointerGetDatum et al, using
the postgres-y datatype seems fine to me.In the LCR case those really are pointers, so preserving that case is
important to me ;), but your proposal does that, so ...
On further reflection, I'm wondering if it wouldn't be a smarter idea
to just get rid of the binaryheap_node concept altogether. Instead,
when you allocate a binary heap, you specify an "element size" as well
as a capacity. Then, the binary heap can truly treat the elements as
an opaque array of bytes rather than a struct of any particular type.
Of course, the disadvantage of this approach is that it's likely to be
slightly slower, because we'll need to do a multiplication by a
variable rather than simple bit shift. But the extra flexibility
might be worth it, because for example for merge append this is kind
of inefficient from a storage perspective. We only really need N+1
pointers, but we've got storage for 2N. With the above change plus a
user-specified third argument for the comparator function (passed as
yet another argument to binaryheap_allocate) we could get around that
while preserving the option for LCR or any other client code to do as
it sees fit.
Thoughts?
I don't have a fundamental problem with it, but I don't think it's worth
the cost. If you have variable sized (from the point of the compiler)
values you either have more complex offset computations to the
individual elements or one more indirection due to a pointer array.
Do you feel strongly about it? If so, go ahead, otherwise I would prefer
the current way (with parameters changed to be Datum's of course).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't have a fundamental problem with it, but I don't think it's worth
the cost. If you have variable sized (from the point of the compiler)
values you either have more complex offset computations to the
individual elements or one more indirection due to a pointer array.Do you feel strongly about it? If so, go ahead, otherwise I would prefer
the current way (with parameters changed to be Datum's of course).
Here's a revised patch that takes the approach of just changing void *
to Datum; it includes a bunch of other cleanups that I felt were
worthwhile. I tested this using the following setup:
CREATE TABLE sample (a int, b numeric);
DO $$
BEGIN
FOR i IN 1 .. 1000 LOOP
EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i
|| ')) INHERITS (sample)';
EXECUTE 'INSERT INTO sample' || i
|| ' SELECT g, ' || i || ' FROM
generate_series(1,100000) g;';
EXECUTE 'ALTER TABLE sample' || i
|| ' ADD PRIMARY KEY (a)';
END LOOP;
END
$$;
VACUUM ANALYZE;
And this test query:
select sum(1) from (select * from sample order by a limit 250) x;
On my system, on repeated execution, this takes about 113 ms with
unpatched master and about 114 ms with the patch. I'm inclined to
think that's not worth getting excited about, as it could be simply
code shifting around across cache line boundaries and not indicative
of an actual regression due to the difference in logic. Even if there
is a regression, it's pretty darn small: most people will not have
1000 partitions, and those that do will often find query execution
time dominated by other factors. Against that, there is positive
value in having this code somewhat more abstracted.
With respect to the question of the API, no, I don't feel
overwhelmingly strongly that we have to whack the API with a bat
that's larger than the one I've used here. On the flip side, it
wouldn't surprise me if, as the code acquires more users, we find that
we actually do want to make some changes, either the one I already
proposed or something else. Fortunately, it's not a lot of code, so
if we end up refactoring it some more later, it shouldn't be a big
deal. So I'm happy enough to commit this the way I have it and sort
out the rest as we go. If there are no objections I'll go ahead and
do that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
binaryheap-rmh.patchapplication/octet-stream; name=binaryheap-rmh.patchDownload
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index d5141ba..847257a 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -41,17 +41,16 @@
#include "executor/execdebug.h"
#include "executor/nodeMergeAppend.h"
+#include "lib/binaryheap.h"
+
/*
- * It gets quite confusing having a heap array (indexed by integers) which
- * contains integers which index into the slots array. These typedefs try to
- * clear it up, but they're only documentation.
+ * We have one slot for each item in the heap array. We use SlotNumber
+ * to store slot indexes. This doesn't actually provide any formal
+ * type-safety, but it makes the code more self-documenting.
*/
-typedef int SlotNumber;
-typedef int HeapPosition;
+typedef int32 SlotNumber;
-static void heap_insert_slot(MergeAppendState *node, SlotNumber new_slot);
-static void heap_siftup_slot(MergeAppendState *node);
-static int32 heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2);
+static int heap_compare_slots(binaryheap_node *a, binaryheap_node *b);
/* ----------------------------------------------------------------
@@ -88,7 +87,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
- mergestate->ms_heap = (int *) palloc0(sizeof(int) * nplans);
+ mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots);
/*
* Miscellaneous initialization
@@ -143,9 +142,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize to show we have not run the subplans yet
*/
- mergestate->ms_heap_size = 0;
mergestate->ms_initialized = false;
- mergestate->ms_last_slot = -1;
return mergestate;
}
@@ -160,6 +157,7 @@ TupleTableSlot *
ExecMergeAppend(MergeAppendState *node)
{
TupleTableSlot *result;
+ binaryheap_node *hn;
SlotNumber i;
if (!node->ms_initialized)
@@ -172,8 +170,10 @@ ExecMergeAppend(MergeAppendState *node)
{
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i),
+ PointerGetDatum(node));
}
+ binaryheap_build(node->ms_heap);
node->ms_initialized = true;
}
else
@@ -184,19 +184,22 @@ ExecMergeAppend(MergeAppendState *node)
* the logic a bit by doing this before returning from the prior call,
* but it's better to not pull tuples until necessary.)
*/
- i = node->ms_last_slot;
+ hn = binaryheap_first(node->ms_heap);
+ Assert(hn != NULL);
+ i = DatumGetInt32(hn->key);
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_replace_first(node->ms_heap, Int32GetDatum(i),
+ PointerGetDatum(node));
+ else
+ (void) binaryheap_remove_first(node->ms_heap);
}
- if (node->ms_heap_size > 0)
+ hn = binaryheap_first(node->ms_heap);
+ if (hn != NULL)
{
- /* Return the topmost heap node, and sift up the remaining nodes */
- i = node->ms_heap[0];
+ i = DatumGetInt32(hn->key);
result = node->ms_slots[i];
- node->ms_last_slot = i;
- heap_siftup_slot(node);
}
else
{
@@ -208,65 +211,15 @@ ExecMergeAppend(MergeAppendState *node)
}
/*
- * Insert a new slot into the heap. The slot must contain a valid tuple.
- */
-static void
-heap_insert_slot(MergeAppendState *node, SlotNumber new_slot)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition j;
-
- Assert(!TupIsNull(node->ms_slots[new_slot]));
-
- j = node->ms_heap_size++; /* j is where the "hole" is */
- while (j > 0)
- {
- int i = (j - 1) / 2;
-
- if (heap_compare_slots(node, new_slot, node->ms_heap[i]) >= 0)
- break;
- heap[j] = heap[i];
- j = i;
- }
- heap[j] = new_slot;
-}
-
-/*
- * Delete the heap top (the slot in heap[0]), and sift up.
- */
-static void
-heap_siftup_slot(MergeAppendState *node)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition i,
- n;
-
- if (--node->ms_heap_size <= 0)
- return;
- n = node->ms_heap_size; /* heap[n] needs to be reinserted */
- i = 0; /* i is where the "hole" is */
- for (;;)
- {
- int j = 2 * i + 1;
-
- if (j >= n)
- break;
- if (j + 1 < n && heap_compare_slots(node, heap[j], heap[j + 1]) > 0)
- j++;
- if (heap_compare_slots(node, heap[n], heap[j]) <= 0)
- break;
- heap[i] = heap[j];
- i = j;
- }
- heap[i] = heap[n];
-}
-
-/*
* Compare the tuples in the two given slots.
*/
static int32
-heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
+heap_compare_slots(binaryheap_node *a, binaryheap_node *b)
{
+ MergeAppendState *node = (MergeAppendState *) a->value;
+ SlotNumber slot1 = DatumGetInt32(a->key);
+ SlotNumber slot2 = DatumGetInt32(b->key);
+
TupleTableSlot *s1 = node->ms_slots[slot1];
TupleTableSlot *s2 = node->ms_slots[slot2];
int nkey;
@@ -291,7 +244,7 @@ heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
datum2, isNull2,
sortKey);
if (compare != 0)
- return compare;
+ return -compare;
}
return 0;
}
@@ -347,7 +300,5 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
- node->ms_heap_size = 0;
node->ms_initialized = false;
- node->ms_last_slot = -1;
}
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 98ce3d7..327a1bc 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = ilist.o stringinfo.o
+OBJS = ilist.o binaryheap.o stringinfo.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
new file mode 100644
index 0000000..5629bb3
--- /dev/null
+++ b/src/backend/lib/binaryheap.c
@@ -0,0 +1,321 @@
+/*-------------------------------------------------------------------------
+ *
+ * binaryheap.c
+ * A simple binary heap implementaion
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/lib/binaryheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/binaryheap.h"
+
+static void sift_down(binaryheap *heap, size_t node_off);
+static void sift_up(binaryheap *heap, size_t node_off);
+static inline void swap_nodes(binaryheap *heap, size_t a, size_t b);
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function.
+ */
+binaryheap *
+binaryheap_allocate(size_t capacity, binaryheap_comparator compare)
+{
+ int sz;
+ binaryheap *heap;
+
+ sz = sizeof(binaryheap) + sizeof(binaryheap_node) * (capacity - 1);
+ heap = palloc(sz);
+ heap->compare = compare;
+ heap->space = capacity;
+ heap->size = 0;
+
+ return heap;
+}
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+void
+binaryheap_free(binaryheap *heap)
+{
+ pfree(heap);
+}
+
+/*
+ * These utility functions return the offset of the left child, right
+ * child, and parent of the node at the given index, respectively.
+ *
+ * The heap is represented as an array of nodes, with the root node
+ * stored at index 0. The left child of node i is at index 2*i+1, and
+ * the right child at 2*i+2. The parent of node i is at index (i-1)/2.
+ */
+
+static inline int
+left_offset(size_t i)
+{
+ return 2 * i + 1;
+}
+
+static inline int
+right_offset(size_t i)
+{
+ return 2 * i + 2;
+}
+
+static inline int
+parent_offset(size_t i)
+{
+ return (i - 1) / 2;
+}
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+ * in O(1) without preserving the heap property. This is a convenience
+ * to add elements quickly to a new heap. To obtain a valid heap, one
+ * must call binaryheap_build() afterwards.
+ */
+void
+binaryheap_add_unordered(binaryheap *heap, Datum key, Datum value)
+{
+ Assert(heap->size < heap->space);
+ heap->nodes[heap->size].key = key;
+ heap->nodes[heap->size].value = value;
+ heap->size++;
+}
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+void
+binaryheap_build(binaryheap *heap)
+{
+ int i;
+
+ for (i = parent_offset(heap->size - 1); i >= 0; i--)
+ sift_down(heap, i);
+}
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+ * preserving the heap property.
+ */
+void
+binaryheap_add(binaryheap *heap, Datum key, Datum value)
+{
+ binaryheap_add_unordered(heap, key, value);
+ sift_up(heap, heap->size - 1);
+}
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+binaryheap_node *
+binaryheap_first(binaryheap *heap)
+{
+ if (!heap->size)
+ return NULL;
+ return &heap->nodes[0];
+}
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+binaryheap_node *
+binaryheap_remove_first(binaryheap *heap)
+{
+ if (heap->size == 0)
+ return NULL;
+
+ if (heap->size == 1)
+ {
+ heap->size--;
+ return &heap->nodes[0];
+ }
+
+ /*
+ * Swap the root and last nodes, decrease the size of the heap (i.e.
+ * remove the former root node) and sift the new root node down to its
+ * correct position.
+ */
+ swap_nodes(heap, 0, heap->size - 1);
+ heap->size--;
+ sift_down(heap, 0);
+
+ return &heap->nodes[heap->size];
+}
+
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
+ * ensure that the heap property is preserved. O(1) in the best case,
+ * or O(log n) if it must fall back to sifting the new node down.
+ */
+void
+binaryheap_replace_first(binaryheap *heap, Datum key, Datum val)
+{
+ int cmp;
+ size_t next_off = 0;
+
+ if (key)
+ heap->nodes[0].key = key;
+ if (val)
+ heap->nodes[0].value = val;
+
+ /*
+ * If the new root node is still larger than the largest of its children
+ * (of which there may be 0, 1, or 2), then the heap is still valid.
+ */
+ if (heap->size == 1)
+ return;
+
+ next_off = 1;
+ if (heap->size > 2)
+ {
+ /* Two children: pick the larger one */
+ cmp = heap->compare(&heap->nodes[2], &heap->nodes[1]);
+ if (cmp > 0)
+ next_off = 2;
+ }
+
+ cmp = heap->compare(&heap->nodes[next_off], &heap->nodes[0]);
+ if (cmp < 0)
+ return;
+
+ /* The child is larger. Swap and sift the new node down. */
+ swap_nodes(heap, 0, next_off);
+ sift_down(heap, next_off);
+}
+
+/*
+ * Swap the contents of two nodes.
+ */
+static inline void
+swap_nodes(binaryheap *heap, size_t a, size_t b)
+{
+ binaryheap_node swap;
+
+ swap.value = heap->nodes[a].value;
+ swap.key = heap->nodes[a].key;
+
+ heap->nodes[a].value = heap->nodes[b].value;
+ heap->nodes[a].key = heap->nodes[b].key;
+
+ heap->nodes[b].key = swap.key;
+ heap->nodes[b].value = swap.value;
+}
+
+/*
+ * Sift a node up to the highest position it can hold according to the
+ * comparator.
+ */
+static void
+sift_up(binaryheap *heap, size_t node_off)
+{
+ while (node_off != 0)
+ {
+ int cmp;
+ size_t parent_off;
+
+ /*
+ * If this node is smaller than its parent, the heap condition is
+ * satisfied, and we're done.
+ */
+ parent_off = parent_offset(node_off);
+ cmp = heap->compare(&heap->nodes[node_off],
+ &heap->nodes[parent_off]);
+ if (cmp <= 0)
+ break;
+
+ /*
+ * Otherwise, swap the node and its parent and go on to check the
+ * node's new parent.
+ */
+ swap_nodes(heap, node_off, parent_off);
+ node_off = parent_off;
+ }
+}
+
+/*
+ * Sift a node down from its current position to satisfy the heap
+ * property.
+ */
+static void
+sift_down(binaryheap *heap, size_t node_off)
+{
+ while (true)
+ {
+ size_t left_off = left_offset(node_off);
+ size_t right_off = right_offset(node_off);
+ size_t swap_off = 0;
+
+ /* Is the left child larger than the parent? */
+
+ if (left_off < heap->size &&
+ heap->compare(&heap->nodes[node_off],
+ &heap->nodes[left_off]) < 0)
+ {
+ swap_off = left_off;
+ }
+
+ /*
+ * If not (note: only one child can violate the heap property after a
+ * change), is the right child larger?
+ */
+
+ if (right_off < heap->size &&
+ heap->compare(&heap->nodes[node_off],
+ &heap->nodes[right_off]) < 0)
+ {
+ /* swap with the larger child */
+ if (!swap_off ||
+ heap->compare(&heap->nodes[left_off],
+ &heap->nodes[right_off]) < 0)
+ {
+ swap_off = right_off;
+ }
+ }
+
+ /*
+ * If we didn't find anything to swap, the heap condition is
+ * satisfied, and we're done.
+ */
+ if (!swap_off)
+ break;
+
+ /*
+ * Otherwise, swap the node with the child that violates the heap
+ * property; then go on to check its children.
+ */
+ swap_nodes(heap, swap_off, node_off);
+ node_off = swap_off;
+ }
+}
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
new file mode 100644
index 0000000..c9e478d
--- /dev/null
+++ b/src/include/lib/binaryheap.h
@@ -0,0 +1,59 @@
+/*
+ * binaryheap.h
+ *
+ * A simple binary heap implementation
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * src/include/lib/binaryheap.h
+ */
+
+#ifndef BINARYHEAP_H
+#define BINARYHEAP_H
+
+/*
+ * This structure represents a single node in a binaryheap, and just
+ * holds two pointers. The heap management code doesn't care what is
+ * stored in a node, only that the comparator function can compare any
+ * two nodes.
+ */
+typedef struct binaryheap_node
+{
+ Datum key;
+ Datum value;
+} binaryheap_node;
+
+/*
+ * For a max-heap, the comparator must return -1 iff a < b, 0 iff a == b,
+ * and +1 iff a > b. For a min-heap, the conditions are reversed.
+ */
+typedef int (*binaryheap_comparator) (binaryheap_node *a, binaryheap_node *b);
+
+/*
+ * binaryheap
+ *
+ * size how many nodes are currently in "nodes"
+ * space how many nodes can be stored in "nodes"
+ * comparator comparator to define the heap property
+ * nodes the first of a list of "space" nodes
+ */
+typedef struct binaryheap
+{
+ size_t size;
+ size_t space;
+ binaryheap_comparator compare;
+ binaryheap_node nodes[1];
+} binaryheap;
+
+extern binaryheap *binaryheap_allocate(size_t capacity,
+ binaryheap_comparator compare);
+extern void binaryheap_free(binaryheap *heap);
+extern void binaryheap_add_unordered(binaryheap *heap, Datum key, Datum value);
+extern void binaryheap_build(binaryheap *heap);
+extern void binaryheap_add(binaryheap *heap, Datum key, Datum value);
+extern binaryheap_node *binaryheap_first(binaryheap *heap);
+extern binaryheap_node *binaryheap_remove_first(binaryheap *heap);
+extern void binaryheap_replace_first(binaryheap *heap, Datum newkey,
+ Datum newval);
+
+#endif /* BINARYHEAP_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fec07b8..d4911bd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1100,10 +1100,8 @@ typedef struct AppendState
* nkeys number of sort key columns
* sortkeys sort keys in SortSupport representation
* slots current output tuple of each subplan
- * heap heap of active tuples (represented as array indexes)
- * heap_size number of active heap entries
+ * heap heap of active tuples
* initialized true if we have fetched first tuple from each subplan
- * last_slot last subplan fetched from (which must be re-called)
* ----------------
*/
typedef struct MergeAppendState
@@ -1114,10 +1112,8 @@ typedef struct MergeAppendState
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
- int *ms_heap; /* array of length ms_nplans */
- int ms_heap_size; /* current active length of ms_heap[] */
+ struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- int ms_last_slot; /* last subplan slot we returned from */
} MergeAppendState;
/* ----------------
On 2012-11-20 15:47:25 -0500, Robert Haas wrote:
On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't have a fundamental problem with it, but I don't think it's worth
the cost. If you have variable sized (from the point of the compiler)
values you either have more complex offset computations to the
individual elements or one more indirection due to a pointer array.Do you feel strongly about it? If so, go ahead, otherwise I would prefer
the current way (with parameters changed to be Datum's of course).Here's a revised patch that takes the approach of just changing void *
to Datum; it includes a bunch of other cleanups that I felt were
worthwhile. I tested this using the following setup:CREATE TABLE sample (a int, b numeric);
DO $$
BEGIN
FOR i IN 1 .. 1000 LOOP
EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i
|| ')) INHERITS (sample)';
EXECUTE 'INSERT INTO sample' || i
|| ' SELECT g, ' || i || ' FROM
generate_series(1,100000) g;';
EXECUTE 'ALTER TABLE sample' || i
|| ' ADD PRIMARY KEY (a)';
END LOOP;
END
$$;
VACUUM ANALYZE;And this test query:
select sum(1) from (select * from sample order by a limit 250) x;
On my system, on repeated execution, this takes about 113 ms with
unpatched master and about 114 ms with the patch. I'm inclined to
think that's not worth getting excited about, as it could be simply
code shifting around across cache line boundaries and not indicative
of an actual regression due to the difference in logic. Even if there
is a regression, it's pretty darn small: most people will not have
1000 partitions, and those that do will often find query execution
time dominated by other factors. Against that, there is positive
value in having this code somewhat more abstracted.
To make sure were not regressing I ran some more testcases that I could
think of/find and didn't find anything problematic. Either both are
equally fast or the new code is faster.
FWIW for me the new code is very slightly faster in your example, but
only if I apply it ontop of fklocks (where I applied it accidentally at
first), not if ontop of 273986bf0d39e5166eb15ba42ebff4803e23a688 where
its reversed, so your code-layout theory sounds likely.
With respect to the question of the API, no, I don't feel
overwhelmingly strongly that we have to whack the API with a bat
that's larger than the one I've used here. On the flip side, it
wouldn't surprise me if, as the code acquires more users, we find that
we actually do want to make some changes, either the one I already
proposed or something else. Fortunately, it's not a lot of code, so
if we end up refactoring it some more later, it shouldn't be a big
deal. So I'm happy enough to commit this the way I have it and sort
out the rest as we go. If there are no objections I'll go ahead and
do that.
Looks good to me. Adapting when we have the actual requirements sounds
fine & sensible as well...
Thanks,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes:
Here's a revised patch that takes the approach of just changing void *
to Datum; it includes a bunch of other cleanups that I felt were
worthwhile.
The comment in binaryheap.h says
* For a max-heap, the comparator must return -1 iff a < b, 0 iff a == b,
* and +1 iff a > b. For a min-heap, the conditions are reversed.
Is it actually required that the output be exactly these three values,
or is the specification really <0, 0, >0 ? If the latter, it should
say that, rather than overdefine the implementation of comparison.
Also, wouldn't it be reasonable to const-ify the comparator function, ie
typedef int (*binaryheap_comparator) (const binaryheap_node *a, const binaryheap_node *b);
* nodes the first of a list of "space" nodes
s/list/array/, no? Also, it's standard to mark this sort of hack with
/* VARIABLE LENGTH ARRAY */, or even better use FLEXIBLE_ARRAY_MEMBER
(which would require fixing the space calculation in
binaryheap_allocate, not that that would be a bad thing anyway).
left_offset and friends are defined as taking size_t and returning int,
which is broken on machines where size_t is wider than int, or even on
machines where they're the same width since int is signed. Personally
I'd replace pretty much every single usage of size_t in this module with
int, as I am unconvinced either that we need 8-byte indexes or that the
logic is correct if it tries to form index -1 (as would happen for
example with binaryheap_build applied to an empty heap).
The Assert() in binaryheap_add_unordered fills me with no confidence.
If we're not going to support expansion of the array (which might be a
better idea anyway) then this needs to be an actual run-time check, not
something that won't be there in production.
What I think *would* be worth asserting for is whether the heap is
correctly ordered or not --- that is, I think you should add a flag that
is cleared by binaryheap_add_unordered and set by binaryheap_build, and
then the functions that presume correct ordering should assert it's set.
An Assert in binaryheap_replace_first that the heap is not empty seems
appropriate as well.
This in binaryheap_replace_first is simply bizarre:
if (key)
heap->nodes[0].key = key;
if (val)
heap->nodes[0].value = val;
Why are these assignments conditional? If they're sane at all, they
should not be testing the Datums directly in any case, but the result of
DatumGetSomething.
static int32
! heap_compare_slots(binaryheap_node *a, binaryheap_node *b)
{
+ MergeAppendState *node = (MergeAppendState *) a->value;
This should use DatumGetPointer(a->value), and the function result
should be int not int32 to comport with the typedef for it.
While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"? That implies a semantics not actually used
here. Maybe "value1" and "value2" instead?
regards, tom lane
Hi.
A brief response to earlier messages in this thread:
1. I agree that it's a good idea to use Datum rather than void *.
2. I don't think it's worth getting rid of binaryheap_node.
3. I agree that it makes sense to go with what we have now (after
Robert's reworking of my patch) and reconsider if needed when
there are more users of the code.
4. I agree with all of Tom's suggestions as well.
At 2012-11-20 18:01:10 -0500, tgl@sss.pgh.pa.us wrote:
Is it actually required that the output be exactly these three values,
or is the specification really <0, 0, >0 ?
The code did require -1/0/+1, but I changed it to expect only <0, 0, >0.
So you're right, the comment should be changed.
This in binaryheap_replace_first is simply bizarre:
if (key)
heap->nodes[0].key = key;
if (val)
heap->nodes[0].value = val;
It's a leftover from the earlier implementation that used pointers,
where you could pass in NULL to leave either key or value unchanged.
While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"? That implies a semantics not actually used
here. Maybe "value1" and "value2" instead?
Yes, I discussed this with Andres earlier (and considered ptr and value
or p1 and p2), but decided to wait for others to comment on the naming.
I have no problem with value1 and value2, though I'd very slightly
prefer d1 and d2 (d for Datum) now.
Robert: thanks again for your work on the patch. Do you want me to make
the changes Tom suggests above, or are you already doing it?
-- Abhijit
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"? That implies a semantics not actually used
here. Maybe "value1" and "value2" instead?
Yes, I discussed this with Andres earlier (and considered ptr and value
or p1 and p2), but decided to wait for others to comment on the naming.
I have no problem with value1 and value2, though I'd very slightly
prefer d1 and d2 (d for Datum) now.
d1/d2 would be fine with me.
BTW, I probably missed some context upthread, but why do we have two
fields at all? At least for the purposes of nodeMergeAppend, it
would make a lot more sense for the binaryheap elements to be just
single Datums, without all the redundant pointers to the MergeAppend
node. The need to get at the comparison support info could be handled
much more elegantly than that by providing a "void *" passthrough arg.
That is, the heap creation function becomes
binaryheap *binaryheap_allocate(size_t capacity,
binaryheap_comparator compare,
void *passthrough);
and the comparator signature becomes
typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough);
For nodeMergeAppend, the Datums would be the slot indexes and the
passthrough pointer could point to the MergeAppend node.
This would make equal sense in a lot of other contexts, such as if you
wanted a heap composed of tuples -- the info to compare tuples could be
handled via the passthrough argument, rather than doubling the size of
the heap in order to put that pointer into each heap element. If you
have no use for the passthrough arg in a particular usage, just pass
NULL.
regards, tom lane
At 2012-11-20 22:55:52 -0500, tgl@sss.pgh.pa.us wrote:
BTW, I probably missed some context upthread, but why do we have two
fields at all?
I would also have preferred to handle the nodeMergeAppend case using a
context pointer as you suggest, but Andres needs to store two pointers
in his heap nodes.
Andres: suppose we replace binaryheap_node with just a Datum, could you
live with storing a pointer to a struct with two pointers? If so, that
would address the concerns raised.
If not, maybe we should explore Robert's earlier suggestion to make
binaryheap_node user-definable (in effect).
-- Abhijit
On 2012-11-20 22:55:52 -0500, Tom Lane wrote:
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"? That implies a semantics not actually used
here. Maybe "value1" and "value2" instead?Yes, I discussed this with Andres earlier (and considered ptr and value
or p1 and p2), but decided to wait for others to comment on the naming.
I have no problem with value1 and value2, though I'd very slightly
prefer d1 and d2 (d for Datum) now.d1/d2 would be fine with me.
BTW, I probably missed some context upthread, but why do we have two
fields at all? At least for the purposes of nodeMergeAppend, it
would make a lot more sense for the binaryheap elements to be just
single Datums, without all the redundant pointers to the MergeAppend
node. The need to get at the comparison support info could be handled
much more elegantly than that by providing a "void *" passthrough arg.
That is, the heap creation function becomesbinaryheap *binaryheap_allocate(size_t capacity,
binaryheap_comparator compare,
void *passthrough);and the comparator signature becomes
typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough);
For nodeMergeAppend, the Datums would be the slot indexes and the
passthrough pointer could point to the MergeAppend node.This would make equal sense in a lot of other contexts, such as if you
wanted a heap composed of tuples -- the info to compare tuples could be
handled via the passthrough argument, rather than doubling the size of
the heap in order to put that pointer into each heap element. If you
have no use for the passthrough arg in a particular usage, just pass
NULL.
The passthrough argument definitely makes sense, I am all for adding it.
The reasons I originally wanted to have two values was that it made it
easy/possible for the comparison function to work without any pointer
dereferences which was somewhat beneficial to performance. Completely
without dereferences only worked on 64bit systems anyway though...
With just one value I need an extra struct, but thats not too bad.
So if you all prefer just having one value, I am ok with that. Who is
updating the patch?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 21, 2012 at 6:08 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-20 22:55:52 -0500, Tom Lane wrote:
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"? That implies a semantics not actually used
here. Maybe "value1" and "value2" instead?Yes, I discussed this with Andres earlier (and considered ptr and value
or p1 and p2), but decided to wait for others to comment on the naming.
I have no problem with value1 and value2, though I'd very slightly
prefer d1 and d2 (d for Datum) now.d1/d2 would be fine with me.
BTW, I probably missed some context upthread, but why do we have two
fields at all? At least for the purposes of nodeMergeAppend, it
would make a lot more sense for the binaryheap elements to be just
single Datums, without all the redundant pointers to the MergeAppend
node. The need to get at the comparison support info could be handled
much more elegantly than that by providing a "void *" passthrough arg.
That is, the heap creation function becomesbinaryheap *binaryheap_allocate(size_t capacity,
binaryheap_comparator compare,
void *passthrough);and the comparator signature becomes
typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough);
For nodeMergeAppend, the Datums would be the slot indexes and the
passthrough pointer could point to the MergeAppend node.This would make equal sense in a lot of other contexts, such as if you
wanted a heap composed of tuples -- the info to compare tuples could be
handled via the passthrough argument, rather than doubling the size of
the heap in order to put that pointer into each heap element. If you
have no use for the passthrough arg in a particular usage, just pass
NULL.The passthrough argument definitely makes sense, I am all for adding it.
The reasons I originally wanted to have two values was that it made it
easy/possible for the comparison function to work without any pointer
dereferences which was somewhat beneficial to performance. Completely
without dereferences only worked on 64bit systems anyway though...
With just one value I need an extra struct, but thats not too bad.So if you all prefer just having one value, I am ok with that. Who is
updating the patch?
I guess I'll take another whack at it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I guess I'll take another whack at it.
New version attached.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
binaryheap-rmh2.patchapplication/octet-stream; name=binaryheap-rmh2.patchDownload
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index d5141ba..dd8d094 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -41,17 +41,16 @@
#include "executor/execdebug.h"
#include "executor/nodeMergeAppend.h"
+#include "lib/binaryheap.h"
+
/*
- * It gets quite confusing having a heap array (indexed by integers) which
- * contains integers which index into the slots array. These typedefs try to
- * clear it up, but they're only documentation.
+ * We have one slot for each item in the heap array. We use SlotNumber
+ * to store slot indexes. This doesn't actually provide any formal
+ * type-safety, but it makes the code more self-documenting.
*/
-typedef int SlotNumber;
-typedef int HeapPosition;
+typedef int32 SlotNumber;
-static void heap_insert_slot(MergeAppendState *node, SlotNumber new_slot);
-static void heap_siftup_slot(MergeAppendState *node);
-static int32 heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2);
+static int heap_compare_slots(Datum a, Datum b, void *arg);
/* ----------------------------------------------------------------
@@ -88,7 +87,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
- mergestate->ms_heap = (int *) palloc0(sizeof(int) * nplans);
+ mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
+ mergestate);
/*
* Miscellaneous initialization
@@ -143,9 +143,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize to show we have not run the subplans yet
*/
- mergestate->ms_heap_size = 0;
mergestate->ms_initialized = false;
- mergestate->ms_last_slot = -1;
return mergestate;
}
@@ -172,8 +170,9 @@ ExecMergeAppend(MergeAppendState *node)
{
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
+ binaryheap_build(node->ms_heap);
node->ms_initialized = true;
}
else
@@ -184,89 +183,38 @@ ExecMergeAppend(MergeAppendState *node)
* the logic a bit by doing this before returning from the prior call,
* but it's better to not pull tuples until necessary.)
*/
- i = node->ms_last_slot;
+ i = DatumGetInt32(binaryheap_first(node->ms_heap));
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
+ else
+ (void) binaryheap_remove_first(node->ms_heap);
}
- if (node->ms_heap_size > 0)
- {
- /* Return the topmost heap node, and sift up the remaining nodes */
- i = node->ms_heap[0];
- result = node->ms_slots[i];
- node->ms_last_slot = i;
- heap_siftup_slot(node);
- }
- else
+ if (binaryheap_empty(node->ms_heap))
{
/* All the subplans are exhausted, and so is the heap */
result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
-
- return result;
-}
-
-/*
- * Insert a new slot into the heap. The slot must contain a valid tuple.
- */
-static void
-heap_insert_slot(MergeAppendState *node, SlotNumber new_slot)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition j;
-
- Assert(!TupIsNull(node->ms_slots[new_slot]));
-
- j = node->ms_heap_size++; /* j is where the "hole" is */
- while (j > 0)
+ else
{
- int i = (j - 1) / 2;
-
- if (heap_compare_slots(node, new_slot, node->ms_heap[i]) >= 0)
- break;
- heap[j] = heap[i];
- j = i;
+ i = DatumGetInt32(binaryheap_first(node->ms_heap));
+ result = node->ms_slots[i];
}
- heap[j] = new_slot;
-}
-/*
- * Delete the heap top (the slot in heap[0]), and sift up.
- */
-static void
-heap_siftup_slot(MergeAppendState *node)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition i,
- n;
-
- if (--node->ms_heap_size <= 0)
- return;
- n = node->ms_heap_size; /* heap[n] needs to be reinserted */
- i = 0; /* i is where the "hole" is */
- for (;;)
- {
- int j = 2 * i + 1;
-
- if (j >= n)
- break;
- if (j + 1 < n && heap_compare_slots(node, heap[j], heap[j + 1]) > 0)
- j++;
- if (heap_compare_slots(node, heap[n], heap[j]) <= 0)
- break;
- heap[i] = heap[j];
- i = j;
- }
- heap[i] = heap[n];
+ return result;
}
/*
* Compare the tuples in the two given slots.
*/
static int32
-heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
+heap_compare_slots(Datum a, Datum b, void *arg)
{
+ MergeAppendState *node = (MergeAppendState *) arg;
+ SlotNumber slot1 = DatumGetInt32(a);
+ SlotNumber slot2 = DatumGetInt32(b);
+
TupleTableSlot *s1 = node->ms_slots[slot1];
TupleTableSlot *s2 = node->ms_slots[slot2];
int nkey;
@@ -291,7 +239,7 @@ heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
datum2, isNull2,
sortKey);
if (compare != 0)
- return compare;
+ return -compare;
}
return 0;
}
@@ -347,7 +295,5 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
- node->ms_heap_size = 0;
node->ms_initialized = false;
- node->ms_last_slot = -1;
}
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 98ce3d7..327a1bc 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = ilist.o stringinfo.o
+OBJS = ilist.o binaryheap.o stringinfo.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
new file mode 100644
index 0000000..acc8861
--- /dev/null
+++ b/src/backend/lib/binaryheap.c
@@ -0,0 +1,311 @@
+/*-------------------------------------------------------------------------
+ *
+ * binaryheap.c
+ * A simple binary heap implementaion
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/lib/binaryheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/binaryheap.h"
+
+static void sift_down(binaryheap *heap, int node_off);
+static void sift_up(binaryheap *heap, int node_off);
+static inline void swap_nodes(binaryheap *heap, int a, int b);
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function.
+ */
+binaryheap *
+binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
+{
+ int sz;
+ binaryheap *heap;
+
+ sz = offsetof(binaryheap, nodes) + sizeof(Datum) * capacity;
+ heap = palloc(sz);
+ heap->size = 0;
+ heap->space = capacity;
+ heap->has_heap_property = true;
+ heap->compare = compare;
+ heap->arg = arg;
+
+ return heap;
+}
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+void
+binaryheap_free(binaryheap *heap)
+{
+ pfree(heap);
+}
+
+/*
+ * These utility functions return the offset of the left child, right
+ * child, and parent of the node at the given index, respectively.
+ *
+ * The heap is represented as an array of nodes, with the root node
+ * stored at index 0. The left child of node i is at index 2*i+1, and
+ * the right child at 2*i+2. The parent of node i is at index (i-1)/2.
+ */
+
+static inline int
+left_offset(int i)
+{
+ return 2 * i + 1;
+}
+
+static inline int
+right_offset(int i)
+{
+ return 2 * i + 2;
+}
+
+static inline int
+parent_offset(int i)
+{
+ return (i - 1) / 2;
+}
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+ * in O(1) without preserving the heap property. This is a convenience
+ * to add elements quickly to a new heap. To obtain a valid heap, one
+ * must call binaryheap_build() afterwards.
+ */
+void
+binaryheap_add_unordered(binaryheap *heap, Datum d)
+{
+ if (heap->size >= heap->space)
+ elog(ERROR, "out of binary heap slots");
+ heap->has_heap_property = false;
+ heap->nodes[heap->size] = d;
+ heap->size++;
+}
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+void
+binaryheap_build(binaryheap *heap)
+{
+ int i;
+
+ for (i = parent_offset(heap->size - 1); i >= 0; i--)
+ sift_down(heap, i);
+ heap->has_heap_property = true;
+}
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+ * preserving the heap property.
+ */
+void
+binaryheap_add(binaryheap *heap, Datum d)
+{
+ if (heap->size >= heap->space)
+ elog(ERROR, "out of binary heap slots");
+ heap->nodes[heap->size] = d;
+ heap->size++;
+ sift_up(heap, heap->size - 1);
+}
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+Datum
+binaryheap_first(binaryheap *heap)
+{
+ Assert(!binaryheap_empty(heap) && heap->has_heap_property);
+ return heap->nodes[0];
+}
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+Datum
+binaryheap_remove_first(binaryheap *heap)
+{
+ Assert(!binaryheap_empty(heap) && heap->has_heap_property);
+
+ if (heap->size == 1)
+ {
+ heap->size--;
+ return heap->nodes[0];
+ }
+
+ /*
+ * Swap the root and last nodes, decrease the size of the heap (i.e.
+ * remove the former root node) and sift the new root node down to its
+ * correct position.
+ */
+ swap_nodes(heap, 0, heap->size - 1);
+ heap->size--;
+ sift_down(heap, 0);
+
+ return heap->nodes[heap->size];
+}
+
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
+ * ensure that the heap property is preserved. O(1) in the best case,
+ * or O(log n) if it must fall back to sifting the new node down.
+ */
+void
+binaryheap_replace_first(binaryheap *heap, Datum d)
+{
+ int cmp;
+ int next_off = 0;
+
+ Assert(heap->has_heap_property);
+
+ heap->nodes[0] = d;
+
+ /*
+ * If the new root node is still larger than the largest of its children
+ * (of which there may be 0, 1, or 2), then the heap is still valid.
+ */
+ if (heap->size == 1)
+ return;
+
+ next_off = 1;
+ if (heap->size > 2)
+ {
+ /* Two children: pick the larger one */
+ cmp = heap->compare(heap->nodes[2], heap->nodes[1], heap->arg);
+ if (cmp > 0)
+ next_off = 2;
+ }
+
+ cmp = heap->compare(heap->nodes[next_off], heap->nodes[0], heap->arg);
+ if (cmp < 0)
+ return;
+
+ /* The child is larger. Swap and sift the new node down. */
+ swap_nodes(heap, 0, next_off);
+ sift_down(heap, next_off);
+}
+
+/*
+ * Swap the contents of two nodes.
+ */
+static inline void
+swap_nodes(binaryheap *heap, int a, int b)
+{
+ Datum swap;
+
+ swap = heap->nodes[a];
+ heap->nodes[a] = heap->nodes[b];
+ heap->nodes[b] = swap;
+}
+
+/*
+ * Sift a node up to the highest position it can hold according to the
+ * comparator.
+ */
+static void
+sift_up(binaryheap *heap, int node_off)
+{
+ while (node_off != 0)
+ {
+ int cmp;
+ int parent_off;
+
+ /*
+ * If this node is smaller than its parent, the heap condition is
+ * satisfied, and we're done.
+ */
+ parent_off = parent_offset(node_off);
+ cmp = heap->compare(heap->nodes[node_off], heap->nodes[parent_off],
+ heap->arg);
+ if (cmp <= 0)
+ break;
+
+ /*
+ * Otherwise, swap the node and its parent and go on to check the
+ * node's new parent.
+ */
+ swap_nodes(heap, node_off, parent_off);
+ node_off = parent_off;
+ }
+}
+
+/*
+ * Sift a node down from its current position to satisfy the heap
+ * property.
+ */
+static void
+sift_down(binaryheap *heap, int node_off)
+{
+ while (true)
+ {
+ int left_off = left_offset(node_off);
+ int right_off = right_offset(node_off);
+ int swap_off = 0;
+
+ /* Is the left child larger than the parent? */
+ if (left_off < heap->size &&
+ heap->compare(heap->nodes[node_off], heap->nodes[left_off],
+ heap->arg) < 0)
+ swap_off = left_off;
+
+ /* Is the right child larger than the parent? */
+ if (right_off < heap->size &&
+ heap->compare(heap->nodes[node_off], heap->nodes[right_off],
+ heap->arg) < 0)
+ {
+ /* swap with the larger child */
+ if (!swap_off ||
+ heap->compare(heap->nodes[left_off], heap->nodes[right_off],
+ heap->arg) < 0)
+ swap_off = right_off;
+ }
+
+ /*
+ * If we didn't find anything to swap, the heap condition is
+ * satisfied, and we're done.
+ */
+ if (!swap_off)
+ break;
+
+ /*
+ * Otherwise, swap the node with the child that violates the heap
+ * property; then go on to check its children.
+ */
+ swap_nodes(heap, swap_off, node_off);
+ node_off = swap_off;
+ }
+}
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
new file mode 100644
index 0000000..2cd997a
--- /dev/null
+++ b/src/include/lib/binaryheap.h
@@ -0,0 +1,51 @@
+/*
+ * binaryheap.h
+ *
+ * A simple binary heap implementation
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * src/include/lib/binaryheap.h
+ */
+
+#ifndef BINARYHEAP_H
+#define BINARYHEAP_H
+
+/*
+ * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
+ * and >0 iff a > b. For a min-heap, the conditions are reversed.
+ */
+typedef int (*binaryheap_comparator) (Datum a, Datum b, void *arg);
+
+/*
+ * binaryheap
+ *
+ * size how many nodes are currently in "nodes"
+ * space how many nodes can be stored in "nodes"
+ * comparator comparator to define the heap property
+ * nodes the first of a list of "space" nodes
+ */
+typedef struct binaryheap
+{
+ int size;
+ int space;
+ bool has_heap_property; /* debugging cross-check */
+ binaryheap_comparator compare;
+ void *arg;
+ Datum nodes[FLEXIBLE_ARRAY_MEMBER];
+} binaryheap;
+
+extern binaryheap *binaryheap_allocate(int capacity,
+ binaryheap_comparator compare,
+ void *arg);
+extern void binaryheap_free(binaryheap *heap);
+extern void binaryheap_add_unordered(binaryheap *heap, Datum d);
+extern void binaryheap_build(binaryheap *heap);
+extern void binaryheap_add(binaryheap *heap, Datum d);
+extern Datum binaryheap_first(binaryheap *heap);
+extern Datum binaryheap_remove_first(binaryheap *heap);
+extern void binaryheap_replace_first(binaryheap *heap, Datum d);
+
+#define binaryheap_empty(h) ((h)->size == 0)
+
+#endif /* BINARYHEAP_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fec07b8..d4911bd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1100,10 +1100,8 @@ typedef struct AppendState
* nkeys number of sort key columns
* sortkeys sort keys in SortSupport representation
* slots current output tuple of each subplan
- * heap heap of active tuples (represented as array indexes)
- * heap_size number of active heap entries
+ * heap heap of active tuples
* initialized true if we have fetched first tuple from each subplan
- * last_slot last subplan fetched from (which must be re-called)
* ----------------
*/
typedef struct MergeAppendState
@@ -1114,10 +1112,8 @@ typedef struct MergeAppendState
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
- int *ms_heap; /* array of length ms_nplans */
- int ms_heap_size; /* current active length of ms_heap[] */
+ struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- int ms_last_slot; /* last subplan slot we returned from */
} MergeAppendState;
/* ----------------
On 2012-11-21 12:54:30 -0500, Robert Haas wrote:
On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I guess I'll take another whack at it.
New version attached.
I think the assert in replace_first should be
Assert(!binaryheap_empty(heap) && heap->has_heap_property);
instead of
Assert(heap->has_heap_property);
looks good otherwise.
Thanks,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas escribió:
On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I guess I'll take another whack at it.
New version attached.
The following comments still talk about "key and value", thus need
an update:
+ * binaryheap_add_unordered
+ *
+ * Adds the given key and value to the end of the heap's list of nodes
+/*
+ * binaryheap_add
+ *
+ * Adds the given key and value to the heap in O(log n), while
+/*
+ * binaryheap_replace_first
+ *
+ * Change the key and/or value of the first (root, topmost) node and
This comment needs updated (s/comparator/compare/, and also add
has_heap_property and arg):
+/*
+ * binaryheap
+ *
+ * size how many nodes are currently in "nodes"
+ * space how many nodes can be stored in "nodes"
+ * comparator comparator to define the heap property
+ * nodes the first of a list of "space" nodes
+ */
+typedef struct binaryheap
+{
+ int size;
+ int space;
+ bool has_heap_property; /* debugging cross-check */
+ binaryheap_comparator compare;
+ void *arg;
+ Datum nodes[FLEXIBLE_ARRAY_MEMBER];
+} binaryheap;
I would suggest to add prefixes to struct members, so bh_size, bh_space
and so on. This makes it easier to grep for stuff later.
Do we really need the struct definition be public? Couldn't it just
live in binaryheap.c?
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
The following comments still talk about "key and value", thus need
an update:
Oops.
This comment needs updated (s/comparator/compare/, and also add
has_heap_property and arg):
Fixed.
I would suggest to add prefixes to struct members, so bh_size, bh_space
and so on. This makes it easier to grep for stuff later.
OK.
Do we really need the struct definition be public? Couldn't it just
live in binaryheap.c?
Well, that would preclude defining any macros, like
binaryheap_empty(). Since efficiency is among the reasons for
inventing this in the first place, that doesn't seem prudent to me.
Another new version attached.
P.S. to Abhijit: Please, please tell me the secret to getting five
people to review your patches. I'm lucky when I can get one!
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
binaryheap-rmh3.patchapplication/octet-stream; name=binaryheap-rmh3.patchDownload
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index d5141ba..9dc25ee 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -41,17 +41,16 @@
#include "executor/execdebug.h"
#include "executor/nodeMergeAppend.h"
+#include "lib/binaryheap.h"
+
/*
- * It gets quite confusing having a heap array (indexed by integers) which
- * contains integers which index into the slots array. These typedefs try to
- * clear it up, but they're only documentation.
+ * We have one slot for each item in the heap array. We use SlotNumber
+ * to store slot indexes. This doesn't actually provide any formal
+ * type-safety, but it makes the code more self-documenting.
*/
-typedef int SlotNumber;
-typedef int HeapPosition;
+typedef int32 SlotNumber;
-static void heap_insert_slot(MergeAppendState *node, SlotNumber new_slot);
-static void heap_siftup_slot(MergeAppendState *node);
-static int32 heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2);
+static int heap_compare_slots(Datum a, Datum b, void *arg);
/* ----------------------------------------------------------------
@@ -88,7 +87,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
- mergestate->ms_heap = (int *) palloc0(sizeof(int) * nplans);
+ mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
+ mergestate);
/*
* Miscellaneous initialization
@@ -143,9 +143,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize to show we have not run the subplans yet
*/
- mergestate->ms_heap_size = 0;
mergestate->ms_initialized = false;
- mergestate->ms_last_slot = -1;
return mergestate;
}
@@ -172,101 +170,53 @@ ExecMergeAppend(MergeAppendState *node)
{
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
+ binaryheap_build(node->ms_heap);
node->ms_initialized = true;
}
else
{
/*
* Otherwise, pull the next tuple from whichever subplan we returned
- * from last time, and insert it into the heap. (We could simplify
- * the logic a bit by doing this before returning from the prior call,
- * but it's better to not pull tuples until necessary.)
+ * from last time, and reinsert the subplan index into the heap,
+ * because it might now compare differently against the existing
+ * elements of the heap. (We could perhaps simplify the logic a bit
+ * by doing this before returning from the prior call, but it's better
+ * to not pull tuples until necessary.)
*/
- i = node->ms_last_slot;
+ i = DatumGetInt32(binaryheap_first(node->ms_heap));
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
+ else
+ (void) binaryheap_remove_first(node->ms_heap);
}
- if (node->ms_heap_size > 0)
- {
- /* Return the topmost heap node, and sift up the remaining nodes */
- i = node->ms_heap[0];
- result = node->ms_slots[i];
- node->ms_last_slot = i;
- heap_siftup_slot(node);
- }
- else
+ if (binaryheap_empty(node->ms_heap))
{
/* All the subplans are exhausted, and so is the heap */
result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
-
- return result;
-}
-
-/*
- * Insert a new slot into the heap. The slot must contain a valid tuple.
- */
-static void
-heap_insert_slot(MergeAppendState *node, SlotNumber new_slot)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition j;
-
- Assert(!TupIsNull(node->ms_slots[new_slot]));
-
- j = node->ms_heap_size++; /* j is where the "hole" is */
- while (j > 0)
+ else
{
- int i = (j - 1) / 2;
-
- if (heap_compare_slots(node, new_slot, node->ms_heap[i]) >= 0)
- break;
- heap[j] = heap[i];
- j = i;
+ i = DatumGetInt32(binaryheap_first(node->ms_heap));
+ result = node->ms_slots[i];
}
- heap[j] = new_slot;
-}
-/*
- * Delete the heap top (the slot in heap[0]), and sift up.
- */
-static void
-heap_siftup_slot(MergeAppendState *node)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition i,
- n;
-
- if (--node->ms_heap_size <= 0)
- return;
- n = node->ms_heap_size; /* heap[n] needs to be reinserted */
- i = 0; /* i is where the "hole" is */
- for (;;)
- {
- int j = 2 * i + 1;
-
- if (j >= n)
- break;
- if (j + 1 < n && heap_compare_slots(node, heap[j], heap[j + 1]) > 0)
- j++;
- if (heap_compare_slots(node, heap[n], heap[j]) <= 0)
- break;
- heap[i] = heap[j];
- i = j;
- }
- heap[i] = heap[n];
+ return result;
}
/*
* Compare the tuples in the two given slots.
*/
static int32
-heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
+heap_compare_slots(Datum a, Datum b, void *arg)
{
+ MergeAppendState *node = (MergeAppendState *) arg;
+ SlotNumber slot1 = DatumGetInt32(a);
+ SlotNumber slot2 = DatumGetInt32(b);
+
TupleTableSlot *s1 = node->ms_slots[slot1];
TupleTableSlot *s2 = node->ms_slots[slot2];
int nkey;
@@ -291,7 +241,7 @@ heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
datum2, isNull2,
sortKey);
if (compare != 0)
- return compare;
+ return -compare;
}
return 0;
}
@@ -347,7 +297,5 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
- node->ms_heap_size = 0;
node->ms_initialized = false;
- node->ms_last_slot = -1;
}
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 98ce3d7..327a1bc 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = ilist.o stringinfo.o
+OBJS = ilist.o binaryheap.o stringinfo.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
new file mode 100644
index 0000000..83f6ab4
--- /dev/null
+++ b/src/backend/lib/binaryheap.c
@@ -0,0 +1,318 @@
+/*-------------------------------------------------------------------------
+ *
+ * binaryheap.c
+ * A simple binary heap implementaion
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/lib/binaryheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/binaryheap.h"
+
+static void sift_down(binaryheap *heap, int node_off);
+static void sift_up(binaryheap *heap, int node_off);
+static inline void swap_nodes(binaryheap *heap, int a, int b);
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function, which will be invoked with the additional
+ * argument specified by 'arg'.
+ */
+binaryheap *
+binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
+{
+ int sz;
+ binaryheap *heap;
+
+ sz = offsetof(binaryheap, bh_nodes) + sizeof(Datum) * capacity;
+ heap = palloc(sz);
+ heap->bh_size = 0;
+ heap->bh_space = capacity;
+ heap->bh_has_heap_property = true;
+ heap->bh_compare = compare;
+ heap->bh_arg = arg;
+
+ return heap;
+}
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+void
+binaryheap_free(binaryheap *heap)
+{
+ pfree(heap);
+}
+
+/*
+ * These utility functions return the offset of the left child, right
+ * child, and parent of the node at the given index, respectively.
+ *
+ * The heap is represented as an array of nodes, with the root node
+ * stored at index 0. The left child of node i is at index 2*i+1, and
+ * the right child at 2*i+2. The parent of node i is at index (i-1)/2.
+ */
+
+static inline int
+left_offset(int i)
+{
+ return 2 * i + 1;
+}
+
+static inline int
+right_offset(int i)
+{
+ return 2 * i + 2;
+}
+
+static inline int
+parent_offset(int i)
+{
+ return (i - 1) / 2;
+}
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given datum to the end of the heap's list of nodes in O(1) without
+ * preserving the heap property. This is a convenience to add elements quickly
+ * to a new heap. To obtain a valid heap, one must call binaryheap_build()
+ * afterwards.
+ */
+void
+binaryheap_add_unordered(binaryheap *heap, Datum d)
+{
+ if (heap->bh_size >= heap->bh_space)
+ elog(ERROR, "out of binary heap slots");
+ heap->bh_has_heap_property = false;
+ heap->bh_nodes[heap->bh_size] = d;
+ heap->bh_size++;
+}
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+void
+binaryheap_build(binaryheap *heap)
+{
+ int i;
+
+ for (i = parent_offset(heap->bh_size - 1); i >= 0; i--)
+ sift_down(heap, i);
+ heap->bh_has_heap_property = true;
+}
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given datum to the heap in O(log n) time, while preserving
+ * the heap property.
+ */
+void
+binaryheap_add(binaryheap *heap, Datum d)
+{
+ if (heap->bh_size >= heap->bh_space)
+ elog(ERROR, "out of binary heap slots");
+ heap->bh_nodes[heap->bh_size] = d;
+ heap->bh_size++;
+ sift_up(heap, heap->bh_size - 1);
+}
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. Returns NULL if the heap is empty.
+ * Always O(1).
+ */
+Datum
+binaryheap_first(binaryheap *heap)
+{
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+ return heap->bh_nodes[0];
+}
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. Returns NULL if the heap
+ * is empty. O(log n) worst case.
+ */
+Datum
+binaryheap_remove_first(binaryheap *heap)
+{
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+
+ if (heap->bh_size == 1)
+ {
+ heap->bh_size--;
+ return heap->bh_nodes[0];
+ }
+
+ /*
+ * Swap the root and last nodes, decrease the size of the heap (i.e.
+ * remove the former root node) and sift the new root node down to its
+ * correct position.
+ */
+ swap_nodes(heap, 0, heap->bh_size - 1);
+ heap->bh_size--;
+ sift_down(heap, 0);
+
+ return heap->bh_nodes[heap->bh_size];
+}
+
+/*
+ * binaryheap_replace_first
+ *
+ * Replace the topmost element of the heap, preserving the heap property.
+ * O(1) in the best case, * or O(log n) if it must fall back to sifting
+ * the new node down.
+ */
+void
+binaryheap_replace_first(binaryheap *heap, Datum d)
+{
+ int cmp;
+ int next_off = 0;
+
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+
+ heap->bh_nodes[0] = d;
+
+ /*
+ * If the new root node is still larger than the largest of its children
+ * (of which there may be 0, 1, or 2), then the heap is still valid.
+ */
+ if (heap->bh_size == 1)
+ return;
+
+ next_off = 1;
+ if (heap->bh_size > 2)
+ {
+ /* Two children: pick the larger one */
+ cmp = heap->bh_compare(heap->bh_nodes[2], heap->bh_nodes[1],
+ heap->bh_arg);
+ if (cmp > 0)
+ next_off = 2;
+ }
+
+ cmp = heap->bh_compare(heap->bh_nodes[next_off], heap->bh_nodes[0],
+ heap->bh_arg);
+ if (cmp < 0)
+ return;
+
+ /* The child is larger. Swap and sift the new node down. */
+ swap_nodes(heap, 0, next_off);
+ sift_down(heap, next_off);
+}
+
+/*
+ * Swap the contents of two nodes.
+ */
+static inline void
+swap_nodes(binaryheap *heap, int a, int b)
+{
+ Datum swap;
+
+ swap = heap->bh_nodes[a];
+ heap->bh_nodes[a] = heap->bh_nodes[b];
+ heap->bh_nodes[b] = swap;
+}
+
+/*
+ * Sift a node up to the highest position it can hold according to the
+ * comparator.
+ */
+static void
+sift_up(binaryheap *heap, int node_off)
+{
+ while (node_off != 0)
+ {
+ int cmp;
+ int parent_off;
+
+ /*
+ * If this node is smaller than its parent, the heap condition is
+ * satisfied, and we're done.
+ */
+ parent_off = parent_offset(node_off);
+ cmp = heap->bh_compare(heap->bh_nodes[node_off],
+ heap->bh_nodes[parent_off],
+ heap->bh_arg);
+ if (cmp <= 0)
+ break;
+
+ /*
+ * Otherwise, swap the node and its parent and go on to check the
+ * node's new parent.
+ */
+ swap_nodes(heap, node_off, parent_off);
+ node_off = parent_off;
+ }
+}
+
+/*
+ * Sift a node down from its current position to satisfy the heap
+ * property.
+ */
+static void
+sift_down(binaryheap *heap, int node_off)
+{
+ while (true)
+ {
+ int left_off = left_offset(node_off);
+ int right_off = right_offset(node_off);
+ int swap_off = 0;
+
+ /* Is the left child larger than the parent? */
+ if (left_off < heap->bh_size &&
+ heap->bh_compare(heap->bh_nodes[node_off],
+ heap->bh_nodes[left_off],
+ heap->bh_arg) < 0)
+ swap_off = left_off;
+
+ /* Is the right child larger than the parent? */
+ if (right_off < heap->bh_size &&
+ heap->bh_compare(heap->bh_nodes[node_off],
+ heap->bh_nodes[right_off],
+ heap->bh_arg) < 0)
+ {
+ /* swap with the larger child */
+ if (!swap_off ||
+ heap->bh_compare(heap->bh_nodes[left_off],
+ heap->bh_nodes[right_off],
+ heap->bh_arg) < 0)
+ swap_off = right_off;
+ }
+
+ /*
+ * If we didn't find anything to swap, the heap condition is
+ * satisfied, and we're done.
+ */
+ if (!swap_off)
+ break;
+
+ /*
+ * Otherwise, swap the node with the child that violates the heap
+ * property; then go on to check its children.
+ */
+ swap_nodes(heap, swap_off, node_off);
+ node_off = swap_off;
+ }
+}
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
new file mode 100644
index 0000000..449ceb5
--- /dev/null
+++ b/src/include/lib/binaryheap.h
@@ -0,0 +1,53 @@
+/*
+ * binaryheap.h
+ *
+ * A simple binary heap implementation
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * src/include/lib/binaryheap.h
+ */
+
+#ifndef BINARYHEAP_H
+#define BINARYHEAP_H
+
+/*
+ * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
+ * and >0 iff a > b. For a min-heap, the conditions are reversed.
+ */
+typedef int (*binaryheap_comparator) (Datum a, Datum b, void *arg);
+
+/*
+ * binaryheap
+ *
+ * bh_size how many nodes are currently in "nodes"
+ * bh_space how many nodes can be stored in "nodes"
+ * bh_has_heap_property no unordered operations since last heap build
+ * bh_compare comparison function to define the heap property
+ * bh_arg user data for comparison function
+ * bh_nodes variable-length array of "space" nodes
+ */
+typedef struct binaryheap
+{
+ int bh_size;
+ int bh_space;
+ bool bh_has_heap_property; /* debugging cross-check */
+ binaryheap_comparator bh_compare;
+ void *bh_arg;
+ Datum bh_nodes[FLEXIBLE_ARRAY_MEMBER];
+} binaryheap;
+
+extern binaryheap *binaryheap_allocate(int capacity,
+ binaryheap_comparator compare,
+ void *arg);
+extern void binaryheap_free(binaryheap *heap);
+extern void binaryheap_add_unordered(binaryheap *heap, Datum d);
+extern void binaryheap_build(binaryheap *heap);
+extern void binaryheap_add(binaryheap *heap, Datum d);
+extern Datum binaryheap_first(binaryheap *heap);
+extern Datum binaryheap_remove_first(binaryheap *heap);
+extern void binaryheap_replace_first(binaryheap *heap, Datum d);
+
+#define binaryheap_empty(h) ((h)->bh_size == 0)
+
+#endif /* BINARYHEAP_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fec07b8..d4911bd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1100,10 +1100,8 @@ typedef struct AppendState
* nkeys number of sort key columns
* sortkeys sort keys in SortSupport representation
* slots current output tuple of each subplan
- * heap heap of active tuples (represented as array indexes)
- * heap_size number of active heap entries
+ * heap heap of active tuples
* initialized true if we have fetched first tuple from each subplan
- * last_slot last subplan fetched from (which must be re-called)
* ----------------
*/
typedef struct MergeAppendState
@@ -1114,10 +1112,8 @@ typedef struct MergeAppendState
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
- int *ms_heap; /* array of length ms_nplans */
- int ms_heap_size; /* current active length of ms_heap[] */
+ struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- int ms_last_slot; /* last subplan slot we returned from */
} MergeAppendState;
/* ----------------
On 2012-11-21 15:09:12 -0500, Robert Haas wrote:
On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:The following comments still talk about "key and value", thus need
an update:Oops.
This comment needs updated (s/comparator/compare/, and also add
has_heap_property and arg):Fixed.
I would suggest to add prefixes to struct members, so bh_size, bh_space
and so on. This makes it easier to grep for stuff later.OK.
Do we really need the struct definition be public? Couldn't it just
live in binaryheap.c?Well, that would preclude defining any macros, like
binaryheap_empty(). Since efficiency is among the reasons for
inventing this in the first place, that doesn't seem prudent to me.Another new version attached.
One very minor nitpick I unfortunately just found now, not sure when
that changed:
binaryheap_replace_first() hardcodes the indices for the left/right node
of the root node. I would rather have it use (left|right)_offset(0).
P.S. to Abhijit: Please, please tell me the secret to getting five
people to review your patches. I'm lucky when I can get one!
Heh. This was rather surprising, yes. Its a rather easy patch to review
in a way, you don't need much pg internals knowledge for it...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
At 2012-11-21 15:09:12 -0500, robertmhaas@gmail.com wrote:
The following comments still talk about "key and value", thus need
an update:Oops.
In the same vein, "Returns NULL if the heap is empty" no longer applies
to binaryheap_first and binaryheap_remove_first. Plus there is an extra
asterisk in the middle of the comment about binaryheap_replace_first.
But it looks good otherwise.
I agree with Tom that we should support resizing the heap, but let's not
bother with that now. I'll write the few extra lines of code the first
time something actually needs to add more elements to a heap.
-- Abhijit
[ Sorry for the slow response on this, Thanksgiving interfered. ]
On Wed, Nov 21, 2012 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
One very minor nitpick I unfortunately just found now, not sure when
that changed:
binaryheap_replace_first() hardcodes the indices for the left/right node
of the root node. I would rather have it use (left|right)_offset(0).
Hmm, yeah... but come to think of it, why do we need that special case
at all? Why not just call sift_down on the root node and call it
good? See the attached version, which simplifies the code
considerably and also makes some comment adjustments per Abhijit's
comments.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
binaryheap-rmh4.patchapplication/octet-stream; name=binaryheap-rmh4.patchDownload
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index d5141ba..9dc25ee 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -41,17 +41,16 @@
#include "executor/execdebug.h"
#include "executor/nodeMergeAppend.h"
+#include "lib/binaryheap.h"
+
/*
- * It gets quite confusing having a heap array (indexed by integers) which
- * contains integers which index into the slots array. These typedefs try to
- * clear it up, but they're only documentation.
+ * We have one slot for each item in the heap array. We use SlotNumber
+ * to store slot indexes. This doesn't actually provide any formal
+ * type-safety, but it makes the code more self-documenting.
*/
-typedef int SlotNumber;
-typedef int HeapPosition;
+typedef int32 SlotNumber;
-static void heap_insert_slot(MergeAppendState *node, SlotNumber new_slot);
-static void heap_siftup_slot(MergeAppendState *node);
-static int32 heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2);
+static int heap_compare_slots(Datum a, Datum b, void *arg);
/* ----------------------------------------------------------------
@@ -88,7 +87,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ms_nplans = nplans;
mergestate->ms_slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * nplans);
- mergestate->ms_heap = (int *) palloc0(sizeof(int) * nplans);
+ mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
+ mergestate);
/*
* Miscellaneous initialization
@@ -143,9 +143,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize to show we have not run the subplans yet
*/
- mergestate->ms_heap_size = 0;
mergestate->ms_initialized = false;
- mergestate->ms_last_slot = -1;
return mergestate;
}
@@ -172,101 +170,53 @@ ExecMergeAppend(MergeAppendState *node)
{
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
+ binaryheap_build(node->ms_heap);
node->ms_initialized = true;
}
else
{
/*
* Otherwise, pull the next tuple from whichever subplan we returned
- * from last time, and insert it into the heap. (We could simplify
- * the logic a bit by doing this before returning from the prior call,
- * but it's better to not pull tuples until necessary.)
+ * from last time, and reinsert the subplan index into the heap,
+ * because it might now compare differently against the existing
+ * elements of the heap. (We could perhaps simplify the logic a bit
+ * by doing this before returning from the prior call, but it's better
+ * to not pull tuples until necessary.)
*/
- i = node->ms_last_slot;
+ i = DatumGetInt32(binaryheap_first(node->ms_heap));
node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
if (!TupIsNull(node->ms_slots[i]))
- heap_insert_slot(node, i);
+ binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
+ else
+ (void) binaryheap_remove_first(node->ms_heap);
}
- if (node->ms_heap_size > 0)
- {
- /* Return the topmost heap node, and sift up the remaining nodes */
- i = node->ms_heap[0];
- result = node->ms_slots[i];
- node->ms_last_slot = i;
- heap_siftup_slot(node);
- }
- else
+ if (binaryheap_empty(node->ms_heap))
{
/* All the subplans are exhausted, and so is the heap */
result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
}
-
- return result;
-}
-
-/*
- * Insert a new slot into the heap. The slot must contain a valid tuple.
- */
-static void
-heap_insert_slot(MergeAppendState *node, SlotNumber new_slot)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition j;
-
- Assert(!TupIsNull(node->ms_slots[new_slot]));
-
- j = node->ms_heap_size++; /* j is where the "hole" is */
- while (j > 0)
+ else
{
- int i = (j - 1) / 2;
-
- if (heap_compare_slots(node, new_slot, node->ms_heap[i]) >= 0)
- break;
- heap[j] = heap[i];
- j = i;
+ i = DatumGetInt32(binaryheap_first(node->ms_heap));
+ result = node->ms_slots[i];
}
- heap[j] = new_slot;
-}
-/*
- * Delete the heap top (the slot in heap[0]), and sift up.
- */
-static void
-heap_siftup_slot(MergeAppendState *node)
-{
- SlotNumber *heap = node->ms_heap;
- HeapPosition i,
- n;
-
- if (--node->ms_heap_size <= 0)
- return;
- n = node->ms_heap_size; /* heap[n] needs to be reinserted */
- i = 0; /* i is where the "hole" is */
- for (;;)
- {
- int j = 2 * i + 1;
-
- if (j >= n)
- break;
- if (j + 1 < n && heap_compare_slots(node, heap[j], heap[j + 1]) > 0)
- j++;
- if (heap_compare_slots(node, heap[n], heap[j]) <= 0)
- break;
- heap[i] = heap[j];
- i = j;
- }
- heap[i] = heap[n];
+ return result;
}
/*
* Compare the tuples in the two given slots.
*/
static int32
-heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
+heap_compare_slots(Datum a, Datum b, void *arg)
{
+ MergeAppendState *node = (MergeAppendState *) arg;
+ SlotNumber slot1 = DatumGetInt32(a);
+ SlotNumber slot2 = DatumGetInt32(b);
+
TupleTableSlot *s1 = node->ms_slots[slot1];
TupleTableSlot *s2 = node->ms_slots[slot2];
int nkey;
@@ -291,7 +241,7 @@ heap_compare_slots(MergeAppendState *node, SlotNumber slot1, SlotNumber slot2)
datum2, isNull2,
sortKey);
if (compare != 0)
- return compare;
+ return -compare;
}
return 0;
}
@@ -347,7 +297,5 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
- node->ms_heap_size = 0;
node->ms_initialized = false;
- node->ms_last_slot = -1;
}
diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 98ce3d7..327a1bc 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/lib
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-OBJS = ilist.o stringinfo.o
+OBJS = ilist.o binaryheap.o stringinfo.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
new file mode 100644
index 0000000..73c80e4
--- /dev/null
+++ b/src/backend/lib/binaryheap.c
@@ -0,0 +1,293 @@
+/*-------------------------------------------------------------------------
+ *
+ * binaryheap.c
+ * A simple binary heap implementaion
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/lib/binaryheap.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <math.h>
+
+#include "lib/binaryheap.h"
+
+static void sift_down(binaryheap *heap, int node_off);
+static void sift_up(binaryheap *heap, int node_off);
+static inline void swap_nodes(binaryheap *heap, int a, int b);
+
+/*
+ * binaryheap_allocate
+ *
+ * Returns a pointer to a newly-allocated heap that has the capacity to
+ * store the given number of nodes, with the heap property defined by
+ * the given comparator function, which will be invoked with the additional
+ * argument specified by 'arg'.
+ */
+binaryheap *
+binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
+{
+ int sz;
+ binaryheap *heap;
+
+ sz = offsetof(binaryheap, bh_nodes) + sizeof(Datum) * capacity;
+ heap = palloc(sz);
+ heap->bh_size = 0;
+ heap->bh_space = capacity;
+ heap->bh_has_heap_property = true;
+ heap->bh_compare = compare;
+ heap->bh_arg = arg;
+
+ return heap;
+}
+
+/*
+ * binaryheap_free
+ *
+ * Releases memory used by the given binaryheap.
+ */
+void
+binaryheap_free(binaryheap *heap)
+{
+ pfree(heap);
+}
+
+/*
+ * These utility functions return the offset of the left child, right
+ * child, and parent of the node at the given index, respectively.
+ *
+ * The heap is represented as an array of nodes, with the root node
+ * stored at index 0. The left child of node i is at index 2*i+1, and
+ * the right child at 2*i+2. The parent of node i is at index (i-1)/2.
+ */
+
+static inline int
+left_offset(int i)
+{
+ return 2 * i + 1;
+}
+
+static inline int
+right_offset(int i)
+{
+ return 2 * i + 2;
+}
+
+static inline int
+parent_offset(int i)
+{
+ return (i - 1) / 2;
+}
+
+/*
+ * binaryheap_add_unordered
+ *
+ * Adds the given datum to the end of the heap's list of nodes in O(1) without
+ * preserving the heap property. This is a convenience to add elements quickly
+ * to a new heap. To obtain a valid heap, one must call binaryheap_build()
+ * afterwards.
+ */
+void
+binaryheap_add_unordered(binaryheap *heap, Datum d)
+{
+ if (heap->bh_size >= heap->bh_space)
+ elog(ERROR, "out of binary heap slots");
+ heap->bh_has_heap_property = false;
+ heap->bh_nodes[heap->bh_size] = d;
+ heap->bh_size++;
+}
+
+/*
+ * binaryheap_build
+ *
+ * Assembles a valid heap in O(n) from the nodes added by
+ * binaryheap_add_unordered(). Not needed otherwise.
+ */
+void
+binaryheap_build(binaryheap *heap)
+{
+ int i;
+
+ for (i = parent_offset(heap->bh_size - 1); i >= 0; i--)
+ sift_down(heap, i);
+ heap->bh_has_heap_property = true;
+}
+
+/*
+ * binaryheap_add
+ *
+ * Adds the given datum to the heap in O(log n) time, while preserving
+ * the heap property.
+ */
+void
+binaryheap_add(binaryheap *heap, Datum d)
+{
+ if (heap->bh_size >= heap->bh_space)
+ elog(ERROR, "out of binary heap slots");
+ heap->bh_nodes[heap->bh_size] = d;
+ heap->bh_size++;
+ sift_up(heap, heap->bh_size - 1);
+}
+
+/*
+ * binaryheap_first
+ *
+ * Returns a pointer to the first (root, topmost) node in the heap
+ * without modifying the heap. The caller must ensure that this
+ * routine is not used on an empty heap. Always O(1).
+ */
+Datum
+binaryheap_first(binaryheap *heap)
+{
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+ return heap->bh_nodes[0];
+}
+
+/*
+ * binaryheap_remove_first
+ *
+ * Removes the first (root, topmost) node in the heap and returns a
+ * pointer to it after rebalancing the heap. The caller must ensure
+ * that this routine is not used on an empty heap. O(log n) worst
+ * case.
+ */
+Datum
+binaryheap_remove_first(binaryheap *heap)
+{
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+
+ if (heap->bh_size == 1)
+ {
+ heap->bh_size--;
+ return heap->bh_nodes[0];
+ }
+
+ /*
+ * Swap the root and last nodes, decrease the size of the heap (i.e.
+ * remove the former root node) and sift the new root node down to its
+ * correct position.
+ */
+ swap_nodes(heap, 0, heap->bh_size - 1);
+ heap->bh_size--;
+ sift_down(heap, 0);
+
+ return heap->bh_nodes[heap->bh_size];
+}
+
+/*
+ * binaryheap_replace_first
+ *
+ * Replace the topmost element of a non-empty heap, preserving the heap
+ * property. O(1) in the best case, or O(log n) if it must fall back to
+ * sifting the new node down.
+ */
+void
+binaryheap_replace_first(binaryheap *heap, Datum d)
+{
+ Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+
+ heap->bh_nodes[0] = d;
+
+ if (heap->bh_size > 1)
+ sift_down(heap, 0);
+}
+
+/*
+ * Swap the contents of two nodes.
+ */
+static inline void
+swap_nodes(binaryheap *heap, int a, int b)
+{
+ Datum swap;
+
+ swap = heap->bh_nodes[a];
+ heap->bh_nodes[a] = heap->bh_nodes[b];
+ heap->bh_nodes[b] = swap;
+}
+
+/*
+ * Sift a node up to the highest position it can hold according to the
+ * comparator.
+ */
+static void
+sift_up(binaryheap *heap, int node_off)
+{
+ while (node_off != 0)
+ {
+ int cmp;
+ int parent_off;
+
+ /*
+ * If this node is smaller than its parent, the heap condition is
+ * satisfied, and we're done.
+ */
+ parent_off = parent_offset(node_off);
+ cmp = heap->bh_compare(heap->bh_nodes[node_off],
+ heap->bh_nodes[parent_off],
+ heap->bh_arg);
+ if (cmp <= 0)
+ break;
+
+ /*
+ * Otherwise, swap the node and its parent and go on to check the
+ * node's new parent.
+ */
+ swap_nodes(heap, node_off, parent_off);
+ node_off = parent_off;
+ }
+}
+
+/*
+ * Sift a node down from its current position to satisfy the heap
+ * property.
+ */
+static void
+sift_down(binaryheap *heap, int node_off)
+{
+ while (true)
+ {
+ int left_off = left_offset(node_off);
+ int right_off = right_offset(node_off);
+ int swap_off = 0;
+
+ /* Is the left child larger than the parent? */
+ if (left_off < heap->bh_size &&
+ heap->bh_compare(heap->bh_nodes[node_off],
+ heap->bh_nodes[left_off],
+ heap->bh_arg) < 0)
+ swap_off = left_off;
+
+ /* Is the right child larger than the parent? */
+ if (right_off < heap->bh_size &&
+ heap->bh_compare(heap->bh_nodes[node_off],
+ heap->bh_nodes[right_off],
+ heap->bh_arg) < 0)
+ {
+ /* swap with the larger child */
+ if (!swap_off ||
+ heap->bh_compare(heap->bh_nodes[left_off],
+ heap->bh_nodes[right_off],
+ heap->bh_arg) < 0)
+ swap_off = right_off;
+ }
+
+ /*
+ * If we didn't find anything to swap, the heap condition is
+ * satisfied, and we're done.
+ */
+ if (!swap_off)
+ break;
+
+ /*
+ * Otherwise, swap the node with the child that violates the heap
+ * property; then go on to check its children.
+ */
+ swap_nodes(heap, swap_off, node_off);
+ node_off = swap_off;
+ }
+}
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
new file mode 100644
index 0000000..449ceb5
--- /dev/null
+++ b/src/include/lib/binaryheap.h
@@ -0,0 +1,53 @@
+/*
+ * binaryheap.h
+ *
+ * A simple binary heap implementation
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * src/include/lib/binaryheap.h
+ */
+
+#ifndef BINARYHEAP_H
+#define BINARYHEAP_H
+
+/*
+ * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
+ * and >0 iff a > b. For a min-heap, the conditions are reversed.
+ */
+typedef int (*binaryheap_comparator) (Datum a, Datum b, void *arg);
+
+/*
+ * binaryheap
+ *
+ * bh_size how many nodes are currently in "nodes"
+ * bh_space how many nodes can be stored in "nodes"
+ * bh_has_heap_property no unordered operations since last heap build
+ * bh_compare comparison function to define the heap property
+ * bh_arg user data for comparison function
+ * bh_nodes variable-length array of "space" nodes
+ */
+typedef struct binaryheap
+{
+ int bh_size;
+ int bh_space;
+ bool bh_has_heap_property; /* debugging cross-check */
+ binaryheap_comparator bh_compare;
+ void *bh_arg;
+ Datum bh_nodes[FLEXIBLE_ARRAY_MEMBER];
+} binaryheap;
+
+extern binaryheap *binaryheap_allocate(int capacity,
+ binaryheap_comparator compare,
+ void *arg);
+extern void binaryheap_free(binaryheap *heap);
+extern void binaryheap_add_unordered(binaryheap *heap, Datum d);
+extern void binaryheap_build(binaryheap *heap);
+extern void binaryheap_add(binaryheap *heap, Datum d);
+extern Datum binaryheap_first(binaryheap *heap);
+extern Datum binaryheap_remove_first(binaryheap *heap);
+extern void binaryheap_replace_first(binaryheap *heap, Datum d);
+
+#define binaryheap_empty(h) ((h)->bh_size == 0)
+
+#endif /* BINARYHEAP_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index fec07b8..d4911bd 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1100,10 +1100,8 @@ typedef struct AppendState
* nkeys number of sort key columns
* sortkeys sort keys in SortSupport representation
* slots current output tuple of each subplan
- * heap heap of active tuples (represented as array indexes)
- * heap_size number of active heap entries
+ * heap heap of active tuples
* initialized true if we have fetched first tuple from each subplan
- * last_slot last subplan fetched from (which must be re-called)
* ----------------
*/
typedef struct MergeAppendState
@@ -1114,10 +1112,8 @@ typedef struct MergeAppendState
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
- int *ms_heap; /* array of length ms_nplans */
- int ms_heap_size; /* current active length of ms_heap[] */
+ struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- int ms_last_slot; /* last subplan slot we returned from */
} MergeAppendState;
/* ----------------
On 2012-11-27 11:56:41 -0500, Robert Haas wrote:
[ Sorry for the slow response on this, Thanksgiving interfered. ]
On Wed, Nov 21, 2012 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
One very minor nitpick I unfortunately just found now, not sure when
that changed:
binaryheap_replace_first() hardcodes the indices for the left/right node
of the root node. I would rather have it use (left|right)_offset(0).Hmm, yeah... but come to think of it, why do we need that special case
at all? Why not just call sift_down on the root node and call it
good? See the attached version, which simplifies the code
considerably and also makes some comment adjustments per Abhijit's
comments.
The simplification made me worry for a second but after checking it out
I realized that my fear was only based on my original version where I
did a
kv = simpleheap_remove_first(heap);
simpleheap_add(heap, kv->key, kv->value);
if there was work to be done. But Abhijit optimized that code to do less
work, so the amount of comparisons is exactly the same before/after your
simplifications. With considerably less code.
Looks ready to me.
Thanks,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 28, 2012 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Looks ready to me.
Committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At 2012-11-15 12:08:12 -0500, robertmhaas@gmail.com wrote:
Still, maybe we could have a src/framework directory that uses the
same trick to produce a libpgframework that frontend code can use,
and a lib pgframework_srv that can be linked into the backend.
That's might not actually be that much work; we'd just need to
clone all the existing src/port logic.
Does anyone have further comments about Robert's idea above? Or
alternative suggestions about how to structure a library of routines
that can be used in either the backend or in client code (like the
binary heap implementation)?
-- Abhijit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 14, 2013 at 4:55 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
At 2012-11-15 12:08:12 -0500, robertmhaas@gmail.com wrote:
Still, maybe we could have a src/framework directory that uses the
same trick to produce a libpgframework that frontend code can use,
and a lib pgframework_srv that can be linked into the backend.
That's might not actually be that much work; we'd just need to
clone all the existing src/port logic.Does anyone have further comments about Robert's idea above? Or
alternative suggestions about how to structure a library of routines
that can be used in either the backend or in client code (like the
binary heap implementation)?
A couple observations:
*) Not sure about the name: something like "libcommon"?
*) This is useful and potentially good for many things. For example,
maybe type manipulation code could be pushed into the common library
at some point.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers