Perl modules for testing/viewing/corrupting/repairing your heap files

Started by Mark Dilgeralmost 6 years ago5 messages

mark.dilger@enterprisedb.com

almost 6 years ago

Hackers,

Recently, as part of testing something else, I had need of a tool to create
surgically precise corruption within heap pages. I wanted to make the
corruption from within TAP tests, so I wrote the tool as a set of perl modules.

The modules allow you to "tie" a perl array to a heap file, in essence thinking
of the file as an array of heap pages. Each page within the file manifests as
a tied perl hash, where each of the page header fields are an element in the
hash, and the tuples in the page are an array of tied hashes, with each field
in the tuple header as a field in that tied hash.

This is all done in pure perl. There is no eXtended Subroutine component of
this.

The body of each tuple (stuff beyond the tuple header) is thought of merely as
binary data. I haven't done any work to decode it into perl datastructures
equivalent to integer, text, timestamp, etc., nor have I needed that
functionality as yet. That seems doable as an extension of this work, at least
if the caller passes tuple descriptor type information into the `tie @file`
command.

Stuff like the following example works in the implementation already completed.
Note in particular that the file is bound in O_RDWR mode. That means it all
gets written back to the underlying file and truly updates (corrupts) your
data. It all also works in O_RDONLY mode, in which case the updates are made
to a copy of the data in perl's memory, but none of it goes back to disk. Of course,
nothing forces you to update anything. You could use this to read the fields from
the file/page/tuple without making modifications.

#!/usr/bin/perl

use HeapTuple;
use HeapPage;
use HeapFile;
use Fcntl;

my @file;
tie @file, 'HeapFile', path => 'base/12925/3599', pagesize => 8192, mode => O_RDWR;
for my $page (@file)
{
$page->{pd_lsn_xrecoff}++;
print $page->{pd_checksum}, "\n";
for (@{$page->{'tuples'}})
{
$_->{HEAP_COMBOCID} = 1 if ($_->{HEAP_HASNULL});
$_->{t_xmin} = $_->{t_xmax} if $_->{HEAP_XMAX_COMMITTED};
}
}
untie @file;

In my TAP test usage of these modules, I tend to fall into the pattern of:

my $node = get_new_node('master');
$node->init;
my $pgdata = $node->data_dir;
$node->safe_psql('postgres', 'create table public.test (bar text)');
my $path = join('/', $pgdata, $node->safe_psql(
'postgres', "SELECT pg_relation_filepath('public.test')"));
$node->stop;

my @file;
tie @file, 'HeapFile', path => $path, pagesize => 8192, mode => O_RDWR;
# do some corruption

$node->start;
# do some queries against the corrupt table, see what happens

For kicks, I just ran this one-liner and got many screenfuls of data. I'll just include
the tail end:

perl -e 'use HeapFile; tie @file, "HeapFile", path => "pgdata/base/12925/1255"; print(scalar(%$_)) for(@file);'

BODY AS HEX ===> PRINTABLE ASCII
ff 0f 06 00 00 00 00 00 ===> . . . . . . . .
47 20 00 00 46 06 46 43 ===> q 2 . . p l p g
49 47 06 05 3f 3d 06 06 ===> s q l _ c a l l
05 44 3d 06 40 06 41 48 ===> _ h a n d l e r
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 50 03 00 00 00 00 ===> . . . ? . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
42 00 00 00 00 4c 4b 00 ===> f . . . . v u .
00 00 00 00 00 08 00 00 ===> . . . . . . . .
3c 00 00 00 01 00 00 00 ===> ` . . . . . . .
00 00 00 00 01 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
02 46 06 46 43 49 47 06 ===> + p l p g s q l
05 3f 3d 06 06 05 44 3d ===> _ c a l l _ h a
06 40 06 41 48 15 18 06 ===> n d l e r ! $ l
45 3e 40 45 48 02 46 06 ===> i b d i r / p l
46 43 49 47 06 ===> p g s q l
b6 01 00 00 t_xmin: 438
00 00 00 00 t_xmax: 0
02 00 00 00 t_field3: 2
00 00 bi_hi: 0
50 00 bi_lo: 80
06 00 ip_posid: 6
1d 00 t_infomask2: 29
Natts: 29
HEAP_KEYS_UPDATED: 0
HEAP_HOT_UPDATED: 0
HEAP_ONLY_TUPLE: 0
03 0b t_infomask: 2819
HEAP_HASNULL: 1
HEAP_HASVARWIDTH: 1
HEAP_HASEXTERNAL: 0
HEAP_HASOID_OLD: 0
HEAP_XMAX_KEYSHR_LOCK: 0
HEAP_COMBOCID: 0
HEAP_XMAX_EXCL_LOCK: 0
HEAP_XMAX_LOCK_ONLY: 0
HEAP_XMIN_COMMITTED: 1
HEAP_XMIN_INVALID: 1
HEAP_XMAX_COMMITTED: 0
HEAP_XMAX_INVALID: 1
HEAP_XMAX_IS_MULTI: 0
HEAP_UPDATED: 0
HEAP_MOVED_OFF: 0
HEAP_MOVED_IN: 0
20 t_hoff: 32
ffff0f06 NULL_BITFIELD: 11111111111111111111000001100
OID_OLD:

BODY AS HEX ===> PRINTABLE ASCII
ff 0f 06 00 00 00 00 00 ===> . . . . . . . .
48 20 00 00 46 06 46 43 ===> r 2 . . p l p g
49 47 06 05 45 06 06 45 ===> s q l _ i n l i
06 41 05 44 3d 06 40 06 ===> n e _ h a n d l
41 48 00 00 00 00 00 00 ===> e r . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 50 03 00 00 00 00 ===> . . . ? . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
42 00 00 01 00 4c 4b 00 ===> f . . . . v u .
01 00 00 00 00 08 00 00 ===> . . . . . . . .
46 00 00 00 01 00 00 00 ===> p . . . . . . .
00 00 00 00 01 00 00 00 ===> . . . . . . . .
01 00 00 00 00 00 00 00 ===> . . . . . . . .
00 08 00 00 02 46 06 46 ===> . . . . / p l p
43 49 47 06 05 45 06 06 ===> g s q l _ i n l
45 06 41 05 44 3d 06 40 ===> i n e _ h a n d
06 41 48 15 18 06 45 3e ===> l e r ! $ l i b
40 45 48 02 46 06 46 43 ===> d i r / p l p g
49 47 06 ===> s q l
b6 01 00 00 t_xmin: 438
00 00 00 00 t_xmax: 0
03 00 00 00 t_field3: 3
00 00 bi_hi: 0
50 00 bi_lo: 80
07 00 ip_posid: 7
1d 00 t_infomask2: 29
Natts: 29
HEAP_KEYS_UPDATED: 0
HEAP_HOT_UPDATED: 0
HEAP_ONLY_TUPLE: 0
03 0b t_infomask: 2819
HEAP_HASNULL: 1
HEAP_HASVARWIDTH: 1
HEAP_HASEXTERNAL: 0
HEAP_HASOID_OLD: 0
HEAP_XMAX_KEYSHR_LOCK: 0
HEAP_COMBOCID: 0
HEAP_XMAX_EXCL_LOCK: 0
HEAP_XMAX_LOCK_ONLY: 0
HEAP_XMIN_COMMITTED: 1
HEAP_XMIN_INVALID: 1
HEAP_XMAX_COMMITTED: 0
HEAP_XMAX_INVALID: 1
HEAP_XMAX_IS_MULTI: 0
HEAP_UPDATED: 0
HEAP_MOVED_OFF: 0
HEAP_MOVED_IN: 0
20 t_hoff: 32
ffff0f06 NULL_BITFIELD: 11111111111111111111000001100
OID_OLD:

BODY AS HEX ===> PRINTABLE ASCII
ff 0f 06 00 00 00 00 00 ===> . . . . . . . .
49 20 00 00 46 06 46 43 ===> s 2 . . p l p g
49 47 06 05 4c 3d 06 45 ===> s q l _ v a l i
40 3d 4a 06 48 00 00 00 ===> d a t o r . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
00 00 50 03 00 00 00 00 ===> . . . ? . . . .
00 00 00 00 00 00 00 00 ===> . . . . . . . .
42 00 00 01 00 4c 4b 00 ===> f . . . . v u .
01 00 00 00 00 08 00 00 ===> . . . . . . . .
46 00 00 00 01 00 00 00 ===> p . . . . . . .
00 00 00 00 01 00 00 00 ===> . . . . . . . .
01 00 00 00 00 00 00 00 ===> . . . . . . . .
01 00 00 00 19 46 06 46 ===> . . . . % p l p
43 49 47 06 05 4c 3d 06 ===> g s q l _ v a l
45 40 3d 4a 06 48 15 18 ===> i d a t o r ! $
06 45 3e 40 45 48 02 46 ===> l i b d i r / p
06 46 43 49 47 06 ===> l p g s q l

Is there any interest in this stuff, and if so, where should it live? I'm happy to
reorganize this a bit if there is general interest in such a submission.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Mark Dilger

mark.dilger@enterprisedb.com

over 5 years ago

In reply to: Mark Dilger (#1)

1 attachment(s)

Re: Perl modules for testing/viewing/corrupting/repairing your heap files

Not having received any feedback on this, I've dusted the modules off for submission as-is.

Attachments:

v1-0001-Adding-HeapFile-related-perl-modules.patchapplication/octet-stream; name=v1-0001-Adding-HeapFile-related-perl-modules.patch; x-unix-mode=0644Download

From 5c7edcbe9e6f1c23aae40bbaf24f96d2de71242c Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 14 Apr 2020 10:09:49 -0700
Subject: [PATCH v1] Adding HeapFile related perl modules

Perl classes HeapFile, HeapPage, and HeapTuple implement a
tied structure over postgres heap files, making it easier to
read or modify them from within TAP tests while the server
is not running.
---
 src/test/perl/HeapFile.pm  |  645 ++++++++++++++++++++
 src/test/perl/HeapPage.pm  |  919 ++++++++++++++++++++++++++++
 src/test/perl/HeapTuple.pm | 1181 ++++++++++++++++++++++++++++++++++++
 3 files changed, 2745 insertions(+)
 create mode 100644 src/test/perl/HeapFile.pm
 create mode 100644 src/test/perl/HeapPage.pm
 create mode 100644 src/test/perl/HeapTuple.pm

diff --git a/src/test/perl/HeapFile.pm b/src/test/perl/HeapFile.pm
new file mode 100644
index 0000000000..b57cde60ac
--- /dev/null
+++ b/src/test/perl/HeapFile.pm
@@ -0,0 +1,645 @@
+
+=pod
+
+=head1 NAME
+
+HeapFile - definitions for tying arrays to PostgreSQL heap files.
+
+=head1 SYNOPSIS
+
+  use HeapFile;
+  use HeapPage;
+
+  my @immutable_file;
+  tie @immutable_file, 'HeapFile',
+    path => "/path/to/pgdata/heap/file",
+    pagesize => 8192,
+    mode => 'O_RDONLY';
+
+  for my $immutable_page (@immutable_file)
+  {
+    print $immutable_page->{xlogid}, "\n";
+  }
+
+  my @mutable_file;
+  tie @mutable_file, 'HeapFile',
+    path => "/path/to/pgdata/heap/file",
+    pagesize => 8192,
+    mode => 'O_RDWR';
+
+  # Bad ideas about managing your postgres data
+  for my $page (@mutable_file)
+  {
+    $page->{pd_checksum} = 0;
+    for my $tuple (@{$page->{tuples}})
+    {
+      $tuple->{HEAP_HASOID_OLD} = 0;
+      $tuple->{t_xmin} = $tuple->{t_xmax}
+        if ($tuple->{HEAP_XMIN_COMMITTED});
+    }
+  }
+
+  # Make sure all changes have been written to disk
+  untie @mutable_file;
+
+=head1 DESCRIPTION
+
+BEWARE: This module has nothing to do with tying perl arrays to a database for
+the purposes of storing and retrieving user data.  This module is a debugging
+tool for users who want to view and potentially change the data files that
+underly a PostgreSQL cluster for debugging or recovering data.
+
+This module provides a mechanism for array-tying a PostgreSQL heap file (or a
+file pretending to be one) to a perl array.
+
+An array tied to a PostgreSQL heap file will contain as many elements as there
+are pages within the file.  Each element will be a HeapPage tied hash.  As
+such, this class, and its accompanying HeapPage and HeapTuple helper classes,
+implement a tied array of tied hashes.
+
+Arrays tied in O_RDONLY mode will only read the heap file, and though
+modifications may succeed in memory, they will not be written back to the file.
+Arrays tied in O_RDWR mode will read the heap file, and modifications will be
+made both in memory and in the disk file.
+
+=cut
+
+package HeapFile;
+
+use strict;
+use warnings;
+use Tie::Array;
+use HeapPage;
+use Fcntl qw(SEEK_SET SEEK_CUR SEEK_END O_RDONLY O_RDWR);
+
+our @ISA = qw(Tie::Array);
+
+# Pack 8192 4-byte zeros for a total of 32k.  This is large enough for
+# a zero page of any supported size.
+my $ZEROS = pack("I[8192]", map { 0 } (1..8192));
+
+# Constructor.  For inheritability, most work is performed in _init() rather
+# than here.  We also don't document this function in POD, as users should
+# be getting here through the tie interface, not through invoking new().
+#
+sub new
+{
+	my $thing = shift;
+	my $class = ref($thing) || $thing;
+	my $self = { };
+
+	bless $self, $class;
+	$self->_init(@_);
+
+	return $self;
+}
+
+sub _init
+{
+	my ($self, %params) = @_;
+	my $classname = ref($self) || "HeapFile";
+	foreach my $param (qw(path))
+	{
+		die "In $classname: Missing required parameter '$param'"
+			unless defined $params{$param};
+	}
+	$self->{path} = delete $params{path};
+	$self->{pagesize} = delete $params{pagesize};
+	$self->{pagesize} = HeapPage::DEFAULT_PAGESIZE unless(defined $self->{pagesize});
+	$self->{pageno} = delete $params{pageno};
+	$self->{mode} = delete $params{mode};
+	$self->{mode} = O_RDONLY unless defined $self->{mode};
+	die sprintf("In $classname: Unrecognized parameters: %s",
+				join(', ', sort keys %params))
+		if (scalar keys %params);
+
+	HeapPage::validate_pagesize($self->{pagesize});
+	$self->{mode} = HeapPage::validate_file_mode($self->{mode});
+
+	# Initialize all known fields.  These will get overwritten
+	# when we read the page from disk
+	$self->{pages} = [];
+	$self->_read();
+}
+
+sub _read
+{
+	my ($self) = @_;
+
+	# Open the file ourselves, rather than having HeapPage open it repeatedly,
+	# as we want the filesystem to give us a consistent view of the file, even
+	# if other processes are modifying it simultaneously.
+	#
+	# We're not really hardened against concurrent modification, so this is just
+	# a half-measure, but it seems better than taking no precautions at all.
+	# At least in O_RDONLY mode, we'll have a consistent view of the file as it
+	# was when we opened it.
+	my $fh;
+	sysopen($fh, $self->{path}, O_RDONLY)
+		or die "Cannot open $self->{path} for reading: $!";
+	binmode($fh);
+	sysseek($fh, 0, SEEK_SET);
+	my $fsize = (stat($fh))[7];
+	for (my $pageno = 0; $fsize > 0; $pageno++)
+	{
+		my %page;
+		tie %page, 'HeapPage',
+			path => $self->{path},
+			pagesize => $self->{pagesize},
+			pageno => $pageno,
+			mode => $self->{mode},
+			fh => $fh;
+		push (@{$self->{pages}}, \%page);
+		$fsize -= $self->{pagesize};
+	}
+}
+
+=pod
+
+=head1 METHODS
+
+=over
+
+=item TIEARRAY classname, OPTIONS
+
+The method invoked by the command "tie @array, 'HeapFile'".
+
+OPTIONS is a list of name => value pairs, as follows:
+
+=over 4
+
+=item path
+
+Required.  The filesystem path (relative or absolute) to the heap file.
+Typically, this will be a path into a stopped PostgreSQL cluster's
+PGDATA directory, something like data/base/12922/3456.
+
+The specified file must exist, and the caller must have permission to
+open the file in the specified mode.  No facility is provided to create
+new heap files through this module.
+
+=item mode
+
+Optional.  The mode to use when opening the underlying heap file.  Valid
+options are a subset of modes defined by Fcntl, specifically:
+
+Defaults to O_RDONLY.
+
+=over 4
+
+=item O_RDONLY
+
+Ties the array in read-only mode.  The underlying file will be read but not
+modified.
+
+=item O_RDWR
+
+Ties the array in read-write mode.  The underlying file will be read, and
+any changes to the array will be written back to the file.
+
+=back
+
+For callers who have not 'used' Fcntl, these modes can be specified as quoted
+strings rather than as numerical constants.  Both forms are accepted.
+
+=item pagesize
+
+Optional.  The page size used when PostgreSQL was configured.  Defaults
+to 8192 (8 kilobytes), which is the same as the default that PostgreSQL's
+configure program uses.  Using a page size other than the one the cluster
+was configured for may result in considerable confusion, and if opened
+in O_RDWR mode, may also cause corruption when the array is modified or
+untied and consequently written back to disk.
+
+=back
+
+=cut
+
+sub TIEARRAY
+{
+	my $classname = shift;
+	my $self = $classname->new(@_);
+}
+
+=pod
+
+=item FETCH this, index
+
+The method invoked when a field in the tied array is read.
+
+Returns a reference to the tied hash for the page at the given index, or undef
+if the index is beyond the end of the file.  The tied page belongs to the
+HeapFile tied array, not to the caller.  The caller is at liberty to read
+and modify the fields of the tied page, but should be careful not to untie
+the page, as the resulting behavior if the page is untied is undefined.
+
+=cut
+
+sub FETCH
+{
+	my ($self, $index) = @_;
+	return $self->{pages}->[$index];
+}
+
+=pod
+
+=item FETCHSIZE this
+
+The method invoked when calling scalar(@array) on the tied array.
+
+Returns the number of pages in the tied array.
+
+=cut
+
+sub FETCHSIZE
+{
+	my ($self) = @_;
+	return scalar(@{$self->{pages}});
+}
+
+=pod
+
+=item STORE this, index, value
+
+The method is invoked when a field in the tied array is written or modified.
+
+If the value is a reference to a hash with an appropriate structure, that
+structure will be assigned into the HeapPage tied hash at the given location
+in the array.  Fields not specified will be left unaltered, or if no tied
+hash yet exists at the specified array index, will be initialized as zero.
+Any unrecognized keys or inappropriate values in the argument will draw an
+exception and the tied array will be unaltered.
+
+=cut
+
+sub STORE
+{
+	my ($self, $index, $value) = @_;
+
+	# Check that the argument value is valid before modifying our array
+	# of HeapPages.  We don't want to perform a partial modification
+	# and then throw an exception, as it would leave us in an inconsistent
+	# state if the exception were caught.
+	HeapPage::validate_hash($value);
+
+	# If the index is beyond our array bounds, extend the heap file one
+	# page at a time with zero pages until it is big enough.  This includes
+	# creating a zero page for the target page.  We'll update the target
+	# fields within the page after creating the zero page.
+	my $pageno;
+	for ($pageno = scalar(@{$self->{pages}}); $pageno <= $index; $pageno++)
+	{
+		my %newpage;
+		tie %newpage, 'HeapPage',
+			path => $self->{path},
+			pagesize => $self->{pagesize},
+			pageno => $pageno,
+			mode => $self->{mode};
+		die "Overwriting previous value!"
+			if defined $self->{pages}->[$pageno];
+		$self->{pages}->[$pageno] = \%newpage;
+	}
+
+	# Check that the target page now exists in our array
+	die "No page allocated"
+		unless defined $self->{pages}->[$index];
+	my $target = $self->{pages}->[$index];
+
+	# Overwrite the target page with the fields we were given, leaving all
+	# unspecified fields zero
+	$target->{$_} = $value->{$_} for(keys %$value);
+}
+
+=pod
+
+=item STORESIZE this, newsize
+
+This method is invoked to make the tied array longer or shorter.
+
+Setting the length longer than the number of pages in the underlying file
+will cause zero pages to be created at the end of the array.  Setting the
+length shorter will truncate pages from the end of the array.  Whether this
+happens only in memory or also on disk depends on whether the array was
+tied in O_RDONLY or O_RDWR mode.
+
+=cut
+
+sub STORESIZE
+{
+	my ($self, $newsize) = @_;
+	my $pageno;
+	if ($newsize > scalar(@{$self->{pages}}))
+	{
+		for ($pageno = scalar(@{$self->{pages}}); $pageno < $newsize; $pageno++)
+		{
+			my %newpage;
+			tie %newpage, 'HeapPage',
+				path => $self->{path},
+				pagesize => $self->{pagesize},
+				pageno => $pageno,
+				mode => $self->{mode};
+			die "Overwriting previous value!"
+				if defined $self->{pages}->[$pageno];
+			$self->{pages}->[$pageno] = \%newpage;
+		}
+	}
+	elsif (scalar(@{$self->{pages}}) > $newsize)
+	{
+		while (scalar(@{$self->{pages}}) > $newsize)
+		{
+			my $hashref = pop(@{$self->{pages}});
+			untie %$hashref;
+		}
+		# Simply untying the pages won't erase ones beyond the end of the array,
+		# so we have to do extra work if we're in O_RDWR mode to shorten the
+		# underlying heap file.
+		if ($self->{mode} == O_RDWR)
+		{
+			my $fh;
+			sysopen($fh, $self->{path}, O_RDWR)
+				or die "Cannot open $self->{path} for truncating: $!";
+			binmode($fh);
+			sysseek($fh, 0, SEEK_SET);
+			my $current_fsize = (stat($fh))[7];
+			my $new_fsize = scalar(@{$self->{pages}}) * $self->{pagesize};
+			if ($new_fsize < $current_fsize)
+			{
+				warn "Attempt to truncate $self->{path} down to $new_fsize bytes failed"
+					unless (truncate($fh, $new_fsize));
+			}
+			else
+			{
+				warn "In PageFile::STORESIZE, expected new_fsize < current_fsize " .
+					 "($new_fsize vs. $current_fsize)";
+			}
+			$fh->close();
+		}
+	}
+}
+
+=pod
+
+=item CLEAR this
+
+This method is invoked to remove all elements from the tied array.
+
+Truncate all pages from the array.  Whether this happens only in memory or also
+on disk depends on whether the array was tied in O_RDONLY or O_RDWR mode.
+
+=cut
+
+sub CLEAR
+{
+	my ($self) = @_;
+	$self->STORESIZE(0);
+}
+
+=pod
+
+=item PUSH this, value
+
+This method is invoked to add an element to the end of the tied array.
+
+Whether this happens only in memory or also on disk depends on whether the
+array was tied in O_RDONLY or O_RDWR mode.
+
+=cut
+
+sub PUSH
+{
+	my ($self, $value) = @_;
+	$self->STORE(scalar(@{$self->{pages}}), $value);
+}
+
+=pod
+
+=item POP this
+
+This method is invoked to remove an element from the end of the tied array.
+
+Whether this happens only in memory or also on disk depends on whether the
+array was tied in O_RDONLY or O_RDWR mode.
+
+The returned value is not a tied HeapPage, but merely a clone of the data
+from what was a tied HeapPage prior to being popped.  This is necessary, at
+least in O_RDWR mode, as the file is shortened by the action, and therefore
+the returned value no longer refers to an actual page on disk.  We could
+perhaps return a tied HeapPage when in O_RDONLY mode, but it seems sensible
+to handle both cases the same.
+
+=cut
+
+sub POP
+{
+	my ($self) = @_;
+
+	return undef unless @{$self->{pages}};
+
+	my $tied_page = pop(@{$self->{pages}});
+	my %untied_page;
+	$untied_page{$_} = $tied_page->{$_} for keys %$tied_page;
+	untie %$tied_page;
+	undef($tied_page);
+
+	# Simply untying the page won't erase it from the file, so we have to do
+	# extra work if we're in O_RDWR mode to shorten the underlying heap file.
+	if ($self->{mode} == O_RDWR)
+	{
+		my $fh;
+		sysopen($fh, $self->{path}, O_RDWR)
+			or die "Cannot open $self->{path} for truncating: $!";
+		binmode($fh);
+		sysseek($fh, 0, SEEK_SET);
+		my $current_fsize = (stat($fh))[7];
+		my $new_fsize = scalar(@{$self->{pages}}) * $self->{pagesize};
+		if ($new_fsize < $current_fsize)
+		{
+			warn "Attempt to truncate $self->{path} down to $new_fsize bytes failed"
+				unless (truncate($fh, $new_fsize));
+		}
+		else
+		{
+			warn "In PageFile::POP, expected new_fsize < current_fsize " .
+				 "($new_fsize vs. $current_fsize)";
+		}
+		$fh->close();
+	}
+	return \%untied_page;
+}
+
+=item SHIFT this
+
+This method is invoked to remove an element from the beginning of the tied
+array.
+
+Whether this happens only in memory or also on disk depends on whether the
+array was tied in O_RDONLY or O_RDWR mode.
+
+The returned value is not a tied HeapPage, but merely a clone of the data from
+what was a tied HeapPage prior to being shifted off the array.  This is
+necessary, at least in O_RDWR mode, as the file is shortened by the action, and
+therefore the returned value no longer refers to an actual page on disk.  We
+could perhaps return a tied HeapPage when in O_RDONLY mode, but it seems
+sensible to handle both cases the same.
+
+=cut
+
+sub SHIFT
+{
+	my ($self) = @_;
+
+	return undef unless @{$self->{pages}};
+
+	# Store the first page, for returning later
+	my $tied_page = $self->{pages}->[0];
+	my %untied_page;
+	$untied_page{$_} = $tied_page->{$_} for keys %$tied_page;
+
+	# If we're in O_RDWR mode, move the content of the file page-by-page to be
+	# one page ealier than it started.  Then re-read the file to re-synchronize
+	# our in-memory array.
+	if ($self->{mode} == O_RDWR)
+	{
+		# Untie and destroy all pages we currently have in memory.  We don't
+		# want the pages re-writing themselves to disk after we've manually
+		# modified the file.
+		while (@{$self->{pages}})
+		{
+			my $tied = pop(@{$self->{pages}});
+			untie %$tied;
+			undef $tied;
+		}
+
+		# Shift the file down one page
+		my $fh;
+		sysopen($fh, $self->{path}, O_RDWR)
+			or die "Cannot open $self->{path} for truncating: $!";
+		binmode($fh);
+		sysseek($fh, 0, SEEK_SET);
+		my $old_fsize = (stat($fh))[7];
+		my $new_fsize = $old_fsize - $self->{pagesize};
+		my ($result, $buffer, $readpos, $writepos);
+		for ($readpos = $self->{pagesize}, $writepos = 0;
+			 $readpos < $old_fsize;
+			 $readpos += $self->{pagesize}, $writepos += $self->{pagesize})
+		{
+			sysseek($fh, $readpos, SEEK_SET);
+			$result = sysread($fh, $buffer, $self->{pagesize});
+			warn "Partial read" if ($result < $self->{pagesize});
+			warn "Read failure: $!" unless defined $result;
+			sysseek($fh, $writepos, SEEK_SET);
+			$result = syswrite($fh, $buffer, $self->{pagesize});
+			warn "Partial write" if ($result < $self->{pagesize});
+			warn "Write failure: $!" unless defined $result;
+		}
+		warn "Attempt to truncate $self->{path} down to $new_fsize bytes failed"
+			unless (truncate($fh, $new_fsize));
+		$fh->close();
+
+		# Re-read the file, restoring our in-memory array
+		$self->_read();
+	}
+	else
+	{
+		# In O_RDONLY mode, we would like to change each HeapPage to have a pageno
+		# one less than before.  We don't have a way of doing that, and it doesn't
+		# actually matter, because in O_RDONLY mode the pageno won't be used after
+		# the initial loading from disk, which has already happened.
+		#
+		# Shorten our array, leaving the pageno fields broken
+		shift (@{$self->{pages}});
+	}
+	return \%untied_page;
+}
+
+=item UNSHIFT this, value
+
+This method is invoked to insert an element at the beginning of the tied
+array.
+
+Whether this happens only in memory or also on disk depends on whether the
+array was tied in O_RDONLY or O_RDWR mode.
+
+=cut
+
+sub UNSHIFT
+{
+	my ($self, $value) = @_;
+
+	HeapPage::validate_hash($value);
+
+	# If we're in O_RDWR mode, move the content of the file page-by-page to be
+	# one page later than it started.  Then re-read the file to re-synchronize
+	# our in-memory array.
+	if ($self->{mode} == O_RDWR)
+	{
+		# Untie and destroy all pages we currently have in memory.  We don't
+		# want the pages re-writing themselves to disk after we've manually
+		# modified the file.
+		while (@{$self->{pages}})
+		{
+			my $tied = pop(@{$self->{pages}});
+			untie %$tied;
+			undef $tied;
+		}
+
+		# Shift the file down one page
+		my $fh;
+		sysopen($fh, $self->{path}, O_RDWR)
+			or die "Cannot open $self->{path} for truncating: $!";
+		binmode($fh);
+		sysseek($fh, 0, SEEK_SET);
+		my $old_fsize = (stat($fh))[7];
+		my $new_fsize = $old_fsize + $self->{pagesize};
+		my ($result, $buffer, $readpos, $writepos);
+		for ($readpos = 0, $writepos = $self->{pagesize};
+			 $readpos < $old_fsize;
+			 $readpos += $self->{pagesize}, $writepos += $self->{pagesize})
+		{
+			sysseek($fh, $readpos, SEEK_SET);
+			$result = sysread($fh, $buffer, $self->{pagesize});
+			warn "Partial read" if ($result < $self->{pagesize});
+			warn "Read failure: $!" unless defined $result;
+			sysseek($fh, $writepos, SEEK_SET);
+			$result = syswrite($fh, $buffer, $self->{pagesize});
+			warn "Partial write" if ($result < $self->{pagesize});
+			warn "Write failure: $!" unless defined $result;
+		}
+		# For now, just fill in the first page with zeros.
+		sysseek($fh, 0, SEEK_SET);
+		$result = syswrite($fh, $ZEROS, $self->{pagesize});
+		warn "Partial write" if ($result < $self->{pagesize});
+		warn "Write failure: $!" unless defined $result;
+		$fh->close();
+
+		# Re-read the file, restoring our in-memory array, and vivifying
+		# a zero page at the beginning
+		$self->_read();
+
+		# Copy the unshifted value into the first page
+		my $target = $self->{pages}->[0];
+
+		# Overwrite the target page with the fields we were given, leaving all
+		# unspecified fields zero
+		$target->{$_} = $value->{$_} for(keys %$value);
+	}
+	else
+	{
+		my %page;
+		tie %page, 'HeapPage',
+			path => $self->{path},
+			pagesize => $self->{pagesize},
+			pageno => 0,
+			mode => $self->{mode},
+			virtual => $value;
+		unshift(@{$self->{pages}}, \%page);
+	}
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
diff --git a/src/test/perl/HeapPage.pm b/src/test/perl/HeapPage.pm
new file mode 100644
index 0000000000..dce4d0ad1e
--- /dev/null
+++ b/src/test/perl/HeapPage.pm
@@ -0,0 +1,919 @@
+
+=pod
+
+=head1 NAME
+
+HeapPage - definitions for tying hashes to PostgreSQL heap pages.
+
+=head1 SYNOPSIS
+
+  use HeapPage;
+
+  # Open the 36th 8k page in read-only mode
+  my %immutable_page;
+  tie %immutable_page, 'HeapPage',
+    path => "/path/to/pgdata/heap/file",
+    pagesize => 8192,
+    pageno => 35,
+    mode => 'O_RDONLY';
+
+  # Open the first 8k page in read-write mode
+  my %mutable_page;
+  tie %mutable_page, 'HeapPage',
+    path => "/path/to/pgdata/heap/page",
+    pagesize => 8192,
+    pageno => 0,
+    mode => 'O_RDWR';
+
+  # Print human readable contents of the pages
+  print scalar(%immutable_page), "\n";
+  print scalar(%mutable_page), "\n";
+
+  # Read some header fields
+  print "Checksum: ", $immutable_page{pd_checksum}, "\n";
+
+  # Modify some header fields
+  $mutable_page{pd_checksum} += 12;
+  $mutable_page{pd_lsn_xlogid} = 123456;
+
+  # Copy fields from one page into another, overwriting them there
+  $mutable_page{$_} = $immutable_page{$_} for keys %immutable_page;
+
+=head1 DESCRIPTION
+
+BEWARE: This module has nothing to do with tying perl hashes to a database for
+the purposes of storing and retrieving user data.  This module is a debugging
+tool for users who want to view and potentially change the data files that
+underly a PostgreSQL cluster for debugging or recovering data.  If you want to
+store data in PostgreSQL through a perl tied hash, see for example Tie::DBI.
+
+This module provides a mechanism for hash-tying a page from a PostgreSQL heap
+file (or a file pretending to be one) to a perl hash.
+
+The hash behaves like a regular perl hash, within limits.  Specifically, the
+hash can only be used to read or write fields that actually exist within
+PostgreSQL heap pages.  Attempts to read or write fields of any other name will
+draw an exception.  Generally speaking, any operation on the hash which
+preserves the data format of a heap page will succeed, but other operations
+(such as adding or removing keys) will fail.
+
+All HeapPage tied hashes must be tied to an existant file, and the caller must
+have filesystem permissions (read or read+write) to open the file in the
+specified mode.  Modifications to tied hashes opened O_RDONLY will succeed in
+memory, but the modification will not be written to disk.  Modifications to
+tied hashes opened in O_RDWR will succeed in memory and be written to disk,
+overwriting the existing page within the file.  No file locking is performed.
+Beware that using this module to tie pages belonging to a running postgresql
+cluster may give undefined (or catastrophic) results.
+
+Each page must be tied not only to a file, but to a specific page within that
+file.  Once tied, the page cannot be relocated to a different file nor to a
+different location within the same file, nor can the pagesize be altered.
+
+=cut
+
+package HeapPage;
+
+use strict;
+use warnings;
+use Tie::Hash;
+use IO::File;
+use Carp;
+use HeapTuple;
+use Fcntl qw(SEEK_SET SEEK_CUR SEEK_END O_RDONLY O_RDWR);
+
+our @ISA = qw(Tie::StdHash);
+
+use constant SIZE_UINT16 => 2;
+use constant MIN_UINT16 => 0;
+use constant MAX_UINT16 => 2**16 - 1;
+
+use constant SIZE_UINT32 => 4;
+use constant MIN_UINT32 => 0;
+use constant MAX_UINT32 => 2**32 - 1;
+
+# Hard-code some constants from elsewhere in the PostgreSQL project.
+# Be sure to update these if you change the core code.
+use constant MAX_BLOCK_NUMBER => 0xFFFFFFFE;	# From storage/block.h
+use constant DEFAULT_PAGESIZE => 8192;			# From configure.in
+
+use constant LP_UNUSED => 0;		# From storage/itemid.h
+use constant LP_NORMAL => 1;		# From storage/itemid.h
+use constant LP_REDIRECT => 2;		# From storage/itemid.h
+use constant LP_DEAD => 3;			# From storage/itemid.h
+
+# Hard-code some information about the PageHeaderData struct.  We need to read
+# and write binary copies of this, using perl's pack() and unpack() functions,
+# but at least we can avoid scattering our assumptions about the format of the
+# PageHeaderData structure throughout the module and instead declare them here
+# in one place.
+#
+# pd_lsn:  struct
+#	xlogid:				uint32		=> L (Unsigned 32-bit Integer)	/ 4 bytes
+#	xrecoff:			uint32		=> L (Unsigned 32-bit Integer)	/ 4 bytes
+# pd_checksum:			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# pd_flags:				uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# pd_lower:				uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# pd_upper:				uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# pd_special:			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# pd_pagesize_version:	uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# pd_prune_xid:			uint32		=> L (Unsigned 32-bit Integer)	/ 4 bytes
+#                                                                  ----------
+#                                                         Total      24 bytes
+#
+use constant PAGEHEADER_PACK_CODE => 'LLSSSSSSL';
+use constant PAGEHEADER_PACK_LENGTH => 24;		# Total size
+
+# The names of page header fields.  If you modify this list, also modify the
+# tied hash function 'sub SCALAR' below, and %PageHeaderBytes and
+# %PageHeaderRange also.
+#
+# THESE MUST BE IN PAGE ORDER!
+#
+# Note that we treat the two subfields of pd_lsn as if they were top level
+# fields, and name them as such.  This avoids needing to have nested tied
+# hashes, which seems like it would be more work than it is worth.
+#
+our @PageHeaderKeys = qw(
+	pd_lsn_xlogid pd_lsn_xrecoff pd_checksum pd_flags pd_lower pd_upper
+	pd_special pd_pagesize_version pd_prune_xid
+);
+
+# All PageHeaderKeys must have entries here.  It is tempting to use the size of
+# the field to define the [min..max] range, but given the differences between
+# signed and unsigned types, and the option to use less than all bits in a
+# field, we keep a separate list of [min..max] in %PageHeaderRange, below.
+our %PageHeaderBytes = (
+	pd_lsn_xlogid => SIZE_UINT32,
+	pd_lsn_xrecoff => SIZE_UINT32,
+	pd_checksum => SIZE_UINT16,
+	pd_flags => SIZE_UINT16,
+	pd_lower => SIZE_UINT16,
+	pd_upper => SIZE_UINT16,
+	pd_special => SIZE_UINT16,
+	pd_pagesize_version => SIZE_UINT16,
+	pd_prune_xid => SIZE_UINT32,
+);
+our %PageHeaderMask = (
+	pd_lsn_xlogid => 0xFFFFFFFF,
+	pd_lsn_xrecoff => 0xFFFFFFFF,
+	pd_checksum => 0xFFFF,
+	pd_flags => 0xFFFF,
+	pd_lower => 0xFFFF,
+	pd_upper => 0xFFFF,
+	pd_special => 0xFFFF,
+	pd_pagesize_version => 0xFFFF,
+	pd_prune_xid => 0xFFFFFFFF,
+);
+our %PageHeaderRange = (
+	pd_lsn_xlogid => [MIN_UINT32, MAX_UINT32],
+	pd_lsn_xrecoff => [MIN_UINT32, MAX_UINT32],
+	pd_checksum => [MIN_UINT16, MAX_UINT16],
+	pd_flags => [MIN_UINT16, MAX_UINT16],
+	pd_lower => [MIN_UINT16, MAX_UINT16],
+	pd_upper => [MIN_UINT16, MAX_UINT16],
+	pd_special => [MIN_UINT16, MAX_UINT16],
+	pd_pagesize_version => [MIN_UINT16, MAX_UINT16],
+	pd_prune_xid => [MIN_UINT32, MAX_UINT32],
+);
+
+our @PageKeys = (@PageHeaderKeys, "linepointers");
+our %PageKeys = map { $_ => 1 } @PageKeys;
+
+#
+# Module utility functions
+#
+
+sub integer_in_range
+{
+	my ($int, $min, $max) = @_;
+	return (defined $int && $int =~ m/^\d+$/ && $int >= $min && $int <= $max);
+}
+
+sub integer_in_list
+{
+	my ($int, @values) = @_;
+	return unless (defined $int && $int =~ m/^\d+$/);
+	for (@values)
+	{
+		return 1 if ($int == $_);
+	}
+}
+
+sub validate_hash
+{
+	my ($hashref) = @_;
+
+	die "Not a hash ref"
+		unless(defined $hashref && ref($hashref) && ref($hashref) =~ m/^HASH$/);
+	foreach my $key (keys %$hashref)
+	{
+		my $value = $hashref->{$key};
+
+		die "Attempt to store unrecognized HeapPage key $key"
+			unless exists $PageKeys{$key};
+		die "cannot store undefined value for $key"
+			unless defined $value;
+
+		if (exists $PageHeaderRange{$key})
+		{
+			my ($min, $max) = @{$PageHeaderRange{$key}};
+			die "$key:$value not within supported range [$min..$max]"
+				unless integer_in_range($value, $min, $max);
+		}
+	}
+}
+
+sub validate_pagesize
+{
+	my $pagesize = shift;
+	die "Invalid pagesize: $pagesize"
+		unless integer_in_list($pagesize, (2**11, 2**12, 2**13, 2**14, 2**15));
+}
+
+sub validate_pageno
+{
+	my $pageno = shift;
+	die "Invalid pageno: $pageno"
+		unless integer_in_range($pageno, 0, MAX_BLOCK_NUMBER);
+}
+
+sub validate_file_mode
+{
+	my $mode = shift;
+	return O_RDONLY if ($mode eq 'O_RDONLY');
+	return O_RDWR if ($mode eq 'O_RDWR');
+	return $mode if ($mode == O_RDONLY || $mode == O_RDWR);
+
+	die "Invalid or unsupported file mode";
+}
+
+sub _pageheader_pack
+{
+	my ($self) = @_;
+
+	my $pack = pack(PAGEHEADER_PACK_CODE,
+					$self->{pagedata}->{pd_lsn_xlogid},
+					$self->{pagedata}->{pd_lsn_xrecoff},
+					$self->{pagedata}->{pd_checksum},
+					$self->{pagedata}->{pd_flags},
+					$self->{pagedata}->{pd_lower},
+					$self->{pagedata}->{pd_upper},
+					$self->{pagedata}->{pd_special},
+					$self->{pagedata}->{pd_pagesize_version},
+					$self->{pagedata}->{pd_prune_xid});
+
+	# sanity check -- make sure it round trips ok.  This is not
+	# a test of Perl's pack mechanism, but of the assumption that
+	# none of our fields will be truncated/altered when using the
+	my @unpack = unpack(PAGEHEADER_PACK_CODE, $pack);
+	die "_pageheader_pack does not roundtrip"
+		unless ($unpack[0] == $self->{pagedata}->{pd_lsn_xlogid} &&
+				$unpack[1] == $self->{pagedata}->{pd_lsn_xrecoff} &&
+				$unpack[2] == $self->{pagedata}->{pd_checksum} &&
+				$unpack[3] == $self->{pagedata}->{pd_flags} &&
+				$unpack[4] == $self->{pagedata}->{pd_lower} &&
+				$unpack[5] == $self->{pagedata}->{pd_upper} &&
+				$unpack[6] == $self->{pagedata}->{pd_special} &&
+				$unpack[7] == $self->{pagedata}->{pd_pagesize_version} &&
+				$unpack[8] == $self->{pagedata}->{pd_prune_xid});
+
+	return $pack;
+}
+
+sub _pageheader_unpack
+{
+	my ($self, $packed) = @_;
+	return unpack(PAGEHEADER_PACK_CODE, $packed);
+}
+
+sub _linepointer_encode
+{
+	my ($lp_off, $lp_flags, $lp_len) = @_;
+	my $uint32 = ($lp_len << 17) |
+				 ($lp_flags << 15) |
+				 ($lp_off);
+	return $uint32;
+}
+
+sub _linepointer_decode
+{
+	my ($uint32) = @_;
+	my $lp_off = $uint32 & 0x7FFF;
+	my $lp_flags = ($uint32 >> 15) & 0x03;
+	my $lp_len = ($uint32 >> 17) & 0x7FFF;
+	return ($lp_off, $lp_flags, $lp_len);
+}
+
+sub _linepointers_pack
+{
+	my ($self) = @_;
+
+	# my $test = _linepointer_encode(5, 1, 7);
+	# my @test = _linepointer_decode($test);
+	# die "Encoded (5,1,7), got back @test";
+
+	my @linewords;
+	foreach my $linepointer (@{$self->{pagedata}->{linepointers}})
+	{
+		my $uint32 = _linepointer_encode($linepointer->{lp_off},
+										 $linepointer->{lp_flags},
+										 $linepointer->{lp_len});
+
+		# Debugging
+		my ($a, $b, $c) = _linepointer_decode($uint32);
+		die sprintf("linepointers did not round-trip: (%u,%u,%u) => (%u,%u,%u)",
+						$linepointer->{lp_off},
+						$linepointer->{lp_flags},
+						$linepointer->{lp_len},
+						$a, $b, $c)
+			unless($a == $linepointer->{lp_off} &&
+				   $b == $linepointer->{lp_flags} &&
+				   $c == $linepointer->{lp_len});
+
+		push (@linewords, $uint32);
+	}
+	my $wordlength = scalar(@linewords);
+	my $packed = pack("L[$wordlength]", @linewords);
+
+	return ($wordlength*4, $packed);
+}
+
+sub _linepointers_unpack
+{
+	my ($self, $packed, $bytelength) = @_;
+	my @linepointers;
+
+	die "_linepointers_unpack handed bytelength = $bytelength"
+		if ($bytelength % 4 != 0);
+
+	my $wordlength = $bytelength / 4;
+
+	my @linewords = unpack("L[$wordlength]", $packed);
+	foreach my $lineword (@linewords)
+	{
+		my ($lp_off, $lp_flags, $lp_len) = _linepointer_decode($lineword);
+		push (@linepointers, { lp_off => $lp_off,
+							   lp_flags => $lp_flags,
+							   lp_len => $lp_len });
+	}
+	$self->{pagedata}->{linepointers} = \@linepointers;
+}
+
+sub _init
+{
+	my ($self, %params) = @_;
+	my $classname = ref($self) || "HeapPage";
+
+	foreach my $param (qw(path pageno))
+	{
+		die "In $classname: Missing required parameter '$param'"
+			unless defined $params{$param};
+	}
+	$self->{path} = delete $params{path};
+	$self->{pagesize} = delete $params{pagesize};
+	$self->{pagesize} = DEFAULT_PAGESIZE unless(defined $self->{pagesize});
+	$self->{pageno} = delete $params{pageno};
+	$self->{mode} = delete $params{mode};
+	$self->{mode} = O_RDONLY unless defined $self->{mode};
+
+	validate_pagesize($self->{pagesize});
+	validate_pageno($self->{pageno});
+	$self->{mode} = validate_file_mode($self->{mode});
+
+	$self->{seekto} = $self->{pagesize} * $self->{pageno};
+
+	# Initialize all known fields.  These will get overwritten
+	# when we read the page from disk or copy it from our virtual argument
+	$self->{pagedata} =
+		{
+			pd_lsn_xlogid => 0,
+			pd_lsn_xrecoff => 0,
+			pd_checksum => 0,
+			pd_flags => 0,
+			pd_lower => 0,
+			pd_upper => 0,
+			pd_special => 0,
+			pd_pagesize_version => 0,
+			pd_prune_xid => 0,
+		};
+	$self->{tuples} = [];
+
+	if (exists $params{virtual})
+	{
+		$self->{pagedata}->{$_} = $params{virtual}->{$_}
+			for keys %{$params{virtual}};
+	}
+	else
+	{
+		my $fh = delete $params{fh};
+		die sprintf("In $classname: Unrecognized parameters: %s",
+					join(', ', sort keys %params))
+			if (scalar keys %params);
+		$self->_read($fh);
+	}
+}
+
+sub _read
+{
+	my ($self, $fh) = @_;
+
+	# If the caller handed us an open file handle, use that, otherwise open
+	# the file ourselves.
+	my $did_open = 0;
+	unless (defined $fh)
+	{
+		sysopen($fh, $self->{path}, O_RDONLY)
+			or die "Cannot open $self->{path} for reading: $!";
+		binmode($fh);
+		$did_open = 1;
+	}
+	my ($result, $rawdata, $zeropage, $incompletepage);
+	sysseek($fh, $self->{seekto}, SEEK_SET);
+	$result = sysread($fh, $rawdata, PAGEHEADER_PACK_LENGTH);
+
+	# We attempted to read the entire page, but we really only have trouble
+	# if we got less than the header.  This will change, as eventually
+	# we'll implement processing of the entire page.
+	if ($result > 0 && $result < PAGEHEADER_PACK_LENGTH)
+	{
+		warn("_read: Read partial page header: Expected " .
+				PAGEHEADER_PACK_LENGTH .
+				" bytes, but got $result");
+		$incompletepage = 1;
+	}
+	elsif ($result == 0)
+	{
+		warn("_read: Read beyond end of heap page file: simulating read " .
+			 "of an all-zero page");
+		$zeropage = 1;
+	}
+	elsif (! defined $result)
+	{
+		die "Cannot read from $self->{path}: $!";
+	}
+
+	# Unpack the page header fields into perl scalars
+	my @unpacked = $self->_pageheader_unpack($rawdata);
+
+	# Sanity-check and store the page header fields, filling in
+	# zeroes for empty or incomplete page reads
+	foreach my $key (@PageHeaderKeys)
+	{
+		my $value = shift @unpacked;
+		$value = 0 if ($zeropage);
+		die "HeapPage unpacked fewer header fields than expected"
+			unless (defined $value || $incompletepage);
+		$value = 0 unless defined $value;
+		$value &= $PageHeaderMask{$key}
+			if (exists $PageHeaderMask{$key});
+		validate($key, $value);
+		$self->{pagedata}->{$key} = $value;
+	}
+	die "HeapPage unpacked more header fields than expected"
+		if scalar @unpacked;
+
+	# Read the line pointers, if possible.  We need to handle corrupted pages,
+	# so be careful to only read them if the page header is sensible.  We read
+	# and store line pointers even if they are unreasonable; we only skip them
+	# if the tuple header would have us reading outside the bounds of the page.
+	my $lplen = $self->{pagedata}->{pd_lower} - PAGEHEADER_PACK_LENGTH;
+	if ($lplen < 0 || $lplen % 4 != 0)
+	{
+		warn "Corrupt header in page $self->{pageno}: Not reading line pointers";
+		$self->{pagedata}->{linepointers} = [];
+	}
+	elsif ($lplen == 0)
+	{
+		warn "Header in page $self->{pageno} shows zero line pointers";
+		$self->{pagedata}->{linepointers} = [];
+	}
+	else
+	{
+		sysseek($fh, $self->{seekto} + PAGEHEADER_PACK_LENGTH, SEEK_SET);
+		$result = sysread($fh, $rawdata, $lplen);
+		die "line pointer partial read" if ($result > 0 && $result < $lplen);
+		die "line pointer read error" if ($result == 0);
+		die "line pointer read error: $!" unless defined $result;
+		$self->_linepointers_unpack($rawdata, $lplen);
+
+		# Debugging
+		my ($test_bytelength, $test_packed) = $self->_linepointers_pack();
+		warn "test_bytelength != lplen ($test_bytelength != $lplen)"
+			if ($test_bytelength != $lplen);
+		warn "test_packed != rawdata" if ($test_packed ne $rawdata);
+	}
+
+	# Read the tuples, if possible.  We need to handle corrupted pages, so be
+	# careful to only read tuples that are within sensible bounds on the page.
+	# Because we were lax about the exact contents of the line pointers above,
+	# we have to be ready for the line pointers to be unreasonable.
+	if ($self->{pagedata}->{pd_lower} <= $self->{pagedata}->{pd_upper} &&
+		$self->{pagedata}->{pd_upper} <= $self->{pagedata}->{pd_special} &&
+		$self->{pagedata}->{pd_special} <= $self->{pagesize})
+	{
+		foreach my $linepointer (@{$self->{pagedata}->{linepointers}})
+		{
+			my $lp_flags = $linepointer->{lp_flags};
+			my $lp_off = $linepointer->{lp_off};
+			my $lp_len = $linepointer->{lp_len};
+			if ($lp_flags == LP_NORMAL)
+			{
+				if ($lp_off >= $self->{pagedata}->{pd_upper} &&
+					$lp_off + $lp_len <= $self->{pagedata}->{pd_special})
+				{
+					my %tuple;
+					sysseek($fh, $self->{seekto} + $lp_off, SEEK_SET);
+					tie %tuple, 'HeapTuple',
+								fh => $fh,
+								lp_len => $lp_len,
+								lp_off => $lp_off;
+					push (@{$self->{tuples}}, \%tuple);
+				}
+				else
+				{
+					carp ("Skipping tuple due to violation of invariant: " .
+							"lp_off >= pd_upper && lp_off + lp_len <= pd_special");
+				}
+			}
+		}
+	}
+	else
+	{
+		carp sprintf("Skipping tuples on page due to violation of invariant: " .
+					 "pd_lower %u <= pd_upper %u <= pd_special %u <= pagesize %u",
+						$self->{pagedata}->{pd_lower},
+						$self->{pagedata}->{pd_upper},
+						$self->{pagedata}->{pd_special},
+						$self->{pagesize},
+						);
+	}
+
+	close($fh) if $did_open;
+}
+
+sub _write
+{
+	my ($self) = @_;
+
+	die "Attempt to _write tied HeapPage that was tied in O_RDONLY mode"
+		if ($self->{mode} == O_RDONLY);
+
+	# Pack the page header fields into bytes to write to the heap page file
+	my $packed = $self->_pageheader_pack();
+
+	# Write the page header
+	my $fh;
+	sysopen($fh, $self->{path}, O_WRONLY)
+		or die "Cannot open $self->{path} for writing: $!";
+	binmode($fh);
+	sysseek($fh, $self->{seekto}, SEEK_SET);
+	syswrite($fh, $packed, PAGEHEADER_PACK_LENGTH);
+
+	# Write the line pointers
+	my $bytelength;
+	($bytelength, $packed) = $self->_linepointers_pack();
+	sysseek($fh, $self->{seekto} + PAGEHEADER_PACK_LENGTH, SEEK_SET);
+	syswrite($fh, $packed, $bytelength);
+
+	# Write the tuples
+	for my $tuple (@{$self->{tuples}})
+	{
+		# Our job to seek to the beginning of the page
+		sysseek($fh, $self->{seekto}, SEEK_SET);
+
+		# Tuple's job to seek to the tuple's lp_off
+		my $sub = $tuple->{SERIALIZATION_CLOSURE};
+		$sub->($fh);
+	}
+
+	close($fh);
+}
+
+sub validate
+{
+	my ($key, $value) = @_;
+	my $range = $PageHeaderRange{$key}
+		or die "Attempt to validate unrecognized key: $key";
+	die "Invalid value for $key: $value not in $range->[0] .. $range->[1]"
+		unless (integer_in_range($value, @$range));
+}
+
+# Constructor.  For inheritability, most work is performed in _init() rather
+# than here.  We also don't document this function in POD, as users should
+# be getting here through the tie interface, not through invoking new().
+#
+sub new
+{
+	my $thing = shift;
+	my $class = ref($thing) || $thing;
+	my $self = { };
+
+	bless $self, $class;
+	$self->_init(@_);
+
+	return $self;
+}
+
+=pod
+
+=head1 METHODS
+
+=over
+
+=item TIEHASH classname, OPTIONS
+
+The method invoked by the command "tie %hash, 'HeapPage'".
+
+OPTIONS is a list of name => value pairs, as follows:
+
+=over 4
+
+=item path
+
+Required.  The filesystem path (relative or absolute) to the heap file.
+Typically, this will be a path into a stopped PostgreSQL cluster's
+PGDATA directory, something like data/base/12922/3456.
+
+The specified file must exist, and the caller must have permission to
+open the file in the specified mode.  No facility is provided to create
+new heap files through this module.
+
+=item pageno
+
+Required.  The number of the page within the file.  Page numbers start
+at zero.  If the page is beyond the end of the file, an initially all-zero
+page will be vivified for the purpose.  If the file is being tied in O_RDONLY
+mode, the new page will be in-memory only and will not affect the file.  If,
+however, the file is being tied in O_RDWR mode, the new page will ultimately
+get written back out to the file, changing and extending the file.
+
+=item mode
+
+Optional.  The mode to use when opening the underlying heap file.  Valid
+options are a subset of modes defined by Fcntl, specifically:
+
+Defaults to O_RDONLY.
+
+=over 4
+
+=item O_RDONLY
+
+Ties the hash in read-only mode.  The underlying file will be read but not
+modified.
+
+=item O_RDWR
+
+Ties the hash in read-write mode.  The underlying file will be read, and
+any changes to the hash will be written back to the file.
+
+=back
+
+For callers who have not 'used' Fcntl, these modes can be specified as quoted
+strings rather than as numerical constants.  Both forms are accepted.
+
+=item pagesize
+
+Optional.  The page size used when PostgreSQL was configured.  Defaults
+to 8192 (8 kilobytes), which is the same as the default that PostgreSQL's
+configure program uses.  Using a page size other than the one the cluster
+was configured for may result in considerable confusion, and if opened
+in O_RDWR mode, may also cause corruption when the hash is modified or
+untied and consequently written back to disk.
+
+=back
+
+=cut
+
+sub TIEHASH
+{
+	my $classname = shift;
+	my $self = $classname->new(@_);
+}
+
+=pod
+
+=item UNTIE this
+
+The method invoked by the command "untie %hash".
+
+If the hash was tied in O_RDWR mode, untying the hash will result in the
+contents of the hash being written to the underlying heap file.
+
+=cut
+
+sub UNTIE
+{
+	my ($self) = @_;
+	$self->_write() if ($self->{mode} == O_RDWR);
+}
+
+=pod
+
+=item DESTROY this
+
+The mothod invoked when the tied hash gets garbage collected.
+
+UNTIE's the hash.
+
+=cut
+
+sub DESTROY
+{
+	my ($self) = @_;
+	$self->UNTIE();
+}
+
+=pod
+
+=item STORE this, key, value
+
+The method invoked when a field in the tied hash is written or modified.
+
+Checks the key against the list of valid page field names and the value
+against the supported range of the field, raising exceptions for invalid
+key/value pairs.  For valid pairs, the hash is updated and, if tied in
+O_RDWR mode, the file is written.
+
+=cut
+
+sub STORE
+{
+	my ($self, $key, $value) = @_;
+	die "Attempt to store unrecognized HeapPage key $key"
+		unless exists $PageKeys{$key};
+	die "cannot store undefined value for $key"
+		unless defined $value;
+
+	$value &= $PageHeaderMask{$key}
+		if (exists $PageHeaderMask{$key});
+	if (exists $PageHeaderRange{$key})
+	{
+		my ($min, $max) = @{$PageHeaderRange{$key}};
+		die "$key:$value not within supported range [$min..$max]"
+			unless integer_in_range($value, $min, $max);
+	}
+	$self->{pagedata}->{$key} = $value;
+	$self->_write() if ($self->{mode} == O_RDWR);
+}
+
+=pod
+
+=item FETCH this, key
+
+The method invoked when a field in the tied hash is read.
+
+Checks the key against the list of valid page field names and raises an
+exception for unrecognized fields.  For valid fields, returns the value last
+read from the file.  The file is not re-checked; alterations by other programs
+or processes will not immediately be noticed.
+
+=cut
+
+sub FETCH
+{
+	my ($self, $key) = @_;
+	return $self->{tuples} if ($key eq 'tuples');
+	die "Attempt to fetch unrecognized HeapPage key $key"
+		unless exists $PageKeys{$key};
+	return $self->{pagedata}->{$key};
+}
+
+=pod
+
+=item FIRSTKEY this
+
+The method invoked when beginning iteration over the hash keys as a result of
+calling 'keys' or 'each' on the tied hash.
+
+=cut
+
+sub FIRSTKEY
+{
+	my ($self) = @_;
+	$self->{keys} = [@PageKeys];	# Start new iteration
+	return $self->NEXTKEY();
+}
+
+=pod
+
+=item NEXTKEY this
+
+The method invoked when continuing iteration over the hash keys as a result of
+calling 'keys' or 'each' on the tied hash.
+
+=cut
+
+sub NEXTKEY
+{
+	my ($self) = @_;
+	return shift(@{$self->{keys}});
+}
+
+=pod
+
+=item EXISTS this, key
+
+The method invoked when checking existence of a key within the tied hash.
+
+The set of keys which exist is invariable, because it is defined by the format
+of PostgreSQL heap pages.
+
+=cut
+
+sub EXISTS
+{
+	my ($self, $key) = @_;
+	return exists $PageKeys{$key};
+}
+
+=pod
+
+=item DELETE this, key
+
+The method invoked when deleting a key from a tied hash.
+
+This method always raises an exception, as the set of keys is static and
+unalterable, governed by the fixed format of PostgreSQL heap pages.
+
+=cut
+
+sub DELETE
+{
+	my ($self, $key) = @_;
+	die "Attempt to delete unrecognized HeapPage key $key"
+		unless exists $PageKeys{$key};
+	die "Operation not supported: Cannot delete keys from HeapPages: $key";
+}
+
+=pod
+
+=item CLEAR this
+
+The method invoked when trying to delete all keys from a tied hash.
+
+This method unties and empties the hash.
+
+=cut
+
+sub CLEAR
+{
+	my ($self) = @_;
+	$self->UNTIE();
+	my @keys = keys %$self;
+	delete $self->{$_} for (@keys);
+}
+
+=pod
+
+=item SCALAR this
+
+The method invoked when evaluating the hash in scalar context.
+
+Returns a string containing a single line of human readable text representing the
+page header fields and values.
+
+=back
+
+=cut
+
+# These strings are intentionally all the same length
+our %lp_flags_str = (
+	LP_UNUSED   => "UNUSED   ",
+	LP_NORMAL   => "NORMAL   ",
+	LP_REDIRECT => "REDIRECT ",
+	LP_DEAD     => "DEAD     ",
+);
+
+sub SCALAR
+{
+	my ($self) = @_;
+	my @lines;
+	push (@lines, sprintf("PAGE:%u pd_lsn.xlogid:%u pd_lsn.xrecoff:%u " .
+							"pd_checksum:%u pd_flags:%u pd_lower:%u " .
+							"pd_upper:%u pd_special:%u " .
+							"pd_pagesize_version:%u pd_prune_xid:%u",
+					$self->{pageno},
+					$self->{pagedata}->{pd_lsn_xlogid},
+					$self->{pagedata}->{pd_lsn_xrecoff},
+					$self->{pagedata}->{pd_checksum},
+					$self->{pagedata}->{pd_flags},
+					$self->{pagedata}->{pd_lower},
+					$self->{pagedata}->{pd_upper},
+					$self->{pagedata}->{pd_special},
+					$self->{pagedata}->{pd_pagesize_version},
+					$self->{pagedata}->{pd_prune_xid}));
+	foreach my $lineptr (@{$self->{pagedata}->{linepointers}})
+	{
+		push (@lines, sprintf("    [%s lp_off:% 5u lp_len:% 5u]",
+					$lp_flags_str{$lineptr->{lp_flags}},
+					map { $lineptr->{$_} } qw(lp_off lp_len)));
+	}
+	foreach my $tuple (@{$self->{tuples}})
+	{
+		push (@lines, scalar(%$tuple));
+	}
+	return join("\n", @lines);
+}
+
+1;
diff --git a/src/test/perl/HeapTuple.pm b/src/test/perl/HeapTuple.pm
new file mode 100644
index 0000000000..1e2d9f2c7e
--- /dev/null
+++ b/src/test/perl/HeapTuple.pm
@@ -0,0 +1,1181 @@
+
+=pod
+
+=head1 NAME
+
+HeapTuple - perl module for representing the contents of a heap tuple.
+
+=head1 SYNOPSIS
+
+  use HeapFile;
+  use HeapPage;
+  use HeapTuple;
+
+  # Use the HeapFile interface in user code
+  tie @file, 'HeapFile',
+    path => "/path/to/pgdata/heap/file",
+    pagesize => 8192,
+    mode => 'O_RDWR';
+
+  # Iterate over all pages in the heap file, changing the 't_xmin"
+  # field:
+  for my $page (@file)
+  {
+    for my $tuple (@{$page->{tuples}})
+    {
+      print "OLD_XMIN: ", $tuple->{t_xmin}, "\n";
+      $tuple->{t_xmin}++;
+      print "NEW_XMIN: ", $tuple->{t_xmin}, "\n";
+    }
+  }
+
+  # There was a little magic above, as the tuple hash does not have
+  # a "t_xmin" field.  If you iterated over keys(%$tuple), you
+  # wouldn't encounter any t_xmin.  The structure looks like this:
+  #
+  #   t_choice => {
+  #     t_heap => {
+  #       t_xmin => UINT32,
+  #       t_xmax => UINT32,
+  #       t_field3 => UINT32,
+  #     },
+  #   },
+  #   t_ctid => {
+  #     ip_blkid => {
+  #       bi_hi => UINT16,
+  #       bi_lo => UINT16,
+  #     },
+  #     ip_posid => UINT16,
+  #   },
+  #   t_infomask2 => UINT16,
+  #   t_infomask => UINT16
+  #   t_hoff => UINT8,
+  #
+  # But since this is a tied hash, it can do things ordinary hashes
+  # cannot, and in this instance it read and updated
+  # $tuple->{t_choice}->{t_heap}->{t_xmin} as a convenience.
+  #
+  # We can also update bit fields by name, using the naming
+  # conventions from htup_details.h.  The appropriate field for the
+  # name will be read or written automatically:
+
+  my $a = $heapfile_a[17]->{tuples}->[3];
+  my $b = $heapfile_b[17]->{tuples}->[17];
+
+  # Only copy the HEAP_HASNULL bit, not the whole t_infomask:
+  $a->{HEAP_HASNULL} = $b->{HEAP_HASNULL};
+
+  # Copy all the bits of t_infomask2 masked by HEAP_NATTS_MASK
+  $a->{HEAP_NATTS_MASK} = $b->{HEAP_NATTS_MASK};
+
+  # Copy t_heap field in its entirety:
+  $a->{t_heap} = $b->{t_heap};
+
+  # Read the body of the tuple (the bytes starting from t_hoff):
+  my @octets = $a->{PAYLOAD_CHR};
+
+  # Same thing, but as hex
+  my @hex = $a->{PAYLOAD_HEX};
+
+=head1 DESCRIPTION
+
+Each tuple in a HeapPage is represented as a HeapTuple tied hash.
+
+The tied hash is a rigid structure, refusing to allow fields to be
+added or removed, but allowing modifications within reason.
+Attempts to set a field to a value too large for the number of bits
+reserved for the field will result in truncation of the value.
+Tuples cannot be moved from one HeapPage to another (whether in the
+same file or a different file), nor can they be relocated on the
+same page.  Values from one tuple may be assigned to another tuple,
+of course, though the assignment is by value, and does not affect
+the original tuple.
+
+=head1 LIMITATIONS
+
+The payload of the tuple can be read, but the structure of the data
+is unknown, so it comes back as octets or hexadecimal characters, at
+your option.  It cannot be written.
+
+This limitation may be eased in the future.
+
+=cut
+
+package HeapTuple;
+
+use strict;
+use warnings;
+use Tie::Hash;
+use Data::Dumper;
+use Fcntl qw(SEEK_SET SEEK_CUR SEEK_END O_RDONLY O_RDWR);
+use POSIX;
+use Carp;
+
+our @ISA = qw(Tie::StdHash);
+
+use constant SIZE_UINT8 => 1;
+use constant MIN_UINT8 => 0;
+use constant MAX_UINT8 => 2**8 - 1;
+
+use constant SIZE_UINT16 => 2;
+use constant MIN_UINT16 => 0;
+use constant MAX_UINT16 => 2**16 - 1;
+
+use constant SIZE_UINT32 => 4;
+use constant MIN_UINT32 => 0;
+use constant MAX_UINT32 => 2**32 - 1;
+
+use constant MaxTupleAttributeNumber => 1664;	# From access/htup_details.h
+use constant MaxHeapAttributeNumber => 1600;	# From access/htup_details.h
+
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_HASVARWIDTH => 0x0002;
+use constant HEAP_HASEXTERNAL => 0x0004;
+use constant HEAP_HASOID_OLD => 0x0008;
+use constant HEAP_XMAX_KEYSHR_LOCK => 0x0010;
+use constant HEAP_COMBOCID => 0x0020;
+use constant HEAP_XMAX_EXCL_LOCK => 0x0040;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_UPDATED => 0x2000;
+use constant HEAP_MOVED_OFF => 0x4000;
+use constant HEAP_MOVED_IN => 0x8000;
+
+use constant HEAP_NATTS_MASK => 0x07FF;			# From access/htup_details.h
+use constant HEAP_KEYS_UPDATED => 0x2000;		# From access/htup_details.h
+use constant HEAP_HOT_UPDATED => 0x4000;		# From access/htup_details.h
+use constant HEAP_ONLY_TUPLE =>  0x8000;		# From access/htup_details.h
+
+#define HEAP2_XACT_MASK         0xE000  /* visibility-related bits */
+
+
+our @TupleKeys = qw(t_choice t_ctid t_infomask2 t_infomask t_hoff t_bits);
+our %TupleKeys = map { $_ => 1 } @TupleKeys;
+
+# Hard-code some information about the HeapTupleHeaderData struct.  We need to
+# read and write binary copies of this, using perl's pack() and unpack()
+# functions, but at least we can avoid scattering our assumptions about the
+# format of the HeapTupleHeaderData structure throughout the module and instead
+# declare them here in one place.
+#
+# t_choice:		union
+#	t_heap:		struct
+#		t_xmin:			uint32		=> L (Unsigned 32-bit Integer)	/ 4 bytes
+#		t_xmax:			uint32		=> L (Unsigned 32-bit Integer)	/ 4 bytes
+#		t_field3:	union
+#			t_cid:		uint32		=> L (Unsigned 32-bit Integer)	/ 4 bytes
+#		  OR
+#			t_xvac:		uint32		=>				ditto
+# t_ctid:		struct
+#	ip_blkid:	struct
+#		bi_hi			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+#		bi_lo			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+#	ip_posid:			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# t_infomask2:			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# t_infomask:			uint16		=> S (Unsigned 16-bit Short)	/ 2 bytes
+# t_hoff:				uint8		=> C (Unsigned  8-bit Octet)	/ 1 byte
+#                                                                  ----------
+#                                                         Total      23 bytes
+#
+use constant HEAPTUPLEHEADER_PACK_CODE => 'LLLSSSSSC';
+use constant HEAPTUPLEHEADER_PACK_LENGTH => 23;		# Total size
+
+sub integer_in_range
+{
+	my ($int, $min, $max) = @_;
+	return (defined $int && $int =~ m/^\d+$/ && $int >= $min && $int <= $max);
+}
+
+# Examines the (nested) structure of two hash references to see that their
+# hash structures are the same.  Ignores scalars and array references, but
+# recurses on hash reference values.
+sub hash_has_format
+{
+	my ($href, $format) = @_;
+
+	return unless (defined $href && ref $href && ref($href) =~ m/HASH/ &&
+					defined $format && ref $format && ref($format) =~ m/HASH/);
+
+	my $a = join(' ', sort keys %$href);
+	my $b = join(' ', sort keys %$format);
+
+	return unless ($a eq $b);
+
+	foreach my $key (keys %$href)
+	{
+		my $a = $href->{$key};
+		my $b = $format->{$key};
+
+		if (defined($a) && ref($a) && ref($a) =~ m/HASH/)
+		{
+			if (defined($b) && ref($b) && ref($b) =~ m/HASH/)
+			{
+				# Recurse on this field
+				return unless hash_as_format($a, $b);
+			}
+		}
+		return if (defined($b) && ref($b) && ref($b) =~ m/HASH/);
+	}
+	return 1;
+}
+
+# Called by new() when a hash is tied
+sub _init
+{
+	my ($self, %params) = @_;
+	my $classname = ref($self) || "HeapTuple";
+
+	my $fh = $params{fh};
+	my ($lp_len, $lp_off) =
+		map {
+			croak "missing required parameter '$_' in HeapTuple::TIE"
+				unless exists $params{$_};
+			croak "parameter '$_' must be defined in HeapTuple::TIE"
+				unless defined $params{$_};
+			croak "parameter '$_' must be a non-negative integer " .
+				  "in HeapTuple::TIE"
+				unless $params{$_} =~ m/^\d+$/;
+			$params{$_}
+		} qw(lp_len lp_off);
+
+	croak "tuple lp_len too short for tuple header: lp_len:$lp_len " .
+		 "in HeapTuple::TIE"
+		if ($lp_len < HEAPTUPLEHEADER_PACK_LENGTH);
+	croak "lp_off out of bounds: $lp_off in HeapTuple::TIE"
+		if ($lp_off < 0);
+
+	# On-disk tuples get tied with a filehandle that is already
+	# opened and seek'ed to the right location.  The caller gives
+	# us an lp_len so we know how much to read, and an lp_off
+	# that we just store for later.
+	if (defined $params{fh})
+	{
+		$self->{lp_len} = $lp_len;
+		$self->{lp_off} = $lp_off;
+		$self->readfh($params{fh}, $params{lp_len});
+		return $self;
+	}
+
+	# In-memory tuples do not get tied with a filehandle, but they
+	# still have an lp_len and lp_off for us to track.  We mock up
+	# a zero heap tuple of the given length.  We use HeapTuple::read
+    # to do the initialization, so that we don't get subtly behavior
+    # than what read() would do.
+	my $packstr = "C[$lp_len]";
+	my @packarray = map { 0 } (1..$lp_len);
+	my $ZEROS = pack($packstr, @packarray);
+	$self->read($ZEROS, $ZEROS, 0);
+	return $self;
+}
+
+# Constructor.  For inheritability, most work is performed in _init() rather
+# than here.  We also don't document this function in POD, as users should
+# be getting here through the tie interface, not through invoking new().
+#
+sub new
+{
+	my $thing = shift;
+	my $class = ref($thing) || $thing;
+	my $self = { };
+
+	bless $self, $class;
+	$self->_init(@_);
+
+	return $self;
+}
+
+# Wrapper around our read() method, below, for callers who have an open
+# filehandle for reading.
+sub readfh {
+	my ($self, $fh, $lp_len) = @_;
+
+	# read the header
+	my $header;
+	my $result = sysread($fh, $header, HEAPTUPLEHEADER_PACK_LENGTH);
+	if ($result >= 0 && $result < HEAPTUPLEHEADER_PACK_LENGTH)
+	{
+		warn("Read partial tuple header: Expected " .
+				HEAPTUPLEHEADER_PACK_LENGTH .
+				" bytes, but got $result");
+	}
+	elsif (! defined $result)
+	{
+		confess "Cannot read tuple header from file: $!";
+	}
+
+	# read the body
+	my $body;
+	my $bodylen = $lp_len - HEAPTUPLEHEADER_PACK_LENGTH;
+	$result = sysread($fh, $body, $bodylen);
+	if ($result >= 0 && $result < $bodylen)
+	{
+		warn("Read partial tuple body: Expected $bodylen bytes, " .
+			 "but got $result");
+	}
+	elsif (! defined $result)
+	{
+		confess "Cannot read tuple body from file: $!";
+	}
+	$self->read($header, $body, $bodylen);
+}
+
+# Reads the data for a tuple from a packed string, such as would come from
+# reading a HeapTuple directly from a heap file, but no assumptions about the
+# origin of the packed data are made.
+sub read {
+	my ($self, $header, $body, $bodylen) = @_;
+
+	# Purge any pre-existing data
+	delete $self->{tuple};
+
+	# Save the packed data for output
+	$self->{packed_header} = $header;
+	$self->{packed_body} = $body;
+	$self->{packed_bodylen} = $bodylen;
+
+	# Read the HeapTupleHeader
+	my @unpacked = unpack(HEAPTUPLEHEADER_PACK_CODE, $header);
+	my ($t_xmin, $t_xmax, $t_field3, $bi_hi, $bi_lo,
+		$ip_posid, $t_infomask2, $t_infomask, $t_hoff) = @unpacked;
+
+	# Interpret the packed data as a HeapTuple, not a DatumTuple
+	$self->{tuple} = {
+		t_choice => {
+			t_heap => {
+				t_xmin => $t_xmin,
+				t_xmax => $t_xmax,
+				t_field3 => $t_field3,
+			},
+		},
+		t_ctid => {
+			ip_blkid => {
+				bi_hi => $bi_hi,
+				bi_lo => $bi_lo,
+			},
+			ip_posid => $ip_posid,
+		},
+		t_infomask2 => $t_infomask2,
+		t_infomask => $t_infomask,
+		t_hoff => $t_hoff,
+	};
+}
+
+# Pack the header into the binary file format appropriate for writing back to
+# the heap file.
+sub repack {
+	my ($self) = @_;
+	$self->{packed_header} =
+		pack(HEAPTUPLEHEADER_PACK_CODE,
+				$self->{tuple}->{t_choice}->{t_heap}->{t_xmin},
+				$self->{tuple}->{t_choice}->{t_heap}->{t_xmax},
+				$self->{tuple}->{t_choice}->{t_heap}->{t_field3},
+				$self->{tuple}->{t_ctid}->{ip_blkid}->{bi_hi},
+				$self->{tuple}->{t_ctid}->{ip_blkid}->{bi_lo},
+				$self->{tuple}->{t_ctid}->{ip_posid},
+				$self->{tuple}->{t_infomask2},
+				$self->{tuple}->{t_infomask},
+				$self->{tuple}->{t_hoff});
+}
+
+# Write our packed data to an open filehandle provided by the caller.  The
+# caller is responsible for seeking to the beginning of the page on which we are
+# to write the data, and we are responsible for writting it to the appropriate
+# offset into that page.
+sub writefh
+{
+	my ($self, $fh) = @_;
+	my $lp_len = $self->{lp_len};
+	my $lp_off = $self->{lp_off};
+	$self->repack();
+
+	# Sanity check
+	my $packsize = $self->{packed_bodylen} + HEAPTUPLEHEADER_PACK_LENGTH;
+	confess "Attempt to write tuple of size $packsize when $lp_len expected"
+		unless ($packsize == $lp_len);
+
+	# The caller should have seek'ed to the beginning of the page,
+	# but it's our job to seek to our lp_off within the page
+	my $startpos = sysseek($fh, $lp_off, SEEK_CUR);
+
+	my $written = syswrite($fh, $self->{packed_header}, HEAPTUPLEHEADER_PACK_LENGTH);
+	confess "syswrite failed while writing tuple header: $!"
+		unless defined $written;
+	confess "syswrite wrote fewer tuple header bytes than expected"
+		unless ($written == HEAPTUPLEHEADER_PACK_LENGTH);
+	my $newpos = sysseek($fh, 0, SEEK_CUR);
+	confess "syswrite did not advance fh as expected for header"
+		unless ($newpos = $startpos + HEAPTUPLEHEADER_PACK_LENGTH);
+
+	$written = syswrite($fh, $self->{packed_body}, $self->{packed_bodylen});
+	confess "syswrite failed while writing tuple body: $!"
+		unless defined $written;
+	confess "syswrite wrote fewer tuple body bytes than expected"
+		unless ($written == $self->{packed_bodylen});
+	$newpos = sysseek($fh, 0, SEEK_CUR);
+	confess "syswrite did not advance fh as expected for body"
+		unless ($newpos = $startpos + $lp_len);
+}
+
+#
+# Public methos provided through the "tie" interface begin here
+#
+
+=head1 METHODS
+
+=over
+
+=item TIEHASH classname
+
+The method invoked by the command "tie %hash, 'HeapTuple'".
+
+Don't do this yourself.  Use HeapPage tied hashes, or better yet,
+HeapFile tied arrays.  Tying a HeapTuple directly is difficult,
+error prone, and uses an interface that may change in the future.
+
+=cut
+
+sub TIEHASH
+{
+	my $classname = shift;
+	my $self = $classname->new(@_);
+}
+
+=pod
+
+=item UNTIE this
+
+The method invoked by the command "untie %hash".
+
+Don't do this yourself.  HeapTuples belong to the tied HeapPage.
+
+=cut
+
+sub UNTIE
+{
+	my ($self) = @_;
+}
+
+=item STORE this, key, value
+
+The method invoked when a field in the tied hash is written or modified.
+
+Checks the key against the list of valid tuple field names and the value
+against the supported range of the field, raising exceptions for invalid
+key/value pairs.  For valid pairs, the hash is updated.
+
+There are many "magical" keys supported.  See FETCH for details.
+
+=cut
+
+sub STORE
+{
+	my ($self, $key, $value) = @_;
+
+	if ($key eq 't_choice')
+	{
+		if (hash_has_format(
+					$value, {
+						t_heap => {
+							t_xmin => [MIN_UINT32, MAX_UINT32],
+							t_xmax => [MIN_UINT32, MAX_UINT32],
+							t_field3 => [MIN_UINT32, MAX_UINT32],
+						},
+					}) ||
+			hash_has_format(
+					$value, {
+						t_datum => {
+							datum_len_ => [MIN_UINT32, MAX_UINT32],
+							datum_typmod => [MIN_UINT32, MAX_UINT32],
+							datum_typeid => [MIN_UINT32, MAX_UINT32],
+						}
+					}))
+		{
+			$self->{tuple}->{t_choice} = $value;
+		}
+		else
+		{
+			my $valdump = Dumper($value);
+			confess "t_choice value wrong format or values out of bounds: $valdump";
+		}
+	}
+	elsif ($key eq 't_ctid')
+	{
+		if (hash_has_format(
+					$value, {
+						ip_blkid => {
+							bi_hi => [MIN_UINT16, MAX_UINT16],
+							bi_lo => [MIN_UINT16, MAX_UINT16],
+						},
+						ip_posid => [MIN_UINT16, MAX_UINT16],
+					}))
+		{
+			$self->{tuple}->{t_ctid} = $value;
+		}
+		else
+		{
+			my $valdump = Dumper($value);
+			confess "t_ctid value wrong format or values out of bounds: $valdump";
+		}
+	}
+	elsif ($key eq 't_infomask2')
+	{
+		confess "t_infomask2 value $value out of bounds"
+			unless(integer_in_range($value, MIN_UINT16, MAX_UINT16));
+		$self->{tuple}->{t_infomask2} = $value;
+	}
+	elsif ($key eq 't_infomask')
+	{
+		confess "t_infomask value $value out of bounds"
+			unless(integer_in_range($value, MIN_UINT16, MAX_UINT16));
+		$self->{tuple}->{t_infomask} = $value;
+	}
+	elsif ($key eq 't_hoff')
+	{
+		confess "t_hoff value $value out of bounds"
+			unless(integer_in_range($value, MIN_UINT8, MAX_UINT8));
+		$self->{tuple}->{t_hoff} = $value;
+	}
+	elsif ($key eq 't_bits')
+	{
+		croak "Not yet implemented: Cannot overwrite t_bits in HeapTuple";
+	}
+
+	# Allow setting/clearing t_infomask bit fields by name
+	elsif ($key eq 'HEAP_HASNULL')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_HASNULL)) |
+			($value ? HEAP_HASNULL : 0);
+	}
+	elsif ($key eq 'HEAP_HASVARWIDTH')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_HASVARWIDTH)) |
+			($value ? HEAP_HASVARWIDTH : 0);
+	}
+	elsif ($key eq 'HEAP_HASEXTERNAL')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_HASEXTERNAL)) |
+			($value ? HEAP_HASEXTERNAL : 0);
+	}
+	elsif ($key eq 'HEAP_HASOID_OLD')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_HASOID_OLD)) |
+			($value ? HEAP_HASOID_OLD : 0);
+	}
+	elsif ($key eq 'HEAP_XMAX_KEYSHR_LOCK')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMAX_KEYSHR_LOCK)) |
+			($value ? HEAP_XMAX_KEYSHR_LOCK : 0);
+	}
+	elsif ($key eq 'HEAP_COMBOCID')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_COMBOCID)) |
+			($value ? HEAP_COMBOCID : 0);
+	}
+	elsif ($key eq 'HEAP_XMAX_EXCL_LOCK')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMAX_EXCL_LOCK)) |
+			($value ? HEAP_XMAX_EXCL_LOCK : 0);
+	}
+	elsif ($key eq 'HEAP_XMAX_LOCK_ONLY')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMAX_LOCK_ONLY)) |
+			($value ? HEAP_XMAX_LOCK_ONLY : 0);
+	}
+	elsif ($key eq 'HEAP_XMIN_COMMITTED')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMIN_COMMITTED)) |
+			($value ? HEAP_XMIN_COMMITTED : 0);
+	}
+	elsif ($key eq 'HEAP_XMIN_INVALID')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMIN_INVALID)) |
+			($value ? HEAP_XMIN_INVALID : 0);
+	}
+	elsif ($key eq 'HEAP_XMAX_COMMITTED')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMAX_COMMITTED)) |
+			($value ? HEAP_XMAX_COMMITTED : 0);
+	}
+	elsif ($key eq 'HEAP_XMAX_INVALID')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMAX_INVALID)) |
+			($value ? HEAP_XMAX_INVALID : 0);
+	}
+	elsif ($key eq 'HEAP_XMAX_IS_MULTI')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_XMAX_IS_MULTI)) |
+			($value ? HEAP_XMAX_IS_MULTI : 0);
+	}
+	elsif ($key eq 'HEAP_UPDATED')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_UPDATED)) |
+			($value ? HEAP_UPDATED : 0);
+	}
+	elsif ($key eq 'HEAP_MOVED_OFF')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_MOVED_OFF)) |
+			($value ? HEAP_MOVED_OFF : 0);
+	}
+	elsif ($key eq 'HEAP_MOVED_IN')
+	{
+		$self->{tuple}->{t_infomask} =
+			($self->{tuple}->{t_infomask} & (~HEAP_MOVED_IN)) |
+			($value ? HEAP_MOVED_IN : 0);
+	}
+
+	# Allow storing t_infomask2 bit fields by name
+	elsif ($key eq 'HEAP_NATTS_MASK')
+	{
+		$self->{tuple}->{t_infomask2} =
+			($self->{tuple}->{t_infomask2} & (~HEAP_NATTS_MASK)) |
+			($value & HEAP_NATTS_MASK);
+	}
+	elsif ($key eq 'HEAP_KEYS_UPDATED')
+	{
+		$self->{tuple}->{t_infomask2} =
+			($self->{tuple}->{t_infomask2} & (~HEAP_KEYS_UPDATED)) |
+			($value ? HEAP_KEYS_UPDATED : 0);
+	}
+	elsif ($key eq 'HEAP_HOT_UPDATED')
+	{
+		$self->{tuple}->{t_infomask2} =
+			($self->{tuple}->{t_infomask2} & (~HEAP_HOT_UPDATED)) |
+			($value ? HEAP_HOT_UPDATED : 0);
+	}
+	elsif ($key eq 'HEAP_ONLY_TUPLE')
+	{
+		$self->{tuple}->{t_infomask2} =
+			($self->{tuple}->{t_infomask2} & (~HEAP_ONLY_TUPLE)) |
+			($value ? HEAP_ONLY_TUPLE : 0);
+	}
+
+	# Allow storing leaf fields by name
+	elsif ($key eq 't_xmin')
+	{
+		($self->{tuple}->{t_choice}->{t_heap}->{t_xmin} = $value & 0xFFFFFFFF);
+	}
+	elsif ($key eq 't_xmax')
+	{
+		($self->{tuple}->{t_choice}->{t_heap}->{t_xmax} = $value & 0xFFFFFFFF);
+	}
+	elsif ($key eq 't_field3')
+	{
+		($self->{tuple}->{t_choice}->{t_heap}->{t_field3} = $value & 0xFFFFFFFF);
+	}
+	elsif ($key eq 'bi_hi')
+	{
+		($self->{tuple}->{t_ctid}->{ip_blkid}->{bi_hi} = $value & 0xFFFF);
+	}
+	elsif ($key eq 'bi_lo')
+	{
+		($self->{tuple}->{t_ctid}->{ip_blkid}->{bi_lo} = $value & 0xFFFF);
+	}
+	elsif ($key eq 'ip_posid')
+	{
+		($self->{tuple}->{t_ctid}->{ip_posid} = $value & 0xFFFF);
+	}
+	elsif ($key eq 'NULL_BITFIELD')
+	{
+		croak "Not yet implemented: Cannot overwrite NULL_BITFIELD in HeapTuple";
+	}
+	elsif ($key eq 'NULL_NIBBLEFIELD')
+	{
+		croak "Not yet implemented: Cannot overwrite NULL_NIBBLEFIELD in HeapTuple";
+	}
+	elsif ($key eq 'OID_OLD')
+	{
+		croak "Not yet implemented: Cannot overwrite OID_OLD in HeapTuple";
+	}
+
+	else
+	{
+		confess "Unrecognized heap tuple key: $key";
+	}
+}
+
+=pod
+
+=item FETCH this, key
+
+The method invoked when a field in the tied hash is read.
+
+Checks the key against the list of valid tuple field names and raises an
+exception for unrecognized fields.  For valid fields, returns the value last
+associated with the field name.
+
+Valid keys are those returned by keys(), but also a number of "magical"
+keys for direct access to subfields and even bitmasks within those subfields.
+
+The "magical" bit fields take and return boolean values, not bits.
+
+"Magical" bit fields indexing into t_infomask:
+
+=over 4
+
+=item HEAP_HASNULL
+
+=item HEAP_HASVARWIDTH
+
+=item HEAP_HASEXTERNAL
+
+=item HEAP_HASOID_OLD
+
+=item HEAP_XMAX_KEYSHR_LOCK
+
+=item HEAP_COMBOCID
+
+=item HEAP_XMAX_EXCL_LOCK
+
+=item HEAP_XMAX_LOCK_ONLY
+
+=item HEAP_XMIN_COMMITTED
+
+=item HEAP_XMIN_INVALID
+
+=item HEAP_XMAX_COMMITTED
+
+=item HEAP_XMAX_INVALID
+
+=item HEAP_XMAX_IS_MULTI
+
+=item HEAP_UPDATED
+
+=item HEAP_MOVED_OFF
+
+=item HEAP_MOVED_IN
+
+=back
+
+"Magical" bit fields indexing into t_infomask2:
+
+=over 4
+
+=item HEAP_KEYS_UPDATED
+
+=item HEAP_HOT_UPDATED
+
+=item HEAP_ONLY_TUPLE
+
+=back
+
+"Magical" multibit field indexing into t_infomask2:
+
+=over 4
+
+=item HEAP_NATTS_MASK
+
+The number of attributes, not a boolean but a number.
+
+=back
+
+Other "magical" fields:
+
+=over 4
+
+=item t_xmin
+
+=item t_xmax
+
+=item t_field3
+
+=item bi_hi
+
+=item bi_lo
+
+=item ip_posid
+
+=item t_hoff
+
+=item OID_OLD
+
+If HEAP_HASOID_OLD is true, returns the Oid.  This should only
+be true for rows written by an older version of postgres.
+
+=item NULL_BITFIELD
+
+The t_bits field expressed as a bit field, like "1110011".
+Note that zeros in the field represent nulls, and ones represent
+not-null values.  The ordering of the bits in the field matches
+the ordering of the attributes.
+
+=item NULL_NIBBLEFIELD
+
+The t_bits field expressed in hexadecimal nibbles, like "e7".
+Note that the nibble field may not look quite right when compared
+against the bit field, as the big-endian vs. little-endian
+distinction might make the bits appear flipped around relative
+to the hexadecimal nibbles.
+
+=back
+
+=cut
+
+sub FETCH
+{
+	my ($self, $key) = @_;
+
+	# Allow fetching tuple fields by name.  These are the only fields
+	# that a caller would see if they iterated over the keys of the hash
+	return $self->{tuple}->{$key}
+		if ($TupleKeys{$key});
+
+	# The rest of the keys we support are auto-magical.  These keys
+	# cannot be seen through keys() or each(), but we still accept them
+	# as short-cuts for the fields they name
+
+	# Allow fetching t_infomask bit fields by name
+	return (($self->{tuple}->{t_infomask} & HEAP_HASNULL) ? 1 : 0)
+		if ($key eq 'HEAP_HASNULL');
+	return (($self->{tuple}->{t_infomask} & HEAP_HASVARWIDTH) ? 1 : 0)
+		if ($key eq 'HEAP_HASVARWIDTH');
+	return (($self->{tuple}->{t_infomask} & HEAP_HASEXTERNAL) ? 1 : 0)
+		if ($key eq 'HEAP_HASEXTERNAL');
+	return (($self->{tuple}->{t_infomask} & HEAP_HASOID_OLD) ? 1 : 0)
+		if ($key eq 'HEAP_HASOID_OLD');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMAX_KEYSHR_LOCK) ? 1 : 0)
+		if ($key eq 'HEAP_XMAX_KEYSHR_LOCK');
+	return (($self->{tuple}->{t_infomask} & HEAP_COMBOCID) ? 1 : 0)
+		if ($key eq 'HEAP_COMBOCID');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMAX_EXCL_LOCK) ? 1 : 0)
+		if ($key eq 'HEAP_XMAX_EXCL_LOCK');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMAX_LOCK_ONLY) ? 1 : 0)
+		if ($key eq 'HEAP_XMAX_LOCK_ONLY');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMIN_COMMITTED) ? 1 : 0)
+		if ($key eq 'HEAP_XMIN_COMMITTED');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMIN_INVALID) ? 1 : 0)
+		if ($key eq 'HEAP_XMIN_INVALID');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMAX_COMMITTED) ? 1 : 0)
+		if ($key eq 'HEAP_XMAX_COMMITTED');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMAX_INVALID) ? 1 : 0)
+		if ($key eq 'HEAP_XMAX_INVALID');
+	return (($self->{tuple}->{t_infomask} & HEAP_XMAX_IS_MULTI) ? 1 : 0)
+		if ($key eq 'HEAP_XMAX_IS_MULTI');
+	return (($self->{tuple}->{t_infomask} & HEAP_UPDATED) ? 1 : 0)
+		if ($key eq 'HEAP_UPDATED');
+	return (($self->{tuple}->{t_infomask} & HEAP_MOVED_OFF) ? 1 : 0)
+		if ($key eq 'HEAP_MOVED_OFF');
+	return (($self->{tuple}->{t_infomask} & HEAP_MOVED_IN) ? 1 : 0)
+		if ($key eq 'HEAP_MOVED_IN');
+
+	# Allow fetching t_infomask2 bit fields by name
+	return $self->{tuple}->{t_infomask2} & HEAP_NATTS_MASK
+		if ($key eq 'HEAP_NATTS_MASK');
+	return (($self->{tuple}->{t_infomask2} & HEAP_KEYS_UPDATED) ? 1 : 0)
+		if ($key eq 'HEAP_KEYS_UPDATED');
+	return (($self->{tuple}->{t_infomask2} & HEAP_HOT_UPDATED) ? 1 : 0)
+		if ($key eq 'HEAP_HOT_UPDATED');
+	return (($self->{tuple}->{t_infomask2} & HEAP_ONLY_TUPLE) ? 1 : 0)
+		if ($key eq 'HEAP_ONLY_TUPLE');
+
+	# Allow fetching leaf fields by name
+	return $self->{tuple}->{t_choice}->{t_heap}->{t_xmin}
+		if ($key eq 't_xmin');
+	return $self->{tuple}->{t_choice}->{t_heap}->{t_xmax}
+		if ($key eq 't_xmax');
+	return $self->{tuple}->{t_choice}->{t_heap}->{t_field3}
+		if ($key eq 't_field3');
+	return $self->{tuple}->{t_ctid}->{ip_blkid}->{bi_hi}
+		if ($key eq 'bi_hi');
+	return $self->{tuple}->{t_ctid}->{ip_blkid}->{bi_lo}
+		if ($key eq 'bi_lo');
+	return $self->{tuple}->{t_ctid}->{ip_posid}
+		if ($key eq 'ip_posid');
+
+	# Allow fetching the t_bits field as a is_notnull bit field
+	if ($key eq 'NULL_BITFIELD')
+	{
+		return '' unless ($self->FETCH('HEAP_HASNULL'));
+		my $bitlen = $self->FETCH('HEAP_NATTS_MASK');
+		return unpack("b[$bitlen]", $self->{packed_body});
+	}
+
+	# Allow fetching the t_bits field as is_notnull nibbles
+	if ($key eq 'NULL_NIBBLEFIELD')
+	{
+		return '' unless ($self->FETCH('HEAP_HASNULL'));
+		my $nibblelen = POSIX::ceil($self->FETCH('HEAP_NATTS_MASK') / 4);
+		return unpack("H[$nibblelen]", $self->{packed_body});
+	}
+
+	# Allow fetching the old oid
+	if ($key eq 'OID_OLD')
+	{
+		my ($offset, $oid, $oidhex);
+		return unless($self->FETCH('HEAP_HASOID_OLD'));
+
+		# t_hoff is the offset into the tuple, not the tuple body.  We
+		# need to know where to read the old oid from the packed body.
+		# Subtract the header length for that, then the length of the
+		# oid itself.
+		$offset = $self->FETCH('t_hoff') - (HEAPTUPLEHEADER_PACK_LENGTH + 4);
+		if ($offset < 0)
+		{
+			confess "Calculated negative offset when trying to read OID_OLD: $offset";
+			return;
+		}
+		($_, $oid) = unpack("C[$offset]L", $self->{packed_body});
+		return $oid;
+	}
+
+	# Allow fetching the body payload that comes after the t_hoff
+	if ($key eq 'PAYLOAD_CHR')
+	{
+		my $offset = $self->FETCH('t_hoff') - HEAPTUPLEHEADER_PACK_LENGTH;
+		my $bytelen = $self->{packed_bodylen} - $offset;
+		my $format = "C" x $bytelen;
+		my ($junk, @chr) = unpack("C[$offset]$format", $self->{packed_body});
+		return @chr;
+	}
+
+	if ($key eq 'PAYLOAD_HEX')
+	{
+		my $offset = $self->FETCH('t_hoff') - HEAPTUPLEHEADER_PACK_LENGTH;
+		my $bytelen = $self->{packed_bodylen} - $offset;
+		my $format = "H2" x $bytelen;
+		my ($junk, @hex) = unpack("C[$offset]$format", $self->{packed_body});
+		return @hex;
+	}
+
+	# PRIVATE: Allow fetching a closure that allows the caller to serialize this
+	# tuple.  This is for use by HeapPage, not end users
+	if ($key eq 'SERIALIZATION_CLOSURE')
+	{
+		return sub {
+			my $tuple = $self;
+			my $fh = shift;
+			return $self->writefh($fh);
+		}
+	}
+
+	confess "Unrecognized heap tuple key: $key";
+}
+
+=pod
+
+=item FIRSTKEY this
+
+The method invoked when beginning iteration over the hash keys as a
+result of calling 'keys' or 'each' on the tied hash.
+
+=cut
+
+sub FIRSTKEY
+{
+	my ($self) = @_;
+	$self->{keys} = [@TupleKeys];	# Start new iteration
+	return $self->NEXTKEY();
+}
+
+=pod
+
+=item NEXTKEY this
+
+The method invoked when continuing iteration over the hash keys as a
+result of calling 'keys' or 'each' on the tied hash.
+
+=cut
+
+sub NEXTKEY
+{
+	my ($self) = @_;
+	return shift(@{$self->{keys}});
+}
+
+=item EXISTS this, key
+
+The method invoked when checking existence of a key within the tied
+hash.
+
+The set of keys which exist is invariable, because it is defined by
+the format of PostgreSQL heap tuples.
+
+=cut
+
+sub EXISTS
+{
+	my ($self, $key) = @_;
+	return exists $TupleKeys{$key};
+}
+
+=pod
+
+=item DELETE this, key
+
+The method invoked when deleting a key from a tied hash.
+
+This method always raises an exception, as the set of keys is static
+and unalterable, governed by the fixed format of PostgreSQL heap
+tuples.
+
+=cut
+
+sub DELETE
+{
+	my ($self, $key) = @_;
+	confess "Attempt to delete unrecognized HeapTuple key $key"
+		unless exists $TupleKeys{$key};
+	confess "Operation not supported: Cannot delete keys from HeapTuples: $key";
+}
+
+=pod
+
+=item CLEAR this
+
+The method invoked when trying to delete all keys from a tied hash.
+
+This method unties and empties the hash.
+
+=cut
+
+sub CLEAR
+{
+	my ($self) = @_;
+	$self->UNTIE();
+	my @keys = keys %$self;
+	delete $self->{$_} for (@keys);
+}
+
+=item SCALAR this
+
+The method invoked when evaluating the hash in scalar context.
+
+Returns a string containing human readable text representing the
+heap tuple fields and values.  To facilitate debugging, the returned
+string shows some of the fields multiple times, under different
+field names, such as packed vs. unpacked values.
+
+=cut
+
+sub SCALAR
+{
+	my ($self) = @_;
+	$self->repack();
+
+	my $template = "H[" . HEAPTUPLEHEADER_PACK_LENGTH*2 . "]";
+	my @h_hex = split(//, unpack($template, $self->{packed_header}));
+
+	confess sprintf("Do not have the expected number of hex characters: %s vs %s",
+			scalar(@h_hex), 2 * HEAPTUPLEHEADER_PACK_LENGTH)
+		unless (scalar(@h_hex) == 2 * HEAPTUPLEHEADER_PACK_LENGTH);
+
+	my $null_bits = $self->FETCH('NULL_BITFIELD');
+	my $null_nibbles = $self->FETCH('NULL_NIBBLEFIELD');
+	my $null_padding = ' ' x (15 - length($null_nibbles));
+
+	my $oid_old = $self->FETCH('OID_OLD');
+	$oid_old = "" unless defined $oid_old;
+
+	my @b_hex = $self->FETCH('PAYLOAD_HEX');
+	my @b_chr = map { chr($_) =~ /[[:print:]]/ ? chr($_) : '.' }
+					$self->FETCH('PAYLOAD_CHR');
+
+	# Make "body" footer
+	my @bodyrows = ("BODY AS HEX               ===>  PRINTABLE ASCII");
+	die sprintf("programmatic error: %u characters, %u hexadecimals",
+					scalar(@b_chr), scalar(@b_hex))
+		 unless (scalar(@b_hex) == scalar(@b_chr));
+
+	@b_hex = map { sprintf("%02x", $_) } @b_hex;
+	while (scalar(@b_chr))
+	{
+		my $h = join(' ', splice(@b_hex, 0, 8));
+		my $c = join(' ', splice(@b_chr, 0, 8));
+		my $padding = ' ' x (24 - length($h));
+		push (@bodyrows, join("$padding  ===>  ", $h, $c));
+	}
+
+	my $headerstr = sprintf(
+		"%s%s %s%s %s%s %s%s            t_xmin: %u\n" .
+		"%s%s %s%s %s%s %s%s            t_xmax: %u\n" .
+		"%s%s %s%s %s%s %s%s          t_field3: %u\n" .
+		"%s%s %s%s                   bi_hi: %u\n" .
+		"%s%s %s%s                   bi_lo: %u\n" .
+		"%s%s %s%s                ip_posid: %u\n" .
+		"%s%s %s%s             t_infomask2: %u\n" .
+		"                        Natts: %u\n" .
+		"            HEAP_KEYS_UPDATED: %s\n" .
+		"             HEAP_HOT_UPDATED: %s\n" .
+		"              HEAP_ONLY_TUPLE: %s\n" .
+		"%s%s %s%s              t_infomask: %u\n" .
+		"                 HEAP_HASNULL: %s\n" .
+		"             HEAP_HASVARWIDTH: %s\n" .
+		"             HEAP_HASEXTERNAL: %s\n" .
+		"              HEAP_HASOID_OLD: %s\n" .
+		"        HEAP_XMAX_KEYSHR_LOCK: %s\n" .
+		"                HEAP_COMBOCID: %s\n" .
+		"          HEAP_XMAX_EXCL_LOCK: %s\n" .
+		"          HEAP_XMAX_LOCK_ONLY: %s\n" .
+		"          HEAP_XMIN_COMMITTED: %s\n" .
+		"            HEAP_XMIN_INVALID: %s\n" .
+		"          HEAP_XMAX_COMMITTED: %s\n" .
+		"            HEAP_XMAX_INVALID: %s\n" .
+		"           HEAP_XMAX_IS_MULTI: %s\n" .
+		"                 HEAP_UPDATED: %s\n" .
+		"               HEAP_MOVED_OFF: %s\n" .
+		"                HEAP_MOVED_IN: %s\n" .
+		"%s%s                     t_hoff: %u\n" .
+		"${null_nibbles}${null_padding} NULL_BITFIELD: %s\n" .
+		"                      OID_OLD: %s\n" .
+		"",
+		@h_hex[0..7],   $self->FETCH('t_xmin'),
+		@h_hex[8..15],  $self->FETCH('t_xmax'),
+		@h_hex[16..23], $self->FETCH('t_field3'),
+		@h_hex[24..27], $self->FETCH('bi_hi'),
+		@h_hex[28..31], $self->FETCH('bi_lo'),
+		@h_hex[32..35], $self->FETCH('ip_posid'),
+		@h_hex[36..39], $self->FETCH('t_infomask2'),
+			$self->FETCH('HEAP_NATTS_MASK'),
+			$self->FETCH('HEAP_KEYS_UPDATED'),
+			$self->FETCH('HEAP_HOT_UPDATED'),
+			$self->FETCH('HEAP_ONLY_TUPLE'),
+		@h_hex[40..43], $self->FETCH('t_infomask'),
+			$self->FETCH('HEAP_HASNULL'),
+			$self->FETCH('HEAP_HASVARWIDTH'),
+			$self->FETCH('HEAP_HASEXTERNAL'),
+			$self->FETCH('HEAP_HASOID_OLD'),
+			$self->FETCH('HEAP_XMAX_KEYSHR_LOCK'),
+			$self->FETCH('HEAP_COMBOCID'),
+			$self->FETCH('HEAP_XMAX_EXCL_LOCK'),
+			$self->FETCH('HEAP_XMAX_LOCK_ONLY'),
+			$self->FETCH('HEAP_XMIN_COMMITTED'),
+			$self->FETCH('HEAP_XMIN_INVALID'),
+			$self->FETCH('HEAP_XMAX_COMMITTED'),
+			$self->FETCH('HEAP_XMAX_INVALID'),
+			$self->FETCH('HEAP_XMAX_IS_MULTI'),
+			$self->FETCH('HEAP_UPDATED'),
+			$self->FETCH('HEAP_MOVED_OFF'),
+			$self->FETCH('HEAP_MOVED_IN'),
+		@h_hex[44..45], $self->FETCH('t_hoff'),
+			$null_bits,
+			$oid_old,
+		);
+
+	return join("\n", $headerstr, @bodyrows);
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
-- 
2.21.1 (Apple Git-122.3)

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: Mark Dilger (#1)

Re: Perl modules for testing/viewing/corrupting/repairing your heap files

On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

Recently, as part of testing something else, I had need of a tool to create
surgically precise corruption within heap pages. I wanted to make the
corruption from within TAP tests, so I wrote the tool as a set of perl modules.

There is also pg_hexedit:

https://github.com/petergeoghegan/pg_hexedit

--
Peter Geoghegan

Mark Dilger

mark.dilger@enterprisedb.com

over 5 years ago

In reply to: Peter Geoghegan (#3)

Re: Perl modules for testing/viewing/corrupting/repairing your heap files

On Apr 14, 2020, at 6:17 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

Recently, as part of testing something else, I had need of a tool to create
surgically precise corruption within heap pages. I wanted to make the
corruption from within TAP tests, so I wrote the tool as a set of perl modules.

There is also pg_hexedit:

https://github.com/petergeoghegan/pg_hexedit

I steered away from software released under the GPL, such as pg_hexedit, owing to difficulties in getting anything I develop accepted. (That's a hard enough problem without licensing issues.). I'm not taking a political stand for or against the GPL here, just a pragmatic position that I wouldn't be able to integrate pg_hexedit into a postgres submission.

(Thanks for writing pg_hexedit, BTW. I'm not criticizing it.)

The purpose of these perl modules is not the viewing of files, but the intentional and targeted corruption of files from within TAP tests. There are limited examples of tests in the postgres source tree that intentionally corrupt files, and as I read them, they employ a blunt force trauma approach:

In src/bin/pg_basebackup/t/010_pg_basebackup.pl:

# induce corruption
system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
open $file, '+<', "$pgdata/$file_corrupt1";
seek($file, $pageheader_size, 0);
syswrite($file, "\0\0\0\0\0\0\0\0\0");
close $file;
system_or_bail 'pg_ctl', '-D', $pgdata, 'start';

In src/bin/pg_checksums/t/002_actions.pl:

# Time to create some corruption
open my $file, '+<', "$pgdata/$file_corrupted";
seek($file, $pageheader_size, 0);
syswrite($file, "\0\0\0\0\0\0\0\0\0");
close $file;

These blunt force trauma tests are fine, as far as they go. But I wanted to be able to do things like

# Corrupt the tuple to look like it has lots of attributes, some of
# them null. This falsely creates the impression that the t_bits
# array is longer than just one byte, but t_hoff still says otherwise.
$tup->{HEAP_HASNULL} = 1;
$tup->{HEAP_NATTS_MASK} = 0x3FF;
$tup->{t_bits} = 0xAA;

# Same as above, but this time t_hoff plays along
$tup->{HEAP_HASNULL} = 1;
$tup->{HEAP_NATTS_MASK} = 0x3FF;
$tup->{t_bits} = 0xAA;
$tup->{t_hoff} = 32;

That's hard to do from a TAP test without modules like this, as you have to calculate by hand the offsets where you're going to write the corruption, and the bit pattern you are going to write to that location. Even if you do all that, nobody else is likely going to be able to read and maintain your tests.

I'd like an easy way from within TAP tests to selectively corrupt files, to test whether various parts of the system fail gracefully in the presence of corruption. What happens when a child partition is corrupted? Does that impact queries that only access other partitions? What kinds of corruption cause pg_upgrade to fail? ...to expand the scope of the corruption? What happens to logical replication when there is corruption on the primary? ...on the standby? What kinds of corruption cause a query to return data from neighboring tuples that the querying role has not permission to view? What happens when a NAS is only intermittently corrupt?

The modules I've submitted thus far are incomplete for this purpose. They don't yet handle toast tables, btree, hash, gist, gin, fsm, or vm, and I might be forgetting a few other things in the list. Before I go and implement all of that, I thought perhaps others would express preferences about how this should all work, even stuff like, "Don't bother implementing that in perl, as I'm reimplementing the entire testing structure in COBOL", or similarly unexpected feedback.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Peter Geoghegan

pg@bowt.ie

over 5 years ago

In reply to: Mark Dilger (#4)

Re: Perl modules for testing/viewing/corrupting/repairing your heap files

On Wed, Apr 15, 2020 at 7:22 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

I steered away from software released under the GPL, such as pg_hexedit, owing to difficulties in getting anything I develop accepted. (That's a hard enough problem without licensing issues.). I'm not taking a political stand for or against the GPL here, just a pragmatic position that I wouldn't be able to integrate pg_hexedit into a postgres submission.

(Thanks for writing pg_hexedit, BTW. I'm not criticizing it.)

The only reason that pg_hexedit is under the GPL is that it's derived
from pg_filedump, which was and is also GPL 2. Note that pg_filedump
is hosted on community resources, and is something that index access
methods know about and try not to break (grep for pg_filedump in the
Postgres source code). pg_hexedit supports all index access methods
with the core distribution, including even the unpopular ones, like
SP-GiST.

That's hard to do from a TAP test without modules like this, as you have to calculate by hand the offsets where you're going to write the corruption, and the bit pattern you are going to write to that location. Even if you do all that, nobody else is likely going to be able to read and maintain your tests.

Logical corruption is almost inherently a once-off thing. I think that
a tool like pg_hexedit is useful for seeing how the system behaves
with certain novel kinds of logical corruption, which it will tolerate
to varying degrees and with diverse symptoms. Pretty much for
investigating on a once-off basis.

I have occasionally wished for an SQL-like interface to bufpage.c
routines like PageIndexTupleDelete(), PageRepairFragmentation(), etc.
That would probably be a great deal more maintainable than what you
propose to do. It's not really equivalent, of course, but it would
give tests a way to dynamically manipulate/damage pages at the
"logical level". That seems like the thing that's hard to simulate
right now.

--
Peter Geoghegan