New form of index "persistent reference"

Started by Nonamealmost 21 years ago5 messages
#1Noname
pgsql@mohawksoft.com

For about 5 years now, I have been using a text search engine that I wrote
and maintain.

In the beginning, I hacked up function mechanisms to return multiple value
sets and columns. Then PostgreSQL aded "setof" and it is was cool. Then it
was able to return a set of rows, which was even better.

Lately, I have been thinking that a cool form of index would be some sort
of "persistent reference" index. Like the old ISAM days of yore, a fixed
number could point you right to the row that you want. I'm not sure if the
"persistent reference" is a specific auto numbering column type or
separate index structure or both.

I asked the question how do you get a record without going through an
index, the answer was CTID, which unfortunately changes when the row is
updated.

Now, what I want to brainstorm is some sort of "persistent reference"
where the value is not algorithmically stored, maybe just an offset into a
table. The number of operations should be about 1 per lookup.

Imagine a dynamically growing array that has one slot per row. Every row
is considered unique. Rows which are updated, their CTID is updated in the
reference. (with vacuum?)

Imagine something like this:

create table foobar(id reference, name varchar, value varchar);

select * from foobar where id = 100;

The reference type has an implicit index that is basically a lookup table.
On unique references where the reference value is fairly arbitrary, this
would be a HUGE gain for direct lookups. There is no need for the NlogN of
a tree.

On the surface level, this would be a huge win for websites that use
semi-fixed tables of data.

#2Bort, Paul
pbort@tmwsystems.com
In reply to: Noname (#1)
Re: New form of index "persistent reference"

If that ID is the only thing you use to access that data, why not just store
it in a flat file with fixed-length records? seek() (or your language's
equivalent) is usually fast.

If you need to drive that from within PostgreSQL, you would need an
untrusted language to read the file, but you could also generate it from a
table using a trigger.

Or maybe use a serial column, an index on that column, and cluster the table
on that index. It's more than one lookup, but not much with a Btree index.
(Not sure if this is better than just using a serial and an index.
http://www.postgresql.org/docs/8.0/interactive/sql-cluster.html says it
isn't, if I read it correctly.)

Then anytime there is a batch of updates to the table, re-cluster it.

Show quoted text

-----Original Message-----
From: pgsql@mohawksoft.com [mailto:pgsql@mohawksoft.com]
Sent: Thursday, February 10, 2005 11:22 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] New form of index "persistent reference"

For about 5 years now, I have been using a text search engine
that I wrote
and maintain.

In the beginning, I hacked up function mechanisms to return
multiple value
sets and columns. Then PostgreSQL aded "setof" and it is was
cool. Then it
was able to return a set of rows, which was even better.

Lately, I have been thinking that a cool form of index would
be some sort
of "persistent reference" index. Like the old ISAM days of
yore, a fixed
number could point you right to the row that you want. I'm
not sure if the
"persistent reference" is a specific auto numbering column type or
separate index structure or both.

I asked the question how do you get a record without going through an
index, the answer was CTID, which unfortunately changes when
the row is
updated.

Now, what I want to brainstorm is some sort of "persistent reference"
where the value is not algorithmically stored, maybe just an
offset into a
table. The number of operations should be about 1 per lookup.

Imagine a dynamically growing array that has one slot per
row. Every row
is considered unique. Rows which are updated, their CTID is
updated in the
reference. (with vacuum?)

Imagine something like this:

create table foobar(id reference, name varchar, value varchar);

select * from foobar where id = 100;

The reference type has an implicit index that is basically a
lookup table.
On unique references where the reference value is fairly
arbitrary, this
would be a HUGE gain for direct lookups. There is no need for
the NlogN of
a tree.

On the surface level, this would be a huge win for websites that use
semi-fixed tables of data.

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

#3Noname
pgsql@mohawksoft.com
In reply to: Bort, Paul (#2)
Re: New form of index "persistent reference"

If that ID is the only thing you use to access that data, why not just
store
it in a flat file with fixed-length records? seek() (or your language's
equivalent) is usually fast.

As a matter of policy, I would never manage data outside of the database.

If you need to drive that from within PostgreSQL, you would need an
untrusted language to read the file, but you could also generate it from a
table using a trigger.

Very ugly.

Or maybe use a serial column, an index on that column, and cluster the
table
on that index. It's more than one lookup, but not much with a Btree index.
(Not sure if this is better than just using a serial and an index.
http://www.postgresql.org/docs/8.0/interactive/sql-cluster.html says it
isn't, if I read it correctly.)

Clustering is OK, but it doesn't handle updates and additions until you
recluster the data.

If a static reference is all that is needed, then merely using CTID would
suffice. I was thinking a little overhead for a reference table would
allow it to hook into PostgreSQL and keep it up to date.

Then anytime there is a batch of updates to the table, re-cluster it.

Yea, like I said, there are easier ways of doing that with fairly static
data.

Show quoted text

-----Original Message-----
From: pgsql@mohawksoft.com [mailto:pgsql@mohawksoft.com]
Sent: Thursday, February 10, 2005 11:22 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] New form of index "persistent reference"

For about 5 years now, I have been using a text search engine
that I wrote
and maintain.

In the beginning, I hacked up function mechanisms to return
multiple value
sets and columns. Then PostgreSQL aded "setof" and it is was
cool. Then it
was able to return a set of rows, which was even better.

Lately, I have been thinking that a cool form of index would
be some sort
of "persistent reference" index. Like the old ISAM days of
yore, a fixed
number could point you right to the row that you want. I'm
not sure if the
"persistent reference" is a specific auto numbering column type or
separate index structure or both.

I asked the question how do you get a record without going through an
index, the answer was CTID, which unfortunately changes when
the row is
updated.

Now, what I want to brainstorm is some sort of "persistent reference"
where the value is not algorithmically stored, maybe just an
offset into a
table. The number of operations should be about 1 per lookup.

Imagine a dynamically growing array that has one slot per
row. Every row
is considered unique. Rows which are updated, their CTID is
updated in the
reference. (with vacuum?)

Imagine something like this:

create table foobar(id reference, name varchar, value varchar);

select * from foobar where id = 100;

The reference type has an implicit index that is basically a
lookup table.
On unique references where the reference value is fairly
arbitrary, this
would be a HUGE gain for direct lookups. There is no need for
the NlogN of
a tree.

On the surface level, this would be a huge win for websites that use
semi-fixed tables of data.

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

#4Merlin Moncure
merlin.moncure@rcsonline.com
In reply to: Noname (#3)
Re: New form of index "persistent reference"

Lately, I have been thinking that a cool form of index would be some

sort

of "persistent reference" index. Like the old ISAM days of yore, a

fixed

number could point you right to the row that you want. I'm not sure if

the

"persistent reference" is a specific auto numbering column type or
separate index structure or both.

What you are talking about is a 'relative file'. It turns out on modern
ISAM file systems, the win you get over b-tree indexing is not worth
losing the ability to do simple things like run-length compression on
strings.

Anyways, while storing a physical offset is O(1), so is computing a
hash. How would a hash index not fill your need?

Merlin

#5Zeugswetter Andreas DAZ SD
ZeugswetterA@spardat.at
In reply to: Merlin Moncure (#4)
Re: New form of index "persistent reference"

I asked the question how do you get a record without going through an
index, the answer was CTID, which unfortunately changes when the row is
updated.

The ctid is a physical location of the row. On update a new tuple is written
in a new location, that is why the ctid changes. The old tuple has a system field
t_ctid which is then a forward pointer to the new tuple.

Thus you can follow that chain until the visible tuple is found.
The current tid = tid does not do that (I think because the ODBC driver
which was the first to use it (for result set modification) needed to notice when the
tuple was updated underneath).

But you can use:
select * from atab where ctid = currtid2('atab', '(0,1)'); -- '(0,1)' is the old ctid

Of course the old ctid is only valid until (auto)vacuum marks it free.
Without vacuum you are currently safe to use the currtid functions.

Andreas