PG index architecture

Started by Andy Colsonabout 12 years ago6 messagesgeneral

andy@squeakycode.net

about 12 years ago

I was thinking about indexes, and am kinda curious about sequential access.

I know nothing of PG guts, so this might even be a dumb question.

As I understand indexes, they are a key value pair, that contain a value
and a position. You lookup the value then use the position to seek into
the database to load the record.

Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Igor Neyman

ineyman@perceptron.com

about 12 years ago

In reply to: Andy Colson (#1)

Re: PG index architecture

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Andy Colson
Sent: Tuesday, July 15, 2014 4:27 PM
To: pgsql
Subject: [GENERAL] PG index architecture

I was thinking about indexes, and am kinda curious about sequential access.

I know nothing of PG guts, so this might even be a dumb question.

As I understand indexes, they are a key value pair, that contain a value and a
position. You lookup the value then use the position to seek into the
database to load the record.

Do we, or could we, load all the the matching index records, then sort them
by position? (maybe not all, maybe large batches)

When loading from the database, if access was slightly more sequential (vs
very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast. That would be loading db
record 1,2,3, etc.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

-Andy

It is called CLUSTER:

http://www.postgresql.org/docs/9.2/static/sql-cluster.html

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Andy Colson (#1)

Re: PG index architecture

Andy Colson <andy@squeakycode.net> writes:

As I understand indexes, they are a key value pair, that contain a value
and a position. You lookup the value then use the position to seek into
the database to load the record.

Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

This is more or less what a "bitmap index scan" does.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

about 12 years ago

In reply to: Andy Colson (#1)

Re: PG index architecture

On 7/15/2014 1:26 PM, Andy Colson wrote:

As I understand indexes, they are a key value pair, that contain a
value and a position. You lookup the value then use the position to
seek into the database to load the record.

indexes are stored as a B-tree. each terminal node has a block number
for the target record.

Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

b-trees are inherently ordered. data records, however are not.

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.

database tables are unordered sets, there is no record 1,2,3.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

its unclear to me what you mean here.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andy Colson

andy@squeakycode.net

about 12 years ago

In reply to: John R Pierce (#4)

Re: PG index architecture

On 7/15/2014 3:54 PM, John R Pierce wrote:

On 7/15/2014 1:26 PM, Andy Colson wrote:

As I understand indexes, they are a key value pair, that contain a
value and a position. You lookup the value then use the position to
seek into the database to load the record.

indexes are stored as a B-tree. each terminal node has a block number
for the target record.

Do we, or could we, load all the the matching index records, then sort
them by position? (maybe not all, maybe large batches)

b-trees are inherently ordered. data records, however are not.

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast. That would be
loading db record 1,2,3, etc.

database tables are unordered sets, there is no record 1,2,3.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

its unclear to me what you mean here.

Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek to
address 1. (where address came from the index lookup)

Know what I mean?

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

about 12 years ago

In reply to: Andy Colson (#5)

Re: PG index architecture

On 7/15/2014 2:54 PM, Andy Colson wrote:

Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek
to address 1. (where address came from the index lookup)

of course, its a file+offset, so there's at least 1-2 more levels of
indirection before you get to physical addresses on the hard disk.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general