How does Postgres store a B-Tree on disk while using the OS file system?

Started by Siddharth Jainabout 3 years ago4 messagesgeneral

siddhsql@gmail.com

about 3 years ago

I am trying to sharpen my understanding of Postgres. As I understand,
Postgres does not write directly to disk blocks. It uses the file system
provided by the OS:
https://dba.stackexchange.com/questions/80036/is-there-a-way-to-store-a-postgresql-database-directly-on-a-block-device-not-fi
My question: How can it then store a B Tree on disk? I would think storing
a B Tree requires storing disk offset addresses and so on (for a node to
navigate to another etc.). For this, one would need to write directly to
the disk using low-level disk access functions and not use file system API.

ronljohnsonjr@gmail.com

about 3 years ago

In reply to: Siddharth Jain (#1)

Re: How does Postgres store a B-Tree on disk while using the OS file system?

On 3/6/23 18:24, Siddharth Jain wrote:

I am trying to sharpen my understanding of Postgres. As I understand,
Postgres does not write directly to disk blocks. It uses the file system
provided by the OS:
https://dba.stackexchange.com/questions/80036/is-there-a-way-to-store-a-postgresql-database-directly-on-a-block-device-not-fi
My question: How can it then store a B Tree on disk? I would think storing
a B Tree requires storing disk offset addresses and so on (for a node to
navigate to another etc.). For this, one would need to write directly to
the disk using low-level disk access functions and not use file system API.

Disk offsets... *file* offsets... one is just a step up on the abstraction
layer.

How can that be? The OS only knows about Logical Block Addresses, which are
a huge list of block numbers (just like userland software sees files as long
lists of bytes). It hasn't see cylinders, heads and platters in almost
FORTY (more, probably, for SCSI and mainframe systems) years.

--
Born in Arizona, moved to Babylonia.

Christophe Pettus

xof@thebuild.com

about 3 years ago

In reply to: Siddharth Jain (#1)

Re: How does Postgres store a B-Tree on disk while using the OS file system?

On Mar 6, 2023, at 16:24, Siddharth Jain <siddhsql@gmail.com> wrote:
My question: How can it then store a B Tree on disk? I would think storing a B Tree requires storing disk offset addresses and so on (for a node to navigate to another etc.). For this, one would need to write directly to the disk using low-level disk access functions and not use file system API.

All of PostgreSQL's relations (tables and indexes) are stored in files. (They're actually stored in a set of files if it's larger than 1GB, but each relation is treated as one logical file.) The "pointers" in this case are just offsets from the start of that file.

There's some additional information here:

https://www.postgresql.org/docs/current/storage.html

and here:

https://www.postgresql.org/docs/current/btree-implementation.html

siddhsql@gmail.com

about 3 years ago

In reply to: Christophe Pettus (#3)

Re: How does Postgres store a B-Tree on disk while using the OS file system?

thanks Christophe. will try to go through it. its bit difficult to grasp.

On Mon, Mar 6, 2023 at 5:08 PM Christophe Pettus <xof@thebuild.com> wrote:

Show quoted text

On Mar 6, 2023, at 16:24, Siddharth Jain <siddhsql@gmail.com> wrote:
My question: How can it then store a B Tree on disk? I would think

storing a B Tree requires storing disk offset addresses and so on (for a
node to navigate to another etc.). For this, one would need to write
directly to the disk using low-level disk access functions and not use file
system API.

All of PostgreSQL's relations (tables and indexes) are stored in files.
(They're actually stored in a set of files if it's larger than 1GB, but
each relation is treated as one logical file.) The "pointers" in this case
are just offsets from the start of that file.

There's some additional information here:

https://www.postgresql.org/docs/current/storage.html

and here:

https://www.postgresql.org/docs/current/btree-implementation.html