Tablespaces on tertiary media

Started by Mark Morgan Lloydover 18 years ago7 messagesgeneral

markMLl.pgsql-general@telemetry.co.uk

over 18 years ago

Where does PostgreSQL stand with storing /really/ large amounts of data
offline? Specifically, if a FUSE is used to move a tablespace to something
like a tape archiver can the planner be warned that access might take an
extended period?

I know that at one point (v6?) there were hooks in the code for experimental
Berkeley code to do this sort of thing but as far as I know there has never
been anything publicly available.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Mark Morgan Lloyd (#1)

Re: Tablespaces on tertiary media

"Mark Morgan Lloyd" <markMLl.pgsql-general@telemetry.co.uk> writes:

Where does PostgreSQL stand with storing /really/ large amounts of data
offline? Specifically, if a FUSE is used to move a tablespace to something like
a tape archiver can the planner be warned that access might take an extended
period?

No, Postgres can't deal with this. You'll have to dump the tables with pg_dump
or COPY or something like that and then drop them from the database. If you
need them again you have to load them again.

Actually if the tables are missing but nobody tries to access them (including
autovacuum) then nothing will notice they're missing. But if you do try to
access them you'll get an error. And if you leave it in this situation too
long your database will shut down from getting too close to transaction
wraparound.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Mark Morgan Lloyd

markMLl.pgsql-general@telemetry.co.uk

over 18 years ago

In reply to: Bruce Momjian (#2)

Re: Tablespaces on tertiary media

Gregory Stark wrote:

Where does PostgreSQL stand with storing /really/ large amounts of data
offline? Specifically, if a FUSE is used to move a tablespace to something like
a tape archiver can the planner be warned that access might take an extended
period?

No, Postgres can't deal with this. You'll have to dump the tables with pg_dump
or COPY or something like that and then drop them from the database. If you
need them again you have to load them again.

Actually if the tables are missing but nobody tries to access them (including
autovacuum) then nothing will notice they're missing. But if you do try to
access them you'll get an error. And if you leave it in this situation too
long your database will shut down from getting too close to transaction
wraparound.

Thanks. If the tables were in a tablespace that was stored on something that
looked like a conventional filesystem would the server code be prepared to
wait the minutes that it took the operating system and FUSE implementation to
load the tables onto disc?

The earlier work e.g. http://www.vldb.org/conf/1996/P156.PDF apparently warned
the planner about long-latency devices but that's probably unnecessary if the
application program was aware that a table had been partitioned by age and
accessing old data could be slow.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Mark Morgan Lloyd (#3)

Re: Tablespaces on tertiary media

"Mark Morgan Lloyd" <markMLl.pgsql-general@telemetry.co.uk> writes:

Thanks. If the tables were in a tablespace that was stored on something that
looked like a conventional filesystem would the server code be prepared to wait
the minutes that it took the operating system and FUSE implementation to load
the tables onto disc?

Ah, I see what you mean now. I think you might have a problem with the planner
opening the files to do an lseek to measure how large they are. I'm not sure
if that gets triggered before or after constraint exclusion. That's the only
problem I can think of.

The earlier work e.g. http://www.vldb.org/conf/1996/P156.PDF apparently warned
the planner about long-latency devices but that's probably unnecessary if the
application program was aware that a table had been partitioned by age and
accessing old data could be slow.

Well it's not like there are any alternative plans that will avoid the need to
access the data at all. I assume the FUSE setup will always have to load the
entire file so there's no even any difference between indexed and sequential
access. (Unless the table is over 1G in which case you might want to avoid
sequential scans if index scans would avoid accessing some segments.)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Mark Morgan Lloyd

markMLl.pgsql-general@telemetry.co.uk

over 18 years ago

In reply to: Bruce Momjian (#4)

Re: Tablespaces on tertiary media

Gregory Stark wrote:

Thanks. If the tables were in a tablespace that was stored on something that
looked like a conventional filesystem would the server code be prepared to wait
the minutes that it took the operating system and FUSE implementation to load
the tables onto disc?

Ah, I see what you mean now. I think you might have a problem with the planner
opening the files to do an lseek to measure how large they are. I'm not sure
if that gets triggered before or after constraint exclusion. That's the only
problem I can think of.

The size could be stored in the catalogue though. However at that point I
guess that anything that was used before constraint exclusion would have to be
in the catalogue and anything after would have to initiate retrieval from
tertiary media if it's not already cached.

The earlier work e.g. http://www.vldb.org/conf/1996/P156.PDF apparently warned
the planner about long-latency devices but that's probably unnecessary if the
application program was aware that a table had been partitioned by age and
accessing old data could be slow.

Well it's not like there are any alternative plans that will avoid the need to
access the data at all. I assume the FUSE setup will always have to load the
entire file so there's no even any difference between indexed and sequential
access. (Unless the table is over 1G in which case you might want to avoid
sequential scans if index scans would avoid accessing some segments.)

I'd imagine in most cases that sequential scan time would be dwarfed by
medium-load and seek time. It would be important here that the server didn't
time out assuming that it had hit a hardware problem when in actual fact the
table was still being pulled from tape.

I'd presume that when Sarawagi (who I believe is now with IBM) was doing the
work that there wasn't a straightforward way to partition tables (as is
currently described in section 5.9 of the manual) so she had to add internal
hooks. Now granted that I don't pretend to really understand how things work
(I'm a luser, not a guru) but it seems to me that it would not be difficult to
extend the tablespace definition from

CREATE TABLESPACE tablespacename LOCATION '/directory'

to something like

CREATE TABLESPACE tablespacename LOCATION '|check_loaded.pl /directory'

where the check_loaded.pl script could check that the table was cached and
return its name when available. However I guess that the script would probably
need to see the initial lseek or whatever as well... there's probably a whole
lot of non-obvious details that I've totally overlooked.

Just my 2d-worth :-)

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]

Ron Johnson

ron.l.johnson@cox.net

over 18 years ago

In reply to: Mark Morgan Lloyd (#1)

Re: Tablespaces on tertiary media

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/14/07 04:06, Mark Morgan Lloyd wrote:

Where does PostgreSQL stand with storing /really/ large amounts of data
offline? Specifically, if a FUSE is used to move a tablespace to
something like a tape archiver can the planner be warned that access
might take an extended period?

I know that at one point (v6?) there were hooks in the code for
experimental Berkeley code to do this sort of thing but as far as I know
there has never been anything publicly available.

While tertiary media certainly was relevant 10 years ago, is it
really necessary in 2007? A couple of MSA-1000s stuffed with 1TB
disks would hold an l-o-t *lot* of historical data.

- --
Ron Johnson, Jr.
Jefferson LA USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG6rpUS9HxQb37XmcRApN4AJ9ETn8nRlfGn67oRk4KVvd2+S6vtQCeKzlh
pxIham1MIue8+PhxuK0PBFQ=
=nOC4
-----END PGP SIGNATURE-----

Mark Morgan Lloyd

markMLl.pgsql-general@telemetry.co.uk

over 18 years ago

In reply to: Ron Johnson (#6)

Re: Tablespaces on tertiary media

Ron Johnson wrote:

I know that at one point (v6?) there were hooks in the code for
experimental Berkeley code to do this sort of thing but as far as I know
there has never been anything publicly available.

While tertiary media certainly was relevant 10 years ago, is it
really necessary in 2007? A couple of MSA-1000s stuffed with 1TB
disks would hold an l-o-t *lot* of historical data.

I was considering it from the point-of-view of completeness rather than
anything else, but as a specific example I seem to recall that one of the
particle accelerator sites uses PostgreSQL for cataloging captured data but
actually stores it on either tape or optical disc (I forget which). I'm sure
that there would be advantages to being able to retrieve both metadata and
data using the same API, rather than using database queries for the former and
something like an AMANDA-compatible interface for the latter.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]