storing zipped SQLite inside PG ?

Started by David Gauthierover 4 years ago7 messagesgeneral

davegauthierpg@gmail.com

over 4 years ago

Hi: I need a sanity check (brainstorming) before I jump into coding
something that might have a better solution known to a community like this
one. Here's the situation...

To begin with... PG 11.5 on linux.
Some code (nature unimportant to this discussion) generates a lot (a lot)
of data and stuffs it in a SQLite DB which, once the process is complete,
gets zipped (compression ratio seems to be about 5). We want to keep this
data in a persistent store which others can get at should they need it.
PG seems like a great place to put this especially as the identifying
context of the SQLite already exists in our PG DB.

So I was thinking about storing the zipped SQLite as a blob in PG. The
record it would be added to would give it all the context needed for proper
retrieval. After retrieval (in a perl script) I was thinking about writing
it out to the shell, unzipping it and then opening it using perl/DBI. The
metadata of the SQLite could be replicated in a temp table in PG and the
data content loaded into that for use. In theory, multiple SQLite DBs
could be opened like this and then loaded in the same temp tables (as long
as the metadata is the same... which it will be).

OK, so that's the plan. What I want to ask this community about is whether
or not there's a better way to approach this. Brute force loading the
SQLite data in regular PG tables would result in billions of records and
the DB management issues that come with that. And there's really no need
to keep all that data immediately accessible like that. Is there some sort
of innate PG functionality that would allow me to store and then compress
my data for targeted retrieval/decompression (even if it's not accessible
to SQL in that form) ?

OK, you get the picture. I'm all ears :-) And thanks in advance for any
suggestions !

steve.baldwin@gmail.com

over 4 years ago

In reply to: David Gauthier (#1)

Re: storing zipped SQLite inside PG ?

Could you use the SQLite FDW - https://github.com/pgspider/sqlite_fdw ?

Steve

On Wed, Dec 22, 2021 at 1:27 PM David Gauthier <davegauthierpg@gmail.com>
wrote:

Show quoted text

Hi: I need a sanity check (brainstorming) before I jump into coding
something that might have a better solution known to a community like this
one. Here's the situation...

To begin with... PG 11.5 on linux.
Some code (nature unimportant to this discussion) generates a lot (a lot)
of data and stuffs it in a SQLite DB which, once the process is complete,
gets zipped (compression ratio seems to be about 5). We want to keep this
data in a persistent store which others can get at should they need it.
PG seems like a great place to put this especially as the identifying
context of the SQLite already exists in our PG DB.

So I was thinking about storing the zipped SQLite as a blob in PG. The
record it would be added to would give it all the context needed for proper
retrieval. After retrieval (in a perl script) I was thinking about writing
it out to the shell, unzipping it and then opening it using perl/DBI. The
metadata of the SQLite could be replicated in a temp table in PG and the
data content loaded into that for use. In theory, multiple SQLite DBs
could be opened like this and then loaded in the same temp tables (as long
as the metadata is the same... which it will be).

OK, so that's the plan. What I want to ask this community about is
whether or not there's a better way to approach this. Brute force loading
the SQLite data in regular PG tables would result in billions of records
and the DB management issues that come with that. And there's really no
need to keep all that data immediately accessible like that. Is there some
sort of innate PG functionality that would allow me to store and then
compress my data for targeted retrieval/decompression (even if it's not
accessible to SQL in that form) ?

OK, you get the picture. I'm all ears :-) And thanks in advance for any
suggestions !

David G. Johnston

david.g.johnston@gmail.com

over 4 years ago

In reply to: David Gauthier (#1)

On Tuesday, December 21, 2021, David Gauthier <davegauthierpg@gmail.com>
wrote:

OK, you get the picture. I'm all ears :-) And thanks in advance for any
suggestions !

This is basically a variant of “should I store images in the database” but
the content sizes involved are insanely high compared to most images.
External storage, with a pointer in the DB, would seem like a much better
choice.

Either way, though, not applying a minor release in over two years is just
wrong; you should get on top of that before building new functionality on
top of said database.

David J.

davegauthierpg@gmail.com

over 4 years ago

In reply to: David G. Johnston (#3)

Re: storing zipped SQLite inside PG ?

I'll have to read more about sqlite_fdw. Thanks for that Steve !

Each SQLite isn't that big (billions of records), more like 30K records or
so. But there are lots and lots of these SQLite DBs which add up over time
to perhaps billions of records.

This is for a big corp with an IT dept. Maybe I can get them to upgrade
the DB itself.
Thank You too David !

On Tue, Dec 21, 2021 at 10:14 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Tuesday, December 21, 2021, David Gauthier <davegauthierpg@gmail.com>
wrote:

OK, you get the picture. I'm all ears :-) And thanks in advance for any
suggestions !

This is basically a variant of “should I store images in the database” but
the content sizes involved are insanely high compared to most images.
External storage, with a pointer in the DB, would seem like a much better
choice.

Either way, though, not applying a minor release in over two years is just
wrong; you should get on top of that before building new functionality on
top of said database.

David J.

David G. Johnston

david.g.johnston@gmail.com

over 4 years ago

In reply to: David Gauthier (#4)

Re: storing zipped SQLite inside PG ?

On Tue, Dec 21, 2021 at 10:06 PM David Gauthier <davegauthierpg@gmail.com>
wrote:

I'll have to read more about sqlite_fdw. Thanks for that Steve !

Each SQLite isn't that big (billions of records), more like 30K records or
so. But there are lots and lots of these SQLite DBs which add up over time
to perhaps billions of records.

This is for a big corp with an IT dept. Maybe I can get them to upgrade
the DB itself.
Thank You too David !

So, more similar to the image storage question than I first thought, but
still large enough where the specific usage patterns and needs end up being
the deciding factor (keeping in mind you can pick multiple solutions - so
that really old data, ideally on a partition, can be removed from the DB
while still remaining accessible if just more slowly or laboriously).

One possibility to consider - ditch the SQLite dependency and just store
CSV (but maybe with a funky delimiter sequence). You can then us
"string_to_table(...)" on that delimiter to materialize a table out of the
data right in a query.

David J.

Дмитрий Иванов

firstdismay@gmail.com

over 4 years ago

In reply to: David G. Johnston (#5)

Re: storing zipped SQLite inside PG ?

Or, if you want to extend this theme, you can use a PostgreSQL-based
"SQLite file player" with
PostgreSQL + Python[sqlite3] extension.This way you can provide direct
access to SQLite files without duplicating data in PostgreSQL cluster
tables.
PS: It may seem that this will reduce performance. When I started my
project, I had some preconceptions about Python. But analyzing projects
like Patroni changed my mind.
--
Regards, Dmitry!

ср, 22 дек. 2021 г. в 10:24, David G. Johnston <david.g.johnston@gmail.com>:

Show quoted text

On Tue, Dec 21, 2021 at 10:06 PM David Gauthier <davegauthierpg@gmail.com>
wrote:

I'll have to read more about sqlite_fdw. Thanks for that Steve !

Each SQLite isn't that big (billions of records), more like 30K records
or so. But there are lots and lots of these SQLite DBs which add up over
time to perhaps billions of records.

This is for a big corp with an IT dept. Maybe I can get them to upgrade
the DB itself.
Thank You too David !

So, more similar to the image storage question than I first thought, but
still large enough where the specific usage patterns and needs end up being
the deciding factor (keeping in mind you can pick multiple solutions - so
that really old data, ideally on a partition, can be removed from the DB
while still remaining accessible if just more slowly or laboriously).

One possibility to consider - ditch the SQLite dependency and just store
CSV (but maybe with a funky delimiter sequence). You can then us
"string_to_table(...)" on that delimiter to materialize a table out of the
data right in a query.

David J.

agonenil@gmail.com

over 4 years ago

In reply to: Дмитрий Иванов (#6)

Re: storing zipped SQLite inside PG ?

Is sqlite is a constraint?

From big data presctive is better to find a more natural format like
parquet or even json. For example what if you've different sqlite version

בתאריך יום ה׳, 23 בדצמ׳ 2021, 17:02, מאת Дмитрий Иванов ‏<
firstdismay@gmail.com>:

Show quoted text

Or, if you want to extend this theme, you can use a PostgreSQL-based
"SQLite file player" with
PostgreSQL + Python[sqlite3] extension.This way you can provide direct
access to SQLite files without duplicating data in PostgreSQL cluster
tables.
PS: It may seem that this will reduce performance. When I started my
project, I had some preconceptions about Python. But analyzing projects
like Patroni changed my mind.
--
Regards, Dmitry!

ср, 22 дек. 2021 г. в 10:24, David G. Johnston <david.g.johnston@gmail.com

:

On Tue, Dec 21, 2021 at 10:06 PM David Gauthier <davegauthierpg@gmail.com>
wrote:

I'll have to read more about sqlite_fdw. Thanks for that Steve !

Each SQLite isn't that big (billions of records), more like 30K records
or so. But there are lots and lots of these SQLite DBs which add up over
time to perhaps billions of records.

This is for a big corp with an IT dept. Maybe I can get them to upgrade
the DB itself.
Thank You too David !

So, more similar to the image storage question than I first thought, but
still large enough where the specific usage patterns and needs end up being
the deciding factor (keeping in mind you can pick multiple solutions - so
that really old data, ideally on a partition, can be removed from the DB
while still remaining accessible if just more slowly or laboriously).

One possibility to consider - ditch the SQLite dependency and just store
CSV (but maybe with a funky delimiter sequence). You can then us
"string_to_table(...)" on that delimiter to materialize a table out of the
data right in a query.

David J.