nooby Q: temp tables good for web apps?
I am porting a datamining web app to postgres from a non-sql datastore
and plan to use temporary tables quite a bit, to manage collections the
user will be massaging interactively. They might search and find
anywhere from 50 to 50k items, then filter that, unfilter, sort, etc.
Currently I manage those collections in the server application, meaning
everything gets pulled from the datastore into RAM. I see postgres
temporary tables and postgres features in general can greatly simplify
my code because so much of what I do can be expressedin postgres-ese. Yayyy.
Some on the team think I am nuts, but one reason given was the absence
of indices and I see (a) temporary tables *can* be indexed and (b)
postgres does not even use an index for small sets, and many collections
will be relatively small (as a design goal in fact--we hope to make
search smarter and return fewer hits).
I thought it would not hurt to check with the gurus before spending a
week on the wrong code, so... dumb idea?
kenny
On Tue, Apr 7, 2009 at 3:11 PM, Kenneth Tilton <kentilton@gmail.com> wrote:
I am porting a datamining web app to postgres from a non-sql datastore and
plan to use temporary tables quite a bit, to manage collections the user
will be massaging interactively. They might search and find anywhere from 50
to 50k items, then filter that, unfilter, sort, etc.Currently I manage those collections in the server application, meaning
everything gets pulled from the datastore into RAM. I see postgres temporary
tables and postgres features in general can greatly simplify my code because
so much of what I do can be expressedin postgres-ese. Yayyy.Some on the team think I am nuts,
People are often resistant to new ideas, even good ones.
but one reason given was the absence of
indices and I see (a) temporary tables *can* be indexed
Correct
and (b) postgres
does not even use an index for small sets, and many collections will be
relatively small (as a design goal in fact--we hope to make search smarter
and return fewer hits).
Correct again.
I thought it would not hurt to check with the gurus before spending a week
on the wrong code, so... dumb idea?
Good idea to at least come up with some tests to prove (or disprove)
your point. We can wait for the gurus to check in later...
Kenneth Tilton <kentilton@gmail.com> writes:
I am porting a datamining web app to postgres from a non-sql datastore
and plan to use temporary tables quite a bit, to manage collections the
user will be massaging interactively. They might search and find
anywhere from 50 to 50k items, then filter that, unfilter, sort, etc.
The main issue you should think about is whether the required lifespan
of the temp tables matches up with your application's use of database
connections. If you are going through a connection pooler, for example,
it can be pretty awkward to hold onto the connection that has got the
temp table instances you need. Web apps in general tend to have a hard
time maintaining such state across successive page references, so I'm
afraid this could be a show-stopper for you.
Some on the team think I am nuts, but one reason given was the absence
of indices and I see (a) temporary tables *can* be indexed
Yeah, whoever claimed that is simply uninformed, or at least is
well-informed about some other database.
regards, tom lane
Temp tables can be great for simplifying your code into more logical
sections. When making a case for using them, make sure to point out that
using them more aggressively can cut down on the amount of indexing you
need on the big tables, which has positive implications in terms of
getting simpler and robust query plans and cutting down on insertion
overhead.
You should be sure to turn on log_temp_files (which is handy in general,
that's not specific to temp tables). One specific thing to look for to
support your case is that sorts that used to execute in RAM and spill to
disk when they exceed work_mem might instead execute with less memory
usage; you'll be doing the final sort/filter steps using the temp tables
instead. If that is already happening, the overhead of using the temp
table can end up looking pretty good.
One thing I like doing when in the early development stages is to create a
seperate disk partition for the temporary tables, turn that into a
tablespace, and then use temp_tablespaces to point the temp tables toward
it. The idea is to separate out I/O to the temp tables so that you can
measure it to see how significant it is.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Tom Lane wrote:
Kenneth Tilton <kentilton@gmail.com> writes:
I am porting a datamining web app to postgres from a non-sql datastore
and plan to use temporary tables quite a bit, to manage collections the
user will be massaging interactively. They might search and find
anywhere from 50 to 50k items, then filter that, unfilter, sort, etc.The main issue you should think about is whether the required lifespan
of the temp tables matches up with your application's use of database
connections. If you are going through a connection pooler, for example,
it can be pretty awkward to hold onto the connection that has got the
temp table instances you need. Web apps in general tend to have a hard
time maintaining such state across successive page references, so I'm
afraid this could be a show-stopper for you.
Ah, I should disclosed I am a relative web application nooby as well.
But our design depends anyway on the same server process handling a web
session from start to finish, and I thought this was doable with
sufficient effort. I mean, I asked and people said it could be arranged.
Not so? Yes, I am doomed. But if we can do that, I already have a ton of
logic for keeping sessions separate and for hanging onto a connection
for the life of a session (most requests use a PG connection pooler, the
ones that need to see the temp tables use a dedicated connection (or two
I think I might need). Keep sessions separate by working the session key
into the temp table name...well, that's the plan anyway.
Some on the team think I am nuts, but one reason given was the absence
of indices and I see (a) temporary tables *can* be indexedYeah, whoever claimed that is simply uninformed, or at least is
well-informed about some other database.
I think I misrepresented their position. They did not say it, but I
think they were referring to some hairy freetext indexing they did on
the permanent tables. And that's OK, we do not need that on the temp tables.
thx for the input, I will study up on the viability of getting a session
managed by the same process throughout.
ken
Greg Smith wrote:
Temp tables can be great for simplifying your code into more logical
sections. When making a case for using them, make sure to point out
that using them more aggressively can cut down on the amount of indexing
you need on the big tables, which has positive implications in terms of
getting simpler and robust query plans and cutting down on insertion
overhead.You should be sure to turn on log_temp_files (which is handy in general,
that's not specific to temp tables). One specific thing to look for to
support your case is that sorts that used to execute in RAM and spill to
disk when they exceed work_mem might instead execute with less memory
usage; you'll be doing the final sort/filter steps using the temp tables
instead. If that is already happening, the overhead of using the temp
table can end up looking pretty good.One thing I like doing when in the early development stages is to create
a seperate disk partition for the temporary tables, turn that into a
tablespace, and then use temp_tablespaces to point the temp tables
toward it. The idea is to separate out I/O to the temp tables so that
you can measure it to see how significant it is.
Thx, I will keep that in mind as a good way of really seeing what is
going on. I did notice the tablespace feature but wasn't sure how to
leverage it. Mgmt has been lusting after those new solid-state memory
disks (SSDs?), this could be a good excuse for a PO. We are a skunkworks
project getting as much praise so far for the speed of the web app as
anything else so we don't want to give up this plus.
ken
On Tue, Apr 7, 2009 at 5:05 PM, Kenneth Tilton <kentilton@gmail.com> wrote:
Greg Smith wrote:
Temp tables can be great for simplifying your code into more logical
sections. When making a case for using them, make sure to point out that
using them more aggressively can cut down on the amount of indexing you need
on the big tables, which has positive implications in terms of getting
simpler and robust query plans and cutting down on insertion overhead.You should be sure to turn on log_temp_files (which is handy in general,
that's not specific to temp tables). One specific thing to look for to
support your case is that sorts that used to execute in RAM and spill to
disk when they exceed work_mem might instead execute with less memory usage;
you'll be doing the final sort/filter steps using the temp tables instead.
If that is already happening, the overhead of using the temp table can end
up looking pretty good.One thing I like doing when in the early development stages is to create a
seperate disk partition for the temporary tables, turn that into a
tablespace, and then use temp_tablespaces to point the temp tables toward
it. The idea is to separate out I/O to the temp tables so that you can
measure it to see how significant it is.Thx, I will keep that in mind as a good way of really seeing what is going
on. I did notice the tablespace feature but wasn't sure how to leverage it.
Mgmt has been lusting after those new solid-state memory disks (SSDs?), this
could be a good excuse for a PO. We are a skunkworks project getting as much
praise so far for the speed of the web app as anything else so we don't want
to give up this plus.
Make sure the newer generation like Intel's that are fast under
concurrent access. Most of the older SSDs are horrificall slow when
handling multiple random accesses.
You can use a different method if you need a table available to the
same session. Create a schema based on the session id, and put your
temp tables there, only don't call them temp tables. You'll either
need to make sure you always clean up your temp schema your session
created or come up with a daemon that comes along every hour or so and
kills off old schemas that aren't in use anymore.
Hi Kenneth,
One concern I have with SSD drives is that the performance degrades over time. If you were not familiar with this issue already, take a look at the following article.
http://www.anandtech.com/storage/showdoc.aspx?i=3531
It is not a huge problem and I have faith in Intel to come up with a good solution fairly quickly, but it is worth noting. Given the cost of SSD, it does make me think that perhaps a more cost effective solution is to have plenty of RAM on the box.
----
John L. Cheng
----- Original Message ----
Show quoted text
From: Scott Marlowe <scott.marlowe@gmail.com>
To: Kenneth Tilton <kentilton@gmail.com>
Cc: pgsql-general@postgresql.org
Sent: Tuesday, April 7, 2009 4:47:17 PM
Subject: Re: [GENERAL] nooby Q: temp tables good for web apps?On Tue, Apr 7, 2009 at 5:05 PM, Kenneth Tilton wrote:
Greg Smith wrote:
Temp tables can be great for simplifying your code into more logical
sections. When making a case for using them, make sure to point out that
using them more aggressively can cut down on the amount of indexing you need
on the big tables, which has positive implications in terms of getting
simpler and robust query plans and cutting down on insertion overhead.You should be sure to turn on log_temp_files (which is handy in general,
that's not specific to temp tables). One specific thing to look for to
support your case is that sorts that used to execute in RAM and spill to
disk when they exceed work_mem might instead execute with less memory usage;
you'll be doing the final sort/filter steps using the temp tables instead.
If that is already happening, the overhead of using the temp table can end
up looking pretty good.One thing I like doing when in the early development stages is to create a
seperate disk partition for the temporary tables, turn that into a
tablespace, and then use temp_tablespaces to point the temp tables toward
it. The idea is to separate out I/O to the temp tables so that you can
measure it to see how significant it is.Thx, I will keep that in mind as a good way of really seeing what is going
on. I did notice the tablespace feature but wasn't sure how to leverage it.
Mgmt has been lusting after those new solid-state memory disks (SSDs?), this
could be a good excuse for a PO. We are a skunkworks project getting as much
praise so far for the speed of the web app as anything else so we don't want
to give up this plus.Make sure the newer generation like Intel's that are fast under
concurrent access. Most of the older SSDs are horrificall slow when
handling multiple random accesses.You can use a different method if you need a table available to the
same session. Create a schema based on the session id, and put your
temp tables there, only don't call them temp tables. You'll either
need to make sure you always clean up your temp schema your session
created or come up with a daemon that comes along every hour or so and
kills off old schemas that aren't in use anymore.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe wrote:
You can use a different method if you need a table available to the
same session. Create a schema based on the session id, and put your
temp tables there, only don't call them temp tables. You'll either
need to make sure you always clean up your temp schema your session
created or come up with a daemon that comes along every hour or so and
kills off old schemas that aren't in use anymore.
I am LMAO because Lisp (my server-side lang) does this to noobs, too:
three (at least) ways to do everything. Well, if all things are equal
dropping one schema and not kludging up mangled table names has a lot of
appeal. Thx.
ken
On Tue, Apr 7, 2009 at 7:12 PM, Kenneth Tilton <kentilton@gmail.com> wrote:
Scott Marlowe wrote:
You can use a different method if you need a table available to the
same session. Create a schema based on the session id, and put your
temp tables there, only don't call them temp tables. You'll either
need to make sure you always clean up your temp schema your session
created or come up with a daemon that comes along every hour or so and
kills off old schemas that aren't in use anymore.I am LMAO because Lisp (my server-side lang) does this to noobs, too: three
(at least) ways to do everything. Well, if all things are equal dropping one
schema and not kludging up mangled table names has a lot of appeal. Thx.
Schemas, search_path and views together can let you do some pretty
cool things in terms of integrating external postgresql based apps
with each other.
Scott Marlowe wrote:
On Tue, Apr 7, 2009 at 7:12 PM, Kenneth Tilton <kentilton@gmail.com> wrote:
Scott Marlowe wrote:
You can use a different method if you need a table available to the
same session. Create a schema based on the session id, and put your
temp tables there, only don't call them temp tables. You'll either
need to make sure you always clean up your temp schema your session
created or come up with a daemon that comes along every hour or so and
kills off old schemas that aren't in use anymore.I am LMAO because Lisp (my server-side lang) does this to noobs, too: three
(at least) ways to do everything. Well, if all things are equal dropping one
schema and not kludging up mangled table names has a lot of appeal. Thx.Schemas, search_path and views together can let you do some pretty
cool things in terms of integrating external postgresql based apps
with each other.
Or between XHRs? It just occurred to me that if I go with a schema
instead of temp tables then I do not need to worry about hanging on to a
connection/pgsession, or even worry about routing a web session to the
same process if all state is stored in pg under the session id.
ken *coming up to speed slowly, going to look up search_path*
On Tue, 7 Apr 2009, John Cheng wrote:
One concern I have with SSD drives is that the performance degrades over
time.
The bigger concern I have with them is that even the Intel drives have a
volatile write cache in them. You have either turn off the write cache
(which degrades performance substantially and might even have a longevity
impact) or use a battery-backed disk controller for them to be safe
database storage. There's a good article about this at
http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/
If there's a disk controller with a write cache involved, that narrows the
gap between SDD and regular drives quite a bit.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Apr 7, 2009, at 7:32 PM, Kenneth Tilton wrote:
Scott Marlowe wrote:
On Tue, Apr 7, 2009 at 7:12 PM, Kenneth Tilton
<kentilton@gmail.com> wrote:Scott Marlowe wrote:
You can use a different method if you need a table available to the
same session. Create a schema based on the session id, and put
your
temp tables there, only don't call them temp tables. You'll either
need to make sure you always clean up your temp schema your session
created or come up with a daemon that comes along every hour or
so and
kills off old schemas that aren't in use anymore.I am LMAO because Lisp (my server-side lang) does this to noobs,
too: three
(at least) ways to do everything. Well, if all things are equal
dropping one
schema and not kludging up mangled table names has a lot of
appeal. Thx.Schemas, search_path and views together can let you do some pretty
cool things in terms of integrating external postgresql based apps
with each other.Or between XHRs? It just occurred to me that if I go with a schema
instead of temp tables then I do not need to worry about hanging on
to a connection/pgsession, or even worry about routing a web session
to the same process if all state is stored in pg under the session id.ken *coming up to speed slowly, going to look up search_path*
If you're using pg_dump for backups then you'll probably want at least
a standard prefix on your "temp" schemas so that you can easily have
pg_dump ignore them when doing backups with it's -N flag.
Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k