Again, sorry, caching.

Started by mlwabout 24 years ago46 messageshackers

markw@mohawksoft.com

about 24 years ago

I traded a couple emails with a guy using one of my open source projects. To
make a long story short, he is going to the new version of MySQL for his
website because of the new caching feature. He is convinced that it will speed
up his web site, and he is probably right.

On a web site, a few specific queries get executed, unchanged, repeatedly.
Think about an ecommerce site, most of the time it is just a handful of basic
queries. These basic queries are usually against pretty large product tables. A
caching mechanism would make these queries pretty light weight.

The arguments against caching:

"It is an application issue"
This is completely wrong. Caching can not be done against a database without
knowledge of the database, i.e. when the data changes.

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

"It isn't very useful"
I disagree completely. A cache of most frequently used queries, or specific
ones, could make for REALLY good performance in some very specific, but very
common, applications. Any system that has a hierarchical "drill down" interface
to a data set, ecommerce, libraries, document management systems, etc. will
greatly benefit from a query cache.

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable
or
select * from table where column = 'bar' /* cacheable */

Either way, it would speed up a lot of common application types. It would even
be very cool if you could just cache the results of sub queries, such as:

select * from (select * from table where col1 = 'foo' cacheable) as subset
where subset.col2 = 'bar' ;

Which would mean that the subquery gets cached, but the greater select need not
be. The cache could be like a global temp table. Perhaps the user could even
name the cache entry:

select * from table where column = 'foo' cache on foo

Where one could also do:

select * from cache_foo

Using a keyword is probably a better idea, it can be picked up by the parser
and instruct PostgreSQL to use the cache, otherwise there will be no additional
overhead.

Having caching within PostgreSQL will be good for data integrity. Application
caches can't tell when an update/delete/insert happens, they often have to use
a time-out mechanism.

OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
am.

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

I previously replied to you vaguely describing a way you could implement
this by using a combination of client side caching and database tables
and triggers to allow you to determine if your cache is still valid.
Someone came right behind me, Tom maybe??, and indicated that the
proper/ideal way to do this would be to using postgres' asynchronous
database notification mechanisms (listen/notify I believe were the
semantics) to alert your application that your cache has become
invalid. Basically, a couple of triggers and the use of the list/notify
model, and you should be all set.

Done properly, a client side cache which is asynchronously notified by
the database when it's contents become invalid should be faster than
relying on MySQL's database caching scheme. Basically, a strong client
side cache is going to prevent your database from even having to return
a cached result set while a database side cache is going to always
return a result set. Of course, one of the extra cool things you can do
is to cache a gzip'd copy of the data contents which would further act
as an optimization preventing the client or web server (in case they are
different) from having to recompress every result set.

In the long run, again, if properly done, you should be able to beat
MySQL's implementation without too extra much effort. Why? Because a
client side cache can be much smarter in the way that it uses it's
cached contents much in the same way an application is able to better
cache it's data then what the file system is able to do. This is why an
client side cache should be preferred over that of a database result set
cache.

Greg

References:
http://www.postgresql.org/idocs/index.php?sql-notify.html
http://www.postgresql.org/idocs/index.php?sql-listen.html

Show quoted text

On Sat, 2002-03-16 at 08:01, mlw wrote:

I traded a couple emails with a guy using one of my open source projects. To
make a long story short, he is going to the new version of MySQL for his
website because of the new caching feature. He is convinced that it will speed
up his web site, and he is probably right.

On a web site, a few specific queries get executed, unchanged, repeatedly.
Think about an ecommerce site, most of the time it is just a handful of basic
queries. These basic queries are usually against pretty large product tables. A
caching mechanism would make these queries pretty light weight.

The arguments against caching:

"It is an application issue"
This is completely wrong. Caching can not be done against a database without
knowledge of the database, i.e. when the data changes.

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

"It isn't very useful"
I disagree completely. A cache of most frequently used queries, or specific
ones, could make for REALLY good performance in some very specific, but very
common, applications. Any system that has a hierarchical "drill down" interface
to a data set, ecommerce, libraries, document management systems, etc. will
greatly benefit from a query cache.

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable
or
select * from table where column = 'bar' /* cacheable */

Either way, it would speed up a lot of common application types. It would even
be very cool if you could just cache the results of sub queries, such as:

select * from (select * from table where col1 = 'foo' cacheable) as subset
where subset.col2 = 'bar' ;

Which would mean that the subquery gets cached, but the greater select need not
be. The cache could be like a global temp table. Perhaps the user could even
name the cache entry:

select * from table where column = 'foo' cache on foo

Where one could also do:

select * from cache_foo

Using a keyword is probably a better idea, it can be picked up by the parser
and instruct PostgreSQL to use the cache, otherwise there will be no additional
overhead.

Having caching within PostgreSQL will be good for data integrity. Application
caches can't tell when an update/delete/insert happens, they often have to use
a time-out mechanism.

OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
am.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

Triggers and asynchronous notification are not substitutes for real hard ACID
complient caching. The way you suggest implies only one access model. Take the
notion of a library, they have both web and application access. These should
both be able to use the cache.

Also, your suggestion does not address the sub-select case, which I think is
much bigger, performance wise, and more efficient than MySQL's cache.

This whole discussion could be moot, and this could be developed as an
extension, if there were a function API that could return sets of whole rows.

Greg Copeland wrote:

Show quoted text

I previously replied to you vaguely describing a way you could implement
this by using a combination of client side caching and database tables
and triggers to allow you to determine if your cache is still valid.
Someone came right behind me, Tom maybe??, and indicated that the
proper/ideal way to do this would be to using postgres' asynchronous
database notification mechanisms (listen/notify I believe were the
semantics) to alert your application that your cache has become
invalid. Basically, a couple of triggers and the use of the list/notify
model, and you should be all set.

Done properly, a client side cache which is asynchronously notified by
the database when it's contents become invalid should be faster than
relying on MySQL's database caching scheme. Basically, a strong client
side cache is going to prevent your database from even having to return
a cached result set while a database side cache is going to always
return a result set. Of course, one of the extra cool things you can do
is to cache a gzip'd copy of the data contents which would further act
as an optimization preventing the client or web server (in case they are
different) from having to recompress every result set.

In the long run, again, if properly done, you should be able to beat
MySQL's implementation without too extra much effort. Why? Because a
client side cache can be much smarter in the way that it uses it's
cached contents much in the same way an application is able to better
cache it's data then what the file system is able to do. This is why an
client side cache should be preferred over that of a database result set
cache.

Greg

References:
http://www.postgresql.org/idocs/index.php?sql-notify.html
http://www.postgresql.org/idocs/index.php?sql-listen.html

On Sat, 2002-03-16 at 08:01, mlw wrote:

I traded a couple emails with a guy using one of my open source projects. To
make a long story short, he is going to the new version of MySQL for his
website because of the new caching feature. He is convinced that it will speed
up his web site, and he is probably right.

On a web site, a few specific queries get executed, unchanged, repeatedly.
Think about an ecommerce site, most of the time it is just a handful of basic
queries. These basic queries are usually against pretty large product tables. A
caching mechanism would make these queries pretty light weight.

The arguments against caching:

"It is an application issue"
This is completely wrong. Caching can not be done against a database without
knowledge of the database, i.e. when the data changes.

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

"It isn't very useful"
I disagree completely. A cache of most frequently used queries, or specific
ones, could make for REALLY good performance in some very specific, but very
common, applications. Any system that has a hierarchical "drill down" interface
to a data set, ecommerce, libraries, document management systems, etc. will
greatly benefit from a query cache.

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable
or
select * from table where column = 'bar' /* cacheable */

Either way, it would speed up a lot of common application types. It would even
be very cool if you could just cache the results of sub queries, such as:

select * from (select * from table where col1 = 'foo' cacheable) as subset
where subset.col2 = 'bar' ;

Which would mean that the subquery gets cached, but the greater select need not
be. The cache could be like a global temp table. Perhaps the user could even
name the cache entry:

select * from table where column = 'foo' cache on foo

Where one could also do:

select * from cache_foo

Using a keyword is probably a better idea, it can be picked up by the parser
and instruct PostgreSQL to use the cache, otherwise there will be no additional
overhead.

Having caching within PostgreSQL will be good for data integrity. Application
caches can't tell when an update/delete/insert happens, they often have to use
a time-out mechanism.

OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
am.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-------------------------------------------------------------------------------
Name: signature.asc
signature.asc Type: application/pgp-signature
Description: This is a digitally signed message part

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

On Sat, 2002-03-16 at 08:01, mlw wrote:
[snip]

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

Oh ya, I forgot that reply to that part. I think you are forgetting
that you can use a database to generate a static page. That is, only
regenerate the static page when the data within the database changes.
Again, this is another example of efficient application caching. If you
have an application which listens for your cache invalidation event, you
can then recreate your static page. Again, database result set caching
is not required. And again, then should be significantly faster than
MySQL's result set caching. Also worth noting that you could then gzip
your static page (keeping both static pages -- compressed and
uncompressed) resulting in yet another optimization for most web servers
and browsers.

Greg

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: mlw (#3)

Re: Again, sorry, caching.

On Sat, 2002-03-16 at 08:36, mlw wrote:

Triggers and asynchronous notification are not substitutes for real hard ACID
complient caching. The way you suggest implies only one access model. Take the
notion of a library, they have both web and application access. These should
both be able to use the cache.

Well, obviously, you'd need to re-implement the client side cache in
each implementation of the client. That is a down side and I certainly
won't argue that. As for the "no substitute" comment, I'm guess I'll
plead ignorance because I'm not sure what I'm missing here. What am I
missing that would not be properly covered by that model?

Also, your suggestion does not address the sub-select case, which I think is
much bigger, performance wise, and more efficient than MySQL's cache.

I'm really not sure what you mean by that. Doesn't address it but is
more efficient? Maybe it's because I've not had my morning coffee
yet... ;)

This whole discussion could be moot, and this could be developed as an
extension, if there were a function API that could return sets of whole rows.

Maybe...but you did ask for feedback. :)

Greg

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable
or
select * from table where column = 'bar' /* cacheable */

Having caching within PostgreSQL will be good for data integrity. Application
caches can't tell when an update/delete/insert happens, they often have to use
a time-out mechanism.

OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
am.

I don't think it's a bad idea, but a cache that takes a query string (or
subquery string) and looks for a match based on that is flawed without
special consideration to non-cacheable functions and constructs
(CURRENT_USER, things that depend on timezone, things that depend on
datestyle). We'd also need to work out an appropriate mechanism to deal
with cache invalidation and adding things to the cache.

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

Andrew Sullivan wrote:

On Sat, Mar 16, 2002 at 09:01:28AM -0500, mlw wrote:

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a
database in a web site to allow content to be changed and upgraded
dynamically and with a minimum of work.

This seems wrong to me. Why not build an extra bit of functionality
so that when the admin makes a static-data change, the new static
data gets pushed into the static files?

I was originally intrigued by the suggestion you made, but the more I
thought about it (and read the arguments of others) the more
convinced I became that the MySQL approach is a mistake. It's
probably worth it for their users, who seem not to care that much
about ACID anyway. But I think for a system that really wants to
play in the big leagues, the cache is a big feature that requires a
lot of development, but which is not adequately useful for most
cases. If we had infinite developer resources, it might be worth it.
In the actual case, I think it's too low a priority.

Again, I can't speak to priority, but I can name a few common application where
caching would be a great benefit. The more I think about it, the more I like
the idea of a 'cacheable' keyword in the select statement.

My big problem with putting the cache outside of the database is that it is now
incumbent on the applications programmer to write a cache. A database should
manage the data, the application should handle how the data is presented.
Forcing the application to implement a cache feels wrong.

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

Greg Copeland wrote:

On Sat, 2002-03-16 at 08:36, mlw wrote:

Triggers and asynchronous notification are not substitutes for real hard ACID
complient caching. The way you suggest implies only one access model. Take

the

notion of a library, they have both web and application access. These should
both be able to use the cache.

Well, obviously, you'd need to re-implement the client side cache in
each implementation of the client. That is a down side and I certainly
won't argue that. As for the "no substitute" comment, I'm guess I'll
plead ignorance because I'm not sure what I'm missing here. What am I
missing that would not be properly covered by that model?

It would not be guarenteed to be up to date with the state of the database. By
implementing the cache within the database, PostgreSQL could maintain the
consistency.

Also, your suggestion does not address the sub-select case, which I think is
much bigger, performance wise, and more efficient than MySQL's cache.

I'm really not sure what you mean by that. Doesn't address it but is
more efficient? Maybe it's because I've not had my morning coffee
yet... ;)

If an internal caching system can be implemented within PostgreSQL, and trust
me, I undersand what a hairball it would be with multiversion concurrency,
omplex queries such as:

select * from (select * from mytable where foo = 'bar' cacheable) as subset
where subset.col = 'value'

The 'cacheable' keyword applied to the query would mean that PostgreSQL could
keep that result set handy for later use. If mytable and that subselect always
does a table scan, no one can argue that this subquery caching could be a huge
win.

As a side note, I REALLY like the idea of a keyword for caching as apposed to
automated caching. t would allow the DBA or developer more control over
PostgreSQL's behavior, and poentially make the fature easier to implement.

This whole discussion could be moot, and this could be developed as an
extension, if there were a function API that could return sets of whole rows.

Currently a function can only return one value or a setof a single type,
implemented as one function call for each entry in a set. If there could be a
function interface which could return a row, and multiple rows similar to the
'setof' return, that would be very cool. That way caching can be implemented
as:

select * from pgcache('select * from mytable where foo='bar') as subset where
subset.col = 'value';

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

I think the notion that data is managed outside of the database is bogus. Query
caching can improve performance in some specific, but popular, scenarios.
Saying it does not belong within the database and is the job of the
application, is like saying file caching is not a job of the file system but is
the job of the application.

This is a functionality many users want, and can be justified by some very
specific, but very common, scenarios. It is not me to say if it is worth the
work, or if it should be done. From the perspective of the user, having this
capability within the database is an important feature, I want to make the
argument.

Greg Copeland wrote:

Show quoted text

I previously replied to you vaguely describing a way you could implement
this by using a combination of client side caching and database tables
and triggers to allow you to determine if your cache is still valid.
Someone came right behind me, Tom maybe??, and indicated that the
proper/ideal way to do this would be to using postgres' asynchronous
database notification mechanisms (listen/notify I believe were the
semantics) to alert your application that your cache has become
invalid. Basically, a couple of triggers and the use of the list/notify
model, and you should be all set.

Done properly, a client side cache which is asynchronously notified by
the database when it's contents become invalid should be faster than
relying on MySQL's database caching scheme. Basically, a strong client
side cache is going to prevent your database from even having to return
a cached result set while a database side cache is going to always
return a result set. Of course, one of the extra cool things you can do
is to cache a gzip'd copy of the data contents which would further act
as an optimization preventing the client or web server (in case they are
different) from having to recompress every result set.

In the long run, again, if properly done, you should be able to beat
MySQL's implementation without too extra much effort. Why? Because a
client side cache can be much smarter in the way that it uses it's
cached contents much in the same way an application is able to better
cache it's data then what the file system is able to do. This is why an
client side cache should be preferred over that of a database result set
cache.

Greg

#10

Karel Zak

zakkr@zf.jcu.cz

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

On Sat, Mar 16, 2002 at 09:01:28AM -0500, mlw wrote:

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

It's ugly argumentation for DB cache. What generate web page after data
change and next time use it as static?

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable

You can insert "mostly static data" into temp table and in next queries
use this temp table. After update/delete/insert can your application
rebuild temp table (or by trigger?).

Karel

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

#11

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

Karel Zak wrote:

On Sat, Mar 16, 2002 at 09:01:28AM -0500, mlw wrote:

"If it is mostly static data, why not just make it a static page?"
Because a static page is a maintenance nightmare. One uses a database in a web
site to allow content to be changed and upgraded dynamically and with a minimum
of work.

It's ugly argumentation for DB cache. What generate web page after data
change and next time use it as static?

I was thinking that it could be implemented as a keyword or comment in a query.
Such as:

select * from table where column = 'foo' cacheable

You can insert "mostly static data" into temp table and in next queries
use this temp table. After update/delete/insert can your application
rebuild temp table (or by trigger?).

Yes, I could, as could most of the guys reading these messages. I am thinking
about a feature in PostgreSQL that would make that easier for the average DBA
or web producer.

Lets face it, MySQL wins a lot of people because they put in features that
people want. All the ways people have suggested to "compete" with MySQL's
caching have been ugly kludges.

I understand the there is an amount of work involved with doing caching, and
the value of caching is debatable by some, however, it is demonstrable that
caching can improve a very common, albeit specific, set of deployments. Also,
managing data is the job of the database, not the application. It does belong
in PostgreSQL, if someone is forced to write a caching scheme around
PostgreSQL, it is because PostgreSQL lacks that feature.

#12

Jean-Michel POURE

jm.poure@freesurf.fr

about 24 years ago

In reply to: mlw (#11)

Re: Again, sorry, caching.

Le Lundi 18 Mars 2002 13:23, mlw a écrit :

Lets face it, MySQL wins a lot of people because they put in features that
people want.

MySQL is very interested in benchmarks.
It does not really care for data consistency.

Cheers, Jean-Michel POURE

#13

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: Jean-Michel POURE (#12)

Re: Again, sorry, caching.

Yes. EVERY person that I've ever known which runs MySQL run for two
very simple reasons. First, they believe it to be wicked fast. Second,
they don't understand what ACID is, what a transaction is, or why
running a single session against a database to perform a benchmark is a
completely bogus concept. In case it's not obvious, these are usually
people that are trying to take a step up from Access. While I do
believe MySQL, from a performance perspective, is a step up from Access
I always tell my clients...if you wouldn't use an Access database for
this project, you shouldn't use MySQL either.

To me, this means we need better advertising, PR, and education rather
than a result set cache. :P

Speaking of which, I'm wondering if there are any design patterns we can
look at which would address client side caching...well, at least make it
easier to implement as well as implement it in a consistent manner.

Greg

Show quoted text

On Mon, 2002-03-18 at 07:32, Jean-Michel POURE wrote:

Le Lundi 18 Mars 2002 13:23, mlw a écrit :

Lets face it, MySQL wins a lot of people because they put in features that
people want.

MySQL is very interested in benchmarks.
It does not really care for data consistency.

Cheers, Jean-Michel POURE

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#14

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

Jean-Michel POURE wrote:

Le Lundi 18 Mars 2002 13:23, mlw a ï¿½crit :

Lets face it, MySQL wins a lot of people because they put in features that
people want.

MySQL is very interested in benchmarks.
It does not really care for data consistency.

In no way am I suggesting we avoid ACID compliance. In no way am I suggesting
that PostgreSQL change. All I am suggesting is that tables which change
infrequently can and should be cached.

select * from table where foo = 'bar'

Need not be executed twice if the table has not changed.

select * from table1, (select * from table2 where foo='bar' cacheable) as
subset were subset.col1 = table1.col1;

In the above query, if table two changes 4 times a day, and it queried a couple
times a minute or second, the caching of the subset could save a huge amount of
disk I/O.

This sort of query could improve many catalog based implementations, from
music, to movies, to books. A library could implement a SQL query for book
lookups like this:

select * from authors, (select * from books where genre = 'scifi' cacheable) as
subset where authors.id = subset.auhorid and authors.id in (....)

Yes it is arguable that index scans may work better, and obviously, summary
tables may help, etc. but imagine a more complex join which produces fewer
records, but is executed frequently. Caching could help the performance of
PostgreSQL in some very real applications.

MySQL's quest for benchmarking numbers, I agree, is shameful because they
create numbers which are not really applicable in the real world. This time,
however, I think they may be on to something.

(1) PostgreSQL use a "cacheable" or "iscacheable" keyword.
(2) If the query uses functions which are not marked as "iscacheable," then it
is not cached.
(3) If any table contained within the cacheable portion of the query is
modified, the cache is marked as dirty.
(4) No provisions are made to recreate the cache after an insert/update/delete.
(5) The first query marked as "iscacheable" that encounters a "dirty" flag in a
table, does an exhaustive search on the cache and removes all entries that are
affected.

As far as I can see, if the above parameters are used to define caching, it
could improve performance on sites where a high number of transactions are
made, where there is also a large amount of static data, i.e. a ecommerce site,
library, etc. If the "iscacheable" keyword is not used, PostgreSQL will not
incur any performance degradation. However, if he "iscacheable" keyword is
used, the performance loss could very well be made up by the benefits of
caching.

#15

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

"Mattew T. O'Connor" wrote:

My big problem with putting the cache outside of the database is that it is
now incumbent on the applications programmer to write a cache. A database
should manage the data, the application should handle how the data is
presented. Forcing the application to implement a cache feels wrong.

I believe someone suggested a possible solution that was in the pg client
using NOTICE and triggers. The argument given against it, was that
it would not be ACID compliant. I say, who cares. I would think that the
"select cachable" would only be allowed for simple selects, it would not be
used for select for update or anything else. Anytime you are given the
result of a simple select, you are not guaranteed that the data won't change
underneath you.

Not true, if you begin a transaction, you can be isolated of changes made to
the database.

The primary use that you have suggested is for web sites,
and they certainly won't mind of the cache is 0.3seconds out of date.

Again, if they don't care about accuracy, then they will use MySQL. PostgreSQL
is a far better system. Making PostgreSQL less accurate, less "correct" takes
away, IMHO, the very reasons to use it.

#16

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: mlw (#15)

Re: Again, sorry, caching.

On Mon, 2002-03-18 at 08:15, mlw wrote:

"Mattew T. O'Connor" wrote:

[snip]

The primary use that you have suggested is for web sites,
and they certainly won't mind of the cache is 0.3seconds out of date.

Again, if they don't care about accuracy, then they will use MySQL. PostgreSQL
is a far better system. Making PostgreSQL less accurate, less "correct" takes
away, IMHO, the very reasons to use it.

If you are using a web site and you need real time data within 0.3s,
you've implemented on the wrong platform. It's as simple as that. In
the web world, there are few applications where a "0.3s" of a window is
notable. After all, that "0.3s" of a window can be anywhere within the
system, including the web server, network, any front end caches, dns
resolutions, etc.

I tend to agree with Mettew. Granted, there are some application
domains where this can be critical...generally speaking, web serving
isn't one of them.

That's why all of the solutions I offered were pointedly addressing a
web server scenario and not a generalized database cache. I completely
agree with you on that. In a generalized situation, the database should
be managing and caching the data (which it already does).

Greg

#17

mlw

markw@mohawksoft.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

Greg Copeland wrote:

On Mon, 2002-03-18 at 08:15, mlw wrote:

"Mattew T. O'Connor" wrote:

[snip]

The primary use that you have suggested is for web sites,
and they certainly won't mind of the cache is 0.3seconds out of date.

Again, if they don't care about accuracy, then they will use MySQL. PostgreSQL
is a far better system. Making PostgreSQL less accurate, less "correct" takes
away, IMHO, the very reasons to use it.

If you are using a web site and you need real time data within 0.3s,
you've implemented on the wrong platform. It's as simple as that. In
the web world, there are few applications where a "0.3s" of a window is
notable. After all, that "0.3s" of a window can be anywhere within the
system, including the web server, network, any front end caches, dns
resolutions, etc.

This is totally wrong! An out of date cache can cause errors by returning
results that are no longer valid, thus causing lookup issues. That is what ACID
compliance is all about.

I tend to agree with Mettew. Granted, there are some application
domains where this can be critical...generally speaking, web serving
isn't one of them.

That's why all of the solutions I offered were pointedly addressing a
web server scenario and not a generalized database cache. I completely
agree with you on that. In a generalized situation, the database should
be managing and caching the data (which it already does).

But it does not cache a query. An expensive query which does an index range
scan and filters by a where clause could invalidate a good number of buffers in
the buffer cache. If this or a number of queries like it are frequently
repeated, verbatim, in a seldom changed table, why not cache them within
PostgreSQL? It would improve overall performance by preserving more blocks in
the buffer cache and eliminate a number of queries being executed.

I don't see how caching can be an argument of applicability. I can understand
it from a time/work point of view, but to debate that it is a useful feature
seems ludicrous.

#18

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: mlw (#17)

Re: Again, sorry, caching.

On Mon, 2002-03-18 at 10:08, mlw wrote:

Greg Copeland wrote:

On Mon, 2002-03-18 at 08:15, mlw wrote:

"Mattew T. O'Connor" wrote:

[snip]

If you are using a web site and you need real time data within 0.3s,
you've implemented on the wrong platform. It's as simple as that. In
the web world, there are few applications where a "0.3s" of a window is
notable. After all, that "0.3s" of a window can be anywhere within the
system, including the web server, network, any front end caches, dns
resolutions, etc.

This is totally wrong! An out of date cache can cause errors by returning
results that are no longer valid, thus causing lookup issues. That is what ACID
compliance is all about.

I understand what ACID is about. Question. Was the result set valid
when it was cached? Yes. So will it be valid when it's returned as a
cached result set? Yes. Might it be an out of date view. Sure...with
a horribly small window for becoming "out of date". Will it cause look
up problems? Might. No more than what you are proposing. In the mean
time, the FE cached result set, performance wise, is beating the pants
off of the database cached solution on both a specific work load and
over all system performance.

I should point out that once the FE cache has been notified that it's
cache is invalid, the FE would no longer return the invalidated result
set. I consider that to be a given, however, from some of your comments
I get the impression that you think the invalid result set would
continue to be served. Another way of thinking about that is...it's
really not any different from the notification acting as the result
returned result set...from a validity perspective. That is...if that
had been the returned result set (the notification) from the
database...it would be accurate (which in the case means the FE cache is
now dirty and treated as such)...if the query is refreshed because it is
now invalid..the result set is once again accurate and reflective of the
database.

Example...

Database cache
Query result set
Result set returned (cached on database)
local change to database (result set cache invalid)
new query based on out of date queried result set

Application cache
Query result set (cached)
Result set returned
local change to database (app cache invalid and signaled)
new query based on out of date queried result set

Both have that problem since transactional boundaries are hard to keep
across HTTP requests. This again, is why for web applications, a FE
cache is perfectly acceptable for *most* needs. Also notice that your
margin for error is more or less the same.

[snip]

I don't see how caching can be an argument of applicability. I can understand
it from a time/work point of view, but to debate that it is a useful feature
seems ludicrous.

I don't think I'm arguing if it's applicable or useful. Rather, I'm
saying that faster results can be yielded by implementing it in the
client with far less effort than it would take to implement in the BE.
I am arguing that it's impact on overall system performance (though I
really didn't do more than just touch on this topic) is
questionable...granted, it may greatly enhance specific work loads...at
the expense of others. Which shouldn't be too surprising as trade offs
of some type are pretty common.

At this point in time, I think we've both pretty well beat this topic
up. Obviously there are two primary ways of viewing the situation. I
don't think anyone is saying it's a bad idea...I think everyone is
saying that it's easier to address elsewhere and that overall, the net
returns may be at the expense of some other work loads. So, unless
there are new pearls to be shared and gleaned, I think the topics been
fairly well addressed. Does more need to said?

Greg

#19

Neil Conway

neilc@samurai.com

about 24 years ago

In reply to: mlw (#1)

Re: Again, sorry, caching.

On Sat, 2002-03-16 at 09:01, mlw wrote:

On a web site, a few specific queries get executed, unchanged, repeatedly.
Think about an ecommerce site, most of the time it is just a handful of basic
queries. These basic queries are usually against pretty large product tables. A
caching mechanism would make these queries pretty light weight.

The arguments against caching:

"It is an application issue"
This is completely wrong. Caching can not be done against a database without
knowledge of the database, i.e. when the data changes.

But can't this be achieved by using a LISTEN/NOTIFY model, with
user-created rules to NOTIFY the appropriate listener when a table
changes? With a good notification scheme like this, you don't need to
continually poll the DB for changes. You don't need to teach your cache
a lot of things about the database, since most of that knowledge is
encapsulated inside the rules, and supporting tables.

My impression (I could be wrong) is that LISTEN/NOTIFY doesn't get the
press that it deserves. If this model isn't widely used because of some
deficiencies in the LISTEN/NOTIFY implementation, IMHO our time would be
better spent fixing those problems than implementing the proposed
caching scheme.

If we're looking to provide a "quick and easy" caching scheme for users
attracted to MySQL's query cache, why not provide this functionality
through another application? I'm thinking about a generic "caching
layer" that would sit in between Postgres and the database client. It
could speak the FE/BE protocol as necessary; it would use LISTEN/NOTIFY
to allow it to efficiently be aware of database changes; it would create
the necessary rules for the user, providing a simple interface to
enabling query caching for a table or a set of tables?

What does everyone think?

OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
am.

I think your goals are laudable (and I also appreciate the effort that
you and everyone else puts into Postgres); I just think we could get
most of the benefits without needing to implement potentially complex
changes to Postgres internals.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

#20

Greg Copeland

greg@CopelandConsulting.Net

about 24 years ago

In reply to: Neil Conway (#19)

Re: Again, sorry, caching.

On Mon, 2002-03-18 at 20:35, Neil Conway wrote:
[snip]

My impression (I could be wrong) is that LISTEN/NOTIFY doesn't get the
press that it deserves. If this model isn't widely used because of some
deficiencies in the LISTEN/NOTIFY implementation, IMHO our time would be
better spent fixing those problems than implementing the proposed
caching scheme.

If we're looking to provide a "quick and easy" caching scheme for users
attracted to MySQL's query cache, why not provide this functionality
through another application? I'm thinking about a generic "caching
layer" that would sit in between Postgres and the database client. It
could speak the FE/BE protocol as necessary; it would use LISTEN/NOTIFY
to allow it to efficiently be aware of database changes; it would create
the necessary rules for the user, providing a simple interface to
enabling query caching for a table or a set of tables?

What does everyone think?

Yes...I was thinking that a generic library interface with a nice design
pattern might meet this need rather well. Done properly, I think we can
make it where all that, more or less, would be needed is application
hooks which accept the result set to be cached and a mechanism to signal
invalidation of the current cache....obviously that's not an exhaustive
list... :)

I haven't spent much time on this, but I'm fairly sure some library
routines can be put together which would greatly reduce the effort of
application coders to support fe-data caches and still be portable for
even the Win32 port.

Greg

#21