Query meltdown: caching results

Started by Gordonabout 18 years ago10 messagesgeneral

gordon.mcvey@ntlworld.com

about 18 years ago

I'm working on a CMS that, in addition to the database-stored version
of articles for easy searching, sorting, etc, also stores a HTML file
version so pages can be fetched with the minimum of overhead (browsing
articles has no more overhead than accessing any other HTML file on
the server).

As I've been trying to keep the system modular I've taken to using an
OO approach to the server side scripts, which are all written in PHP 5
and use PDO for database access. I've also been using prepared
sequences almost exclusively for security and performance reasons.
I've tried to wrap sequences of queries in transactions as well, to
eliminate the "every query is its own transaction" overhead.

With previous projects which I wrote using structured programming
methods it was quite easy to hold caches of results and keep database
queries to a minimum, but I've found this extremely difficult to pull
off when using the OO approach, and now it's starting to have some
real performance consequences. The biggest one comes when publishing
a document that has siblings. CMS content is organized in a tree with
folders, subfolders and documents. A document can be published, where
both a HTML and database copy exist, or unpublished, where only the
database version exists, thus denying visitors to the site access to
it. Documents in a folder get a sidebar with links to the other
documents in the same folder, and when you change the published status
of a document then all the other documents that are also published in
that folder have to be republished in order to update their
sidebars.

This means fetching a list of all the documents with the same parent
and that have a published flag status of true, using the text stored
in the database to generate the HTML page and saving it to disk.
Documents have an associated template, which also has to be fetched
from the database. And all documents have data such as their path,
which is a chain of the document's parents back to the root so that
things like breadcrumbs can be generated.

In the structured approach I'd have just cached stuff like the trail
back to the root as I know it'll be the same for all documents, so I'd
only have to run the sequences of queries to get the full trail once.
But as each instance of a document is independent of all the others
doing things like this is proving really difficult.

I need to find a way of not running queries that I don't need to,
either in the PHP script, or in the Postgres database. What I need is
for a result set to be cached somewhere, either by Postgres or PHP, so
when it sees the same query again in a given session it just returns
the previously fetched result set. The cache also needs to be able to
disregard its cached result sets when an event that changes a table
occurs (insert, update, delete, etc).

On the PHP side I've written a simple Database class that extends PDO
and that I use in its place. It's a simple class that basically I use
to allow me to nest calls to beginTransaction(), commit () and
rollback () (It only starts an actual transaction of a counter is 0.
Otherwide it just increments the counter. Commit only actually
commits when the counter is 1, and decrements it otherwise. Rollback
sets an error flag and decrements the counter, and only rolls back
when the counter is 1. If the error flag is set then commit will
actually roll back instead. )

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

I seem to remember MySQL providing some kind of results caching, can
Postgres do the same? Has anyone else run into similar problems and
how did they overcome them?

Richard Huxton

dev@archonet.com

about 18 years ago

In reply to: Gordon (#1)

Re: Query meltdown: caching results

Gordon wrote:

I need to find a way of not running queries that I don't need to,
either in the PHP script, or in the Postgres database. What I need is
for a result set to be cached somewhere, either by Postgres or PHP, so
when it sees the same query again in a given session it just returns
the previously fetched result set. The cache also needs to be able to
disregard its cached result sets when an event that changes a table
occurs (insert, update, delete, etc).

It's the second part that's fiddly (in the general case) if you do it in
PHP.

If you're looking for a large-scale cache then memcached would suit your
needs. There's an add-on for PG that can keep it notigi
http://pgfoundry.org/projects/pgmemcache/

On the PHP side I've written a simple Database class that extends PDO
and that I use in its place. It's a simple class that basically I use
to allow me to nest calls to beginTransaction(), commit () and
rollback () (It only starts an actual transaction of a counter is 0.
Otherwide it just increments the counter. Commit only actually
commits when the counter is 1, and decrements it otherwise. Rollback
sets an error flag and decrements the counter, and only rolls back
when the counter is 1. If the error flag is set then commit will
actually roll back instead. )

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or

There are a whole bunch of Pear classes for caching - Cache_Lite is
simple to plug into an existing structure.

2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

I seem to remember MySQL providing some kind of results caching, can
Postgres do the same? Has anyone else run into similar problems and
how did they overcome them?

No, but if you're serious about the caching you'll want to do it well
above the data-access layer.

The main gains I've seen with a simple caching system have been:
1. Big, static lookup lists (countries, catalogue sections etc).
2. Whole pages / sections of pages
The trick with both is to cache as close to rendering as possible. So,
the HTML in the case of pages/controls.

Make sure your data-access layer invalidates any relevant cache entries
and you'll be fine (as long as you don't do any database manipulation
outside your app - always have an "invalidate whole cache" function /
script available for this).

Oh, and *do* make sure you've identified real gains first. It's
distressing to spend two days optimising your caching only to realise
you've gained 2% because you've missed the real bottle-neck.

--
Richard Huxton
Archonet Ltd

Gordon

gordon.mcvey@ntlworld.com

about 18 years ago

In reply to: Gordon (#1)

Re: Query meltdown: caching results

On Feb 26, 11:11 am, Gordon <gordon.mc...@ntlworld.com> wrote:

I'm working on a CMS that, in addition to the database-stored version
of articles for easy searching, sorting, etc, also stores a HTML file
version so pages can be fetched with the minimum of overhead (browsing
articles has no more overhead than accessing any other HTML file on
the server).

As I've been trying to keep the system modular I've taken to using an
OO approach to the server side scripts, which are all written in PHP 5
and use PDO for database access. I've also been using prepared
sequences almost exclusively for security and performance reasons.
I've tried to wrap sequences of queries in transactions as well, to
eliminate the "every query is its own transaction" overhead.

With previous projects which I wrote using structured programming
methods it was quite easy to hold caches of results and keep database
queries to a minimum, but I've found this extremely difficult to pull
off when using the OO approach, and now it's starting to have some
real performance consequences. The biggest one comes when publishing
a document that has siblings. CMS content is organized in a tree with
folders, subfolders and documents. A document can be published, where
both a HTML and database copy exist, or unpublished, where only the
database version exists, thus denying visitors to the site access to
it. Documents in a folder get a sidebar with links to the other
documents in the same folder, and when you change the published status
of a document then all the other documents that are also published in
that folder have to be republished in order to update their
sidebars.

This means fetching a list of all the documents with the same parent
and that have a published flag status of true, using the text stored
in the database to generate the HTML page and saving it to disk.
Documents have an associated template, which also has to be fetched
from the database. And all documents have data such as their path,
which is a chain of the document's parents back to the root so that
things like breadcrumbs can be generated.

In the structured approach I'd have just cached stuff like the trail
back to the root as I know it'll be the same for all documents, so I'd
only have to run the sequences of queries to get the full trail once.
But as each instance of a document is independent of all the others
doing things like this is proving really difficult.

I need to find a way of not running queries that I don't need to,
either in the PHP script, or in the Postgres database. What I need is
for a result set to be cached somewhere, either by Postgres or PHP, so
when it sees the same query again in a given session it just returns
the previously fetched result set. The cache also needs to be able to
disregard its cached result sets when an event that changes a table
occurs (insert, update, delete, etc).

On the PHP side I've written a simple Database class that extends PDO
and that I use in its place. It's a simple class that basically I use
to allow me to nest calls to beginTransaction(), commit () and
rollback () (It only starts an actual transaction of a counter is 0.
Otherwide it just increments the counter. Commit only actually
commits when the counter is 1, and decrements it otherwise. Rollback
sets an error flag and decrements the counter, and only rolls back
when the counter is 1. If the error flag is set then commit will
actually roll back instead. )

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

I seem to remember MySQL providing some kind of results caching, can
Postgres do the same? Has anyone else run into similar problems and
how did they overcome them?

I have an idea for how to do it but I'm not quite sure how to
accomplish it fully. Aspects involving modifications to the tables
are going to be particularly problematic.

My idea is to extend the PDOStatement class with an internal result
cache. I'm already caching PDOStatements in order to prevent the
script from trying to prepare the same queries over and over again.
The cache will be an array. The execute(), fetch(). fetchall() etc
methods will be aware of the array and return values from it if
possible.

Things risk getting really tricky really quickly, however. If a
modification is made to a table, then any or all of the cached data in
all the PDOStatements may no longer be valid and will need to be
flushed. This is leading me to suspect that this is a far from ideal
way of doing things.

I know that Postgres can cache query plans, but what about results?
Can/do they get cached too?

Norman Peelman

npeelman@cfl.rr.com

about 18 years ago

In reply to: Gordon (#1)

Re: Query meltdown: caching results

Gordon wrote:

I'm working on a CMS that, in addition to the database-stored version
of articles for easy searching, sorting, etc, also stores a HTML file
version so pages can be fetched with the minimum of overhead (browsing
articles has no more overhead than accessing any other HTML file on
the server).

As I've been trying to keep the system modular I've taken to using an
OO approach to the server side scripts, which are all written in PHP 5
and use PDO for database access. I've also been using prepared
sequences almost exclusively for security and performance reasons.
I've tried to wrap sequences of queries in transactions as well, to
eliminate the "every query is its own transaction" overhead.

With previous projects which I wrote using structured programming
methods it was quite easy to hold caches of results and keep database
queries to a minimum, but I've found this extremely difficult to pull
off when using the OO approach, and now it's starting to have some
real performance consequences. The biggest one comes when publishing
a document that has siblings. CMS content is organized in a tree with
folders, subfolders and documents. A document can be published, where
both a HTML and database copy exist, or unpublished, where only the
database version exists, thus denying visitors to the site access to
it. Documents in a folder get a sidebar with links to the other
documents in the same folder, and when you change the published status
of a document then all the other documents that are also published in
that folder have to be republished in order to update their
sidebars.

This means fetching a list of all the documents with the same parent
and that have a published flag status of true, using the text stored
in the database to generate the HTML page and saving it to disk.
Documents have an associated template, which also has to be fetched
from the database. And all documents have data such as their path,
which is a chain of the document's parents back to the root so that
things like breadcrumbs can be generated.

In the structured approach I'd have just cached stuff like the trail
back to the root as I know it'll be the same for all documents, so I'd
only have to run the sequences of queries to get the full trail once.
But as each instance of a document is independent of all the others
doing things like this is proving really difficult.

I need to find a way of not running queries that I don't need to,
either in the PHP script, or in the Postgres database. What I need is
for a result set to be cached somewhere, either by Postgres or PHP, so
when it sees the same query again in a given session it just returns
the previously fetched result set. The cache also needs to be able to
disregard its cached result sets when an event that changes a table
occurs (insert, update, delete, etc).

On the PHP side I've written a simple Database class that extends PDO
and that I use in its place. It's a simple class that basically I use
to allow me to nest calls to beginTransaction(), commit () and
rollback () (It only starts an actual transaction of a counter is 0.
Otherwide it just increments the counter. Commit only actually
commits when the counter is 1, and decrements it otherwise. Rollback
sets an error flag and decrements the counter, and only rolls back
when the counter is 1. If the error flag is set then commit will
actually roll back instead. )

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

I seem to remember MySQL providing some kind of results caching, can
Postgres do the same? Has anyone else run into similar problems and
how did they overcome them?

Don't know about Postgres but yes MySQL does have caching. You could
also take a look at APC (Alternative PHP Cache) depending on your setup.
Very easy to use. And very easy to monitor what's actually going on with
your pages in the cache. Once a page is generated you can store it in
the cache and give it a time to live before going back to the db.

--
Norman
Registered Linux user #461062

Bruce Momjian

bruce@momjian.us

about 18 years ago

In reply to: Norman Peelman (#4)

Re: Query meltdown: caching results

"Norman Peelman" <npeelman@cfl.rr.com> writes:

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

You might consider looking at memcached. One way to use it would be to have
the PHP application check for the cached object first and use it rather than
do any database queries. Then you can use pgmemcached to allow triggers to
invalidate cached objects whenever the underlying data changes. (Or you could
even just use pl/php to update or invalidate the cached object through the
same code library)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

Gordon

gordon.mcvey@ntlworld.com

about 18 years ago

In reply to: Gordon (#1)

Re: Query meltdown: caching results

On Feb 26, 5:26 pm, st...@enterprisedb.com (Gregory Stark) wrote:

"Norman Peelman" <npeel...@cfl.rr.com> writes:

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

You might consider looking at memcached. One way to use it would be to have
the PHP application check for the cached object first and use it rather than
do any database queries. Then you can use pgmemcached to allow triggers to
invalidate cached objects whenever the underlying data changes. (Or you could
even just use pl/php to update or invalidate the cached object through the
same code library)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Thanks for the replies, but the problem is really centered around how
my script interacts with the database. I know what the problem is,
and if I was using a structured approach I could figure out
workarounds fairly quickly, but I chose to do this project in OOP for
a few reasons (the old CMS was hacked together in a hurry by an
employee who is now long gone, is horribly written and completely
unmaintainable, the size of the project seemed to warrant an OOP
approach, we recently upgraded to PHP 5 and I wanted to take advantage
of the new features, PDO and prepared statements appeared from the
documentation to offer major security and performance enhancements,
and I wanted to increase my experience with OOP development as most
work I've done before now has been structured) and the high level of
compartmentalization that OOP demands is coming back to haunt me
now.

The problem comes when a document is published or unpublished. All
the documents that relate to that document need to be republished to
add or remove the link to the document in question. When the document
is published or unpublished the script gets related documents (at the
moment other documents that share the same folder, but this may change
in the future to cover more related content) and republishes them.

Each document has associated data, such as the ID of the template to
apply, its parent, its path back to root, etc, that are generated by
querying the database in various ways. For example, the route back to
path is fetched by iterativly getting the parent folder, getting that
folder's parent, etc until the script hits the route. Templates are
fetched by looking ath the template ID associated with the document.
If this is 0, then the script goes to the parent folder and gets the
template associated with the folder. If that is 0 as well then it
iterativly goes up until it finds a template to apply or until it hits
the root, in which case it applies a default template. The code
fragment from the script that does this looks like this:

$db -> beginTransaction ();
if ($doc = CmsItem::factory ('CmsDoc', intval ($_GET ['itm_id']),
$db, $user))
{
if ((($doc -> itemProps ['itm_publish']) && ($doc -> unpublish ()))
|| ($doc -> publish ()))
{
// Republish related documents
foreach ($doc -> getSiblingObjects () as $thisDoc)
{
if ((get_class ($thisDoc) == 'CmsDoc')
&& ($thisDoc -> itemProps ['itm_publish']))
{
$thisDoc -> republish ();
}
}
// Item status changed
$db -> commit ();
$_SESSION ['messageStack'][] = ($doc -> itemProps ['itm_publish']?
'Item published':
'Item unpublished');
}
else
{
// Couldn't change published status
$db -> rollback ();
$_SESSION ['messageStack'][] = ($doc -> itemProps ['itm_publish']?
'Unable to unpublish item':
'Unable to publish item');
}
}

GetSiblingObjects () runs a query that gets a list of IDs that share
the same parent as the current document. It then iterates the list
and spawns a new CMS item for each item in the list and returns them
as an array. As folders could be returned as well as documents we
only run republish () on those items.

CmsDoc -> publish () and CmsDoc -> unpublish () toggle a boolean
column in the database between true and false for the item being (un)
published. unpublish () also deletes the concrete file associated
with the DB entry.

publish () and republish () write out a concrete HTML file based on
the content stored in the table for the document in question and the
template that should be applied. The template is determined from a
template ID column. If it's 0 then the script walks up the tree until
it finds a template to use as described above.

publish () and republish () rely on a method of CmsDoc called generate
(), which creates a Smarty instance, does the work described above and
generates a HTML string for publish () and republish () to write out
to disk.

public function generate ()
// Generate document HTML
{
$paths = $this -> getTemplatePaths ();
$page = new Template ($paths ['tplPath'], $paths ['cplPath']);
$page -> assign_by_ref ('document', $this -> itemProps);
$page -> assign ('siblings', $this -> getSiblings ());
$page -> assign ('path', $this -> getTrail ());
return ($page -> fetch ($paths ['tplFile']));
}

Template is a class that inherits from Smarty, that just does a little
setup housekeeping for me and allows me to specify both the template
and the compile dirs from arguments instead of just the template dir.
getTemplatePaths returns where templates are stored and where they'll
be compiled. getSiblings () is like getSiblingObjects () but doesn't
create objects, just returns a list of siblings. getTrail returns an
array of folders fro mthe root to the document's containing folder.

The problem is that there is a vast number of redundant queries being
run. The documents all share the same parent (the list was obtained
in the first place with getSiblingObjects () but for every one that
gets republished the getSiblings () and getTrail () queries get run
(As every document in a folder can have a different template the
getTemplatePaths () query would have to be run for every one anyway).
All the solutions I can think of for this problem would involve
breaking the principle that objects should not depend on a knowledge
of the inner workings of unrelated objects. the only thing these
objects are guaranteed to all have in common is that they all share
the same Database object between them (Database is my PDO extending
class). This suggests to me that the best solution is to cache the
results of read queries either in the database engine itself, or in
the Database object. Results caching would eliminate the problem of
the same queries beign run over and over because only the first
invocation of a query would actually query the database. All
subsequent queries would hit the cache instead.

The problem is the mechanics of actually implementing this caching.
I'm using prepared statements almost exclusivly throughout the design,
meaning that the PDOStatement class probably needs to be extended
somehow and my Database prepare() method should return my extended
class instead of a generic PDOStatement. The Database class does
caching of PDOStatements to save the overhead of each instance of a
class attempting to prepare the same statement over and over again.
If I can extend PDOStatement to cache its results the nthe repeated
query overhead will be more or less eliminated. The problem is what
happens when a database modifying event occurs? The easiest thing to
do would be to invalidate the cache in all existing PDOStatements,
which while it's not the most efficiant solution would be the safest.
but when one PDOStatement executes a modifying query then how will all
the other PDOStatements know?

This is why I was interested in caching at the database level, as it
would sidestep this problem.

I can't have been the first person to run up against this problem,
somebody somewhere must have implemented a framework for PDO that can
handle caching of results from prepared statements. I've just not been
able to find one. Please, if anyone out there knows something I don't
then please let me know.

PS: For those of you who replied via private mail instead of in the
group, thank you for the responses, I appreciate the help and advice.
but could you please reply in the group? Those messages get sent to
my home address, and I'm posting this from work, so I won't see them
until I get home. Besides, it'll be helpful for the next poor sod who
hits this problem if there's a publicly viewable thread that'll turn
up in Google :)

paul rivers

rivers.paul@gmail.com

about 18 years ago

In reply to: Gordon (#6)

Re: Query meltdown: caching results

Gordon wrote:

On Feb 26, 5:26 pm, st...@enterprisedb.com (Gregory Stark) wrote:

"Norman Peelman" <npeel...@cfl.rr.com> writes:

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

You might consider looking at memcached. One way to use it would be to have
the PHP application check for the cached object first and use it rather than
do any database queries. Then you can use pgmemcached to allow triggers to
invalidate cached objects whenever the underlying data changes. (Or you could
even just use pl/php to update or invalidate the cached object through the
same code library)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

The problem is that there is a vast number of redundant queries being
run. .... the principle that objects should not depend on a knowledge
of the inner workings of unrelated objects. .... Results caching would eliminate the problem of
the same queries beign run over and over ....
The problem is the mechanics of actually implementing this caching.
I'm using prepared statements almost exclusivly throughout the design,
meaning that the PDOStatement class probably needs to be extended
somehow and my Database prepare() .... I can't have been the first person to run up against this problem

With memcached, your methods to retrieve data go from "get data from db"
to "get data from cache, and on cache miss get from db and leave a copy
for the next guy in cache". Updating the data is not much more
complicated. I don't see why this doesn't work for you? It won't
compromise anything on the encapsulation front you are concerned about,
and you can still use your prepared statements for hitting the db, etc.?

Regards,
Paul

Gordon

gordon.mcvey@ntlworld.com

about 18 years ago

In reply to: Gordon (#1)

Re: Query meltdown: caching results

On Feb 27, 10:37 am, Gordon <gordon.mc...@ntlworld.com> wrote:

On Feb 26, 5:26 pm, st...@enterprisedb.com (Gregory Stark) wrote:

"Norman Peelman" <npeel...@cfl.rr.com> writes:

My options are, as far as I can tell,

1) replace the Database PDO extending class with something else that
provides query results caching in PHP, or
2) get Postgres itself to cache the results of queries to avoid
running them repeatedly during a session.

You might consider looking at memcached. One way to use it would be to have
the PHP application check for the cached object first and use it rather than
do any database queries. Then you can use pgmemcached to allow triggers to
invalidate cached objects whenever the underlying data changes. (Or you could
even just use pl/php to update or invalidate the cached object through the
same code library)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Thanks for the replies, but the problem is really centered around how
my script interacts with the database. I know what the problem is,
and if I was using a structured approach I could figure out
workarounds fairly quickly, but I chose to do this project in OOP for
a few reasons (the old CMS was hacked together in a hurry by an
employee who is now long gone, is horribly written and completely
unmaintainable, the size of the project seemed to warrant an OOP
approach, we recently upgraded to PHP 5 and I wanted to take advantage
of the new features, PDO and prepared statements appeared from the
documentation to offer major security and performance enhancements,
and I wanted to increase my experience with OOP development as most
work I've done before now has been structured) and the high level of
compartmentalization that OOP demands is coming back to haunt me
now.

The problem comes when a document is published or unpublished. All
the documents that relate to that document need to be republished to
add or remove the link to the document in question. When the document
is published or unpublished the script gets related documents (at the
moment other documents that share the same folder, but this may change
in the future to cover more related content) and republishes them.

Each document has associated data, such as the ID of the template to
apply, its parent, its path back to root, etc, that are generated by
querying the database in various ways. For example, the route back to
path is fetched by iterativly getting the parent folder, getting that
folder's parent, etc until the script hits the route. Templates are
fetched by looking ath the template ID associated with the document.
If this is 0, then the script goes to the parent folder and gets the
template associated with the folder. If that is 0 as well then it
iterativly goes up until it finds a template to apply or until it hits
the root, in which case it applies a default template. The code
fragment from the script that does this looks like this:

$db -> beginTransaction ();
if ($doc = CmsItem::factory ('CmsDoc', intval ($_GET ['itm_id']),
$db, $user))
{
if ((($doc -> itemProps ['itm_publish']) && ($doc -> unpublish ()))
|| ($doc -> publish ()))
{
// Republish related documents
foreach ($doc -> getSiblingObjects () as $thisDoc)
{
if ((get_class ($thisDoc) == 'CmsDoc')
&& ($thisDoc -> itemProps ['itm_publish']))
{
$thisDoc -> republish ();
}
}
// Item status changed
$db -> commit ();
$_SESSION ['messageStack'][] = ($doc -> itemProps ['itm_publish']?
'Item published':
'Item unpublished');
}
else
{
// Couldn't change published status
$db -> rollback ();
$_SESSION ['messageStack'][] = ($doc -> itemProps ['itm_publish']?
'Unable to unpublish item':
'Unable to publish item');
}

}

GetSiblingObjects () runs a query that gets a list of IDs that share
the same parent as the current document. It then iterates the list
and spawns a new CMS item for each item in the list and returns them
as an array. As folders could be returned as well as documents we
only run republish () on those items.

CmsDoc -> publish () and CmsDoc -> unpublish () toggle a boolean
column in the database between true and false for the item being (un)
published. unpublish () also deletes the concrete file associated
with the DB entry.

publish () and republish () write out a concrete HTML file based on
the content stored in the table for the document in question and the
template that should be applied. The template is determined from a
template ID column. If it's 0 then the script walks up the tree until
it finds a template to use as described above.

publish () and republish () rely on a method of CmsDoc called generate
(), which creates a Smarty instance, does the work described above and
generates a HTML string for publish () and republish () to write out
to disk.

public function generate ()
// Generate document HTML
{
$paths = $this -> getTemplatePaths ();
$page = new Template ($paths ['tplPath'], $paths ['cplPath']);
$page -> assign_by_ref ('document', $this -> itemProps);
$page -> assign ('siblings', $this -> getSiblings ());
$page -> assign ('path', $this -> getTrail ());
return ($page -> fetch ($paths ['tplFile']));
}

Template is a class that inherits from Smarty, that just does a little
setup housekeeping for me and allows me to specify both the template
and the compile dirs from arguments instead of just the template dir.
getTemplatePaths returns where templates are stored and where they'll
be compiled. getSiblings () is like getSiblingObjects () but doesn't
create objects, just returns a list of siblings. getTrail returns an
array of folders fro mthe root to the document's containing folder.

The problem is that there is a vast number of redundant queries being
run. The documents all share the same parent (the list was obtained
in the first place with getSiblingObjects () but for every one that
gets republished the getSiblings () and getTrail () queries get run
(As every document in a folder can have a different template the
getTemplatePaths () query would have to be run for every one anyway).
All the solutions I can think of for this problem would involve
breaking the principle that objects should not depend on a knowledge
of the inner workings of unrelated objects. the only thing these
objects are guaranteed to all have in common is that they all share
the same Database object between them (Database is my PDO extending
class). This suggests to me that the best solution is to cache the
results of read queries either in the database engine itself, or in
the Database object. Results caching would eliminate the problem of
the same queries beign run over and over because only the first
invocation of a query would actually query the database. All
subsequent queries would hit the cache instead.

The problem is the mechanics of actually implementing this caching.
I'm using prepared statements almost exclusivly throughout the design,
meaning that the PDOStatement class probably needs to be extended
somehow and my Database prepare() method should return my extended
class instead of a generic PDOStatement. The Database class does
caching of PDOStatements to save the overhead of each instance of a
class attempting to prepare the same statement over and over again.
If I can extend PDOStatement to cache its results the nthe repeated
query overhead will be more or less eliminated. The problem is what
happens when a database modifying event occurs? The easiest thing to
do would be to invalidate the cache in all existing PDOStatements,
which while it's not the most efficiant solution would be the safest.
but when one PDOStatement executes a modifying query then how will all
the other PDOStatements know?

This is why I was interested in caching at the database level, as it
would sidestep this problem.

I can't have been the first person to run up against this problem,
somebody somewhere must have implemented a framework for PDO that can
handle caching of results from prepared statements. I've just not been
able to find one. Please, if anyone out there knows something I don't
then please let me know.

PS: For those of you who replied via private mail instead of in the
group, thank you for the responses, I appreciate the help and advice.
but could you please reply in the group? Those messages get sent to
my home address, and I'm posting this from work, so I won't see them
until I get home. Besides, it'll be helpful for the next poor sod who
hits this problem if there's a publicly viewable thread that'll turn
up in Google :)

After a lot of hairpulling, I finally found a mechanism in PHP for
doing what I wanted. I just had to know 2 things:

1) How to get the PDO engine to use my customized prepared statement
class instead of PDOStatement
2) Extending PDOStatement to transparently add results caching is too
difficult and complex in the timeframe required

Once I knew these things I made a PDOStatement extension class that
instead of trying to transparently add caching to the existing methods
added a couple of new ones instead. Code below:

<?php

class Statement extends PDOStatement
{
private $resultCache = array ();
private $database = NULL;
public $hits = 0;
public $misses = 0;

public function ask (array $params = array ())
// Executes a prepared statement on the database that fetches data
{
$hash = md5 (implode (',', $params));
if (!$this -> resultCache [$hash])
{
$this -> misses++;
// Execute the query
if ($this -> execute ($params))
{
// Cache the results
$this -> resultCache [$hash] = $this -> fetchAll
(PDO::FETCH_ASSOC);
}
}
else
{
$this -> hits++;
}
return ($this -> resultCache [$hash]);
}
public function tell (array $params = array ())
// Execute a prepared statement that causes the database to be
modified
{
// Execute the query
if ($this -> execute ($params))
{
$rowCount = $this -> rowCount ();
if ($rowCount)
{
// Tell the parent Database object to clear statement caches
$this -> database -> clearResults ();
}
return ($rowCount);
}
}
public function clearResults ()
// Clear cache
{
$this -> resultCache = array ();
}
private function __construct ($db)
// Class constructor
{
$this -> database = $db;
//print_r ($this);
}
}

class Database extends PDO
// Adds some extra functionality to the built in PDO class
{
private $statementCache = array ();
private $txCount = 0;
private $txErr = false;

// Prepared statement cacheing
public function prepare ($statement, array $options = array ())
{
$hash = md5 ($statement);
if ((!isset ($this -> statementCache [$hash]))
|| (!is_object ($this -> statementCache [$hash])))
{
//echo ('Preparing statement "'. $statement .'" ');
$this -> statementCache [$hash] = parent::prepare ($statement,
$options);
}
else
{
//echo ('Statement "' . $statement . '" already prepared ');
}
return ($this -> statementCache [$hash]);
}
public function clearResults ()
// Clear the results cache of all associated prepared statements
{
foreach ($this -> statementCache as $thisStatement)
{
$thisStatement -> clearResults ();
}
}
// Add support for transaction nesting
public function beginTransaction ()
{
if (($this -> txCount == 0) && (!$this -> txErr))
{
$result = parent::beginTransaction ();
}
$this -> txCount ++;
if (DEBUG_TX)
{
echo ('begin: ' . $this -> txCount . ' transaction(s) ');
}
return ($result);
}
public function commit ()
{
$this -> txCount --;
if ($this -> txCount <= 0)
{
$this -> txErr? $result = parent::rollback (): $result =
parent::commit ();
$this -> txErr = false;
}
if (DEBUG_TX)
{
echo ('commit: ' . $this -> txCount . ' transaction(s) ');
}
return ($result);
}
public function rollback ()
{
$this -> txErr = true;
$this -> txCount --;
if ($this -> txCount <= 0)
{
$result = parent::rollback ();
$this -> txErr = false;
}
if (DEBUG_TX)
{
echo ('rollback: ' . $this -> txCount . ' transaction(s) ');
}
return ($result);
}
// Housekeeping
private function removeExpiredLocks ()
{
$query = 'DELETE FROM cms_locks
WHERE lck_timestamp + lck_duration < NOW();';
$preparedQuery = $this -> prepare ($query);
if ($preparedQuery -> execute ())
{
return ($preparedQuery -> rowCount ());
}
}
// Class constructor
public function __construct ()
{
parent::__construct ( 'pgsql:host=' . CFG_DB_HOST
.' port=' . CFG_DB_PORT
.' dbname=' . CFG_DB_DBNAME,
CFG_DB_USERNAME,
CFG_DB_PASSWORD);
/*
The documentation for the following line on php.net is really bad!

What is does is set what class will be used to handle prepared
statements. By default
the PDO -> prepare() command returns a prepared statement as a
PDOStatement class object.
We want to extend PDOStatements to provide some extra functionality,
so when we run the
prepare () method, we want to return something other than a
PDOStatement.

This line tells the database to use our Statement class for prepared
statements instead
of the PDOStatement class. It also passes a reference to the
database object that
spawned it to the constructor as an argument. We'll use this
reference in the Statement
class to send messages back to the Database class
*/
$this -> setAttribute (PDO::ATTR_STATEMENT_CLASS, array
('Statement', array ($this)));
$this -> setAttribute (PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING);
$this -> query ('SET search_path = ' . CFG_DB_PATH . ';');
$this -> removeExpiredLocks ();
}
}

The system works like this: I use my Database class where I would
have used PDO to create a database connection. When I prepare() a
statement I either get a new prepared statement back, or the
previously created one is returned. The prepared statement contains
an array that caches the results produced with various parameters. My
prepared statement class also adds 2 database querying methods, ask ()
and tell (), that I use where I normally would have used an execute
(). If I want to retrieve information from the database I ask () it
to return the data matching my parameters. Whenever I want to make a
change I tell () it the data I want saved to the database. The ask ()
method checks the statement's local cache of results and if it finds
one matching the passed parameters, it returns it. If not, then it
execute()s the query to retrieve them.

When I tell () the database to make a change the statement that
received the message to do so sends a message to its parent Database
object. The object goes through its list of cached prepared
statements and sends them messages telling them to dump the contents
of their caches.

It's not an ideal solution, if two different queries produce identical
result sets or ones that overlap in some way then they get cached
twice, and the cache invalidation mechanism is extremely primitive (if
any change is made, dump everything). Nonetheless this little caching
system has had quite a dramatic effect on performance. It's still not
as fast as I would like though. :)

I reposted the code here in the hope that somebody finds it useful,
and/or in case somebody has suggestions for improvement. While the
code itself was quite simple in the end, finding the information
needed to make it work wasn't. The ATTR_STATEMENT_CLASS parameter is
not well documented. Hopefully others can learn from my difficulties
here and learn from them.

Gordon

gordon.mcvey@ntlworld.com

about 18 years ago

In reply to: Gordon (#1)

Re: Query meltdown: caching results

(Sorry for the repost but I thought this would be appropriate to both
groups. I did tell Google to delete my first post but odds are some
guys got that copy already anyway)

After a lot of hairpulling, I finally found a mechanism in PHP for
doing what I wanted. I just had to know 2 things:

Once I knew these things I made a PDOStatement extension class that
instead of trying to transparently add caching to the existing methods
added a couple of new ones instead. Code below:

<?php

class Statement extends PDOStatement
{
private $resultCache = array ();
private $database = NULL;
public $hits = 0;
public $misses = 0;

fetchAll

(PDO::FETCH_ASSOC);
}
}
else
{
$this -> hits++;
}
return ($this -> resultCache [$hash]);
}
public function tell (array $params = array ())
// Execute a prepared statement that causes the database to be
modified
{
// Execute the query
if ($this -> execute ($params))
{
$rowCount = $this -> rowCount ();
if ($rowCount)
{
// Tell the parent Database object to
clear statement caches
$this -> database -> clearResults ();
}
return ($rowCount);
}
}
public function clearResults ()
// Clear cache
{
$this -> resultCache = array ();
}
private function __construct ($db)
// Class constructor
{
$this -> database = $db;
//print_r ($this);
}

}

class Database extends PDO
// Adds some extra functionality to the built in PDO class
{
private $statementCache = array ();
private $txCount = 0;
private $txErr = false;

// Prepared statement cacheing
public function prepare ($statement, array $options = array
())
{
$hash = md5 ($statement);
if ((!isset ($this -> statementCache [$hash]))
|| (!is_object ($this -> statementCache [$hash])))
{
//echo ('Preparing statement "'.
$statement .'" ');
$this -> statementCache [$hash] =
parent::prepare ($statement,
$options);
}
else
{
//echo ('Statement "' . $statement . '"
already prepared ');
}
return ($this -> statementCache [$hash]);
}
public function clearResults ()
// Clear the results cache of all associated prepared
statements
{
foreach ($this -> statementCache as $thisStatement)
{
$thisStatement -> clearResults ();
}
}
// Add support for transaction nesting
public function beginTransaction ()
{
if (($this -> txCount == 0) && (!$this -> txErr))
{
$result = parent::beginTransaction ();
}
$this -> txCount ++;
if (DEBUG_TX)
{
echo ('begin: ' . $this -> txCount . '
transaction(s) ');
}
return ($result);
}
public function commit ()
{
$this -> txCount --;
if ($this -> txCount <= 0)
{
$this -> txErr? $result =
parent::rollback (): $result =
parent::commit ();
$this -> txErr = false;
}
if (DEBUG_TX)
{
echo ('commit: ' . $this -> txCount . '
transaction(s) ');
}
return ($result);
}
public function rollback ()
{
$this -> txErr = true;
$this -> txCount --;
if ($this -> txCount <= 0)
{
$result = parent::rollback ();
$this -> txErr = false;
}
if (DEBUG_TX)
{
echo ('rollback: ' . $this -> txCount . '
transaction(s) ');
}
return ($result);
}
// Housekeeping
private function removeExpiredLocks ()
{
$query = 'DELETE FROM cms_locks
WHERE lck_timestamp +
lck_duration < NOW();';
$preparedQuery = $this -> prepare ($query);
if ($preparedQuery -> execute ())
{
return ($preparedQuery -> rowCount ());
}
}
// Class constructor
public function __construct ()
{
parent::__construct ( 'pgsql:host=' . CFG_DB_HOST
.'
port=' . CFG_DB_PORT
.'
dbname=' . CFG_DB_DBNAME,

CFG_DB_USERNAME,

CFG_DB_PASSWORD);
/*
The documentation for the following line on php.net is
really bad!

What is does is set what class will be used to handle
prepared
statements. By default
the PDO -> prepare() command returns a prepared
statement as a
PDOStatement class object.
We want to extend PDOStatements to provide some extra
functionality,
so when we run the
prepare () method, we want to return something other
than a
PDOStatement.

This line tells the database to use our Statement
class for prepared
statements instead
of the PDOStatement class. It also passes a reference
to the
database object that
spawned it to the constructor as an argument. We'll
use this
reference in the Statement
class to send messages back to the Database class
*/
$this -> setAttribute (PDO::ATTR_STATEMENT_CLASS,
array
('Statement', array ($this)));
$this -> setAttribute (PDO::ATTR_ERRMODE,
PDO::ERRMODE_WARNING);
$this -> query ('SET search_path = ' . CFG_DB_PATH .
';');
$this -> removeExpiredLocks ();
}

}

#10

Norman Peelman

npeelman@cfl.rr.com

about 18 years ago

In reply to: Gordon (#9)

Re: Query meltdown: caching results

Gordon wrote:

(Sorry for the repost but I thought this would be appropriate to both
groups. I did tell Google to delete my first post but odds are some
guys got that copy already anyway)

After a lot of hairpulling, I finally found a mechanism in PHP for
doing what I wanted. I just had to know 2 things:

1) How to get the PDO engine to use my customized prepared statement
class instead of PDOStatement
2) Extending PDOStatement to transparently add results caching is too
difficult and complex in the timeframe required

Once I knew these things I made a PDOStatement extension class that
instead of trying to transparently add caching to the existing methods
added a couple of new ones instead. Code below:

<?php

class Statement extends PDOStatement
{
private $resultCache = array ();
private $database = NULL;
public $hits = 0;
public $misses = 0;

public function ask (array $params = array ())
// Executes a prepared statement on the database that fetches
data
{
$hash = md5 (implode (',', $params));
if (!$this -> resultCache [$hash])
{
$this -> misses++;
// Execute the query
if ($this -> execute ($params))
{
// Cache the results
$this -> resultCache [$hash] = $this -

fetchAll

(PDO::FETCH_ASSOC);
}
}
else
{
$this -> hits++;
}
return ($this -> resultCache [$hash]);
}
public function tell (array $params = array ())
// Execute a prepared statement that causes the database to be
modified
{
// Execute the query
if ($this -> execute ($params))
{
$rowCount = $this -> rowCount ();
if ($rowCount)
{
// Tell the parent Database object to
clear statement caches
$this -> database -> clearResults ();
}
return ($rowCount);
}
}
public function clearResults ()
// Clear cache
{
$this -> resultCache = array ();
}
private function __construct ($db)
// Class constructor
{
$this -> database = $db;
//print_r ($this);
}

}

class Database extends PDO
// Adds some extra functionality to the built in PDO class
{
private $statementCache = array ();
private $txCount = 0;
private $txErr = false;

// Prepared statement cacheing
public function prepare ($statement, array $options = array
())
{
$hash = md5 ($statement);
if ((!isset ($this -> statementCache [$hash]))
|| (!is_object ($this -> statementCache [$hash])))
{
//echo ('Preparing statement "'.
$statement .'" ');
$this -> statementCache [$hash] =
parent::prepare ($statement,
$options);
}
else
{
//echo ('Statement "' . $statement . '"
already prepared ');
}
return ($this -> statementCache [$hash]);
}
public function clearResults ()
// Clear the results cache of all associated prepared
statements
{
foreach ($this -> statementCache as $thisStatement)
{
$thisStatement -> clearResults ();
}
}
// Add support for transaction nesting
public function beginTransaction ()
{
if (($this -> txCount == 0) && (!$this -> txErr))
{
$result = parent::beginTransaction ();
}
$this -> txCount ++;
if (DEBUG_TX)
{
echo ('begin: ' . $this -> txCount . '
transaction(s) ');
}
return ($result);
}
public function commit ()
{
$this -> txCount --;
if ($this -> txCount <= 0)
{
$this -> txErr? $result =
parent::rollback (): $result =
parent::commit ();
$this -> txErr = false;
}
if (DEBUG_TX)
{
echo ('commit: ' . $this -> txCount . '
transaction(s) ');
}
return ($result);
}
public function rollback ()
{
$this -> txErr = true;
$this -> txCount --;
if ($this -> txCount <= 0)
{
$result = parent::rollback ();
$this -> txErr = false;
}
if (DEBUG_TX)
{
echo ('rollback: ' . $this -> txCount . '
transaction(s) ');
}
return ($result);
}
// Housekeeping
private function removeExpiredLocks ()
{
$query = 'DELETE FROM cms_locks
WHERE lck_timestamp +
lck_duration < NOW();';
$preparedQuery = $this -> prepare ($query);
if ($preparedQuery -> execute ())
{
return ($preparedQuery -> rowCount ());
}
}
// Class constructor
public function __construct ()
{
parent::__construct ( 'pgsql:host=' . CFG_DB_HOST
.'
port=' . CFG_DB_PORT
.'
dbname=' . CFG_DB_DBNAME,

CFG_DB_USERNAME,

CFG_DB_PASSWORD);
/*
The documentation for the following line on php.net is
really bad!

What is does is set what class will be used to handle
prepared
statements. By default
the PDO -> prepare() command returns a prepared
statement as a
PDOStatement class object.
We want to extend PDOStatements to provide some extra
functionality,
so when we run the
prepare () method, we want to return something other
than a
PDOStatement.

This line tells the database to use our Statement
class for prepared
statements instead
of the PDOStatement class. It also passes a reference
to the
database object that
spawned it to the constructor as an argument. We'll
use this
reference in the Statement
class to send messages back to the Database class
*/
$this -> setAttribute (PDO::ATTR_STATEMENT_CLASS,
array
('Statement', array ($this)));
$this -> setAttribute (PDO::ATTR_ERRMODE,
PDO::ERRMODE_WARNING);
$this -> query ('SET search_path = ' . CFG_DB_PATH .
';');
$this -> removeExpiredLocks ();
}

}

The system works like this: I use my Database class where I would
have used PDO to create a database connection. When I prepare() a
statement I either get a new prepared statement back, or the
previously created one is returned. The prepared statement contains
an array that caches the results produced with various parameters. My
prepared statement class also adds 2 database querying methods, ask ()
and tell (), that I use where I normally would have used an execute
(). If I want to retrieve information from the database I ask () it
to return the data matching my parameters. Whenever I want to make a
change I tell () it the data I want saved to the database. The ask ()
method checks the statement's local cache of results and if it finds
one matching the passed parameters, it returns it. If not, then it
execute()s the query to retrieve them.

When I tell () the database to make a change the statement that
received the message to do so sends a message to its parent Database
object. The object goes through its list of cached prepared
statements and sends them messages telling them to dump the contents
of their caches.

It's not an ideal solution, if two different queries produce identical
result sets or ones that overlap in some way then they get cached
twice, and the cache invalidation mechanism is extremely primitive (if
any change is made, dump everything). Nonetheless this little caching
system has had quite a dramatic effect on performance. It's still not
as fast as I would like though. :)

I reposted the code here in the hope that somebody finds it useful,
and/or in case somebody has suggestions for improvement. While the
code itself was quite simple in the end, finding the information
needed to make it work wasn't. The ATTR_STATEMENT_CLASS parameter is
not well documented. Hopefully others can learn from my difficulties
here and learn from them.

Gordon,

Thought you might be looking to cache the completed pages (as they
are requested). I would have thought something like APC would fit the bill:

$key = $_SERVER['HTTP_REFERER'];
if ($cached_html = apc_fetch($key)
{
echo $cached_html;
exit;
}

<rest of your code>
...when you output a page down here you store it like:

echo $html; // your output
apc_store($key,$html,300); // 300 secs = 5 mins
exit;

This way you never even touch a query unless the time limit is up. If
a page is changed, all you need to do is:

apc_delete($key);

...which will automatically re-cache the new page (due to the logic
flow) on the next request.

--
Norman
Registered Linux user #461062