intercepting WAL writes

Started by Mikeover 17 years ago13 messages
#1Mike
mike@fonolo.com

Hello,

I'm new to the core PostgreSQL code, so pardon the question if the answer is
really obvious, and I'm just missing it, but I've got a relatively large web
application that uses PostgreSQL as a back-end database, and we're heavily
using memcached to cache frequently accessed data.

I'm looking at modifying PostgreSQL (in some way) to push changes directly
to our memcache servers, in hopes of moving towards a system where only
writes are actually sent to the databases, and reads are exclusively sent to
the memcache servers.

I'm guessing that I could intercept the WAL writes, and use this information
to push out to my memcache servers, similar to a replication model.

Can somebody point to the most logical place in the code to intercept the
WAL writes? (just a rough direction would be enough)- or if this doesn't
make sense at all, another suggestion on where to get the data? (I'm trying
to avoid doing it using triggers).

Thanks,

Mike

#2Jonah H. Harris
jonah.harris@gmail.com
In reply to: Mike (#1)
Re: intercepting WAL writes

On Wed, May 28, 2008 at 7:11 PM, Mike <mike@fonolo.com> wrote:

Can somebody point to the most logical place in the code to intercept the
WAL writes? (just a rough direction would be enough)

XLogInsert

or if this doesn't make sense at all, another suggestion on where to get
the data? (I'm trying to avoid doing it using triggers).

Without triggers, you don't have many options. With triggers, you
could use pg_memcache. If you take it from the WAL, you'll have to do
a bit of decoding to make it usable in the context you're looking for,
which is quite a bit of work.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

#3Jeff Davis
pgsql@j-davis.com
In reply to: Mike (#1)
Re: intercepting WAL writes

On Wed, 2008-05-28 at 19:11 -0400, Mike wrote:

Can somebody point to the most logical place in the code to intercept
the WAL writes? (just a rough direction would be enough)- or if this
doesn’t make sense at all, another suggestion on where to get the
data? (I’m trying to avoid doing it using triggers).

Why are you avoiding triggers? One solution might be to use Slony to
just create the log, and then read the log of events into memcached
rather than another PostgreSQL instance.

http://slony.info/documentation/logshipping.html

Those logs might be easier to process than the WAL.

Also, why do you need to intercept the WAL writes, and not just read
from a WAL archive? Does this need to be synchronous?

Regards,
Jeff Davis

#4Mike
mike@fonolo.com
In reply to: Jonah H. Harris (#2)
Re: intercepting WAL writes

On Wed, May 28, 2008 at 7:11 PM, Mike <mike@fonolo.com> wrote:

Can somebody point to the most logical place in the code to intercept the
WAL writes? (just a rough direction would be enough)

XLogInsert

Great- I'll take a look at that code.

or if this doesn't make sense at all, another suggestion on where to get
the data? (I'm trying to avoid doing it using triggers).

Without triggers, you don't have many options. With triggers, you
could use pg_memcache. If you take it from the WAL, you'll have to do
a bit of decoding to make it usable in the context you're looking for,
which is quite a bit of work.

When you say a bit of decoding, is that because the data written to the logs
is after the query parser/planner? Or because it's written in several
chunks? Or?

I turned on WAL_DEBUG; it wasn't immediately helpful, tho I don't know what
it means quite yet. I guess perfectly formatted SQL statements would be too
easy ;)

Mike

#5Jonah H. Harris
jonah.harris@gmail.com
In reply to: Mike (#4)
Re: intercepting WAL writes

On Wed, May 28, 2008 at 8:30 PM, Mike <mike@fonolo.com> wrote:

When you say a bit of decoding, is that because the data written to the logs
is after the query parser/planner? Or because it's written in several
chunks? Or?

Because that's the actual recovery record. There is no SQL text, just
the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE,
XLOG_XACT_COMMIT, ...) and the data as it relates to that operation.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

#6Mike
mike@fonolo.com
In reply to: Jonah H. Harris (#5)
Re: intercepting WAL writes

On Wed, May 28, 2008 at 8:30 PM, Mike <mike@fonolo.com> wrote:

When you say a bit of decoding, is that because the data written to the

logs

is after the query parser/planner? Or because it's written in several
chunks? Or?

Because that's the actual recovery record. There is no SQL text, just
the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE,
XLOG_XACT_COMMIT, ...) and the data as it relates to that operation.

Oh- right- that makes sense.

I installed and started looking at the source code for xlogviewer and
xlogdump; seems like a reasonable place to start.

Thanks for your help,

Mike

#7Koichi Suzuki
koichi.szk@gmail.com
In reply to: Mike (#6)
Re: intercepting WAL writes

And you will have a chance to encounter full page writes, whole page
image, which could be produced during the hot backup and the first
modification to the data page after a checkpoint (if you turn full
page write option "on" by GUC).

2008/5/29 Mike <mike@fonolo.com>:

On Wed, May 28, 2008 at 8:30 PM, Mike <mike@fonolo.com> wrote:

When you say a bit of decoding, is that because the data written to the

logs

is after the query parser/planner? Or because it's written in several
chunks? Or?

Because that's the actual recovery record. There is no SQL text, just
the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE,
XLOG_XACT_COMMIT, ...) and the data as it relates to that operation.

Oh- right- that makes sense.

I installed and started looking at the source code for xlogviewer and
xlogdump; seems like a reasonable place to start.

Thanks for your help,

Mike

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
------
Koichi Suzuki

#8Hannu Krosing
hannu@krosing.net
In reply to: Mike (#1)
Re: intercepting WAL writes

On Wed, 2008-05-28 at 19:11 -0400, Mike wrote:

Can somebody point to the most logical place in the code to intercept
the WAL writes? (just a rough direction would be enough)- or if this
doesn’t make sense at all, another suggestion on where to get the
data?

I don't think that intercepting (and then decoding ) WAL is very
productive. It is too low level to be of much help.

The way I'd do it would be using pgQ from SkyTools package where change
events can be queued when happening and then moved in bulk to memcached
with not too much effort.

Marko Kreen, the primary author of pgQ did a presentation on it at this
years pgcon :

http://www.pgcon.org/2008/schedule/events/79.en.html

You can download the presentation slides to get the overview.



(I’m trying to avoid doing it using triggers).

I can't see a good point in avoiding triggers, if you end up doing a lot
more work to reconstruct the data which would have been easy to get
using a trigger.

I've heard that there was and add-on replication solution for Oracle
which extracted data from WAL and I was told that it maxed out at about
200 tps. pgQ approach can do much better.

---------------
Hannu

#9Simon Riggs
simon@2ndquadrant.com
In reply to: Mike (#6)
Re: intercepting WAL writes

On Wed, 2008-05-28 at 21:47 -0400, Mike wrote:

On Wed, May 28, 2008 at 8:30 PM, Mike <mike@fonolo.com> wrote:

When you say a bit of decoding, is that because the data written to the

logs

is after the query parser/planner? Or because it's written in several
chunks? Or?

Because that's the actual recovery record. There is no SQL text, just
the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE,
XLOG_XACT_COMMIT, ...) and the data as it relates to that operation.

Oh- right- that makes sense.

Hmm, yes and no.

WAL doesn't carry enough information to reconstruct updates or deletes
external to the database in question. That's the barrier you need to
cross, not just piping the results somewhere.

I envision being able to do this in the future and exposing an API to
allow it to happen, but we aren't there yet.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#10Mike
mike@fonolo.com
In reply to: Jonah H. Harris (#5)
Re: intercepting WAL writes

On Wed, May 28, 2008 at 8:30 PM, Mike <mike@fonolo.com> wrote:

When you say a bit of decoding, is that because the data written to the

logs

is after the query parser/planner? Or because it's written in several
chunks? Or?

Because that's the actual recovery record. There is no SQL text, just
the WAL record type (XLOG_HEAP_INSERT, XLOG_HEAP_UPDATE,
XLOG_XACT_COMMIT, ...) and the data as it relates to that operation.

Hmm... after looking at the xlogdump and xlogview applications, I can see
the difficulty pulling this information you were talking about.

I see that I can get the OID's for the relations, and then lookup their
names from the source database; but pulling where clause and field info
seems to be a bit tougher.

Is there another place in the code, I can get access to the statements (or
statement "like" information), after a transaction commit?

Thanks for all your help,

Mike

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike (#10)
Re: intercepting WAL writes

"Mike" <mike@fonolo.com> writes:

Is there another place in the code, I can get access to the statements (or
statement "like" information), after a transaction commit?

No.

Bear in mind that what you have decided to do amounts to rolling your
own replication system. This is a Hard Problem. I would suggest
thinking in terms of adapting one of the existing replication projects
rather than doing it from scratch.

regards, tom lane

#12Mario Weilguni
mweilguni@sime.com
In reply to: Mike (#1)
Re: intercepting WAL writes

Mike schrieb:

Hello,

I�m new to the core PostgreSQL code, so pardon the question if the
answer is really obvious, and I�m just missing it, but I�ve got a
relatively large web application that uses PostgreSQL as a back-end
database, and we�re heavily using memcached to cache frequently
accessed data.

I�m looking at modifying PostgreSQL (in some way) to push changes
directly to our memcache servers, in hopes of moving towards a system
where only writes are actually sent to the databases, and reads are
exclusively sent to the memcache servers.

I�m guessing that I could intercept the WAL writes, and use this
information to push out to my memcache servers, similar to a
replication model.

Can somebody point to the most logical place in the code to intercept
the WAL writes? (just a rough direction would be enough)- or if this
doesn�t make sense at all, another suggestion on where to get the
data? (I�m trying to avoid doing it using triggers).

Thanks,

Mike

Why not use rules? They are far more easy to use than patching at C
Level, and you can simply write some functions at C level and load those
as a postgres extension, and interact with MemCache at this level.

#13Decibel!
decibel@decibel.org
In reply to: Hannu Krosing (#8)
1 attachment(s)
Re: intercepting WAL writes

On May 29, 2008, at 1:57 AM, Hannu Krosing wrote:

On Wed, 2008-05-28 at 19:11 -0400, Mike wrote:

Can somebody point to the most logical place in the code to intercept
the WAL writes? (just a rough direction would be enough)- or if this
doesn’t make sense at all, another suggestion on where to get the
data?

I don't think that intercepting (and then decoding ) WAL is very
productive. It is too low level to be of much help.

The way I'd do it would be using pgQ from SkyTools package where
change
events can be queued when happening and then moved in bulk to
memcached
with not too much effort.

Actually, you might look one step further and see about adding
memcached as a subscriber type to londiste; it might be easier than
just using PgQ... not that using PgQ would be all that hard.

Also, keep in mind that no matter what you do you'll always have a
race condition between data in the database and in memcached.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload