Extracting SQL from logs in a usable format

Started by Chris Ernstover 16 years ago17 messagesgeneral

cernst@esoft.com

over 16 years ago

Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

Thank you for your help.

Chris Ernst
eSoft, Inc.

Rory Campbell-Lange

rory@campbell-lange.net

over 16 years ago

In reply to: Chris Ernst (#1)

Re: Extracting SQL from logs in a usable format

On 18/12/09, Chris Ernst (cernst@esoft.com) wrote:

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

I'm sure there are a number of log replay systems already in existence.
Perhaps you could use Londiste, and introduce a lag to the replication
process if it is required?
http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc17

A very simple way of doing this is to log all of the SQL statements by
setting the postgresql.conf parameter log_min_duration_statement to 0.
You can then easily parse out the SQL statements from the log file.
I have done this before quite successfully when wishing to replay a long
set of SQL statements to test un upgrade of a Postgresql server.

Rory

--
Rory Campbell-Lange
Director
rory@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
0207 6311 555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

Chris Ernst

cernst@esoft.com

over 16 years ago

In reply to: Rory Campbell-Lange (#2)

Re: Extracting SQL from logs in a usable format

Rory Campbell-Lange wrote:

On 18/12/09, Chris Ernst (cernst@esoft.com) wrote:

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

I'm sure there are a number of log replay systems already in existence.
Perhaps you could use Londiste, and introduce a lag to the replication
process if it is required?
http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc17

A very simple way of doing this is to log all of the SQL statements by
setting the postgresql.conf parameter log_min_duration_statement to 0.
You can then easily parse out the SQL statements from the log file.
I have done this before quite successfully when wishing to replay a long
set of SQL statements to test un upgrade of a Postgresql server.

Hi Rory,

Thank you for the quick reply.

Londiste isn't really an option as it (apparently) would only get
INSERT, UPDATE and DELETE queries. I would want to capture every query
that is run, including SELECTs. Plus, the production master is already
running slony1, and I don't think they will play nice together.

My goal is to be able to replay a set of actual production traffic on
the staging server, starting from a snapshot at the point where the
statement logging began. Then make some changes (tweak settings,
upgrade versions, make DDL changes, etc.) and rerun the same set of
statements to analyze the results with pgFouine.

I started writing my own log parser to pull the statements from the
postgres logs, but as I get in to the details, it's not quite as
straight forward as I had thought. Keeping track of which client
connections have prepared queries, merging the correct arguments and
maintaining concurrency are all critical and far from trivial.

Basically I'm curious if anyone has already created something that does
this or am I treading into uncharted waters? I've been googling around
a bit and haven't come up with anything yet.

Thanks again,

- Chris

Greg Smith

gsmith@gregsmith.com

over 16 years ago

In reply to: Chris Ernst (#3)

Re: Extracting SQL from logs in a usable format

Chris Ernst wrote:

I started writing my own log parser to pull the statements from the
postgres logs, but as I get in to the details, it's not quite as
straight forward as I had thought. Keeping track of which client
connections have prepared queries, merging the correct arguments and
maintaining concurrency are all critical and far from trivial.

I hope you're using the CVS format logs, which should make the job a lot
easier than the standard text one.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

Filip Rembiałkowski

plk.zuber@gmail.com

over 16 years ago

In reply to: Chris Ernst (#1)

Re: Extracting SQL from logs in a usable format

Tsung [http://tsung.erlang-projects.org/] has a nice pg proxy module. It
records your "query stream" and is able to replay it later.

2009/12/18 Chris Ernst <cernst@esoft.com>

Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

Thank you for your help.

Chris Ernst
eSoft, Inc.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/

Chris Ernst

cernst@esoft.com

over 16 years ago

In reply to: Greg Smith (#4)

Re: Extracting SQL from logs in a usable format

Greg Smith wrote:

Chris Ernst wrote:

I started writing my own log parser to pull the statements from the
postgres logs, but as I get in to the details, it's not quite as
straight forward as I had thought. Keeping track of which client
connections have prepared queries, merging the correct arguments and
maintaining concurrency are all critical and far from trivial.

I hope you're using the CVS format logs, which should make the job a lot
easier than the standard text one.

HOLY COW! I hadn't even thought of that. Excellent point. Thank you,
Greg.

- Chris

Chris Ernst

cernst@esoft.com

over 16 years ago

In reply to: Filip Rembiałkowski (#5)

Re: Extracting SQL from logs in a usable format

Hmm.. That does look very interesting. The only thing that concerns me
is where it says it supports "Basic Queries (Extended queries not yet
supported)". I'm not sure what is meant by "Extended queries". Any idea?

Thank you for the pointer, Filip. I'll check it out.

- Chris

Filip Rembiałkowski wrote:

Show quoted text

Tsung [http://tsung.erlang-projects.org/] has a nice pg proxy module. It
records your "query stream" and is able to replay it later.

2009/12/18 Chris Ernst <cernst@esoft.com <mailto:cernst@esoft.com>>

Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

Thank you for your help.

Chris Ernst
eSoft, Inc.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
<mailto:filip.rembialkowski@gmail.com>
http://filip.rembialkowski.net/

Laurenz Albe

laurenz.albe@cybertec.at

over 16 years ago

In reply to: Chris Ernst (#1)

Re: Extracting SQL from logs in a usable format

Chris Ernst wrote:

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

I am currently developing such a beast, it is currently still quite alpha.
If you are interested I can send you a copy.
I'll try to publish it once it is sufficiently beta.

Yours,
Laurenz Albe

Shoaib Mir

shoaibmir@gmail.com

over 16 years ago

In reply to: Laurenz Albe (#8)

Re: Extracting SQL from logs in a usable format

On Sun, Dec 20, 2009 at 9:37 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:

Chris Ernst wrote:

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

I am not sure if its still available but there used to a nice tool for doing
the same, I guess it was named as "Playr" by myyearbook.

Regards,
--
Shoaib Mir
http://shoaibmir.wordpress.com/

#10

Jehan-Guillaume (ioguix) de Rorthais

ioguix@free.fr

over 16 years ago

In reply to: Laurenz Albe (#8)

Re: Extracting SQL from logs in a usable format

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey guys,

Albe Laurenz a ï¿½crit :

Chris Ernst wrote:

I have a project where I need to be able to capture every query
from a production system into a file such that I can "replay"
them on a staging system. Does such a thing already exist or
should I start writing my own log parser?

I have the same problem. Our goal would be to capture from a 8.1 a
representative period and replay against a 8.4 to find out every
possible issues.

I am currently developing such a beast, it is currently still quite
alpha. If you are interested I can send you a copy. I'll try to
publish it once it is sufficiently beta.

Interesting project, but but I have one big issue under 8.1 and
advanced query (prepare / bind / execute): we cannot extract values of
parameters from the logs with <8.2. So I am not able to parse /
rebuilt query from logs under 8.1.

I started something as well, based on tcpdump/tshark output (tshark
- -VT text ...). My project is in pre-alpha step, but at least I can
extract both simple queries and advanced queries w/ params. The only
known limitations with this approach are :
- cannot extract from SSL connections
- ISTM tshark only support PostgreSQL V3 protocol. So only work on

7.2 IIRC

Yours, Laurenz Albe

- --
Jehan-Guillaume (ioguix) de Rorthais
www.dalibo.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkst9/EACgkQxWGfaAgowiJPNgCgia285amuwCXX2nl4/LaNSofR
N1wAnA8kuFKnP0vzAx/PCamheD/iKmNu
=ihow
-----END PGP SIGNATURE-----

#11

Jehan-Guillaume (ioguix) de Rorthais

ioguix@free.fr

over 16 years ago

In reply to: Shoaib Mir (#9)

Re: Extracting SQL from logs in a usable format

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shoaib Mir a ï¿½crit :

On Sun, Dec 20, 2009 at 9:37 AM, Albe Laurenz
<laurenz.albe@wien.gv.at <mailto:laurenz.albe@wien.gv.at>> wrote:

Chris Ernst wrote:

I have a project where I need to be able to capture every

query from a

production system into a file such that I can "replay" them on

a staging

system. Does such a thing already exist or should I start

writing my

own log parser?

I am not sure if its still available but there used to a nice tool
for doing the same, I guess it was named as "Playr" by myyearbook.

AFAIK, it's a dead project and its home at myyearbook is unavailable.
However, you can still access their download page:
https://area51.myyearbook.com/downloads/

I did find the svn repo with some googling some time ago...

I did some quick investigation on it, it seems it doesn't work with
extended queries neither.

Regards, -- Shoaib Mir http://shoaibmir.wordpress.com/

- --
Jehan-Guillaume (ioguix) de Rorthais
www.dalibo.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkst+d8ACgkQxWGfaAgowiKA8gCcDhiCPC6pZCghVIuVePd0s3lo
GtMAoKFV5YldPH8QjdYGMRZ+Mq0Io/Dk
=+ANY
-----END PGP SIGNATURE-----

#12

hubert depesz lubaczewski

depesz@depesz.com

over 16 years ago

In reply to: Chris Ernst (#1)

Re: Extracting SQL from logs in a usable format

On Fri, Dec 18, 2009 at 02:08:07PM -0700, Chris Ernst wrote:

Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

log to CSV format.

But:
1. not always all parts of the query will be logged in query itself
(prepared statements)
2. replying queries on 2nd machine doesn't quarantee that you will get
the same data afterwards.

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

#13

Craig Ringer

craig@2ndquadrant.com

over 16 years ago

In reply to: hubert depesz lubaczewski (#12)

Re: Extracting SQL from logs in a usable format

On 20/12/2009 7:59 PM, hubert depesz lubaczewski wrote:

On Fri, Dec 18, 2009 at 02:08:07PM -0700, Chris Ernst wrote:

Hi all,

I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?

log to CSV format.

But:
1. not always all parts of the query will be logged in query itself
(prepared statements)
2. replying queries on 2nd machine doesn't quarantee that you will get
the same data afterwards.

... because of global settings (DATESTYLE etc) that may affect
interpretation of the data, and because the log order of statements
can't accurately represent concurrent execution.

With the same server settings, the same starting values for sequences
etc, no time-based function use, no non-deterministic function use (eg:
random()) and no non-deterministic interactions between concurrent
transactions, you should be able to get data that's the same when
examined at the SQL level. It might not be in the same order, though,
and it certainly won't be the same on-disk.

So ... why do you need this replay? What sorts of limitations can you
live with?

It sounds like concurrency is a concern, and that's one that will give
you pain, because the Pg logs don't record statement start _and_ end
time, nor do they record at what points along the execution timeline the
backend got a chance to do work. So it's hard to know about lock
acquisition order, among other things.

--
Craig Ringer

#14

Laurenz Albe

laurenz.albe@cybertec.at

over 16 years ago

In reply to: Chris Ernst (#1)

Re: Extracting SQL from logs in a usable format

JGuillaume (ioguix) de Rorthais wrote:

I am currently developing such a beast, it is currently still quite
alpha. If you are interested I can send you a copy. I'll try to
publish it once it is sufficiently beta.

Interesting project, but but I have one big issue under 8.1 and
advanced query (prepare / bind / execute): we cannot extract values of
parameters from the logs with <8.2. So I am not able to parse /
rebuilt query from logs under 8.1.

Hmm, that doesn't bother me. 8.1 is pretty old now, and who knows when
my program will be stable :^)

I started something as well, based on tcpdump/tshark output (tshark
- -VT text ...). My project is in pre-alpha step, but at least I can
extract both simple queries and advanced queries w/ params. The only
known limitations with this approach are :
- cannot extract from SSL connections
- ISTM tshark only support PostgreSQL V3 protocol. So only work on

7.2 IIRC

I guess each approach has some limitations.
The limitations I encountered for log parsing:
- COPY data are not logged.
- Fast Path API calls are not logged (that includes large object functions).
- Unless you have log_min_messages at DEBUG2 or better, you cannot
determine when exactly a prepared statement was parsed.

Yours,
Laurenz Albe

#15

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 16 years ago

In reply to: Chris Ernst (#7)

Re: Extracting SQL from logs in a usable format

Le 19 déc. 2009 à 16:20, Chris Ernst a écrit :

Hmm.. That does look very interesting. The only thing that concerns me
is where it says it supports "Basic Queries (Extended queries not yet
supported)". I'm not sure what is meant by "Extended queries". Any idea?

I think it refers to the Extended Query support in the frontend / backend protocol, as in the documentation:
http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results of preparatory steps can be re-used multiple times for improved efficiency. Furthermore, additional features are available, such as the possibility of supplying data values as separate parameters instead of having to insert them directly into a query string

So that's for parse/bind/execute communications, which are used in prepare/execute and queryParam I think.
--
dim

#16

Chris Ernst

cernst@esoft.com

over 16 years ago

In reply to: Dimitri Fontaine (#15)

Re: Extracting SQL from logs in a usable format

Dimitri Fontaine wrote:

Le 19 dï¿½c. 2009 ï¿½ 16:20, Chris Ernst a ï¿½crit :

Hmm.. That does look very interesting. The only thing that concerns me
is where it says it supports "Basic Queries (Extended queries not yet
supported)". I'm not sure what is meant by "Extended queries". Any idea?

I think it refers to the Extended Query support in the frontend / backend protocol, as in the documentation:
http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results of preparatory steps can be re-used multiple times for improved efficiency. Furthermore, additional features are available, such as the possibility of supplying data values as separate parameters instead of having to insert them directly into a query string

So that's for parse/bind/execute communications, which are used in prepare/execute and queryParam I think.

Ouch! You're right. And that's would be a deal killer for me. About
90% of the traffic is prepared queries that are run over and over with
different parameters.

- Chris

#17

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 16 years ago

In reply to: Chris Ernst (#16)

Re: Extracting SQL from logs in a usable format

Le 21 déc. 2009 à 15:24, Chris Ernst a écrit :

Ouch! You're right. And that's would be a deal killer for me. About
90% of the traffic is prepared queries that are run over and over with
different parameters.

The driver project and code are now there it seems:
http://frihjul.net/pgsql
http://github.com/noss/pgsql/tree

Maybe you could ask the author about supporting the extended protocol, a quick browsing tonight shows me prepare/execute support. I'm sure if improvements in the pgsql driver would translate to improvements in the tsung support of it, but it should be about it.

Regards,
--
dim