contrib/pg_stat_statements 1202

Started by ITAGAKI Takahiroabout 17 years ago44 messages

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

1 attachment(s)

Here is an update version of contrib/pg_stat_statements.

New modifications from the last version are:
1. New counters in struct Instrumentation.
2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section.
3. Buffer statistics counsters are not reset to zero anymore.

----
1. New counters in struct Instrumentation.
[in include/executor/instrument.h, backend/executor/instrument.c]

The following fields are added. They are used by pg_stat_statements
and EXPLAIN ANALYZE VERBOSE. getrusage() is called for each nodes.
Write-counters are not included because I think they are not so useful.

buffer_gets; /* # of buffer hits */
buffer_hits; /* # of buffer gets */
buffile_reads; /* # of buffile reads */
utime; /* user time in sec */
stime; /* sys time in sec */

----
2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section.
[in backend/commands/explain.c]

I borrowed the idea from Vladimir,
"Buffer pool statistics in Explain Analyze".
http://archives.postgresql.org/message-id/1d709ecc0810111624i7419b179v1789b2ca681987c0@mail.gmail.com

Here is an sample output. We'd better to add a linebreak before
the 'actual' section because the line is too wide to display.

=# EXPLAIN ANALYZE VERBOSE SELECT * FROM accounts;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on accounts (cost=0.00..2688.29 rows=101829 width=97) (actual time=0.072..119.633 rows=100000 loops=1 gets=1670 reads=1638 local_reads=0 CPU=0.06/0.03 sec)
Output: aid, bid, abalance, filler
Total runtime: 209.556 ms
(3 rows)

----
3. Buffer statistics counsters are not reset to zero anymore.
[in storage/buffer/bufmgr.c]

ResetBufferUsage() is save the current counters in global variables as
baseline and buffer statistics are measured in difference from them
because the counters are used by struct Instrumentation.

----

Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#1)

Re: contrib/pg_stat_statements 1202

2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section.

I do not get the point of "VERBOSE".
As far as I understand, explain analyze (without verbose) will anyway add
overhead for calculation of "gets/hits/cpu". Why discard that information in
"non verbose" mode? Just to make the investigation more complex?

Write-counters are not included because I think they are not so useful.

Never say never. I guess they (or just one counter for accumulated writes)
could be useful for monitoring operations that spill to the disk. For
instance, PostgreSQL does not show the amount of temp used for the join.

buffer_gets; /* # of buffer hits */
buffer_hits; /* # of buffer gets */
buffile_reads; /* # of buffile reads */

I guess it makes sense expanding "buffile reads" into "buffer file reads" or
just "file reads"

Here is an sample output. We'd better to add a linebreak before

the 'actual' section because the line is too wide to display.

I wish there was a way to get the results of explain into some table. I wish
it was the default output "format". That would make life of pgAdmin easier,
and improve readability even in psql. Do not you think there is something
wrong with having "cost=... rows=... loops=..." in each and every row?

ResetBufferUsage() is save the current counters in global variables as
baseline and buffer statistics are measured in difference from them
because the counters are used by struct Instrumentation.

That would definitely work well for Instrumentation (it will not notice
resetBufferUsage any more), however that will not isolate the guys who do
the reset. I am afraid the idea of having api for "reset" is broken and it
makes sense removing that function. However it looks like it is out of scope
of this patch.

Regards,
Vladimir Sitnikov

Gregory Stark

stark@enterprisedb.com

about 17 years ago

In reply to: Vladimir Sitnikov (#2)

Re: contrib/pg_stat_statements 1202

"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> writes:

I wish there was a way to get the results of explain into some table. I wish
it was the default output "format". That would make life of pgAdmin easier,
and improve readability even in psql. Do not you think there is something
wrong with having "cost=... rows=... loops=..." in each and every row?

A number of people have suggesting we switch to XML.

An alternative would be to build up a tuplestore of data and then send that to
the client in a separate result set. That's kind of nice because it would give
us a way to send both the real results and the explain results. And at least
we already have an api for accessing result sets.

Oracle's approach is to have the explain command stuff the results into a
table. That has advantages for tools, especially if you want to be able to
look at plans generated by other sessions. But it's pretty awkward for the
simple case.

I'm sure there are dozens of ways to skin this cat. Anyone have any more?
We probably just have to pick one and run with it.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 17 years ago

In reply to: Gregory Stark (#3)

Re: contrib/pg_stat_statements 1202

"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> writes:

I wish there was a way to get the results of explain into some table. I

wish

it was the default output "format". That would make life of pgAdmin

easier,

and improve readability even in psql. Do not you think there is something
wrong with having "cost=... rows=... loops=..." in each and every row?

A number of people have suggesting we switch to XML.

I do not see much benefit of XML:
* XML is not human-readable
* Plain old result set is even easier to process since it is the main
PostgreSQL interface at this point

The only benefit of XML I could imagine is it could provide a nicer markup
for sort/hash/etc nodes. It is not that nice to have a column "sort method"
that would be empty nearly for all the rows. At the same time it looks fine
to have a column with xml inside for any additional information execution
node wants provide (like topN-allrows sort / number of batches in hash join
or whatever)

An alternative would be to build up a tuplestore of data and then send that
to
the client in a separate result set. That's kind of nice because it would
give
us a way to send both the real results and the explain results. And at
least
we already have an api for accessing result sets.

Sounds good. As for me, current output of explain is not very easy to read:
it suits well only for "find timings for particular node" workflow only (I
mean, the source is a particular node, the result is
timings/rows/buffers/etc). However from my point of view, when it comes to
query tuning the main workflow is "find node by suspicious timings". If all
the relevant data were displayed in the same column it would be easier to
read. Consider all the row counts in the very first column.

Oracle's approach is to have the explain command stuff the results into a
table. That has advantages for tools, especially if you want to be able to
look at plans generated by other sessions.

I do not believe that workflow makes sense. I have never ever thought of it.

External table makes sense if you have several output formats (say, create a
formatting function for psql and let pgAdmin format the plan on its own)

Regards,
Vladimir Sitnikov

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Vladimir Sitnikov (#4)

Re: contrib/pg_stat_statements 1202

On Fri, 5 Dec 2008, Vladimir Sitnikov wrote:

Oracle's approach is to have the explain command stuff the results into a
table. That has advantages for tools, especially if you want to be able to
look at plans generated by other sessions.

I do not believe that workflow makes sense. I have never ever thought of it.

The main benefit is that you can track how EXPLAIN plans change over time.
Archive a snapshot of what the plans for all your common queries look like
every day, and the day one of them blows up and starts doing something
wrong you've got a *lot* more information to work with for figuring out
what happened--whether it was a minor query change, some stats that got
slowly out of whack, etc. I wouldn't just immediately dismiss that
workflow as unsensible without thinking about it a bit first, there are
some good advantages to it.

Greg Sabino Mullane had a neat blog entry on saving plans to tables in
PostgreSQL, unfortunately the Planet PostgreSQL outage seems to have eaten
it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 17 years ago

In reply to: Greg Smith (#5)

Re: contrib/pg_stat_statements 1202

The main benefit is that you can track how EXPLAIN plans change over time.

It is not required to output plan *into* some table to be able track it over
time. If EXPLAIN returns a table, it is up to you to perform "insert into
history select * from explain(...)".

Workflow that does not make sense for me is "look at plans generated _into
some plan_table_ by other sessions in Oracle".
I am 100% sure it really makes sense have some view like pg_execute_plan
that will reveal execution plans for currently running queries (see
v$sql_plan in Oracle). However, I would stress once again I have no idea
what the sense could be in "one session explained into plan_table, while the
other reads that plan".

Does that make sense?

Regards,
Vladimir Sitnikov

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#1)

1 attachment(s)

Re: contrib/pg_stat_statements 1202

On Tue, Dec 2, 2008 at 02:35, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an update version of contrib/pg_stat_statements.

Hello again!

I was assigned to review this.

Submission review:
Is the patch in standard form? Yes
Does it apply cleanly to current HEAD? Yes (with fuzz)
Does it include reasonable tests, docs, etc? Yes

Usability review:
Does the patch actually implement that? Yes
Do we want that? I think so
Do we already have it? No
Does it follow SQL spec, or the community-agreed behavior? Sure
Are there dangers? No
Have all the bases been covered? Yes

Feature test:
Does the feature work as advertised? Yes
Are there corner cases the author has failed to consider? No

Performance review
Does the patch slow down simple tests?

Does not seem to...

(test.sql)
select * from tenk1 a join tenk1 b using (unique1);

(dual core machine, --enable-debug, --enable-cassert build)
pgbench -c 2 -T60 -n -f test.sql

HEAD: tps = 9.674423
PATCH: tps = 9.695784

If it claims to improve performance, does it?
Does it slow down other things?

Coding review:
Does it follow the project coding guidelines? Yes
Are there portability issues? No
Will it work on Windows/BSD etc? Think so
Are the comments sufficient and accurate? I think so
Does it do what it says, correctly? Yes
Does it produce compiler warnings? No
Can you make it crash? No

I'm not sure about the new counters in struct Instrumentation or the
hooks (but did not see anything obviously wrong with them)... A
commiter can better comment on those. Also find attached some very
minor verbiage changes. If there is nothing else on your todo list
for this Ill mark it as Ready for commiter on the wiki.

Attachments:

patch.patchtext/x-patch; name=patch.patchDownload

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 216,222 **** error:
  }
  
  /*
!  * pgss_shutdown - Load statistics from file.
   */
  static void
  pgss_startup(void)
--- 216,222 ----
  }
  
  /*
!  * pgss_startup - Load statistics from file.
   */
  static void
  pgss_startup(void)
*** a/doc/src/sgml/pgstatstatements.sgml
--- b/doc/src/sgml/pgstatstatements.sgml
***************
*** 68,74 ****
        <entry><structfield>calls</structfield></entry>
        <entry><type>bigint</type></entry>
        <entry></entry>
!       <entry>Number of being executed</entry>
       </row>
  
       <row>
--- 68,74 ----
        <entry><structfield>calls</structfield></entry>
        <entry><type>bigint</type></entry>
        <entry></entry>
!       <entry>Number of times executed</entry>
       </row>
  
       <row>

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Alex Hunsaker (#7)

Re: contrib/pg_stat_statements 1202

"Alex Hunsaker" <badalex@gmail.com> wrote:

I was assigned to review this.

Thanks for your reviewing.
I assume that the basic concepts are ok and focus of discussion is in:
- New counters in struct Instrumentation.
(buffer usage and CPU usage)
- Should EXPLAIN ANALYZE show those counters.

Performance review
HEAD: tps = 9.674423
PATCH: tps = 9.695784

If it claims to improve performance, does it?
Does it slow down other things?

The patch should not slow down normal use if you don't use
pg_stat_statements module, but it might slow down EXPLAIN ANALYZE
because some fields are added in struct Instrumentation and
they are counted up per tuple in EXPLAIN ANALYZE.

Also find attached some very minor verbiage changes.

Thanks. I'll apply your fixes.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Vladimir Sitnikov (#2)

Re: contrib/pg_stat_statements 1202

"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> wrote:

2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section.

I do not get the point of "VERBOSE".
As far as I understand, explain analyze (without verbose) will anyway add
overhead for calculation of "gets/hits/cpu". Why discard that information in
"non verbose" mode? Just to make the investigation more complex?

I thought that output of new counters are too wide and it brakes
compatibility of EXPLAIN ANALYZE. On the other hand, we don't have to
think of compatibility in EXPLAIN ANALYZE VERBOSE because it is newly
added in 8.4. However, overheads should be avoided. We could have
two kinds of instrumentations, time-only or all-stats.

Another idea is to have buffer and cpu counters not in Instrumentation
but in QueryDesc (i.e, only one per query). A new field for the counter
will be initialized in pg_stat_statements module. EXPLAIN ANALYZE could
also use the field but the result is printed only the end of query plan,
as the follows. We can avoid too-wide-line problem with the approach.

=# EXPLAIN ANALYZE SELECT ...;
QUERY PLAN
---------------------------
...
Total runtime: 24.273 ms
CPU usage: user=20.2 sys=2.4 #
Buffer usage: gets=100 reads=10 writes=20 # added lines
Tempfile i/o: reads=50 writes=50 #
(10 rows)

I wish there was a way to get the results of explain into some table. I wish
it was the default output "format". That would make life of pgAdmin easier,
and improve readability even in psql. Do not you think there is something
wrong with having "cost=... rows=... loops=..." in each and every row?

I absolutely agree that XML-explain and TABLE-expalin are useful,
but they are independent features from pg_stat_statements. So I'd
like to discuss them in separated threads (and it will come in 8.5).

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#10

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#9)

Re: contrib/pg_stat_statements 1202

2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section.

I do not get the point of "VERBOSE".
As far as I understand, explain analyze (without verbose) will anyway add
overhead for calculation of "gets/hits/cpu". Why discard that information

in

"non verbose" mode? Just to make the investigation more complex?

I thought that output of new counters are too wide and it brakes
compatibility of EXPLAIN ANALYZE. On the other hand, we don't have to
think of compatibility in EXPLAIN ANALYZE VERBOSE because it is newly
added in 8.4. However, overheads should be avoided. We could have
two kinds of instrumentations, time-only or all-stats.

I am not sure if this impact (in terms of compatibility) is really that big.
As far as I could understand, pgAdmin parses modified explain analyze output
well. However, pgAdmin does not support "verbose".

The impact in terms of "computational" overhead could be measured. I guess
it depends only on the number of calls to the instrumentation (that is
either rows or loops). Thus, after explaining some basic select from
generate_series, one could adjust the timings. We could even incorporate
this to EXPLAIN ANALYZE, so it would display "cpu is 0.5s, while
instrumentation cpu is 0.3s".

Another idea is to have buffer and cpu counters not in Instrumentation

but in QueryDesc (i.e, only one per query). A new field for the counter
will be initialized in pg_stat_statements module. EXPLAIN ANALYZE could
also use the field but the result is printed only the end of query plan,
as the follows. We can avoid too-wide-line problem with the approach.

Single number per query is sufficient only for pg_stat_statements purposes.
That will give an insight of what the top consuming queries are (by cpu
time, by gets, etc).
However, single "gets=... reads=..." is not sufficient to pinpoint the
problem especially in case of complex query (that is comparable to "query
returned N rows" vs "this plan node returned N rows") .

=# EXPLAIN ANALYZE SELECT ...;
QUERY PLAN
---------------------------
...
Total runtime: 24.273 ms
CPU usage: user=20.2 sys=2.4 #
Buffer usage: gets=100 reads=10 writes=20 # added lines
Tempfile i/o: reads=50 writes=50 #
(10 rows)

I wish pgAdmin (or whatever client) had an option to fetch that counters for
each and every SQL query and display the consumed resources at a separate
tab. I mean, even before/after plain "select" (without any explain). That
will show you how the query would behave without any instrumentation.

Regards,
Vladimir Sitnikov

#11

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Alex Hunsaker (#7)

Re: contrib/pg_stat_statements 1202

On Sun, 7 Dec 2008, Alex Hunsaker wrote:

(dual core machine, --enable-debug, --enable-cassert build)
pgbench -c 2 -T60 -n -f test.sql

HEAD: tps = 9.674423
PATCH: tps = 9.695784

Two general suggestions here, not specific to this patch:

While it's good to do most testing with debug and cassert turned on, you
shouldn't report performance results with those two flags enabled. What
if the patch has some large amount of overhead that only shows up when
compiled with debug or asserts on? You'd end up reporting a performance
loss that doesn't actually exist in a real build. Unfortunately, the only
way to get good performance results is to have a parallel build done with
those off, in addition to the debug/assert one used to catch bugs.

The above pgbench is executing less than 600 actual tests (60 seconds @
9.7TPS). That seems a bit short to me. If you sorted out the above and
run this again, it would be good to let pgbench run for a lot longer than
1 minute, to see if the results show some more significant difference.
With this few TPS, it would be nice to let that run for 30 minutes or more
if you can find some time to schedule that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#12

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: ITAGAKI Takahiro (#9)

Re: contrib/pg_stat_statements 1202

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:

I thought that output of new counters are too wide and it brakes
compatibility of EXPLAIN ANALYZE. On the other hand, we don't have to
think of compatibility in EXPLAIN ANALYZE VERBOSE because it is newly
added in 8.4. However, overheads should be avoided. We could have
two kinds of instrumentations, time-only or all-stats.

I've got a serious problem with the way that this patch is being
presented. It's being named and described as though it's "just another
contrib module", but in fact it makes invasive, undocumented changes to
the behavior of the core EXPLAIN functionality --- changes that
certainly cannot be claimed to having been agreed to by the community,
since most of us probably weren't aware that there was any such thing
going on inside this patch.

Please split this into two separate patches that can be separately
evaluated.

regards, tom lane

#13

Robert Haas

robertmhaas@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#9)

Re: contrib/pg_stat_statements 1202

On the other hand, we don't have to
think of compatibility in EXPLAIN ANALYZE VERBOSE because it is newly
added in 8.4.

Uh, it exists for me in 8.2.9.

Welcome to psql 8.2.9, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

portal=# explain analyze verbose select 1;
QUERY PLAN

--------------------------------------------------------------------------------
----
{RESULT
:startup_cost 0.00
:total_cost 0.01
:plan_rows 1
:plan_width 0
:targetlist (
{TARGETENTRY
:expr
{CONST
:consttype 23
:constlen 4
:constbyval true
:constisnull false
:constvalue 4 [ 1 0 0 0 ]
}
:resno 1
:resname ?column?
:ressortgroupref 0
:resorigtbl 0
:resorigcol 0
:resjunk false
}
)
:qual <>
:lefttree <>
:righttree <>
:initPlan <>
:extParam (b)
:allParam (b)
:nParamExec 0
:resconstantqual <>
}

Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.010..0.012 rows=1 loops
=1)
Total runtime: 0.244 ms
(35 rows)

...Robert

#14

Gregory Stark

stark@enterprisedb.com

about 17 years ago

In reply to: Robert Haas (#13)

Re: contrib/pg_stat_statements 1202

"Robert Haas" <robertmhaas@gmail.com> writes:

On the other hand, we don't have to
think of compatibility in EXPLAIN ANALYZE VERBOSE because it is newly
added in 8.4.

Uh, it exists for me in 8.2.9.

The current behaviour is newly added in 8.4. In 8.2 it meant something
completely different and quite useless for end-users in any case, so backwards
compatibility isn't important.

What strikes me as a convenient approach is basically using EXPLAIN VERBOSE as
a playground where we feel free to add everything we think of. If people run a
command marked "VERBOSE" and complain it prints too much...

As stuff matures and becomes indispensable we could consider moving it to the
regular EXPLAIN or implement some way to specify precisely which data the user
wants. Or just say XML/table data/whatever will solve the problem for us.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#15

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Robert Haas (#13)

Re: contrib/pg_stat_statements 1202

"Robert Haas" <robertmhaas@gmail.com> writes:

On the other hand, we don't have to
think of compatibility in EXPLAIN ANALYZE VERBOSE because it is newly
added in 8.4.

Uh, it exists for me in 8.2.9.

EXPLAIN VERBOSE has existed at least back to 7.0, probably further.
However, we've felt free to whack around what it outputs, so maybe
the backwards-compatibility issue isn't very strong.

A possibly stronger complaint is that ANALYZE and VERBOSE have always
been orthogonal options to EXPLAIN, and now there'd be some interaction
between them.

regards, tom lane

#16

Robert Haas

robertmhaas@gmail.com

about 17 years ago

In reply to: Gregory Stark (#14)

Re: contrib/pg_stat_statements 1202

As stuff matures and becomes indispensable we could consider moving it to the
regular EXPLAIN or implement some way to specify precisely which data the user
wants. Or just say XML/table data/whatever will solve the problem for us.

I think some way to specify precisely which data the user wants is the
way to go. The amount of data that there is to be printed is only
going to continue to increase. If the only toggle is a boolean flag
to display ALL or NONE of it, then every time someone proposes a new
type of output, we're going to argue about whether it's useful enough
to be worth the display real estate.

I'm not sure what the best way is though. I don't think continuing to
add keywords between EXPLAIN and the start of the query is very
scalable. Putting parentheses around the option list seems like it
might eliminate a lot of grammar headaches:

EXPLAIN (option, option, option...) SELECT ...

...Robert

#17

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 17 years ago

In reply to: Robert Haas (#16)

Re: contrib/pg_stat_statements 1202

I'm not sure what the best way is though. I don't think continuing to
add keywords between EXPLAIN and the start of the query is very
scalable. Putting parentheses around the option list seems like it
might eliminate a lot of grammar headaches:

Do you think it is required to invent special grammar just for presentation
purposes?

I guess database should not deal with presentation. Provided "explain"
retuns table, it is up to the client to do the formatting. I do not believe
it makes sense creating several different explain outputs, and redo all the
work in 8.5.

It still could make sense having several options for "explain" if that would
result in *different instrumentation *(e.g. "explain" vs "explain analyze").

Regards,
Vladimir Sitnikov

#18

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#8)

Re: contrib/pg_stat_statements 1202

On Mon, Dec 8, 2008 at 23:28, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

"Alex Hunsaker" <badalex@gmail.com> wrote:

I was assigned to review this.

Thanks for your reviewing.

I assume that the basic concepts are ok and focus of discussion is in:
- New counters in struct Instrumentation.
(buffer usage and CPU usage)
- Should EXPLAIN ANALYZE show those counters.

Right, I would split out your next patch in 3 parts: the hooks you
need, contrib module and the new counters. I think I saw older
versions of the patch that did this... just got lost for this version?

Performance review
HEAD: tps = 9.674423
PATCH: tps = 9.695784

If it claims to improve performance, does it?
Does it slow down other things?

The patch should not slow down normal use if you don't use
pg_stat_statements module, but it might slow down EXPLAIN ANALYZE
because some fields are added in struct Instrumentation and
they are counted up per tuple in EXPLAIN ANALYZE.

Err yes sorry I was just following
http://wiki.postgresql.org/wiki/Reviewing_a_Patch, those two did not
seem pertainant so I did not answer them.

#19

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: Greg Smith (#11)

Re: contrib/pg_stat_statements 1202

On Tue, Dec 9, 2008 at 01:20, Greg Smith <gsmith@gregsmith.com> wrote:

On Sun, 7 Dec 2008, Alex Hunsaker wrote:

(dual core machine, --enable-debug, --enable-cassert build)
pgbench -c 2 -T60 -n -f test.sql

HEAD: tps = 9.674423
PATCH: tps = 9.695784

Two general suggestions here, not specific to this patch:

While it's good to do most testing with debug and cassert turned on, you
shouldn't report performance results with those two flags enabled. What if
the patch has some large amount of overhead that only shows up when compiled
with debug or asserts on? You'd end up reporting a performance loss that
doesn't actually exist in a real build. Unfortunately, the only way to get
good performance results is to have a parallel build done with those off, in
addition to the debug/assert one used to catch bugs.

Right, which is part of the reason I noted it was a cassert build.

The above pgbench is executing less than 600 actual tests (60 seconds @
9.7TPS). That seems a bit short to me. If you sorted out the above and run
this again, it would be good to let pgbench run for a lot longer than 1
minute, to see if the results show some more significant difference. With
this few TPS, it would be nice to let that run for 30 minutes or more if you
can find some time to schedule that.

Ok thats useful to know as well, thanks! (ill go re-run them)

Show quoted text

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#20

Greg Stark

greg.stark@enterprisedb.com

about 17 years ago

In reply to: Robert Haas (#16)

Re: contrib/pg_stat_statements 1202

Yes this is one reasonable option, as is the idea of using XML or a
table and making it the client's problem. Neither are going to happen
for this release I think.

And in any case it will always be useful to have an option to print
all the available information anyways so we make as well do that with
"verbose".

--
Greg

On 9 Dec 2008, at 16:35, "Robert Haas" <robertmhaas@gmail.com> wrote:

Show quoted text

As stuff matures and becomes indispensable we could consider moving
it to the
regular EXPLAIN or implement some way to specify precisely which
data the user
wants. Or just say XML/table data/whatever will solve the problem
for us.

I think some way to specify precisely which data the user wants is the
way to go. The amount of data that there is to be printed is only
going to continue to increase. If the only toggle is a boolean flag
to display ALL or NONE of it, then every time someone proposes a new
type of output, we're going to argue about whether it's useful enough
to be worth the display real estate.

I'm not sure what the best way is though. I don't think continuing to
add keywords between EXPLAIN and the start of the query is very
scalable. Putting parentheses around the option list seems like it
might eliminate a lot of grammar headaches:

EXPLAIN (option, option, option...) SELECT ...

...Robert

#21

Robert Haas

robertmhaas@gmail.com

about 17 years ago

In reply to: Greg Stark (#20)

Re: contrib/pg_stat_statements 1202

On Tue, Dec 9, 2008 at 12:44 PM, Greg Stark <greg.stark@enterprisedb.com> wrote:

Yes this is one reasonable option, as is the idea of using XML or a table
and making it the client's problem. Neither are going to happen for this
release I think.

Agreed.

And in any case it will always be useful to have an option to print all the
available information anyways so we make as well do that with "verbose".

Sounds very nice.

...Robert

#22

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 17 years ago

In reply to: Robert Haas (#21)

Re: contrib/pg_stat_statements 1202

On Tue, Dec 9, 2008 at 8:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 9, 2008 at 12:44 PM, Greg Stark <greg.stark@enterprisedb.com>
wrote:

Yes this is one reasonable option, as is the idea of using XML or a table
and making it the client's problem. Neither are going to happen for this
release I think.

Agreed.

I 100% agree with that point. Thus I suggest output additional information
into "explain analyze" since:
1) it will require minimal code change
2) it will be consistent with previous behaviour
3) looks like a natural EXPLAIN's feature improvement
4) will be anyway changed when "table" for explain will come

And in any case it will always be useful to have an option to print all

the

available information anyways so we make as well do that with "verbose".

Sounds very nice.

Can I ask my question once again?
Why don't you want to make "print all the info" the default output format?
As long as it comes to "pgsql-performance", they used to recommend: "please,
provide EXPLAIN ANALYZE, and not just EXPLAIN".
If the default output format is not changed in 8.4, this will transform into
"please, provide EXPLAIN ANALYZE VERBOSE, not just EXPLAIN ANALYZE or
EXPLAIN". Do you really want that?

Regards,
Vladimir Sitnikov

#23

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: Alex Hunsaker (#7)

Re: contrib/pg_stat_statements 1202

On Sun, Dec 7, 2008 at 19:13, Alex Hunsaker <badalex@gmail.com> wrote:

On Tue, Dec 2, 2008 at 02:35, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an update version of contrib/pg_stat_statements.

Hello again!

I was assigned to review this.

... Some other things I accidentally left out.

#define GUCNAME(name) ("statistics." name)

Why statistics?
Would not something like stat_statements make more sense? Statistics
seems fairly arbitrary...

Also per the
/* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */

Maybe it should be configurable, personally I would want something
like # of calls / time. Mainly because I don't for instance really
care that my backups get tracked but would be more interested in the
things that get called most often that also take the longest. (aka
the most bang for the buck, as far as optimizing those goes...)

#24

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Tom Lane (#12)

Re: contrib/pg_stat_statements 1202

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Please split this into two separate patches that can be separately
evaluated.

Sure. I want to disucuss only where to add counters of buffer usage
and cpu usage, or they should not be added. However, it seems to
affect future of EXPLAIN ANALYZE, so we might also need to discuss
about EXPLAIN.

I assume we have 3 choices here:

1. Add those counters to struct Instrument.
We can get statistics for each line in EXPLAIN ANALYZE,
but it might have overhead to update counters.

2. Add those counters only to top instruments (one per query).
We can get accumulated statistics for each query.
It might be unsufficient for complex queries.

3. Should not add any counters.
No changes to core, but usability of pg_stat_statement module
would be very poor...

Which should we take? or are there another idea?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#25

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Alex Hunsaker (#23)

Re: contrib/pg_stat_statements 1202

"Alex Hunsaker" <badalex@gmail.com> wrote:

#define GUCNAME(name) ("statistics." name)

Why statistics?
Would not something like stat_statements make more sense? Statistics
seems fairly arbitrary...

Not to use duplicated "statements" words;
variable names contains "statements" already.
- stat_statements.max_statements
- stat_statements.track_statements
seem to be ugly for me, but avoiding arbitrariness might be more
important. If there are agreements, I will to change the prefix.

Also per the
/* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */

Maybe it should be configurable, personally I would want something
like # of calls / time. Mainly because I don't for instance really
care that my backups get tracked but would be more interested in the
things that get called most often that also take the longest. (aka
the most bang for the buck, as far as optimizing those goes...)

Configurability is better, but we need documentations of how to
configure them and I have no clear idea for it. Also, we already have
means for logging slow queries. We could use the logging for slow
queries executed rarely and use the module queries executed many times.

Excluding backup scripts is easy; You can create a database role for
backup and disable statement-tracking for the user using ALTER ROLE.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#26

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Vladimir Sitnikov (#22)

1 attachment(s)

contrib/pg_stat_statements 1212

Here is an updated version of pg_stat_statements.

[Changes]
- A new GUC variable 'explain_analyze_format' is added.
- Statistics counters are converted into a variable 'BufferStats'.

"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> wrote:

Can I ask my question once again?
Why don't you want to make "print all the info" the default output format?

I added a new GUC variable 'explain_analyze_format' to print all
statistics counters in EXPLAIN ANALYZE. We can set special variables
into it to get non-default counters. I think everyone don't always
require all the info. Of course TABLE-explain and XML-explain could
handle the counters better, but this patch should not include them.

A variable 'BufferStats' is for cleanup calculations of many separated
counters. It is exported (marked as PGDLLIMPORT) so that users can add
a user-defined view like pg_stat_session_buffer_usage if needed.

Comments welcome.

[Sample output of explain_analyze_format]
=# SET explain_analyze_format = ' gets=%g reads=%r temp=%t cpu=%p';
=# EXPLAIN ANALYZE SELECT * FROM accounts;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on accounts (cost=0.00..2640.00 rows=100000 width=97) (actual time=0.054..117.046 rows=100000 loops=1 gets=1640 reads=1608 temp=0 cpu=125.00)
Total runtime: 208.167 ms
(2 rows)

The following special variables are available:
%g : # of shared buffer gets (= %h + %r)
%h : # of shared buffer hits
%r : # of shared buffer reads
%w : # of shared buffer writes
%G : # of local buffer gets (= %H + %R)
%H : # of local buffer hits
%R : # of local buffer reads
%W : # of local buffer writes
%t : # of buffile reads
%T : # of buffile writes
%u : user cpu time
%s : sys cpu time
%p : total cpu times (= %u + %s)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#27

Euler Taveira de Oliveira

euler@timbira.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#26)

Re: contrib/pg_stat_statements 1212

ITAGAKI Takahiro escreveu:

- A new GUC variable 'explain_analyze_format' is added.

I'm afraid that this variable name doesn't tell what it means. What about
'explain_analyze_stats_format' or even 'explain_stats_format'?

--
Euler Taveira de Oliveira
http://www.timbira.com/

#28

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#26)

3 attachment(s)

Re: contrib/pg_stat_statements 1212

On Fri, Dec 12, 2008 at 02:51, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an updated version of pg_stat_statements.

I took the liberty of splitting this out into 3 patches for easier review:

patch: 001-hooks.patch
-adds 2 new hooks (startup_hook and shutdown_hook).
-fixes missing process_shared_preload_libraries on EXEC_BACKEND
-changes int pgstat_track_activity_query_size to PGDLLIMPORT int
pgstat_track_activity_query_size
-addes sourceText to QueyDesc and modifies callers to populate it

patch: 002-explain-fmt.patch
-adds new guc explain_analyze_format to print out statistic counters
-adds rtime/utime to explain analyze printout
-moves statistic counters (ReadBufferCount etc...) into struct
Instrumentation via a new struct BufferCounter (allows modules like
the proposed easy access)

patch: pg_stat_statements-1212.patch
the contrib module

A few comments:

Is there a reason you add sourceText to QueryDesc? AFAICT you can do
ActivePortal->sourceText and it will always be populated correctly.

I think the explain_analyze_format guc is a clever way of getting
around the explain analyze verbose you proposed earlier. But I dont
see any doc updates for it.

Im still not overly fond of the "statistics." custom guc name, but
what can you do...

Other than that it looks good, though I admit I have not had the time
to sit down and thoroughly test it yet...

Attachments:

001-hooks.patchtext/x-patch; name=001-hooks.patchDownload

diff -cprN HEAD/src/backend/bootstrap/bootstrap.c pg_stat_statements/src/backend/bootstrap/bootstrap.c
*** HEAD/src/backend/bootstrap/bootstrap.c	Fri Nov 14 21:39:34 2008
--- pg_stat_statements/src/backend/bootstrap/bootstrap.c	Fri Dec 12 18:26:03 2008
*************** typedef struct _IndexList
*** 193,198 ****
--- 193,199 ----
  
  static IndexList *ILHead = NULL;
  
+ startup_hook_type	startup_hook = NULL;
  
  /*
   *	 AuxiliaryProcessMain
*************** AuxiliaryProcessMain(int argc, char *arg
*** 419,424 ****
--- 420,427 ----
  			bootstrap_signals();
  			StartupXLOG();
  			BuildFlatFiles(false);
+ 			if (startup_hook)
+ 				startup_hook();
  			proc_exit(0);		/* startup done */
  
  		case BgWriterProcess:
diff -cprN HEAD/src/backend/commands/copy.c pg_stat_statements/src/backend/commands/copy.c
*** HEAD/src/backend/commands/copy.c	Fri Dec  5 00:11:18 2008
--- pg_stat_statements/src/backend/commands/copy.c	Fri Dec 12 18:26:03 2008
*************** DoCopy(const CopyStmt *stmt, const char 
*** 1056,1062 ****
  		/* Create a QueryDesc requesting no output */
  		cstate->queryDesc = CreateQueryDesc(plan, GetActiveSnapshot(),
  											InvalidSnapshot,
! 											dest, NULL, false);
  
  		/*
  		 * Call ExecutorStart to prepare the plan for execution.
--- 1056,1062 ----
  		/* Create a QueryDesc requesting no output */
  		cstate->queryDesc = CreateQueryDesc(plan, GetActiveSnapshot(),
  											InvalidSnapshot,
! 											dest, NULL, false, queryString);
  
  		/*
  		 * Call ExecutorStart to prepare the plan for execution.
diff -cprN HEAD/src/backend/executor/functions.c pg_stat_statements/src/backend/executor/functions.c
*** HEAD/src/backend/executor/functions.c	Fri Dec  5 00:11:18 2008
--- pg_stat_statements/src/backend/executor/functions.c	Fri Dec 12 18:26:03 2008
*************** postquel_start(execution_state *es, SQLF
*** 413,424 ****
  		es->qd = CreateQueryDesc((PlannedStmt *) es->stmt,
  								 snapshot, InvalidSnapshot,
  								 dest,
! 								 fcache->paramLI, false);
  	else
  		es->qd = CreateUtilityQueryDesc(es->stmt,
  										snapshot,
  										dest,
! 										fcache->paramLI);
  
  	/* We assume we don't need to set up ActiveSnapshot for ExecutorStart */
  
--- 413,424 ----
  		es->qd = CreateQueryDesc((PlannedStmt *) es->stmt,
  								 snapshot, InvalidSnapshot,
  								 dest,
! 								 fcache->paramLI, false, fcache->src);
  	else
  		es->qd = CreateUtilityQueryDesc(es->stmt,
  										snapshot,
  										dest,
! 										fcache->paramLI, fcache->src);
  
  	/* We assume we don't need to set up ActiveSnapshot for ExecutorStart */
  
diff -cprN HEAD/src/backend/postmaster/bgwriter.c pg_stat_statements/src/backend/postmaster/bgwriter.c
*** HEAD/src/backend/postmaster/bgwriter.c	Thu Nov 27 23:42:00 2008
--- pg_stat_statements/src/backend/postmaster/bgwriter.c	Fri Dec 12 18:26:03 2008
*************** static BgWriterShmemStruct *BgWriterShme
*** 141,146 ****
--- 141,148 ----
  /* interval for calling AbsorbFsyncRequests in CheckpointWriteDelay */
  #define WRITES_PER_ABSORB		1000
  
+ shutdown_hook_type	shutdown_hook = NULL;
+ 
  /*
   * GUC parameters
   */
*************** BackgroundWriterMain(void)
*** 396,401 ****
--- 398,405 ----
  			 */
  			ExitOnAnyError = true;
  			/* Close down the database */
+ 			if (shutdown_hook)
+ 				shutdown_hook();
  			ShutdownXLOG(0, 0);
  			/* Normal exit from the bgwriter is here */
  			proc_exit(0);		/* done */
diff -cprN HEAD/src/backend/executor/spi.c pg_stat_statements/src/backend/executor/spi.c
*** HEAD/src/backend/executor/spi.c	Fri Dec  5 00:11:18 2008
--- pg_stat_statements/src/backend/executor/spi.c	Fri Dec 12 18:26:03 2008
*************** _SPI_execute_plan(SPIPlanPtr plan, Param
*** 1794,1800 ****
  				qdesc = CreateQueryDesc((PlannedStmt *) stmt,
  										snap, crosscheck_snapshot,
  										dest,
! 										paramLI, false);
  				res = _SPI_pquery(qdesc, fire_triggers,
  								  canSetTag ? tcount : 0);
  				FreeQueryDesc(qdesc);
--- 1794,1801 ----
  				qdesc = CreateQueryDesc((PlannedStmt *) stmt,
  										snap, crosscheck_snapshot,
  										dest,
! 										paramLI, false,
! 										plansource->query_string);
  				res = _SPI_pquery(qdesc, fire_triggers,
  								  canSetTag ? tcount : 0);
  				FreeQueryDesc(qdesc);
diff -cprN HEAD/src/backend/storage/lmgr/proc.c pg_stat_statements/src/backend/storage/lmgr/proc.c
*** HEAD/src/backend/storage/lmgr/proc.c	Wed Dec 10 08:05:02 2008
--- pg_stat_statements/src/backend/storage/lmgr/proc.c	Fri Dec 12 18:26:03 2008
*************** InitAuxiliaryProcess(void)
*** 381,386 ****
--- 381,390 ----
  	if (MyProc != NULL)
  		elog(ERROR, "you already exist");
  
+ #ifdef EXEC_BACKEND
+ 	process_shared_preload_libraries();
+ #endif
+ 
  	/*
  	 * We use the ProcStructLock to protect assignment and releasing of
  	 * AuxiliaryProcs entries.
diff -cprN HEAD/src/backend/tcop/pquery.c pg_stat_statements/src/backend/tcop/pquery.c
*** HEAD/src/backend/tcop/pquery.c	Fri Dec  5 00:11:18 2008
--- pg_stat_statements/src/backend/tcop/pquery.c	Fri Dec 12 18:26:03 2008
*************** Portal		ActivePortal = NULL;
*** 38,44 ****
  static void ProcessQuery(PlannedStmt *plan,
  			 ParamListInfo params,
  			 DestReceiver *dest,
! 			 char *completionTag);
  static void FillPortalStore(Portal portal, bool isTopLevel);
  static uint32 RunFromStore(Portal portal, ScanDirection direction, long count,
  			 DestReceiver *dest);
--- 38,45 ----
  static void ProcessQuery(PlannedStmt *plan,
  			 ParamListInfo params,
  			 DestReceiver *dest,
! 			 char *completionTag,
! 			 const char *sourceText);
  static void FillPortalStore(Portal portal, bool isTopLevel);
  static uint32 RunFromStore(Portal portal, ScanDirection direction, long count,
  			 DestReceiver *dest);
*************** CreateQueryDesc(PlannedStmt *plannedstmt
*** 65,71 ****
  				Snapshot crosscheck_snapshot,
  				DestReceiver *dest,
  				ParamListInfo params,
! 				bool doInstrument)
  {
  	QueryDesc  *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
  
--- 66,73 ----
  				Snapshot crosscheck_snapshot,
  				DestReceiver *dest,
  				ParamListInfo params,
! 				bool doInstrument,
! 				const char *sourceText)
  {
  	QueryDesc  *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
  
*************** CreateQueryDesc(PlannedStmt *plannedstmt
*** 78,83 ****
--- 80,86 ----
  	qd->dest = dest;			/* output dest */
  	qd->params = params;		/* parameter values passed into query */
  	qd->doInstrument = doInstrument;	/* instrumentation wanted? */
+ 	qd->sourceText = sourceText;
  
  	/* null these fields until set by ExecutorStart */
  	qd->tupDesc = NULL;
*************** QueryDesc *
*** 95,101 ****
  CreateUtilityQueryDesc(Node *utilitystmt,
  					   Snapshot snapshot,
  					   DestReceiver *dest,
! 					   ParamListInfo params)
  {
  	QueryDesc  *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
  
--- 98,105 ----
  CreateUtilityQueryDesc(Node *utilitystmt,
  					   Snapshot snapshot,
  					   DestReceiver *dest,
! 					   ParamListInfo params,
! 					   const char *sourceText)
  {
  	QueryDesc  *qd = (QueryDesc *) palloc(sizeof(QueryDesc));
  
*************** CreateUtilityQueryDesc(Node *utilitystmt
*** 107,112 ****
--- 111,117 ----
  	qd->dest = dest;			/* output dest */
  	qd->params = params;		/* parameter values passed into query */
  	qd->doInstrument = false;	/* uninteresting for utilities */
+ 	qd->sourceText = sourceText;
  
  	/* null these fields until set by ExecutorStart */
  	qd->tupDesc = NULL;
*************** static void
*** 155,161 ****
  ProcessQuery(PlannedStmt *plan,
  			 ParamListInfo params,
  			 DestReceiver *dest,
! 			 char *completionTag)
  {
  	QueryDesc  *queryDesc;
  
--- 160,167 ----
  ProcessQuery(PlannedStmt *plan,
  			 ParamListInfo params,
  			 DestReceiver *dest,
! 			 char *completionTag,
! 			 const char *sourceText)
  {
  	QueryDesc  *queryDesc;
  
*************** ProcessQuery(PlannedStmt *plan,
*** 171,177 ****
  	 */
  	queryDesc = CreateQueryDesc(plan,
  								GetActiveSnapshot(), InvalidSnapshot,
! 								dest, params, false);
  
  	/*
  	 * Set up to collect AFTER triggers
--- 177,183 ----
  	 */
  	queryDesc = CreateQueryDesc(plan,
  								GetActiveSnapshot(), InvalidSnapshot,
! 								dest, params, false, sourceText);
  
  	/*
  	 * Set up to collect AFTER triggers
*************** PortalStart(Portal portal, ParamListInfo
*** 507,513 ****
  											InvalidSnapshot,
  											None_Receiver,
  											params,
! 											false);
  
  				/*
  				 * We do *not* call AfterTriggerBeginQuery() here.	We assume
--- 513,520 ----
  											InvalidSnapshot,
  											None_Receiver,
  											params,
! 											false,
! 											portal->sourceText);
  
  				/*
  				 * We do *not* call AfterTriggerBeginQuery() here.	We assume
*************** PortalRunMulti(Portal portal, bool isTop
*** 1259,1272 ****
  				/* statement can set tag string */
  				ProcessQuery(pstmt,
  							 portal->portalParams,
! 							 dest, completionTag);
  			}
  			else
  			{
  				/* stmt added by rewrite cannot set tag */
  				ProcessQuery(pstmt,
  							 portal->portalParams,
! 							 altdest, NULL);
  			}
  
  			if (log_executor_stats)
--- 1266,1279 ----
  				/* statement can set tag string */
  				ProcessQuery(pstmt,
  							 portal->portalParams,
! 							 dest, completionTag, portal->sourceText);
  			}
  			else
  			{
  				/* stmt added by rewrite cannot set tag */
  				ProcessQuery(pstmt,
  							 portal->portalParams,
! 							 altdest, NULL, portal->sourceText);
  			}
  
  			if (log_executor_stats)
diff -cprN HEAD/src/include/storage/ipc.h pg_stat_statements/src/include/storage/ipc.h
*** HEAD/src/include/storage/ipc.h	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/include/storage/ipc.h	Fri Dec 12 18:26:03 2008
***************
*** 20,25 ****
--- 20,30 ----
  
  typedef void (*pg_on_exit_callback) (int code, Datum arg);
  
+ typedef void (*startup_hook_type) (void);
+ extern PGDLLIMPORT startup_hook_type startup_hook;
+ typedef void (*shutdown_hook_type) (void);
+ extern PGDLLIMPORT shutdown_hook_type shutdown_hook;
+ 
  /*----------
   * API for handling cleanup that must occur during either ereport(ERROR)
   * or ereport(FATAL) exits from a block of code.  (Typical examples are
diff -cprN HEAD/src/backend/commands/explain.c pg_stat_statements/src/backend/commands/explain.c
*** HEAD/src/backend/commands/explain.c	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/src/backend/commands/explain.c	Fri Dec 12 18:26:03 2008
*************** ExplainOneQuery(Query *query, ExplainStm
*** 172,178 ****
  		plan = pg_plan_query(query, 0, params);
  
  		/* run it (if needed) and produce output */
! 		ExplainOnePlan(plan, params, stmt, tstate);
  	}
  }
  
--- 176,182 ----
  		plan = pg_plan_query(query, 0, params);
  
  		/* run it (if needed) and produce output */
! 		ExplainOnePlan(plan, params, stmt, tstate, queryString);
  	}
  }
  
*************** ExplainOneUtility(Node *utilityStmt, Exp
*** 219,225 ****
   */
  void
  ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params,
! 			   ExplainStmt *stmt, TupOutputState *tstate)
  {
  	QueryDesc  *queryDesc;
  	instr_time	starttime;
--- 223,230 ----
   */
  void
  ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params,
! 			   ExplainStmt *stmt, TupOutputState *tstate,
! 			   const char *queryString)
  {
  	QueryDesc  *queryDesc;
  	instr_time	starttime;
*************** ExplainOnePlan(PlannedStmt *plannedstmt,
*** 237,243 ****
  	queryDesc = CreateQueryDesc(plannedstmt,
  								GetActiveSnapshot(), InvalidSnapshot,
  								None_Receiver, params,
! 								stmt->analyze);
  
  	INSTR_TIME_SET_CURRENT(starttime);
  
--- 242,248 ----
  	queryDesc = CreateQueryDesc(plannedstmt,
  								GetActiveSnapshot(), InvalidSnapshot,
  								None_Receiver, params,
! 								stmt->analyze, queryString);
  
  	INSTR_TIME_SET_CURRENT(starttime);

diff -cprN HEAD/src/backend/commands/prepare.c pg_stat_statements/src/backend/commands/prepare.c
*** HEAD/src/backend/commands/prepare.c	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/backend/commands/prepare.c	Fri Dec 12 18:26:03 2008
*************** ExplainExecuteQuery(ExecuteStmt *execstm
*** 697,703 ****
  				pstmt->intoClause = execstmt->into;
  			}
  
! 			ExplainOnePlan(pstmt, paramLI, stmt, tstate);
  		}
  		else
  		{
--- 697,703 ----
  				pstmt->intoClause = execstmt->into;
  			}
  
! 			ExplainOnePlan(pstmt, paramLI, stmt, tstate, queryString);
  		}
  		else
  		{
diff -cprN HEAD/src/include/commands/explain.h pg_stat_statements/src/include/commands/explain.h
*** HEAD/src/include/commands/explain.h	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/src/include/commands/explain.h	Fri Dec 12 18:26:03 2008
*************** extern void ExplainOneUtility(Node *util
*** 39,45 ****
  				  TupOutputState *tstate);
  
  extern void ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params,
! 			   ExplainStmt *stmt, TupOutputState *tstate);
  
  extern void ExplainPrintPlan(StringInfo str, QueryDesc *queryDesc,
  							 bool analyze, bool verbose);
--- 39,46 ----
  				  TupOutputState *tstate);
  
  extern void ExplainOnePlan(PlannedStmt *plannedstmt, ParamListInfo params,
! 			   ExplainStmt *stmt, TupOutputState *tstate,
! 			   const char *queryString);
  
  extern void ExplainPrintPlan(StringInfo str, QueryDesc *queryDesc,
  							 bool analyze, bool verbose);
diff -cprN HEAD/src/include/executor/execdesc.h pg_stat_statements/src/include/executor/execdesc.h
*** HEAD/src/include/executor/execdesc.h	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/src/include/executor/execdesc.h	Fri Dec 12 18:26:03 2008
*************** typedef struct QueryDesc
*** 42,47 ****
--- 42,48 ----
  	DestReceiver *dest;			/* the destination for tuple output */
  	ParamListInfo params;		/* param values being passed in */
  	bool		doInstrument;	/* TRUE requests runtime instrumentation */
+ 	const char *sourceText;		/* Source text for the query */
  
  	/* These fields are set by ExecutorStart */
  	TupleDesc	tupDesc;		/* descriptor for result tuples */
*************** extern QueryDesc *CreateQueryDesc(Planne
*** 58,69 ****
  				Snapshot crosscheck_snapshot,
  				DestReceiver *dest,
  				ParamListInfo params,
! 				bool doInstrument);
  
  extern QueryDesc *CreateUtilityQueryDesc(Node *utilitystmt,
  					   Snapshot snapshot,
  					   DestReceiver *dest,
! 					   ParamListInfo params);
  
  extern void FreeQueryDesc(QueryDesc *qdesc);
  
--- 59,72 ----
  				Snapshot crosscheck_snapshot,
  				DestReceiver *dest,
  				ParamListInfo params,
! 				bool doInstrument,
! 				const char *sourceText);
  
  extern QueryDesc *CreateUtilityQueryDesc(Node *utilitystmt,
  					   Snapshot snapshot,
  					   DestReceiver *dest,
! 					   ParamListInfo params,
! 					   const char *sourceText);
  
  extern void FreeQueryDesc(QueryDesc *qdesc);
  
diff -cprN HEAD/src/include/pgstat.h pg_stat_statements/src/include/pgstat.h
*** HEAD/src/include/pgstat.h	Fri Nov 14 21:39:34 2008
--- pg_stat_statements/src/include/pgstat.h	Fri Dec 12 18:26:03 2008
*************** typedef struct PgStat_FunctionCallUsage
*** 591,597 ****
  extern bool pgstat_track_activities;
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
! extern int	pgstat_track_activity_query_size;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  
--- 591,597 ----
  extern bool pgstat_track_activities;
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
! extern PGDLLIMPORT int	pgstat_track_activity_query_size;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;

002-explain-fmt.patchtext/x-patch; name=002-explain-fmt.patchDownload

diff -cprN HEAD/src/backend/commands/explain.c pg_stat_statements/src/backend/commands/explain.c
*** HEAD/src/backend/commands/explain.c	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/src/backend/commands/explain.c	Fri Dec 12 18:26:03 2008
***************
*** 33,38 ****
--- 33,41 ----
  #include "utils/snapmgr.h"
  
  
+ /* GUC parameter */
+ char *explain_analyze_format;
+ 
  /* Hook for plugins to get control in ExplainOneQuery() */
  ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
  
*************** static void show_sort_keys(Plan *sortpla
*** 73,78 ****
--- 76,82 ----
  static void show_sort_info(SortState *sortstate,
  			   StringInfo str, int indent, ExplainState *es);
  static const char *explain_get_index_name(Oid indexId);
+ static void appendInstrument(StringInfo str, const Instrumentation *instr);
  
  
  /*
*************** report_triggers(ResultRelInfo *rInfo, bo
*** 396,403 ****
  			appendStringInfo(buf, " on %s",
  							 RelationGetRelationName(rInfo->ri_RelationDesc));
  
! 		appendStringInfo(buf, ": time=%.3f calls=%.0f\n",
  						 1000.0 * instr->total, instr->ntuples);
  	}
  }
  
--- 401,410 ----
  			appendStringInfo(buf, " on %s",
  							 RelationGetRelationName(rInfo->ri_RelationDesc));
  
! 		appendStringInfo(buf, ": time=%.3f calls=%.0f",
  						 1000.0 * instr->total, instr->ntuples);
+ 		appendInstrument(buf, instr);
+ 		appendStringInfoChar(buf, '\n');
  	}
  }
  
*************** explain_outNode(StringInfo str,
*** 801,811 ****
  	{
  		double		nloops = planstate->instrument->nloops;
  
! 		appendStringInfo(str, " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
  						 1000.0 * planstate->instrument->startup / nloops,
  						 1000.0 * planstate->instrument->total / nloops,
  						 planstate->instrument->ntuples / nloops,
  						 planstate->instrument->nloops);
  	}
  	else if (es->printAnalyze)
  		appendStringInfo(str, " (never executed)");
--- 808,820 ----
  	{
  		double		nloops = planstate->instrument->nloops;
  
! 		appendStringInfo(str, " (actual time=%.3f..%.3f rows=%.0f loops=%.0f",
  						 1000.0 * planstate->instrument->startup / nloops,
  						 1000.0 * planstate->instrument->total / nloops,
  						 planstate->instrument->ntuples / nloops,
  						 planstate->instrument->nloops);
+ 		appendInstrument(str, planstate->instrument);
+ 		appendStringInfoChar(str, ')');
  	}
  	else if (es->printAnalyze)
  		appendStringInfo(str, " (never executed)");
*************** explain_get_index_name(Oid indexId)
*** 1335,1338 ****
--- 1344,1423 ----
  		result = quote_identifier(result);
  	}
  	return result;
+ }
+ 
+ /*
+  * append formatted instrumentation to the provided buffer.
+  */
+ static void
+ appendInstrument(StringInfo buf, const Instrumentation *instr)
+ {
+ 	int			format_len;
+ 	int			i;
+ 
+ 	format_len = strlen(explain_analyze_format);
+ 
+ 	for (i = 0; i < format_len; i++)
+ 	{
+ 		if (explain_analyze_format[i] != '%')
+ 		{
+ 			/* literal char, just copy */
+ 			appendStringInfoChar(buf, explain_analyze_format[i]);
+ 			continue;
+ 		}
+ 		/* go to char after '%' */
+ 		i++;
+ 		if (i >= format_len)
+ 			break;				/* format error - ignore it */
+ 
+ 		/* process the option */
+ 		switch (explain_analyze_format[i])
+ 		{
+ 			case 'g':
+ 				appendStringInfo(buf, "%ld", instr->buffer.shared_gets);
+ 				break;
+ 			case 'h':
+ 				appendStringInfo(buf, "%ld", instr->buffer.shared_hits);
+ 				break;
+ 			case 'r':
+ 				appendStringInfo(buf, "%ld", instr->buffer.shared_gets - instr->buffer.shared_hits);
+ 				break;
+ 			case 'w':
+ 				appendStringInfo(buf, "%ld", instr->buffer.shared_writes);
+ 				break;
+ 			case 'G':
+ 				appendStringInfo(buf, "%ld", instr->buffer.local_gets);
+ 				break;
+ 			case 'H':
+ 				appendStringInfo(buf, "%ld", instr->buffer.local_hits);
+ 				break;
+ 			case 'R':
+ 				appendStringInfo(buf, "%ld", instr->buffer.local_gets - instr->buffer.local_hits);
+ 				break;
+ 			case 'W':
+ 				appendStringInfo(buf, "%ld", instr->buffer.local_writes);
+ 				break;
+ 			case 't':
+ 				appendStringInfo(buf, "%ld", instr->buffer.buffile_reads);
+ 				break;
+ 			case 'T':
+ 				appendStringInfo(buf, "%ld", instr->buffer.buffile_writes);
+ 				break;
+ 			case 'p':
+ 				appendStringInfo(buf, "%.2f", 1000.0 * (instr->utime + instr->stime));
+ 				break;
+ 			case 'u':
+ 				appendStringInfo(buf, "%.2f", 1000.0 * instr->utime);
+ 				break;
+ 			case 's':
+ 				appendStringInfo(buf, "%.2f", 1000.0 * instr->stime);
+ 				break;
+ 			case '%':
+ 				appendStringInfoChar(buf, '%');
+ 				break;
+ 			default:
+ 				/* format error - ignore it */
+ 				break;
+ 		}
+ 	}
  }
diff -cprN HEAD/src/backend/executor/instrument.c pg_stat_statements/src/backend/executor/instrument.c
*** HEAD/src/backend/executor/instrument.c	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/backend/executor/instrument.c	Fri Dec 12 18:26:03 2008
***************
*** 14,22 ****
--- 14,31 ----
  #include "postgres.h"
  
  #include <unistd.h>
+ #ifdef HAVE_SYS_RESOURCE_H
+ #include <sys/time.h>
+ #include <sys/resource.h>
+ #endif
+ 
+ #ifndef HAVE_GETRUSAGE
+ #include "rusagestub.h"
+ #endif
  
  #include "executor/instrument.h"
  
+ #define timeval_to_sec(t) 	((double) (t).tv_sec + (t).tv_usec / 1000000.0)
  
  /* Allocate new instrumentation structure(s) */
  Instrumentation *
*************** void
*** 34,40 ****
--- 43,59 ----
  InstrStartNode(Instrumentation *instr)
  {
  	if (INSTR_TIME_IS_ZERO(instr->starttime))
+ 	{
+ 		struct rusage	ru;
+ 
  		INSTR_TIME_SET_CURRENT(instr->starttime);
+ 
+ 		BufferCounterSub(&instr->buffer, &BufferStats);
+ 
+ 		getrusage(RUSAGE_SELF, &ru);
+ 		instr->utime -= timeval_to_sec(ru.ru_utime);
+ 		instr->stime -= timeval_to_sec(ru.ru_stime);
+ 	}
  	else
  		elog(DEBUG2, "InstrStartNode called twice in a row");
  }
*************** void
*** 44,49 ****
--- 63,69 ----
  InstrStopNode(Instrumentation *instr, double nTuples)
  {
  	instr_time	endtime;
+ 	struct rusage	ru;
  
  	/* count the returned tuples */
  	instr->tuplecount += nTuples;
*************** InstrStopNode(Instrumentation *instr, do
*** 56,61 ****
--- 76,87 ----
  
  	INSTR_TIME_SET_CURRENT(endtime);
  	INSTR_TIME_ACCUM_DIFF(instr->counter, endtime, instr->starttime);
+ 
+ 	BufferCounterAdd(&instr->buffer, &BufferStats);
+ 
+ 	getrusage(RUSAGE_SELF, &ru);
+ 	instr->utime += timeval_to_sec(ru.ru_utime);
+ 	instr->stime += timeval_to_sec(ru.ru_stime);
  
  	INSTR_TIME_SET_ZERO(instr->starttime);
  
diff -cprN HEAD/src/backend/storage/buffer/buf_init.c pg_stat_statements/src/backend/storage/buffer/buf_init.c
*** HEAD/src/backend/storage/buffer/buf_init.c	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/backend/storage/buffer/buf_init.c	Fri Dec 12 18:26:03 2008
*************** char	   *BufferBlocks;
*** 23,37 ****
  int32	   *PrivateRefCount;
  
  /* statistics counters */
! long int	ReadBufferCount;
! long int	ReadLocalBufferCount;
! long int	BufferHitCount;
! long int	LocalBufferHitCount;
! long int	BufferFlushCount;
! long int	LocalBufferFlushCount;
! long int	BufFileReadCount;
! long int	BufFileWriteCount;
  
  
  /*
   * Data Structures:
--- 23,55 ----
  int32	   *PrivateRefCount;
  
  /* statistics counters */
! BufferCounter	BufferStats;
  
+ void
+ BufferCounterAdd(BufferCounter *into, const BufferCounter *arg)
+ {
+ 	into->shared_gets += arg->shared_gets;
+ 	into->shared_hits += arg->shared_hits;
+ 	into->shared_writes += arg->shared_writes;
+ 	into->local_gets += arg->local_gets;
+ 	into->local_hits += arg->local_hits;
+ 	into->local_writes += arg->local_writes;
+ 	into->buffile_reads += arg->buffile_reads;
+ 	into->buffile_writes += arg->buffile_writes;
+ }
+ 
+ void
+ BufferCounterSub(BufferCounter *into, const BufferCounter *arg)
+ {
+ 	into->shared_gets -= arg->shared_gets;
+ 	into->shared_hits -= arg->shared_hits;
+ 	into->shared_writes -= arg->shared_writes;
+ 	into->local_gets -= arg->local_gets;
+ 	into->local_hits -= arg->local_hits;
+ 	into->local_writes -= arg->local_writes;
+ 	into->buffile_reads -= arg->buffile_reads;
+ 	into->buffile_writes -= arg->buffile_writes;
+ }
  
  /*
   * Data Structures:
diff -cprN HEAD/src/backend/storage/buffer/bufmgr.c pg_stat_statements/src/backend/storage/buffer/bufmgr.c
*** HEAD/src/backend/storage/buffer/bufmgr.c	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/src/backend/storage/buffer/bufmgr.c	Fri Dec 12 18:26:03 2008
*************** double		bgwriter_lru_multiplier = 2.0;
*** 69,74 ****
--- 69,77 ----
  static volatile BufferDesc *InProgressBuf = NULL;
  static bool IsForInput;
  
+ /* baseline of buffer counters */
+ static BufferCounter	BufferStats_base;
+ 
  /* local state for LockBufferForCleanup */
  static volatile BufferDesc *PinCountWaitBuf = NULL;
  
*************** ReadBuffer_common(SMgrRelation smgr, boo
*** 208,218 ****
  
  	if (isLocalBuf)
  	{
! 		ReadLocalBufferCount++;
  		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
  		if (found)
  		{
! 			LocalBufferHitCount++;
  			TRACE_POSTGRESQL_BUFFER_HIT(true); /* true == local buffer */
  		}
  		else
--- 211,221 ----
  
  	if (isLocalBuf)
  	{
! 		BufferStats.local_gets++;
  		bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
  		if (found)
  		{
! 			BufferStats.local_hits++;
  			TRACE_POSTGRESQL_BUFFER_HIT(true); /* true == local buffer */
  		}
  		else
*************** ReadBuffer_common(SMgrRelation smgr, boo
*** 222,228 ****
  	}
  	else
  	{
! 		ReadBufferCount++;
  
  		/*
  		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
--- 225,231 ----
  	}
  	else
  	{
! 		BufferStats.shared_gets++;
  
  		/*
  		 * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
*************** ReadBuffer_common(SMgrRelation smgr, boo
*** 231,237 ****
  		bufHdr = BufferAlloc(smgr, forkNum, blockNum, strategy, &found);
  		if (found)
  		{
! 			BufferHitCount++;
  			TRACE_POSTGRESQL_BUFFER_HIT(false); /* false != local buffer */
  		}
  		else
--- 234,240 ----
  		bufHdr = BufferAlloc(smgr, forkNum, blockNum, strategy, &found);
  		if (found)
  		{
! 			BufferStats.shared_hits++;
  			TRACE_POSTGRESQL_BUFFER_HIT(false); /* false != local buffer */
  		}
  		else
*************** ShowBufferUsage(void)
*** 1524,1551 ****
  	StringInfoData str;
  	float		hitrate;
  	float		localhitrate;
  
  	initStringInfo(&str);
  
! 	if (ReadBufferCount == 0)
  		hitrate = 0.0;
  	else
! 		hitrate = (float) BufferHitCount *100.0 / ReadBufferCount;
  
! 	if (ReadLocalBufferCount == 0)
  		localhitrate = 0.0;
  	else
! 		localhitrate = (float) LocalBufferHitCount *100.0 / ReadLocalBufferCount;
  
  	appendStringInfo(&str,
  	"!\tShared blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
! 				ReadBufferCount - BufferHitCount, BufferFlushCount, hitrate);
  	appendStringInfo(&str,
  	"!\tLocal  blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
! 					 ReadLocalBufferCount - LocalBufferHitCount, LocalBufferFlushCount, localhitrate);
  	appendStringInfo(&str,
  					 "!\tDirect blocks: %10ld read, %10ld written\n",
! 					 BufFileReadCount, BufFileWriteCount);
  
  	return str.data;
  }
--- 1527,1557 ----
  	StringInfoData str;
  	float		hitrate;
  	float		localhitrate;
+ 	BufferCounter	c = BufferStats;
+ 
+ 	BufferCounterSub(&c, &BufferStats_base);
  
  	initStringInfo(&str);
  
! 	if (c.shared_gets == 0)
  		hitrate = 0.0;
  	else
! 		hitrate = c.shared_hits * 100.0f / c.shared_gets;
  
! 	if (c.local_gets == 0)
  		localhitrate = 0.0;
  	else
! 		localhitrate = c.local_hits * 100.0f / c.local_gets;
  
  	appendStringInfo(&str,
  	"!\tShared blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
! 				c.shared_gets - c.shared_hits, c.shared_writes, hitrate);
  	appendStringInfo(&str,
  	"!\tLocal  blocks: %10ld read, %10ld written, buffer hit rate = %.2f%%\n",
! 				c.local_gets - c.local_hits, c.local_writes, localhitrate);
  	appendStringInfo(&str,
  					 "!\tDirect blocks: %10ld read, %10ld written\n",
! 					 c.buffile_reads, c.buffile_writes);
  
  	return str.data;
  }
*************** ShowBufferUsage(void)
*** 1553,1566 ****
  void
  ResetBufferUsage(void)
  {
! 	BufferHitCount = 0;
! 	ReadBufferCount = 0;
! 	BufferFlushCount = 0;
! 	LocalBufferHitCount = 0;
! 	ReadLocalBufferCount = 0;
! 	LocalBufferFlushCount = 0;
! 	BufFileReadCount = 0;
! 	BufFileWriteCount = 0;
  }
  
  /*
--- 1559,1565 ----
  void
  ResetBufferUsage(void)
  {
! 	BufferStats_base = BufferStats;
  }
  
  /*
*************** FlushBuffer(volatile BufferDesc *buf, SM
*** 1818,1824 ****
  			  (char *) BufHdrGetBlock(buf),
  			  false);
  
! 	BufferFlushCount++;
  
  	TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
  		 reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode);
--- 1817,1823 ----
  			  (char *) BufHdrGetBlock(buf),
  			  false);
  
! 	BufferStats.shared_writes++;
  
  	TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(reln->smgr_rnode.spcNode,
  		 reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode);
diff -cprN HEAD/src/backend/storage/buffer/localbuf.c pg_stat_statements/src/backend/storage/buffer/localbuf.c
*** HEAD/src/backend/storage/buffer/localbuf.c	Fri Dec  5 00:11:18 2008
--- pg_stat_statements/src/backend/storage/buffer/localbuf.c	Fri Dec 12 18:26:03 2008
*************** LocalBufferAlloc(SMgrRelation smgr, Fork
*** 172,178 ****
  		/* Mark not-dirty now in case we error out below */
  		bufHdr->flags &= ~BM_DIRTY;
  
! 		LocalBufferFlushCount++;
  	}
  
  	/*
--- 172,178 ----
  		/* Mark not-dirty now in case we error out below */
  		bufHdr->flags &= ~BM_DIRTY;
  
! 		BufferStats.local_writes++;
  	}
  
  	/*
diff -cprN HEAD/src/backend/storage/file/buffile.c pg_stat_statements/src/backend/storage/file/buffile.c
*** HEAD/src/backend/storage/file/buffile.c	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/backend/storage/file/buffile.c	Fri Dec 12 18:26:03 2008
*************** BufFileLoadBuffer(BufFile *file)
*** 240,246 ****
  	file->offsets[file->curFile] += file->nbytes;
  	/* we choose not to advance curOffset here */
  
! 	BufFileReadCount++;
  }
  
  /*
--- 240,246 ----
  	file->offsets[file->curFile] += file->nbytes;
  	/* we choose not to advance curOffset here */
  
! 	BufferStats.buffile_reads++;
  }
  
  /*
*************** BufFileDumpBuffer(BufFile *file)
*** 304,310 ****
  		file->curOffset += bytestowrite;
  		wpos += bytestowrite;
  
! 		BufFileWriteCount++;
  	}
  	file->dirty = false;
  
--- 304,310 ----
  		file->curOffset += bytestowrite;
  		wpos += bytestowrite;
  
! 		BufferStats.buffile_writes++;
  	}
  	file->dirty = false;
  
diff -cprN HEAD/src/backend/utils/misc/guc.c pg_stat_statements/src/backend/utils/misc/guc.c
*** HEAD/src/backend/utils/misc/guc.c	Fri Dec  5 00:11:18 2008
--- pg_stat_statements/src/backend/utils/misc/guc.c	Fri Dec 12 18:26:03 2008
*************** extern char *default_tablespace;
*** 113,118 ****
--- 113,119 ----
  extern char *temp_tablespaces;
  extern bool synchronize_seqscans;
  extern bool fullPageWrites;
+ extern char *explain_analyze_format;
  
  #ifdef TRACE_SORT
  extern bool trace_sort;
*************** static struct config_string ConfigureNam
*** 2493,2498 ****
--- 2494,2508 ----
  	},
  #endif   /* USE_SSL */
  
+ 	{
+ 		{"explain_analyze_format", PGC_USERSET, CLIENT_CONN_STATEMENT,
+ 			gettext_noop("Controls information returned in explain analyze."),
+ 			gettext_noop("If blank, basic statistics are returned.")
+ 		},
+ 		&explain_analyze_format,
+ 		"", NULL, NULL
+ 	},
+ 
  	/* End-of-list marker */
  	{
  		{NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL
*************** define_custom_variable(struct config_gen
*** 5707,5713 ****
  		case PGC_S_ENV_VAR:
  		case PGC_S_FILE:
  		case PGC_S_ARGV:
! 			phcontext = PGC_SIGHUP;
  			break;
  		case PGC_S_DATABASE:
  		case PGC_S_USER:
--- 5717,5726 ----
  		case PGC_S_ENV_VAR:
  		case PGC_S_FILE:
  		case PGC_S_ARGV:
! 			if (variable->context == PGC_POSTMASTER)
! 				phcontext = PGC_POSTMASTER;
! 			else
! 				phcontext = PGC_SIGHUP;
  			break;
  		case PGC_S_DATABASE:
  		case PGC_S_USER:
diff -cprN HEAD/src/backend/utils/misc/postgresql.conf.sample pg_stat_statements/src/backend/utils/misc/postgresql.conf.sample
*** HEAD/src/backend/utils/misc/postgresql.conf.sample	Fri Nov 14 21:39:34 2008
--- pg_stat_statements/src/backend/utils/misc/postgresql.conf.sample	Fri Dec 12 18:26:03 2008
***************
*** 443,448 ****
--- 443,450 ----
  # default configuration for text search
  #default_text_search_config = 'pg_catalog.simple'
  
+ #explain_analyze_format = ''	# special values %[ghrwGHRWtTpus%]
+ 
  # - Other Defaults -
  
  #dynamic_library_path = '$libdir'
diff -cprN HEAD/src/include/executor/instrument.h pg_stat_statements/src/include/executor/instrument.h
*** HEAD/src/include/executor/instrument.h	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/include/executor/instrument.h	Fri Dec 12 18:26:03 2008
***************
*** 14,19 ****
--- 14,20 ----
  #define INSTRUMENT_H
  
  #include "portability/instr_time.h"
+ #include "storage/buf_internals.h"
  
  
  typedef struct Instrumentation
*************** typedef struct Instrumentation
*** 29,34 ****
--- 30,39 ----
  	double		total;			/* Total total time (in seconds) */
  	double		ntuples;		/* Total tuples produced */
  	double		nloops;			/* # of run cycles for this node */
+ 
+ 	BufferCounter	buffer;		/* buffer stats */
+ 	double		utime;			/* user time in sec */
+ 	double		stime;			/* sys time in sec */
  } Instrumentation;
  
  extern Instrumentation *InstrAlloc(int n);
diff -cprN HEAD/src/include/pgstat.h pg_stat_statements/src/include/pgstat.h
*** HEAD/src/include/pgstat.h	Fri Nov 14 21:39:34 2008
--- pg_stat_statements/src/include/pgstat.h	Fri Dec 12 18:26:03 2008
*************** typedef struct PgStat_FunctionCallUsage
*** 591,597 ****
  extern bool pgstat_track_activities;
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
! extern int	pgstat_track_activity_query_size;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  
--- 591,597 ----
  extern bool pgstat_track_activities;
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
! extern PGDLLIMPORT int	pgstat_track_activity_query_size;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  
diff -cprN HEAD/src/include/storage/buf_internals.h pg_stat_statements/src/include/storage/buf_internals.h
*** HEAD/src/include/storage/buf_internals.h	Sat Nov  1 15:22:28 2008
--- pg_stat_statements/src/include/storage/buf_internals.h	Fri Dec 12 18:26:03 2008
*************** extern PGDLLIMPORT BufferDesc *BufferDes
*** 174,188 ****
  extern BufferDesc *LocalBufferDescriptors;
  
  /* event counters in buf_init.c */
! extern long int ReadBufferCount;
! extern long int ReadLocalBufferCount;
! extern long int BufferHitCount;
! extern long int LocalBufferHitCount;
! extern long int BufferFlushCount;
! extern long int LocalBufferFlushCount;
! extern long int BufFileReadCount;
! extern long int BufFileWriteCount;
  
  
  /*
   * Internal routines: only called by bufmgr
--- 174,195 ----
  extern BufferDesc *LocalBufferDescriptors;
  
  /* event counters in buf_init.c */
! typedef struct BufferCounter
! {
! 	long	shared_gets;
! 	long	shared_hits;
! 	long	shared_writes;
! 	long	local_gets;
! 	long	local_hits;
! 	long	local_writes;
! 	long	buffile_reads;
! 	long	buffile_writes;
! } BufferCounter;
  
+ extern PGDLLIMPORT BufferCounter	BufferStats;
+ 
+ extern void	BufferCounterAdd(BufferCounter *into, const BufferCounter *arg);
+ extern void	BufferCounterSub(BufferCounter *into, const BufferCounter *arg);
  
  /*
   * Internal routines: only called by bufmgr

pg_stat_statements-1212.patchtext/x-patch; name=pg_stat_statements-1212.patchDownload

diff -cprN HEAD/contrib/Makefile pg_stat_statements/contrib/Makefile
*** HEAD/contrib/Makefile	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/contrib/Makefile	Fri Dec 12 18:26:03 2008
*************** WANTED_DIRS = \
*** 31,36 ****
--- 31,37 ----
  		pgbench		\
  		pgcrypto	\
  		pgrowlocks	\
+ 		pg_stat_statements	\
  		pgstattuple	\
  		seg		\
  		spi		\
diff -cprN HEAD/contrib/pg_stat_statements/Makefile pg_stat_statements/contrib/pg_stat_statements/Makefile
*** HEAD/contrib/pg_stat_statements/Makefile	Thu Jan  1 09:00:00 1970
--- pg_stat_statements/contrib/pg_stat_statements/Makefile	Mon Jun 23 11:05:19 2008
***************
*** 0 ****
--- 1,15 ----
+ MODULE_big = pg_stat_statements
+ DATA_built = pg_stat_statements.sql
+ DATA = uninstall_pg_stat_statements.sql
+ OBJS = pg_stat_statements.o
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_stat_statements
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
diff -cprN HEAD/contrib/pg_stat_statements/pg_stat_statements.c pg_stat_statements/contrib/pg_stat_statements/pg_stat_statements.c
*** HEAD/contrib/pg_stat_statements/pg_stat_statements.c	Thu Jan  1 09:00:00 1970
--- pg_stat_statements/contrib/pg_stat_statements/pg_stat_statements.c	Fri Dec 12 10:52:02 2008
***************
*** 0 ****
--- 1,810 ----
+ /*-------------------------------------------------------------------------
+  *
+  * pg_stat_statements.c
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+ 
+ #include <unistd.h>
+ 
+ #ifdef HAVE_SYS_RESOURCE_H
+ #include <sys/time.h>
+ #include <sys/resource.h>
+ #endif
+ #ifndef HAVE_GETRUSAGE
+ #include "rusagestub.h"
+ #endif
+ 
+ #include "access/hash.h"
+ #include "executor/instrument.h"
+ #include "funcapi.h"
+ #include "mb/pg_wchar.h"
+ #include "miscadmin.h"
+ #include "pgstat.h"
+ #include "storage/ipc.h"
+ #include "storage/spin.h"
+ #include "utils/builtins.h"
+ #include "utils/guc.h"
+ 
+ PG_MODULE_MAGIC;
+ 
+ #define GUCNAME(name)		("statistics." name)
+ 
+ #define MAX_STATEMENTS_DEFAULT	1000
+ #define CHUNK_SIZE				MAXALIGN(pgstat_track_activity_query_size)
+ 
+ const uint32 PGSS_FILE_HEADER = 0x20081202;
+ 
+ /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
+ #define USAGE_EXEC(duration)	(1.0)
+ #define USAGE_INIT				(1.0)	/* including initial planning */
+ #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
+ #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
+ 
+ #define timeval_diff_in_usec(end, beg) \
+ 	( (int64) ((end).tv_sec - (beg).tv_sec) * 1000000 \
+ 	+ ((end).tv_usec - (beg).tv_usec))
+ 
+ #define track_statements_enabled() \
+ 	(track_statements == PGSS_TRACK_ALL || \
+ 	(track_statements == PGSS_TRACK_TOP && nested_level == 0))
+ 
+ /*
+  * Chunk buffer for query text.
+  */
+ typedef union Chunk Chunk;
+ union Chunk
+ {
+ 	char	query[1];	/* query text. true size is [CHUNK_SIZE] */
+ 	Chunk  *next;		/* next free chunk */
+ };
+ 
+ typedef struct Counters
+ {
+ 	int64		calls;		/* # of executed */
+ 	double		total_time;	/* total execution time in sec */
+ 	double		cpu_user;	/* user cpu time in sec */
+ 	double		cpu_sys;	/* sys cpu time in sec */
+ 	int64		gets;		/* # of buffer gets during execution */
+ 	int64		reads;		/* # of buffer reads during execution */
+ 	int64		lreads;		/* # of buffile reads during execution */
+ 	int64		rows;		/* # of retrieved or affected rows */
+ } Counters;
+ 
+ /*
+  * Statistics per statement
+  */
+ typedef struct Entry
+ {
+ 	uint32		tag;		/* hash value of query */
+ 	Oid			userid;		/* user oid */
+ 	Oid			dbid;		/* database oid */
+ 	Counters	counters;	/* counters */
+ 	double		usage;		/* usage factor */
+ 	Chunk	   *chunk;		/* query text */
+ 	slock_t		mutex;		/* protect above fields */
+ } Entry;
+ 
+ /*
+  * Global shared state
+  */
+ typedef struct pgStatStmt
+ {
+ 	LWLockId	lock;				/* protect fields and hash */
+ 	int			num_statements;		/* # of entries in hash */
+ 	int			chunk_size;			/* max query length in bytes */
+ 	Chunk	   *free_chunks;		/* single linked list of free chunks */
+ 	Chunk		chunks[1];			/* chunk buffers */
+ } pgStatStmt;
+ 
+ /*---- Local variables ----*/
+ 
+ static int						nested_level = 0;
+ static startup_hook_type		prev_startup_hook = NULL;
+ static shutdown_hook_type		prev_shutdown_hook = NULL;
+ static ExecutorStart_hook_type	prev_ExecutorStart = NULL;
+ static ExecutorRun_hook_type	prev_ExecutorRun = NULL;
+ static ExecutorEnd_hook_type	prev_ExecutorEnd = NULL;
+ static pgStatStmt			   *pgss;
+ static HTAB					   *pgss_hash;
+ static bool						reset_requested = false;
+ 
+ /*---- Function declarations ----*/
+ 
+ void	_PG_init(void);
+ void	_PG_fini(void);
+ Datum	pg_stat_statements_reset(PG_FUNCTION_ARGS);
+ Datum	pg_stat_statements(PG_FUNCTION_ARGS);
+ 
+ PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
+ PG_FUNCTION_INFO_V1(pg_stat_statements);
+ 
+ static uint32 pgss_tag(Oid userid, Oid dbid, const char *query);
+ static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+ static void pgss_ExecutorRun(QueryDesc *queryDesc,
+ 							 ScanDirection direction,
+ 							 long count);
+ static void pgss_ExecutorEnd(QueryDesc *queryDesc);
+ static void pgss_executed(const char *query,
+ 						  const Instrumentation *instr, uint32 rows);
+ 
+ static Size	pgss_memsize(void);
+ static void pgss_init(void);
+ static volatile Entry *entry_alloc(uint32 tag, Oid userid, Oid dbid,
+ 							const char *query);
+ static void entry_dealloc(void);
+ static void entry_reset(void);
+ static Chunk *chunk_alloc(void);
+ static void chunk_free(Chunk *chunk);
+ static void chunk_reset(void);
+ 
+ static const char *assign_saved_file(const char *newval, bool doit, GucSource source);
+ 
+ /*---- GUC variables ----*/
+ 
+ typedef enum
+ {
+ 	PGSS_TRACK_NONE,	/* log no statements */
+ 	PGSS_TRACK_TOP,		/* only top level statements */
+ 	PGSS_TRACK_ALL,		/* all statements, including nested ones */
+ } PGSSTrackLevel;
+ 
+ static const struct config_enum_entry track_options[] = {
+ 	{"none", PGSS_TRACK_NONE, false},
+ 	{"top", PGSS_TRACK_TOP, false},
+ 	{"all", PGSS_TRACK_ALL, false},
+ 	{NULL, 0, false}
+ };
+ 
+ static int		max_statements;
+ static int		track_statements;
+ static char	   *saved_file;
+ 
+ /*
+  * pgss_shutdown - Dump statistics into file.
+  */
+ static void
+ pgss_shutdown(void)
+ {
+ 	FILE			   *file;
+ 	HASH_SEQ_STATUS		hash_seq;
+ 	Entry			   *entry;
+ 	int					save_errno;
+ 
+ 	if (prev_shutdown_hook)
+ 		prev_shutdown_hook();
+ 
+ 	if (saved_file == NULL || saved_file[0] == '\0')
+ 		return;
+ 
+ 	if (!pgss)
+ 		pgss_init();
+ 
+ 	file = fopen(saved_file, PG_BINARY_W);
+ 	if (file == NULL ||
+ 		fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1 ||
+ 		fwrite(&pgss->num_statements, sizeof(int32), 1, file) != 1)
+ 		goto error;
+ 
+ 	hash_seq_init(&hash_seq, pgss_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		int32	len = (int32) strlen(entry->chunk->query);
+ 
+ 		if (fwrite(entry, offsetof(Entry, chunk), 1, file) != 1 ||
+ 			fwrite(&len, sizeof(int32), 1, file) != 1 ||
+ 			fwrite(entry->chunk->query, 1, len, file) != len)
+ 			goto error;
+ 	}
+ 
+ 	fclose(file);
+ 	return;
+ 
+ error:
+ 	save_errno = errno;
+ 
+ 	if (file)
+ 		fclose(file);
+ 	unlink(saved_file);
+ 
+ 	errno = save_errno;
+ 	ereport(LOG,
+ 		(errcode_for_file_access(),
+ 		 errmsg("could not write pg_stat_statement file \"%s\": %m",
+ 				saved_file)));
+ }
+ 
+ /*
+  * pgss_startup - Load statistics from file.
+  */
+ static void
+ pgss_startup(void)
+ {
+ 	FILE	   *file;
+ 	uint32		header;
+ 	int32		num;
+ 	int32		i;
+ 	char	   *buffer = NULL;
+ 	int			save_errno;
+ 
+ 	if (prev_startup_hook)
+ 		prev_startup_hook();
+ 
+ 	if (saved_file == NULL || saved_file[0] == '\0')
+ 		return;
+ 
+ 	file = fopen(saved_file, PG_BINARY_R);
+ 	if (file == NULL)
+ 	{
+ 		if (errno == ENOENT)
+ 			return;	/* ignore not found error */
+ 		goto error;
+ 	}
+ 
+ 	if (!pgss)
+ 		pgss_init();
+ 
+ 	buffer = palloc(CHUNK_SIZE);
+ 
+ 	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
+ 		header != PGSS_FILE_HEADER ||
+ 		fread(&num, sizeof(int32), 1, file) != 1)
+ 		goto error;
+ 
+ 	for (i = 0; i < num; i++)
+ 	{
+ 		Entry		temp;
+ 		Entry	   *entry;
+ 		int32		templen;
+ 		int32		len;
+ 
+ 		if (fread(&temp, offsetof(Entry, chunk), 1, file) != 1 ||
+ 			fread(&templen, sizeof(int32), 1, file) != 1)
+ 			goto error;
+ 
+ 		/*
+ 		 * We might not have enough buffer to read query text
+ 		 * when pgstat_track_activity_query_size is set smaller.
+ 		 */
+ 		if (templen < CHUNK_SIZE)
+ 			len = templen;
+ 		else
+ 			len = CHUNK_SIZE - 1;
+ 
+ 		if (fread(buffer, 1, len, file) != len)
+ 			goto error;
+ 		buffer[len] = '\0';
+ 		entry = (Entry *) entry_alloc(
+ 			temp.tag, temp.userid, temp.dbid, buffer);
+ 		memcpy(entry, &temp, offsetof(Entry, chunk));
+ 
+ 		/* Discard the tail of query text. */
+ 		if (templen > len)
+ 			fseek(file, templen - len, SEEK_CUR);
+ 	}
+ 
+ 	pfree(buffer);
+ 	fclose(file);
+ 	return;
+ 
+ error:
+ 	save_errno = errno;
+ 
+ 	if (file)
+ 		fclose(file);
+ 	unlink(saved_file);
+ 	if (buffer)
+ 		pfree(buffer);
+ 
+ 	errno = save_errno;
+ 	ereport(LOG,
+ 		(errcode_for_file_access(),
+ 		 errmsg("could not read pg_stat_statement file \"%s\": %m",
+ 				saved_file)));
+ }
+ 
+ void
+ _PG_init(void)
+ {
+ 	/* Define custom guc variables. */
+ 	DefineCustomIntVariable(
+ 		GUCNAME("max_statements"),
+ 		"Sets the maximum number of statements tracked by pg_stat_statements.",
+ 		NULL,
+ 		&max_statements,
+ 		MAX_STATEMENTS_DEFAULT,
+ 		100,
+ 		INT_MAX,
+ 		PGC_POSTMASTER,
+ 		0,
+ 		NULL,
+ 		NULL);
+ 
+ 	DefineCustomEnumVariable(
+ 		GUCNAME("track_statements"),
+ 		"Collects information about executed statements.",
+ 		NULL,
+ 		&track_statements,
+ 		PGSS_TRACK_NONE,
+ 		track_options,
+ 		PGC_SUSET,
+ 		0,
+ 		NULL,
+ 		NULL);
+ 
+ 	DefineCustomStringVariable(
+ 		GUCNAME("saved_file"),
+ 		"Writes statistic files to the specified path on shutdown.",
+ 		NULL,
+ 		&saved_file,
+ 		"global/pg_stat_statements.dump",
+ 		PGC_USERSET,
+ 		GUC_SUPERUSER_ONLY,
+ 		assign_saved_file,
+ 		NULL);
+ 
+ 	/* Request additonal shared resources. */
+ 	RequestAddinShmemSpace(pgss_memsize());
+ 	RequestAddinLWLocks(1);
+ 
+ 	/* Install hooks. */
+ 	prev_startup_hook = startup_hook;
+ 	startup_hook = pgss_startup;
+ 	prev_shutdown_hook = shutdown_hook;
+ 	shutdown_hook = pgss_shutdown;
+ 	prev_ExecutorStart = ExecutorStart_hook;
+ 	ExecutorStart_hook = pgss_ExecutorStart;
+ 	prev_ExecutorRun = ExecutorRun_hook;
+ 	ExecutorRun_hook = pgss_ExecutorRun;
+ 	prev_ExecutorEnd = ExecutorEnd_hook;
+ 	ExecutorEnd_hook = pgss_ExecutorEnd;
+ }
+ 
+ void
+ _PG_fini(void)
+ {
+ 	/* Uninstall hooks. */
+ 	ExecutorStart_hook = prev_ExecutorStart;
+ 	ExecutorRun_hook = prev_ExecutorRun;
+ 	ExecutorEnd_hook = prev_ExecutorEnd;
+ 	shutdown_hook = prev_shutdown_hook;
+ 	startup_hook = prev_startup_hook;
+ }
+ 
+ static uint32
+ pgss_tag(Oid userid, Oid dbid, const char *query)
+ {
+ 	return oid_hash(&userid, sizeof(Oid)) ^
+ 		   oid_hash(&dbid, sizeof(Oid)) ^
+ 		   DatumGetUInt32(hash_any((const unsigned char *) query,
+ 								   (int) strlen(query)));
+ }
+ 
+ static void
+ pgss_executed(const char *query, const Instrumentation *instr, uint32 rows)
+ {
+ 	volatile Entry *entry;
+ 	Oid				userid;
+ 	Oid				dbid;
+ 	uint32			tag;
+ 	long			gets;
+ 	long			reads;
+ 	double			usage;
+ 
+ 	Assert(query != NULL);
+ 
+ 	if (!track_statements)
+ 		return;
+ 	if (!pgss)
+ 		pgss_init();
+ 
+ 	userid = GetUserId();
+ 	dbid = MyDatabaseId;
+ 	tag = pgss_tag(userid, dbid, query);
+ 	gets = instr->buffer.shared_gets + instr->buffer.local_gets;
+ 	reads = gets - instr->buffer.shared_hits - instr->buffer.local_hits;
+ 	usage = USAGE_EXEC(duration);
+ 
+ 	LWLockAcquire(pgss->lock, LW_SHARED);
+ 
+ 	entry = hash_search(pgss_hash, &tag, HASH_FIND, NULL);
+ 	if (!entry)
+ 	{
+ 		/* Re-acquire exclusive lock to add a new entry. */
+ 		LWLockRelease(pgss->lock);
+ 		LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
+ 		entry = entry_alloc(tag, userid, dbid, query);
+ 	}
+ 
+ 	SpinLockAcquire(&entry->mutex);
+ 	entry->counters.calls += 1;
+ 	entry->counters.total_time += instr->total;
+ 	entry->counters.cpu_user += instr->utime;
+ 	entry->counters.cpu_sys += instr->stime;
+ 	entry->counters.gets += gets;
+ 	entry->counters.reads += reads;
+ 	entry->counters.lreads += instr->buffer.buffile_reads;
+ 	entry->counters.rows += rows;
+ 	entry->usage += usage;
+ 	SpinLockRelease(&entry->mutex);
+ 
+ 	LWLockRelease(pgss->lock);
+ }
+ 
+ void
+ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
+ {
+ 	if (track_statements_enabled() && queryDesc->totaltime == NULL)
+ 		queryDesc->totaltime = InstrAlloc(1);
+ 
+ 	if (prev_ExecutorStart)
+ 		prev_ExecutorStart(queryDesc, eflags);
+ 	else
+ 		standard_ExecutorStart(queryDesc, eflags);
+ }
+ 
+ void
+ pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
+ {
+ 	nested_level++;
+ 	PG_TRY();
+ 	{
+ 		if (prev_ExecutorRun)
+ 			prev_ExecutorRun(queryDesc, direction, count);
+ 		else
+ 			standard_ExecutorRun(queryDesc, direction, count);
+ 		nested_level--;
+ 	}
+ 	PG_CATCH();
+ 	{
+ 		nested_level--;
+ 		PG_RE_THROW();
+ 	}
+ 	PG_END_TRY();
+ }
+ 
+ 
+ void
+ pgss_ExecutorEnd(QueryDesc *queryDesc)
+ {
+ 	if (queryDesc->totaltime && track_statements_enabled())
+ 	{
+ 		/*
+ 		 * Make sure stats accumulation is done.  (Note: it's okay if
+ 		 * several levels of hook all do this.)
+ 		 */
+ 		InstrEndLoop(queryDesc->totaltime);
+ 
+ 		pgss_executed(queryDesc->sourceText,
+ 			queryDesc->totaltime,
+ 			queryDesc->estate->es_processed);
+ 	}
+ 
+ 	if (prev_ExecutorEnd)
+ 		prev_ExecutorEnd(queryDesc);
+ 	else
+ 		standard_ExecutorEnd(queryDesc);
+ 
+ 	/* Reset statistics if requested. */
+ 	if (reset_requested && nested_level == 0)
+ 	{
+ 		entry_reset();
+ 		reset_requested = false;
+ 	}
+ }
+ 
+ Datum
+ pg_stat_statements_reset(PG_FUNCTION_ARGS)
+ {
+ 	if (!pgss)
+ 		pgss_init();
+ 	reset_requested = true;
+ 	PG_RETURN_BOOL(true);
+ }
+ 
+ #define pg_stat_statements_cols		11
+ 
+ Datum
+ pg_stat_statements(PG_FUNCTION_ARGS)
+ {
+ 	ReturnSetInfo	   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc			tupdesc;
+ 	Tuplestorestate    *tupstore;
+ 	MemoryContext		per_query_ctx;
+ 	MemoryContext		oldcontext;
+ 	Oid					userid = GetUserId();
+ 	bool				is_superuser = superuser();
+ 	HASH_SEQ_STATUS		hash_seq;
+ 	Entry			   *entry;
+ 
+ 	if (!pgss)
+ 		pgss_init();
+ 
+ 	/* check to see if caller supports us returning a tuplestore */
+ 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("set-valued function called in context that cannot accept a set")));
+ 	if (!(rsinfo->allowedModes & SFRM_Materialize))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("materialize mode required, but it is not " \
+ 						"allowed in this context")));
+ 
+ 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+ 
+ 	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+ 	tupstore = tuplestore_begin_heap(true, false, work_mem);
+ 	rsinfo->returnMode = SFRM_Materialize;
+ 	rsinfo->setResult = tupstore;
+ 	rsinfo->setDesc = tupdesc;
+ 
+ 	LWLockAcquire(pgss->lock, LW_SHARED);
+ 
+ 	hash_seq_init(&hash_seq, pgss_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		Datum	values[pg_stat_statements_cols];
+ 		bool	nulls[pg_stat_statements_cols] = { 0 };
+ 		int		i = 0;
+ 
+ 		/* generate junk in short-term context */
+ 		MemoryContextSwitchTo(oldcontext);
+ 
+ 		values[i++] = ObjectIdGetDatum(entry->userid);
+ 		values[i++] = ObjectIdGetDatum(entry->dbid);
+ 
+ 		if (is_superuser || entry->userid == userid)
+ 			values[i++] = CStringGetTextDatum(entry->chunk->query);
+ 		else
+ 			values[i++] = CStringGetTextDatum("<insufficient privilege>");
+ 
+ 		SpinLockAcquire(&entry->mutex);
+ 		values[i++] = Int64GetDatumFast(entry->counters.calls);
+ 		values[i++] = Float8GetDatumFast(entry->counters.total_time);
+ 		values[i++] = Float8GetDatumFast(entry->counters.cpu_user);
+ 		values[i++] = Float8GetDatumFast(entry->counters.cpu_sys);
+ 		values[i++] = Int64GetDatumFast(entry->counters.gets);
+ 		values[i++] = Int64GetDatumFast(entry->counters.reads);
+ 		values[i++] = Int64GetDatumFast(entry->counters.lreads);
+ 		values[i++] = Int64GetDatumFast(entry->counters.rows);
+ 		SpinLockRelease(&entry->mutex);
+ 		Assert(i == pg_stat_statements_cols);
+ 
+ 		/* switch to appropriate context while storing the tuple */
+ 		MemoryContextSwitchTo(per_query_ctx);
+ 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ 	}
+ 
+ 	LWLockRelease(pgss->lock);
+ 
+ 	/* clean up and return the tuplestore */
+ 	tuplestore_donestoring(tupstore);
+ 
+ 	MemoryContextSwitchTo(oldcontext);
+ 
+ 	return (Datum) 0;
+ }
+ 
+ static Size
+ pgss_memsize(void)
+ {
+ 	Size	size;
+ 
+ 	size = offsetof(pgStatStmt, chunks);
+ 	size = add_size(size, mul_size(max_statements, CHUNK_SIZE));
+ 	size = add_size(size, hash_estimate_size(max_statements, sizeof(Entry)));
+ 
+ 	return size;
+ }
+ 
+ static void
+ pgss_init(void)
+ {
+     bool		found;
+ 	Size		size;
+ 	HASHCTL		info = { 0 };
+ 
+ 	Assert(pgss == NULL);
+ 
+ 	size = pgss_memsize();
+ 
+ 	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+ 
+ 	pgss = ShmemInitStruct("pg_stat_statements", size, &found);
+ 	if (!pgss)
+ 		elog(ERROR, "out of shared memory");
+ 
+ 	if (!found)
+     {
+ 		pgss->lock = LWLockAssign();
+ 		pgss->num_statements = 0;
+ 		pgss->chunk_size = CHUNK_SIZE;
+ 		chunk_reset();
+ 	}
+ 
+ 	info.keysize = sizeof(uint32);
+ 	info.entrysize = sizeof(Entry);
+ 	info.hash = oid_hash;
+ 	pgss_hash = ShmemInitHash("pg_stat_statements hash",
+ 							  max_statements, max_statements,
+ 							  &info,
+ 							  HASH_ELEM | HASH_FUNCTION);
+ 	if (!pgss_hash)
+ 		elog(ERROR, "out of shared memory");
+ 
+ 	LWLockRelease(AddinShmemInitLock);
+ }
+ 
+ /*
+  * caller must be hold an exclusive lock on pgStateStmt->lock
+  */
+ static volatile Entry *
+ entry_alloc(uint32 tag, Oid userid, Oid dbid, const char *query)
+ {
+ 	bool	found;
+ 	Entry  *entry;
+ 
+ 	Assert(query != NULL);
+ 
+ 	while (pgss->num_statements >= max_statements)
+ 		entry_dealloc();
+ 
+ 	entry = hash_search(pgss_hash, &tag, HASH_ENTER, &found);
+ 
+ 	if (!found)
+ 	{
+ 		Size	len;
+ 
+ 		pgss->num_statements++;
+ 
+ 		entry->userid = userid;
+ 		entry->dbid = dbid;
+ 		memset(&entry->counters, 0, sizeof(Counters));
+ 		entry->usage = USAGE_INIT;
+ 
+ 		entry->chunk = chunk_alloc();
+ 		len = strlen(query);
+ 		len = pg_mbcliplen(query, len, pgss->chunk_size - 1);
+ 		memcpy(entry->chunk->query, query, len);
+ 		entry->chunk->query[len] = '\0';
+ 
+ 		SpinLockInit(&entry->mutex);
+ 	}
+ 
+ 	return entry;
+ }
+ 
+ /*
+  * Free least recently used entries.
+  * Caller must be hold an exclusive lock on pgStateStmt->lock.
+  */
+ static int
+ entry_cmp(const void *lhs, const void *rhs)
+ {
+ 	double	l_usage = (*(const Entry **)lhs)->usage;
+ 	double	r_usage = (*(const Entry **)rhs)->usage;
+ 
+ 	if (l_usage < r_usage)
+ 		return -1;
+ 	else if (l_usage > r_usage)
+ 		return +1;
+ 	else
+ 		return 0;
+ }
+ 
+ /*
+  * Deallocate least used entries.
+  */
+ static void
+ entry_dealloc(void)
+ {
+ 	HASH_SEQ_STATUS		hash_seq;
+ 	Entry			  **entries;
+ 	Entry			   *entry;
+ 	int					nvictims;
+ 	int					i;
+ 
+ 	/* Sort entries by usage and deallocate USAGE_DEALLOC_PERCENT of them. */
+ 
+ 	entries = palloc(pgss->num_statements * sizeof(Entry *));
+ 
+ 	i = 0;
+ 	hash_seq_init(&hash_seq, pgss_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		entries[i++] = entry;
+ 		entry->usage *= USAGE_DECREASE_FACTOR;
+ 	}
+ 
+ 	qsort(entries, i, sizeof(Entry *), entry_cmp);
+ 	nvictims = Max(10, i * USAGE_DEALLOC_PERCENT / 100);
+ 
+ 	for (i = 0; i < nvictims; i++)
+ 	{
+ 		chunk_free(entries[i]->chunk);
+ 		(void) SpinLockFree(&entries[i]->mutex);
+ 		hash_search(pgss_hash, &entries[i]->tag, HASH_REMOVE, NULL);
+ 		pgss->num_statements--;
+ 	}
+ 
+ 	pfree(entries);
+ }
+ 
+ static void
+ entry_reset(void)
+ {
+ 	HASH_SEQ_STATUS		hash_seq;
+ 	Entry			   *entry;
+ 
+ 	LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
+ 
+ 	hash_seq_init(&hash_seq, pgss_hash);
+ 	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+ 	{
+ 		(void) SpinLockFree(&entry->mutex);
+ 		hash_search(pgss_hash, &entry->tag, HASH_REMOVE, NULL);
+ 	}
+ 
+ 	pgss->num_statements = 0;
+ 	chunk_reset();
+ 
+ 	LWLockRelease(pgss->lock);
+ }
+ 
+ static Chunk *
+ chunk_alloc(void)
+ {
+ 	Chunk *chunk;
+ 
+ 	Assert(pgss->free_chunks != NULL);
+ 
+ 	chunk = pgss->free_chunks;
+ 	pgss->free_chunks = pgss->free_chunks->next;
+ 	return chunk;
+ }
+ 
+ static void
+ chunk_free(Chunk *chunk)
+ {
+ 	Assert(chunk != NULL);
+ 
+ 	chunk->next = pgss->free_chunks;
+ 	pgss->free_chunks = chunk;
+ }
+ 
+ static void
+ chunk_reset(void)
+ {
+ 	Chunk  *chunk;
+ 	Chunk  *next;
+ 	int				i;
+ 
+ 	chunk = pgss->free_chunks = pgss->chunks;
+ 	for (i = 0; i < max_statements - 1; i++)
+ 	{
+ 		next = (Chunk *) ((char *)chunk + pgss->chunk_size);
+ 		chunk->next = next;
+ 		chunk = next;
+ 	}
+ 	chunk->next = NULL;
+ }
+ 
+ const char *
+ assign_saved_file(const char *newval, bool doit, GucSource source)
+ {
+ 	if (doit)
+ 	{
+ 		char *canon_val = strdup(newval);
+ 		if (canon_val == NULL)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_OUT_OF_MEMORY),
+ 					 errmsg("out of memory")));
+ 		canonicalize_path(canon_val);
+ 		return canon_val;
+ 	}
+ 	else
+ 		return newval;
+ }
diff -cprN HEAD/contrib/pg_stat_statements/pg_stat_statements.sql.in pg_stat_statements/contrib/pg_stat_statements/pg_stat_statements.sql.in
*** HEAD/contrib/pg_stat_statements/pg_stat_statements.sql.in	Thu Jan  1 09:00:00 1970
--- pg_stat_statements/contrib/pg_stat_statements/pg_stat_statements.sql.in	Tue Dec  2 16:07:20 2008
***************
*** 0 ****
--- 1,36 ----
+ -- Adjust this setting to control where the objects get created.
+ SET search_path = public;
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_statements_reset()
+ RETURNS bool
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT query text,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT cpu_user float8,
+     OUT cpu_sys float8,
+     OUT gets int8,
+     OUT reads int8,
+     OUT local_reads int8,
+     OUT rows int8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ -- Register a view.
+ CREATE VIEW pg_stat_statements AS
+     SELECT userid, dbid, query, calls,
+            (total_time * 1000)::int8 AS total_time,
+            ((cpu_user + cpu_sys) * 1000)::int8 AS cpu_time,
+            gets, reads, local_reads, rows
+       FROM pg_stat_statements();
+ 
+ -- Don't want these to be available at public.
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
diff -cprN HEAD/contrib/pg_stat_statements/uninstall_pg_stat_statements.sql pg_stat_statements/contrib/pg_stat_statements/uninstall_pg_stat_statements.sql
*** HEAD/contrib/pg_stat_statements/uninstall_pg_stat_statements.sql	Thu Jan  1 09:00:00 1970
--- pg_stat_statements/contrib/pg_stat_statements/uninstall_pg_stat_statements.sql	Mon Jul 14 18:03:04 2008
***************
*** 0 ****
--- 1,6 ----
+ -- Adjust this setting to control where the objects get dropped.
+ SET search_path = public;
+ 
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ DROP FUNCTION pg_stat_statements_reset();
diff -cprN HEAD/doc/src/sgml/contrib.sgml pg_stat_statements/doc/src/sgml/contrib.sgml
*** HEAD/doc/src/sgml/contrib.sgml	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/doc/src/sgml/contrib.sgml	Fri Dec 12 18:26:03 2008
*************** psql -d dbname -f <replaceable>SHAREDIR<
*** 103,108 ****
--- 103,109 ----
   &pgfreespacemap;
   &pgrowlocks;
   &pgstandby;
+  &pgstatstatements;
   &pgstattuple;
   &pgtrgm;
   &seg;
diff -cprN HEAD/doc/src/sgml/filelist.sgml pg_stat_statements/doc/src/sgml/filelist.sgml
*** HEAD/doc/src/sgml/filelist.sgml	Sat Nov 22 11:15:56 2008
--- pg_stat_statements/doc/src/sgml/filelist.sgml	Fri Dec 12 18:26:03 2008
***************
*** 116,121 ****
--- 116,122 ----
  <!entity pgfreespacemap  SYSTEM "pgfreespacemap.sgml">
  <!entity pgrowlocks      SYSTEM "pgrowlocks.sgml">
  <!entity pgstandby       SYSTEM "pgstandby.sgml">
+ <!entity pgstatstatements SYSTEM "pgstatstatements.sgml">
  <!entity pgstattuple     SYSTEM "pgstattuple.sgml">
  <!entity pgtrgm          SYSTEM "pgtrgm.sgml">
  <!entity seg             SYSTEM "seg.sgml">
diff -cprN HEAD/doc/src/sgml/pgstatstatements.sgml pg_stat_statements/doc/src/sgml/pgstatstatements.sgml
*** HEAD/doc/src/sgml/pgstatstatements.sgml	Thu Jan  1 09:00:00 1970
--- pg_stat_statements/doc/src/sgml/pgstatstatements.sgml	Mon Dec  8 11:48:40 2008
***************
*** 0 ****
--- 1,281 ----
+ <sect1 id="pgstatstatements">
+  <title>pg_stat_statements</title>
+ 
+  <indexterm zone="pgstatstatements">
+   <primary>pg_stat_statements</primary>
+  </indexterm>
+ 
+  <para>
+   The <filename>pg_stat_statements</filename> module provides a means for
+   tracing and analyzing SQL statements executed in a server.
+  </para>
+ 
+  <para>
+   The module should be loaded through <varname>shared_preload_libraries</>
+   because it requires additional shared memory. You might need to restart
+   servers to enable the module.
+  </para>
+ 
+  <para>
+   NOTE: It'd be better to to execute statements using prepared statements
+   or extended protocol if you want to get a better report from the module.
+   If simple protocol is used, statements that have differences only in the
+   query parameters are not grouped in the report.
+  </para>
+ 
+  <sect2>
+   <title>The <structname>pg_stat_statements</structname> view</title>
+ 
+   <para>
+    The definitions of the columns exposed by the view are:
+   </para>
+ 
+   <table>
+    <title><structname>pg_stat_statements</> columns</title>
+ 
+    <tgroup cols="4">
+     <thead>
+      <row>
+       <entry>Name</entry>
+       <entry>Type</entry>
+       <entry>References</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry><structfield>userid</structfield></entry>
+       <entry><type>oid</type></entry>
+       <entry>pg_authid.oid</entry>
+       <entry>User who executed the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>dbid</structfield></entry>
+       <entry><type>oid</type></entry>
+       <entry><literal>pg_class.oid</literal></entry>
+       <entry>Database in which the statement is executed</entry>
+      </row>
+ 
+     <row>
+       <entry><structfield>query</structfield></entry>
+       <entry><type>text</type></entry>
+       <entry></entry>
+       <entry>Query text of the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>calls</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Number of times executed</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>total_time</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total time spent for the statement in milliseconds</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>cpu_time</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>CPU time spent for the statement in milliseconds</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>gets</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total buffer gets during execution</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>reads</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total buffer reads during execution</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_reads</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total direct reads during execution, for example disk sorting and materializing</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>rows</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Number of rows retrieved or affected by the statement</entry>
+      </row>
+ 
+     </tbody>
+    </tgroup>
+   </table>
+ 
+   <para>
+    There is one row for each statement. Statements are grouped when they have
+    same SQL text, are in the same database, and are executed by the same user.
+   </para>
+ 
+   <para>
+    Because of security restriction, non-super users cannot see query strings
+    executed by other users.
+   </para>
+  </sect2>
+ 
+  <sect2>
+   <title>Functions</title>
+ 
+   <variablelist>
+    <varlistentry>
+     <term>
+      <function>pg_stat_statements_reset() returns bool</function>
+     </term>
+ 
+     <listitem>
+      <para>
+       <function>pg_stat_statements_reset</function> resets all of statement
+       statistics. <structname>pg_stat_statements</> view will be empty.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+   </variablelist>
+  </sect2>
+ 
+  <sect2>
+   <title>Configuration parameters</title>
+ 
+   <variablelist>
+    <varlistentry>
+     <term>
+      <varname>statistics.max_statements (<type>integer</type>)</varname>
+     </term>
+ 
+     <listitem>
+      <para>
+       <varname>statistics.max_statements</varname> is the maximum number of
+       statements tracked by the module. If variations of statements are larger
+       than the value, statistics of the least used statement is discarded.
+       The default value is 1000. The module requires addional shared memory
+       about <varname>statistics.max_statements</varname> * <varname>track_activity_query_size</varname>.
+       This parameter can only be set at server start.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term>
+      <varname>statistics.track_statements (<type>string</type>)</varname>
+     </term>
+ 
+     <listitem>
+      <para>
+       <varname>statistics.track_statements</varname> enables collection of
+       statement statistics by the module. Specify top to count only top-level
+       statements, all to also track nested statements, and none to disable.
+       The default value is none.
+       Only superusers can change this setting.
+      </para>
+      <para>
+       NOTE: Shared memory allocated by the module is not released even if
+       statistics collection is disabled.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term>
+      <varname>statistics.saved_file (<type>string</type>)</varname>
+     </term>
+ 
+     <listitem>
+      <para>
+       <varname>statistics.saved_file</varname> specifies the name of saved file
+       where the statement statistics are loaded or dumped. The default value is
+       'global/pg_stat_statements.dump'. If the value is an empty string,
+       statistics are neither loaded nor dumped. This parameter can only be set
+       in the postgresql.conf file or on the server command line.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <para>
+    If you set the above configuration parameters, you also need to
+    add 'statistics' entry to <varname>custom_variable_classes</varname>.
+   </para>
+ 
+   <programlisting>
+   # postgresql.conf
+   shared_preload_libraries = 'pg_stat_statements'
+   
+   custom_variable_classes = 'statistics'
+   #statistics.max_statements = 1000
+   #statistics.track_statements = top
+   #statistics.saved_file = 'global/pg_stat_statements.dump'
+   </programlisting>
+  </sect2>
+ 
+  <sect2>
+   <title>Sample output</title>
+ 
+   <programlisting>
+   $ pgbench -i
+ 
+   postgres=# SELECT pg_stat_statements_reset();
+ 
+   $ pgbench -c10 -t300 -M prepared
+   
+   postgres=# \x
+   postgres=# SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 3;
+   -[ RECORD 1 ]+-------------------------------------------------------------
+   userid       | 10
+   dbid         | 16384
+   query        | UPDATE branches SET bbalance = bbalance + $1 WHERE bid = $2;
+   calls        | 3000
+   total_time   | 34683
+   cpu_time     | 1843
+   gets         | 48166
+   reads        | 12
+   local_reads  | 0
+   rows         | 3000
+   -[ RECORD 2 ]+-------------------------------------------------------------
+   userid       | 10
+   dbid         | 16384
+   query        | UPDATE tellers SET tbalance = tbalance + $1 WHERE tid = $2;
+   calls        | 3000
+   total_time   | 34611
+   cpu_time     | 593
+   gets         | 26447
+   reads        | 7
+   local_reads  | 0
+   rows         | 3000
+   -[ RECORD 3 ]+-------------------------------------------------------------
+   userid       | 10
+   dbid         | 16384
+   query        | UPDATE accounts SET abalance = abalance + $1 WHERE aid = $2;
+   calls        | 3000
+   total_time   | 546
+   cpu_time     | 359
+   gets         | 18025
+   reads        | 274
+   local_reads  | 0
+   rows         | 3000
+   </programlisting>
+  </sect2>
+ 
+  <sect2>
+   <title>Authors</title>
+ 
+   <para>
+    Takahiro Itagaki <email>itagaki.takahiro@oss.ntt.co.jp</email>
+   </para>
+  </sect2>
+ 
+ </sect1>

#29

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Alex Hunsaker (#28)

Re: contrib/pg_stat_statements 1212

"Alex Hunsaker" <badalex@gmail.com> wrote:

A few comments:

Is there a reason you add sourceText to QueryDesc? AFAICT you can do
ActivePortal->sourceText and it will always be populated correctly.

That's for nested statements (SQLs called in stored functions).
ActivePortal->sourceText shows text of only top most query.

I think the explain_analyze_format guc is a clever way of getting
around the explain analyze verbose you proposed earlier. But I dont
see any doc updates for it.

Sure, no docs for now. The guc approach is acceptable?
(I'm not sure whether it is the best way...)
If ok, I'll write docs for it.

Im still not overly fond of the "statistics." custom guc name, but
what can you do...

I have no obsessions with the name. The "pg_stat_statements.*" might
be better to avoid confliction of prefix. If so, we'd better to rename
variables to kill duplication of "statements" from the names.

Ex.
statistics.max_statements -> pg_stat_statements.limit
statistics.track_statements -> pg_stat_statements.target
statistics.saved_file -> pg_stat_statements.saved_file

Other than that it looks good, though I admit I have not had the time
to sit down and thoroughly test it yet...

I found another bug in my patch.

[pg_stat_statements-1212.patch # pg_stat_statements()]
SpinLockAcquire(&entry->mutex);
values[i++] = Int64GetDatumFast(entry->counters.calls);
values[i++] = Float8GetDatumFast(entry->counters.total_time);
values[i++] = Float8GetDatumFast(entry->counters.cpu_user);
values[i++] = Float8GetDatumFast(entry->counters.cpu_sys);
values[i++] = Int64GetDatumFast(entry->counters.gets);
values[i++] = Int64GetDatumFast(entry->counters.reads);
values[i++] = Int64GetDatumFast(entry->counters.lreads);
values[i++] = Int64GetDatumFast(entry->counters.rows);
SpinLockRelease(&entry->mutex);

The variables are not protected by spinlock actually when float64 and
int64 are passed by reference (especially on 32bit platform).
It'd be better to copy values:

Counters tmp;
/* copy the actual values in spinlock */
SpinLockAcquire(&entry->mutex);
tmp = entry->counters;
SpinLockRelease(&entry->mutex);
/* create a tuple after lock is released. */
values[i++] = Int64GetDatumFast(tmp.calls);
values[i++] = Float8GetDatumFast(tmp.total_time);
...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#30

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#29)

Re: contrib/pg_stat_statements 1212

On Mon, Dec 22, 2008 at 00:44, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

"Alex Hunsaker" <badalex@gmail.com> wrote:

A few comments:

Is there a reason you add sourceText to QueryDesc? AFAICT you can do
ActivePortal->sourceText and it will always be populated correctly.

That's for nested statements (SQLs called in stored functions).
ActivePortal->sourceText shows text of only top most query.

I think the explain_analyze_format guc is a clever way of getting
around the explain analyze verbose you proposed earlier. But I dont
see any doc updates for it.

Sure, no docs for now. The guc approach is acceptable?
(I'm not sure whether it is the best way...)
If ok, I'll write docs for it.

I dunno, Im hopping that splitting up the patches and making the
change more visible some more people might chime in :)

Im still not overly fond of the "statistics." custom guc name, but
what can you do...

I have no obsessions with the name. The "pg_stat_statements.*" might
be better to avoid confliction of prefix. If so, we'd better to rename
variables to kill duplication of "statements" from the names.

Ex.
statistics.max_statements -> pg_stat_statements.limit
statistics.track_statements -> pg_stat_statements.target

How about just pg_stat_statements.track ?

statistics.saved_file -> pg_stat_statements.saved_file

I do like the consistency of having the custom gucs be the same as the
module name, easy to grep or google for.

Other than that it looks good, though I admit I have not had the time
to sit down and thoroughly test it yet...

I found another bug in my patch.

[pg_stat_statements-1212.patch # pg_stat_statements()]
SpinLockAcquire(&entry->mutex);
values[i++] = Int64GetDatumFast(entry->counters.calls);
values[i++] = Float8GetDatumFast(entry->counters.total_time);
values[i++] = Float8GetDatumFast(entry->counters.cpu_user);
values[i++] = Float8GetDatumFast(entry->counters.cpu_sys);
values[i++] = Int64GetDatumFast(entry->counters.gets);
values[i++] = Int64GetDatumFast(entry->counters.reads);
values[i++] = Int64GetDatumFast(entry->counters.lreads);
values[i++] = Int64GetDatumFast(entry->counters.rows);
SpinLockRelease(&entry->mutex);

The variables are not protected by spinlock actually when float64 and
int64 are passed by reference (especially on 32bit platform).
It'd be better to copy values:

Counters tmp;
/* copy the actual values in spinlock */
SpinLockAcquire(&entry->mutex);
tmp = entry->counters;
SpinLockRelease(&entry->mutex);
/* create a tuple after lock is released. */
values[i++] = Int64GetDatumFast(tmp.calls);
values[i++] = Float8GetDatumFast(tmp.total_time);
...

Ive only been testing on 64bit... maybe thats why I never ran into this.

#31

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Alex Hunsaker (#30)

1 attachment(s)

contrib/pg_stat_statements 1226

Here is an updated version of contrib/pg_stat_statements patch.

"Alex Hunsaker" <badalex@gmail.com> wrote:

I think the explain_analyze_format guc is a clever way of getting
around the explain analyze verbose you proposed earlier. But I dont
see any doc updates for it.

Documentation is added.

How about just pg_stat_statements.track ?

I renamed the variables to:
- pg_stat_statements.limit
- pg_stat_statements.track
- pg_stat_statements.saved_file

I also modified assign_custom_variable_classes()
to accept '_' as a prefix character, not only 0-9A-Za-z.

I do like the consistency of having the custom gucs be the same as the
module name, easy to grep or google for.

Should I also rename variables used in auto_explain module?
It uses 'explain.*' now.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#32

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: ITAGAKI Takahiro (#31)

Re: contrib/pg_stat_statements 1226

On Thu, Dec 25, 2008 at 23:04, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an updated version of contrib/pg_stat_statements patch.

Im going to do some more testing, give the typedef Chunk stuff another
look (you did fix the race/not protected by a spinlock you found
earlier right?) . And if all looks good mark it as ready for
commiter. (Assuming I find time tonight :))

I think the other questions a -commiter needs to respond to, namely:
- explain_analyze_format guc, do we want it?

- queryDesc->sourceText changes
(look good to me though, and I don't see any other obvious way to do it)

- rtime/utime to explain analyze printout

- moves statistic counters (ReadBufferCount etc...) into struct
Instrumentation via a new struct BufferCounter
(looks like a good cleanup regardless...)

Should I also rename variables used in auto_explain module?
It uses 'explain.*' now.

Well in the school of second thought you *do* have to manually define
them in custom_variable_classes, so maybe its fine. Personally though
I would like them to be auto_explain.*... it seems harder to mistake
them later as having something to do EXPLAIN.

#33

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: Alex Hunsaker (#32)

Re: contrib/pg_stat_statements 1226

On Mon, Dec 29, 2008 at 15:46, Alex Hunsaker <badalex@gmail.com> wrote:

On Thu, Dec 25, 2008 at 23:04, ITAGAKI Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Here is an updated version of contrib/pg_stat_statements patch.

Im going to do some more testing, give the typedef Chunk stuff another
look (you did fix the race/not protected by a spinlock you found
earlier right?) . And if all looks good mark it as ready for
commiter. (Assuming I find time tonight :))

Ok i ran a portion of tpc-h with 4 simultaneous clients doing the same
query on a 32bit dual core machine about a hundred times (lets call
this A). I then compared the # of calls, # of gets and # rows from
pg_stat_statements view to a single run (lets cal this B) to make sure
they matched (i.e. made sure A.gets = A.calls * B.gets), that part all
looks good. So I think if there was a race you squashed it. My only
nit this time around was patch complaining about (Stripping trailing
CRs from patch.) but that's no big deal. So Im going to mark it as
ready for commmiter.

#34

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Alex Hunsaker (#33)

Re: contrib/pg_stat_statements 1226

"Alex Hunsaker" <badalex@gmail.com> writes:

... So Im going to mark it as
ready for commmiter.

Has this patch been tested on Windows? (Or more generally, with EXEC_BACKEND?)

The reason I ask is that eyeballing the code suggests a couple of major
problems in that area:

* the startup/shutdown hooks will be installed in the postmaster
process, but the patch expects them to be executed in a child process.
I think nothing will happen.

* in an EXEC_BACKEND situation, we re-execute
process_shared_preload_libraries() when starting a fresh backend
(but not in other kinds of child processes, which is why the other
problem is a problem). This means re-executing the _PG_init function,
which will try to redefine the custom GUC variables, which will fail.
I don't think this is really a bug in this patch per se, it's a bug
in the custom-GUC support; but nonetheless it looks like a problem.

regards, tom lane

#35

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: Tom Lane (#34)

Re: contrib/pg_stat_statements 1226

On Thu, Jan 1, 2009 at 17:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Alex Hunsaker" <badalex@gmail.com> writes:

... So Im going to mark it as
ready for commmiter.

Has this patch been tested on Windows? (Or more generally, with EXEC_BACKEND?)

I was under the impression thats where Itagaki-san develops.You'll
note some other specific windows changes:

pgstat_track_activity_query_size gains PGDLLIMPORT
process_shared_preload_libraries()
also seems of intreset:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01416.php

varoius carriage returns in the patch...

I could be wrong though.

The reason I ask is that eyeballing the code suggests a couple of major
problems in that area:

* the startup/shutdown hooks will be installed in the postmaster
process, but the patch expects them to be executed in a child process.
I think nothing will happen.

I dunno about this one, not very familiar with EXEC_BACKEND

* in an EXEC_BACKEND situation, we re-execute
process_shared_preload_libraries() when starting a fresh backend
(but not in other kinds of child processes, which is why the other
problem is a problem). This means re-executing the _PG_init function,
which will try to redefine the custom GUC variables, which will fail.
I don't think this is really a bug in this patch per se, it's a bug
in the custom-GUC support; but nonetheless it looks like a problem.

I see 3 options:
- add a GUC_CUSTOM_REDEFINE flag

- ignore redefines of custom gucs

-change the define_custom_variable() to return a bool (or something)
true if it got added
false if it was already there

Seems to me we could probably just ignore any redefines of custom gucs
outright. Im not to worried about some other module picking the same
custom guc. And frankly the op should notice. Especially when they
go to add it to custom_variable_classes.

#36

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Tom Lane (#34)

Re: contrib/pg_stat_statements 1226

I wrote:

* in an EXEC_BACKEND situation, we re-execute
process_shared_preload_libraries() when starting a fresh backend
(but not in other kinds of child processes, which is why the other
problem is a problem). This means re-executing the _PG_init function,
which will try to redefine the custom GUC variables, which will fail.
I don't think this is really a bug in this patch per se, it's a bug
in the custom-GUC support; but nonetheless it looks like a problem.

Oh, never mind that part. I was thinking that the child process would
already know the real definition of the custom variable at the time it
tries to load the shared library, but actually the mechanism for pushing
GUC values into EXEC_BACKEND child processes doesn't transfer the whole
variable definition. It causes any such values to be loaded as
placeholders, and then the later load of the shared library converts the
placeholder to a regular variable. So it all works, or nearly anyway:
the code fails on a custom variable class whose name alphabetically
precedes "custom_variable_class".

http://archives.postgresql.org/pgsql-committers/2009-01/msg00008.php

regards, tom lane

#37

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: Tom Lane (#36)

Re: contrib/pg_stat_statements 1226

On Thu, Jan 1, 2009 at 19:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

* in an EXEC_BACKEND situation, we re-execute
process_shared_preload_libraries() when starting a fresh backend
(but not in other kinds of child processes, which is why the other
problem is a problem). This means re-executing the _PG_init function,
which will try to redefine the custom GUC variables, which will fail.
I don't think this is really a bug in this patch per se, it's a bug
in the custom-GUC support; but nonetheless it looks like a problem.

Oh, never mind that part. I was thinking that the child process would
already know the real definition of the custom variable at the time it
tries to load the shared library, but actually the mechanism for pushing
GUC values into EXEC_BACKEND child processes doesn't transfer the whole
variable definition. It causes any such values to be loaded as
placeholders, and then the later load of the shared library converts the
placeholder to a regular variable.

So it all works, or nearly anyway:
the code fails on a custom variable class whose name alphabetically
precedes "custom_variable_class".

Cool! Err interesting...

http://archives.postgresql.org/pgsql-committers/2009-01/msg00008.php

Yeah I saw your commits just shortly after hitting send on my reply :)

#38

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Tom Lane (#34)

Re: contrib/pg_stat_statements 1226

I wrote:

* the startup/shutdown hooks will be installed in the postmaster
process, but the patch expects them to be executed in a child process.
I think nothing will happen.

OK, I figured out what is happening there: the patch makes it work by
means of this expedient:

diff -cprN HEAD/src/backend/storage/lmgr/proc.c pg_stat_statements/src/backend/storage/lmgr/proc.c
*** HEAD/src/backend/storage/lmgr/proc.c	2008-12-10 02:03:02.366590000 +0900
--- pg_stat_statements/src/backend/storage/lmgr/proc.c	2008-12-26 14:51:58.062500000 +0900
*************** InitAuxiliaryProcess(void)
*** 381,386 ****
--- 381,390 ----
  	if (MyProc != NULL)
  		elog(ERROR, "you already exist");

+ #ifdef EXEC_BACKEND
+ 	process_shared_preload_libraries();
+ #endif
+ 
  	/*
  	 * We use the ProcStructLock to protect assignment and releasing of
  	 * AuxiliaryProcs entries.

I find this mighty Rube Goldbergian. We have a startup hook which is
declared in include/storage/ipc.h, but defined and called in bootstrap.c
(whence it will actually be executed down inside the startup process).
We have a shutdown hook which is also declared in include/storage/ipc.h,
but defined and called in bgwriter.c (executed in the bgwriter process,
of course). And to make those hooks work in the EXEC_BACKEND case, we
have a kluge inserted in proc.c, miles away from where the existing
process_shared_preload_libraries() calls are (in postmaster.c), and with
extremely high probability of someday resulting in duplicate preload
operations if the postmaster.c code gets shuffled.

I don't really care for the idea of initializing the pg_stat_statements
shared memory down inside the startup process. All the rest of shared
memory is initialized by the postmaster process itself, and I think
pg_stat_statements should probably work the same way. Normally I am
against having more functionality in the postmaster than absolutely
necessary, but I don't see any robustness loss in this situation: if we
have the pg_stat_statements init code in the startup subprocess, and it
dies, we'll abort startup anyway. So we might as well run it directly
in the postmaster. I think the right place is probably at the bottom of
CreateSharedMemoryAndSemaphores(). This will serve two purposes: to
create the pg_stat_statements shared memory when run in the postmaster,
or to re-attach to it when starting an EXEC_BACKEND child. The existing
coding in the patch that will try to create or attach to the shared
memory at any old random instant ought to go away --- if one of these
functions is called and doesn't find the pgss shared memory pointer
ready-to-go, it should error out or return quietly as appropriate.
That would indicate that pg_stat_statements.so got loaded into an
already-running backend, which means that the shared memory situation
can't possibly be right.

As for the shutdown hook, I don't think we need it at all in this
design. When loaded into the postmaster process, pg_stat_statements can
insert itself into the on_proc_exit or on_shmem_exit hook lists ... it
doesn't need a private hook.

BTW, as for the question of where to call
process_shared_preload_libraries: we currently have postmaster.c doing
this for itself, and when spawning a regular backend child in the
EXEC_BACKEND case. We don't do it for other child process types;
which is the omission that Itagaki-san tried to work around with
the above diff hunk. You could argue that this is a case of premature
optimization (a/k/a the root of all evil). I think the idea was that
we'd not possibly need any loadable libraries installed in special child
processes --- but that seems more than a little bit dubious if you
think about a loadable library whose goal is to monitor system activity,
as opposed to implementing some SQL-callable functionality. Moreover
it fails to duplicate what will happen in the non-EXEC_BACKEND case;
all child processes will inherit loadable libraries then. So I'm
thinking that Itagaki-san had the right idea in wanting to make
auxiliary processes load libraries, just the wrong implementation.
The right way to make that happen is to rearrange the coding in
SubPostmasterMain() so that process_shared_preload_libraries is
done in all cases, just after the read_nondefault_variables call.

Comments?

regards, tom lane

#39

Alex Hunsaker

badalex@gmail.com

about 17 years ago

In reply to: Tom Lane (#38)

Re: contrib/pg_stat_statements 1226

On Fri, Jan 2, 2009 at 20:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

* the startup/shutdown hooks will be installed in the postmaster
process, but the patch expects them to be executed in a child process.
I think nothing will happen.

OK, I figured out what is happening there: the patch makes it work by
means of this expedient:

<snip>

I find this mighty Rube Goldbergian. We have a startup hook which is
declared in include/storage/ipc.h, but defined and called in bootstrap.c
(whence it will actually be executed down inside the startup process).
We have a shutdown hook which is also declared in include/storage/ipc.h,
but defined and called in bgwriter.c (executed in the bgwriter process,
of course). And to make those hooks work in the EXEC_BACKEND case, we
have a kluge inserted in proc.c, miles away from where the existing
process_shared_preload_libraries() calls are (in postmaster.c), and with
extremely high probability of someday resulting in duplicate preload
operations if the postmaster.c code gets shuffled.

Kudos to Itagaki-san for getting that to work?

As for the shutdown hook, I don't think we need it at all in this
design. When loaded into the postmaster process, pg_stat_statements can
insert itself into the on_proc_exit or on_shmem_exit hook lists ... it
doesn't need a private hook.

Ok cool.

The right way to make that happen is to rearrange the coding in
SubPostmasterMain() so that process_shared_preload_libraries is
done in all cases, just after the read_nondefault_variables call.

This should also fix the rmg hooks patch on EXEC_BACKEND.

#40

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: ITAGAKI Takahiro (#31)

Re: contrib/pg_stat_statements 1226

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:

"Alex Hunsaker" <badalex@gmail.com> wrote:

How about just pg_stat_statements.track ?

I renamed the variables to:
- pg_stat_statements.limit
- pg_stat_statements.track
- pg_stat_statements.saved_file

Hmm, this has got a problem:

regression=# show pg_stat_statements.limit ;
ERROR: syntax error at or near "limit"
LINE 1: show pg_stat_statements.limit ;
^

LIMIT is a reserved word ...

I thought max_statements was fine, but if people think that's too long
I guess we could go with just "max".

I'm not enamored of "saved_file" either, it seems like the wrong
part of speech somehow. Maybe "save_in_file"?

regards, tom lane

#41

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Tom Lane (#40)

Re: contrib/pg_stat_statements 1226

I wrote:

I'm not enamored of "saved_file" either, it seems like the wrong
part of speech somehow. Maybe "save_in_file"?

Actually ... what is the point of letting users control the filename at
all? It seems like the only useful nondefault value is the empty string
(to suppress storing the stats). So why not simplify this to a boolean?
"pg_stat_statements.save" = on or off.

regards, tom lane

#42

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: ITAGAKI Takahiro (#31)

Re: contrib/pg_stat_statements 1226

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:

Here is an updated version of contrib/pg_stat_statements patch.

I've committed this with significant revisions. Other than the points
already mentioned in previous messages:

* I removed the proposed changes to the behavior of the core EXPLAIN
code. I think that that should be submitted and discussed as a separate
patch, not slide in under the misleading title of being a "contrib
module". I'm personally against those changes anyway, on two grounds:

1. The proposed change to track system/user CPU time presents an
enormous cost, and no argument has been made to show that there is any
comparable benefit. The change causes each EXPLAIN ANALYZE tracking
call to invoke getrusage() as well as gettimeofday(). I did a little
bit of testing and found that this is over seven times slower on Fedora
9 on x86_64 (Xeon hardware) and over twenty-seven times slower on Darwin
(on Core 2 Duo hardware). Considering that EXPLAIN ANALYZE overhead is
already higher than anyone would like, you would need a pretty darn
convincing argument to persuade us to accept that kind of slowdown.
At the very least the code would need to be modified so that it doesn't
execute getrusage() unless the user is actually going to look at the
results.

2. I'm unconvinced by the proposed changes to accumulate backend-local
I/O counters, too. The fact of the matter is that those counters are
left over from Berkeley days, a time when PG hackers tended to do their
performance measurements in standalone backends (!). They're obviously
not the full story now on write measurements, and I don't have any
confidence in them as read measurements either, particularly seeing that
the wave of the future is likely to be asynchronous read operations
(with the posix_fadvise patch and foreseeable follow-on work). I think
those counters should more likely be done away with than
institutionalized in EXPLAIN ANALYZE output. You can get more reliable
information about what's happening from the existing pgstats system-wide
I/O counts.

* I changed the default track setting to "top". I don't see the
likelihood that someone would load this module into their server
and not want it turned on.

* I'm not entirely seeing the point of a server-wide tracking facility
that only counts SELECT/INSERT/UPDATE/DELETE. ISTM this should be
modified to count utility commands too; which probably means adding some
hooks around ProcessUtility (and what about autovacuum?). I left that
work for someone else to do, though.

* As already mentioned I find the entry_dealloc logic pretty darn
dubious; but I didn't touch that either in this go-round. If we do
keep it in its current form, ISTM that usage ought to be proportional
to total execution time not total call count.

regards, tom lane

#43

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 17 years ago

In reply to: Tom Lane (#42)

Re: contrib/pg_stat_statements 1226

Thank you for many works.

Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. The proposed change to track system/user CPU time presents an
enormous cost, and no argument has been made to show that there is any
comparable benefit.

2. I'm unconvinced by the proposed changes to accumulate backend-local
I/O counters, too. The fact of the matter is that those counters are
left over from Berkeley days, a time when PG hackers tended to do their
performance measurements in standalone backends (!).

Ok, I need to reconsider performance and design of I/O counters.
I think those information is still useful because we can determine
not only which query is bad, but also why the query is bad
*without repeating the query in production servers*.
But the reworking would be done in 8.5...

* I changed the default track setting to "top". I don't see the
likelihood that someone would load this module into their server
and not want it turned on.

Looks good. Setting shared_preload_libraries is enough for normal use.

* I'm not entirely seeing the point of a server-wide tracking facility
that only counts SELECT/INSERT/UPDATE/DELETE. ISTM this should be
modified to count utility commands too; which probably means adding some
hooks around ProcessUtility (and what about autovacuum?). I left that
work for someone else to do, though.

Sure, but we should also consider utility commands should not kick out
DML queries because maintenance commands take long time typically.

* As already mentioned I find the entry_dealloc logic pretty darn
dubious; but I didn't touch that either in this go-round. If we do
keep it in its current form, ISTM that usage ought to be proportional
to total execution time not total call count.

I think the current implementation is not ideal, too.
I also don't like using fixed-size shared memory. I'm thinking
that shared memory used by extended modules to be allocated
from the shared buffer pool in the future. If we could do it,
memory management will be flexible and we can load libraries
after server starts.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#44

Gregory Stark

stark@enterprisedb.com

about 17 years ago

In reply to: Tom Lane (#42)

Re: contrib/pg_stat_statements 1226

Tom Lane <tgl@sss.pgh.pa.us> writes:

2. I'm unconvinced by the proposed changes to accumulate backend-local
I/O counters, too. The fact of the matter is that those counters are
left over from Berkeley days, a time when PG hackers tended to do their
performance measurements in standalone backends (!). They're obviously
not the full story now on write measurements, and I don't have any
confidence in them as read measurements either, particularly seeing that
the wave of the future is likely to be asynchronous read operations
(with the posix_fadvise patch and foreseeable follow-on work). I think
those counters should more likely be done away with than
institutionalized in EXPLAIN ANALYZE output. You can get more reliable
information about what's happening from the existing pgstats system-wide
I/O counts.

It's clear that these counters are not the whole picture. But I think that
just means we need more counters with more sources of data, not that we need
to get rid of the ones we have. There's no reason we shouldn't have counters
for advised buffers along with reads.

I'm also picturing using dtrace to find out how many reads were satisfied from
cache and how many required physical reads for example.

The system-wide stats satisfy a very different need from this. The sysadmin or
DBA might want to know about system-wide stats but a programmer or a DBA
analyzing a specific query needs to know what that query is doing.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!