Access statistics

Started by Jan Wieckover 24 years ago13 messages
#1Jan Wieck
JanWieck@Yahoo.com

One more feature for discussion :-)

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.

Should I apply the patches or provide a separate patch for
review first?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#1)
Re: Access statistics

Jan Wieck <JanWieck@Yahoo.com> writes:

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.
Should I apply the patches or provide a separate patch for
review first?

Considering that you've not offered any detailed information about
what you plan to do (AFAIR), a patch for review first would be polite ...

regards, tom lane

#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jan Wieck (#1)
Re: Access statistics

One more feature for discussion :-)

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.

Should I apply the patches or provide a separate patch for
review first?

I like doing a cvs diff -c and throwing the patch to PATCHES just before
commit. That way, people can see may changes easier and find problems.
Of course, if I am at all unsure, I post to patches and wait 2 days.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#2)
Re: Access statistics

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.
Should I apply the patches or provide a separate patch for
review first?

Considering that you've not offered any detailed information about
what you plan to do (AFAIR), a patch for review first would be polite ...

We had that discussion a couple of weeks ago down to if it's
better to use UNIX or INET domain UDP sockets. But I expected
it not to be detailed enough for our current quality level
:-)

It's incomplete anyway since the per database configuration
in pg_database is missing and some other details I need to
tidy up. So I'll wrap up a patch tomorrow.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Jan Wieck (#1)
Re: Access statistics

Jan Wieck writes:

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.

Should I apply the patches or provide a separate patch for
review first?

Maybe you could describe what it's going to do and how it's going to work.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#6Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Peter Eisentraut (#5)
AW: Access statistics

One more feature for discussion :-)

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.

Should I apply the patches or provide a separate patch for
review first?

One concern I remember from memory was, that the table names
did not conform to the system table semantics of "pg_*". (pgstat_*)
Have you, or would you change that ?

Andreas

#7Jan Wieck
JanWieck@Yahoo.com
In reply to: Peter Eisentraut (#5)
Re: Access statistics

Peter Eisentraut wrote:

Jan Wieck writes:

In the next couple of hours (at least tomorrow) I would be
ready to commit the backend changes for table-/index-access
statistics and current backend activity views.

Should I apply the patches or provide a separate patch for
review first?

Maybe you could describe what it's going to do and how it's going to work.

Real programmers don't comment - if it was hard to write it
should be hard to read :-)

So outing myself not beeing a *real programmer*, this is what
I have so far:

* On startup the postmaster creates an INET domain UDP socket
and bind(2)'s it to localhost:0, meaning the kernel will
assign a yet unused, unprivileged port that could be seen
with getsockaddr(2).

It then starts two background processes of which one is
simply a wraparound buffer doing recvfrom(2) on the socket,
checking that the source address of the received packets is
the sockets own address (!) and forwarding approved ones
over a pipe to the second one, discribed later.

* Backends call some collector functions at various places
now (these will finally be macros), that count up table
scans, tuples returned by scans, buffer fetches/hits and
the like. At the beginning of a statement the backends send
a message telling the first couple of hundred bytes of the
querystring and after the statement is done (just before
getting ready for the next command) they send the collected
access numbers.

Tables, indexes etc. in these statistics are identified by
OID, so the data doesn't tell much so far.

* The second background process fired by the postmaster
collects these numbers into hashtables and as long as it
receives messages, it'll write out a summary file every 500
or so milliseconds, telling a snapshot of current stats.
Using select(2) with timeouts ensures that a complete idle
DB instance not to waste a single CPU cycle or IO to write
these snapshots.

On startup it tries to read the last snapshot file in, so
the collected statistics survive a postmaster restart.

Vacuum reads the file too and sends bulk delete messages
for objects that are gone. So the stats don't grow
infinitely.

* A bunch of new builtin functions gain access to the
snapshot file. At first call of one of these functions
during a transaction, the backend will read the current
file and return the numbers from in memory then.

Based on these functions a couple of views can tell these
collected stats. Information from the databases system
catalog is of course required to identify the objects in
the stats, but I think those informations should only be
visible to someone who identified herself as a valid DB
user anyway.

The visibility of querystrings (this info is available
cross DB) is restricted to DB superusers.

There has been discussion already about using an INET vs.
UNIX UDP socket for the communication. At least for Linux I
found INET to be the most effective way of communication. And
for security concerns: If someone else than root can really
send packets to that socket that show up with a source
address of 127.0.0.1:n, where n is a portnumber actually
occupied by your own socket, be sure you'll have more severe
problems than modified access statistics.

The views should be considered examples. The final naming and
layout is subject for discussion.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#7)
Re: Access statistics

Jan Wieck <JanWieck@Yahoo.com> writes:

So outing myself not beeing a *real programmer*, this is what
I have so far:

Hmm ... what is the performance of all this like? Seems like a lot
of overhead. Can it be turned off?

* Backends call some collector functions at various places
now (these will finally be macros), that count up table
scans, tuples returned by scans, buffer fetches/hits and
the like.

Have you removed the existing stats-gathering support
(backend/access/heap/stats.c and so on)? That would buy back
at least a few of the cycles involved ...

regards, tom lane

#9Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#8)
Re: Access statistics

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

So outing myself not beeing a *real programmer*, this is what
I have so far:

Hmm ... what is the performance of all this like? Seems like a lot
of overhead. Can it be turned off?

Current performance loss is about 2-4% wallclock measured. I
expect it to become better when turning some of the functions
into macros.

The plan is to add another column to pg_database that can be
used to turn it on/off on a per database level. Backends just
decide at startup if they collect and send for their session
lifetime.

* Backends call some collector functions at various places
now (these will finally be macros), that count up table
scans, tuples returned by scans, buffer fetches/hits and
the like.

Have you removed the existing stats-gathering support
(backend/access/heap/stats.c and so on)? That would buy back
at least a few of the cycles involved ...

Not sure if we really should. Let's later decide if it's
really obsolete.

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#9)
Re: Access statistics

Jan Wieck <JanWieck@yahoo.com> writes:

Tom Lane wrote:

Have you removed the existing stats-gathering support
(backend/access/heap/stats.c and so on)? That would buy back
at least a few of the cycles involved ...

Not sure if we really should. Let's later decide if it's
really obsolete.

Considering that Bruce long ago ifdef'd out all the code that could
actually *do* anything with those stats (like print them), I'd say
it's obsolete. In any case, it's too confusing to have two sets of
stats-gathering code in there. I vote for getting rid of the old
stuff.

regards, tom lane

#11Mike Mascari
mascarm@mascari.com
In reply to: Tom Lane (#10)
RE: Access statistics

For what its worth,

Oracle has a nice feature for resource management called PROFILEs:

CREATE PROFILE profile LIMIT
[ SESSION_PER_USER [ session_limit | UNLIMITED | DEFAULT ] ]
[ CPU_PER_SESSION cpu_session_limit | UNLIMITED | DEFAULT ] ]
[ CPU_PER_CALL cpu_call_limit | UNLIMITED | DEFAULT ] ]
[ CONNECT_TIME connect_limit | UNLIMITED | DEFAULT ] ]
[ IDLE_TIME idle_limit | UNLIMITED | DEFAULT ] ]
[ LOGICAL READS_PER_SESSION read_session_limit | UNLIMITED | DEFAULT
] ]
[ LOGICAL READS_PER_CALL read_call_limit | UNLIMITED | DEFAULT ] ]
[ PRIVATE_SGA memory_limit | UNLIMITED | DEFAULT ] ]
[ COMPOSITE_LIMIT resource_cost_limit | UNLIMITED | DEFAULT ] ]

which limits things like CPU_PER_CALL and LOGICAL_READS_PER_SESSION
to a profile. The ALTER USER command then allows you to assign a
PROFILE to a user. This is really nice, since you can prevent
run-away queries from denying service by the database to other users.
It also can prevent a user from soaking up all of the available
connections. You must set a flag in your initSID.ora configuration
profile for ORACLE to support profiles. Since Jan is collecting these
statistics any way (if the appropriate configuration flag is set), it
would be pretty trivial to implement PROFILEs in PostgreSQL.

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From: Tom Lane [SMTP:tgl@sss.pgh.pa.us]

Jan Wieck <JanWieck@Yahoo.com> writes:

So outing myself not beeing a *real programmer*, this is what
I have so far:

Hmm ... what is the performance of all this like? Seems like a lot
of overhead. Can it be turned off?

* Backends call some collector functions at various places
now (these will finally be macros), that count up table
scans, tuples returned by scans, buffer fetches/hits and
the like.

Have you removed the existing stats-gathering support
(backend/access/heap/stats.c and so on)? That would buy back
at least a few of the cycles involved ...

regards, tom lane

#12Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jan Wieck (#9)
Re: Access statistics

Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

So outing myself not beeing a *real programmer*, this is what
I have so far:

Hmm ... what is the performance of all this like? Seems like a lot
of overhead. Can it be turned off?

Current performance loss is about 2-4% wallclock measured. I
expect it to become better when turning some of the functions
into macros.

At 2-4%, I assume it is not enabled by default. I can see the query
string part being enabled by default though.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#13Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#10)
Re: Access statistics

Jan Wieck <JanWieck@yahoo.com> writes:

Tom Lane wrote:

Have you removed the existing stats-gathering support
(backend/access/heap/stats.c and so on)? That would buy back
at least a few of the cycles involved ...

Not sure if we really should. Let's later decide if it's
really obsolete.

Considering that Bruce long ago ifdef'd out all the code that could
actually *do* anything with those stats (like print them), I'd say
it's obsolete. In any case, it's too confusing to have two sets of
stats-gathering code in there. I vote for getting rid of the old
stuff.

I agree. Rip away.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026