Progress indication prototype
Here is a small prototype for a query progress indicator.
Past discussions seemed to indicate that the best place to report this
would be in pg_stat_activity. So that's what this does. You can try it
out with any of the following (on a sufficiently large table): VACUUM
(lazy) (also autovacuum), COPY out from table, COPY in from file,
table-rewriting ALTER TABLE (e.g., add column with default), or a very
simple query. Run the command, and stare at pg_stat_activity (perhaps
via "watch") in a separate session.
More can be added, and the exact placement of the counters is debatable,
but those might be details to be worked out later. Note, my emphasis
here is on maintenance commands; I don't plan to do a progress
estimation of complex queries at this time.
Some thoughts:
- Are the interfaces OK?
- Is this going to be too slow to be useful?
- Should there be a separate switch to turn it on (currently
track_activities)?
- How to handle commands that process multiple tables? For example,
lazy VACUUM on a single table is pretty easy to track (count the block
numbers), but what about a database-wide lazy VACUUM?
Other comments?
Attachments:
progress-indication.patchtext/x-patch; charset=UTF-8; name=progress-indication.patchDownload+90-5
* Peter Eisentraut (peter_e@gmx.net) wrote:
Other comments?
Will we be able to use it for psql while keeping just one database
connection? I assume the answer is 'no', but it sure would be nice..
Perhaps psql could do something for \copy commands, anyway, but it'd be
independent.
Thanks,
Stephen
On Tue, August 17, 2010 07:19, Peter Eisentraut wrote:
Here is a small prototype for a query progress indicator.
The patch applies to cvs HEAD (9.1devel) and compiles OK, but make check fails.
./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator
--with-pgport=6548 --quiet --enable-depend --enable-cassert --enable-debug --with-openssl
--with-perl --with-libxml
Running initdb manually gives the following error:
$ /var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/bin/initdb -U rijkers -D
/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data -E UTF8 -A md5
--pwfile=/var/data1/pg_stuff/.90devel
The files belonging to this database system will be owned by user "rijkers".
This user must also own the server process.
The database cluster will be initialized with locale en_US.UTF-8.
The default text search configuration will be set to "english".
creating directory /var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in
/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data/base/1 ... FATAL: could not
create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(2614) is duplicated.
child process exited with exit code 1
initdb: removing data directory "/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data"
this is on centos 5.4 - x86_64 GNU/Linux (2.6.18-164.el5)
Erik Rijkers
On tis, 2010-08-17 at 08:31 -0400, Stephen Frost wrote:
Will we be able to use it for psql while keeping just one database
connection? I assume the answer is 'no', but it sure would be nice..
How do you expect that to behave? I suppose the backend could send
NOTICE-like messages every 1% or so, and then psql could try to display
that in some way (which?), but then I suspect that a) it will annoy some
people, so b) it will have to be off by default, and then c) it won't be
enabled when you need it.
On tis, 2010-08-17 at 15:59 +0200, Erik Rijkers wrote:
creating template1 database in
/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data/base/1 ... FATAL: could not
create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(2614) is duplicated.
Probably merge conflict with parallel developments. Try changing the
OID.
On Tue, Aug 17, 2010 at 06:31, Stephen Frost <sfrost@snowman.net> wrote:
* Peter Eisentraut (peter_e@gmx.net) wrote:
Other comments?
Will we be able to use it for psql while keeping just one database
connection? I assume the answer is 'no', but it sure would be nice..
I think thats something that could be worked out in libpq after this
patch. Although I'd bump your nice to an awesome.
* Alex Hunsaker (badalex@gmail.com) wrote:
On Tue, Aug 17, 2010 at 06:31, Stephen Frost <sfrost@snowman.net> wrote:
* Peter Eisentraut (peter_e@gmx.net) wrote:
Other comments?
Will we be able to use it for psql while keeping just one database
connection? I assume the answer is 'no', but it sure would be nice..I think thats something that could be worked out in libpq after this
patch. Although I'd bump your nice to an awesome.
If it was configurable via \set and I could drop it in my .psqlrc, and
it knew not to only do it after a few seconds (otherwise it'd be far too
much)...
I don't like how the backend would have to send something NOTICE-like, I
had originally been thinking "gee, it'd be nice if psql could query
pg_stat while doing something else", but that's not really possible...
So, I guess NOTICE-like messages would work, if the backend could be
taught to do it.
Thanks,
Stephen
On Tue, August 17, 2010 19:13, Peter Eisentraut wrote:
On tis, 2010-08-17 at 15:59 +0200, Erik Rijkers wrote:
creating template1 database in
/var/data1/pg_stuff/pg_installations/pgsql.progress_indicator/data/base/1 ... FATAL: could not
create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(2614) is duplicated.Probably merge conflict with parallel developments. Try changing the
OID.
Could you elaborate? What is a 'merge conflict'? Or 'parallel developments'?
Do you mean the current git conversion? (I get source from a local rsync'ed cvs repository)
How can I 'change OID'? This error comes out of an initial initdb run. (There are several other
test-instances on this machine (several running), but with their own $PGDATA, $PGPORT. - they
can't interfere with each other, can they?)
thanks,
Erik Rijkers
--On 17. August 2010 20:08:51 +0200 Erik Rijkers <er@xs4all.nl> wrote:
How can I 'change OID'? This error comes out of an initial initdb run.
(There are several other test-instances on this machine (several
running), but with their own $PGDATA, $PGPORT. - they can't interfere
with each other, can they?)
I assume Peter means an OID conflict, resulting from concurrent patches or
drifting code.
Looks like pg_stat_get_backend_progress() has a conflict in current HEAD
with xmlexists() (both will get 2614 in my current version of pg_proc.h).
You need to resolve this to have initdb succeed.
--
Thanks
Bernd
(Sorry for top posting and for any typos -- typing on my phone)
In my earlier patch to do progress indicators for arbitrary SQL queries I
had envisioned a setup similar to how we handle query cancellation. Psql
could support a key like SIGINFO which would make it request an update.
Clients like pgadmin would either do that periodically or set some guc or
protocol option to request periodic updates in advance.
greg
On 17 Aug 2010 19:07, "Stephen Frost" <sfrost@snowman.net> wrote:
* Alex Hunsaker (badalex@gmail.com) wrote:
On Tue, Aug 17, 2010 at 06:31, Stephen Frost <sfrost@sn...
If it was configurable via \set and I could drop it in my .psqlrc, and
it knew not to only do it after a few seconds (otherwise it'd be far too
much)...
I don't like how the backend would have to send something NOTICE-like, I
had originally been thinking "gee, it'd be nice if psql could query
pg_stat while doing something else", but that's not really possible...
So, I guess NOTICE-like messages would work, if the backend could be
taught to do it.
Thanks,
Stephen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkxqzFwACgkQrzgMPqB3kijVbACfWkUc/A5+6NaViTf8f9yrN/vT
Y3AAn1eDvj4meqxlr05r0L51j+OypNqs
=f+ya
-----END PGP SIGNATURE-----
Import Notes
Reply to msg id not found: AANLkTikVmyuyqjHOJaRJfy8ftaOHBkpAcR6Qj-5WEDpc@mail.gmail.com
On Tue, Aug 17, 2010 at 10:53 PM, Greg Stark <stark@mit.edu> wrote:
(Sorry for top posting and for any typos -- typing on my phone)
In my earlier patch to do progress indicators for arbitrary SQL queries I
had envisioned a setup similar to how we handle query cancellation. Psql
could support a key like SIGINFO which would make it request an update.
Clients like pgadmin would either do that periodically or set some guc or
protocol option to request periodic updates in advance.
Which is ideal for monitoring your own connection - having the info in
the pg_stat_activity is also valuable for monitoring and system
administration. Both would be ideal :-)
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company
On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage@pgadmin.org> wrote:
Which is ideal for monitoring your own connection - having the info in
the pg_stat_activity is also valuable for monitoring and system
administration. Both would be ideal :-)
Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...
But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries. This would improve
the DBAs view of what's going on in the system immensely. Currently
you have to grab the query and try to set up a similar environment for
it to run explain on it. If analyze has run since or if the tables
have grown or shrank or if the query was run with some constants as
parameters it can be awkward. If some of the tables in the query were
temporary tables it can be impossible. You can never really be sure
you're looking at precisely the same plan than the other user's
session is running.
But stuffing the whole json or xml explain plan into pg_stat_activity
seems like it doesn't really fit the same model that the existing
infrastructure is designed around. It could be quite large and if we
want to support progress feedback it could change quite frequently.
We do stuff the whole query there (up to a limited size) so maybe I'm
all wet and stuffing the explain plan in there would be fine?
--
greg
On 18 August 2010 13:45, Greg Stark <stark@mit.edu> wrote:
On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage@pgadmin.org> wrote:
Which is ideal for monitoring your own connection - having the info in
the pg_stat_activity is also valuable for monitoring and system
administration. Both would be ideal :-)Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries.
Do you mean just see the explain plan? Or see at what stage of the
plan the query has reached? I think the latter would be awesome. And
if it's broken down by step, wouldn't it be feasible to knew how far
through that step it's got for some steps? Obviously for ones with a
LIMIT applied it wouldn't know how far through it had got, but for
things like a sequential scan or sort it should be able to indicate
how far through it is.
--
Thom Brown
Registered Linux user: #516935
On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark <stark@mit.edu> wrote:
On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage@pgadmin.org> wrote:
Which is ideal for monitoring your own connection - having the info in
the pg_stat_activity is also valuable for monitoring and system
administration. Both would be ideal :-)Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries. This would improve
the DBAs view of what's going on in the system immensely. Currently
you have to grab the query and try to set up a similar environment for
it to run explain on it. If analyze has run since or if the tables
have grown or shrank or if the query was run with some constants as
parameters it can be awkward. If some of the tables in the query were
temporary tables it can be impossible. You can never really be sure
you're looking at precisely the same plan than the other user's
session is running.But stuffing the whole json or xml explain plan into pg_stat_activity
seems like it doesn't really fit the same model that the existing
infrastructure is designed around. It could be quite large and if we
want to support progress feedback it could change quite frequently.We do stuff the whole query there (up to a limited size) so maybe I'm
all wet and stuffing the explain plan in there would be fine?
It seems to me that progress reporting could add quite a bit of
overhead. For example, in the whole-database vacuum case, the most
logical way to report progress would be to compute pages visited
divided by pages to be visited. But the total number of pages to be
visited is something that doesn't need to be computed in advance
unless someone cares about progress. I don't think we want to incur
that overhead in all cases just on the off chance someone might ask.
We need to think about ways to structure this so that it only costs
when someone's using it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On tis, 2010-08-17 at 13:52 -0400, Stephen Frost wrote:
I don't like how the backend would have to send something NOTICE-like,
I had originally been thinking "gee, it'd be nice if psql could query
pg_stat while doing something else", but that's not really possible...
So, I guess NOTICE-like messages would work, if the backend could be
taught to do it.
That should be doable; you'd just have to do some ereport(NOTICE)
variant inside pgstat_report_progress and have a switch to turn it on
and off, and have psql do something with it. The latter is really the
interesting part; the former is relatively easy once the general
framework is in place.
On ons, 2010-08-18 at 13:45 +0100, Greg Stark wrote:
But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries.
The auto_explain module does that already.
On Aug 18, 2010, at 9:02 AM, Robert Haas wrote:
On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark <stark@mit.edu> wrote:
On Tue, Aug 17, 2010 at 11:29 PM, Dave Page <dpage@pgadmin.org> wrote:
Which is ideal for monitoring your own connection - having the info in
the pg_stat_activity is also valuable for monitoring and system
administration. Both would be ideal :-)Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries. This would improve
the DBAs view of what's going on in the system immensely. Currently
you have to grab the query and try to set up a similar environment for
it to run explain on it. If analyze has run since or if the tables
have grown or shrank or if the query was run with some constants as
parameters it can be awkward. If some of the tables in the query were
temporary tables it can be impossible. You can never really be sure
you're looking at precisely the same plan than the other user's
session is running.But stuffing the whole json or xml explain plan into pg_stat_activity
seems like it doesn't really fit the same model that the existing
infrastructure is designed around. It could be quite large and if we
want to support progress feedback it could change quite frequently.We do stuff the whole query there (up to a limited size) so maybe I'm
all wet and stuffing the explain plan in there would be fine?It seems to me that progress reporting could add quite a bit of
overhead. For example, in the whole-database vacuum case, the most
logical way to report progress would be to compute pages visited
divided by pages to be visited. But the total number of pages to be
visited is something that doesn't need to be computed in advance
unless someone cares about progress. I don't think we want to incur
that overhead in all cases just on the off chance someone might ask.
We need to think about ways to structure this so that it only costs
when someone's using it.
I wish that I could get explain analyze output step-by-step while running a long query instead of seeing it jump out at the end of execution. Some queries "never" end and it would be nice to see which step is spinning (explain can be a red herring). To me the "progress bar" is nice, but I don't see how it would be reliable enough to draw any inferences (such as execution time). If I could get the explain analyze results *and* the actual query results, that would be a huge win, too.
Cheers,
M
On Tue, Aug 17, 2010 at 2:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
Here is a small prototype for a query progress indicator.
I read and tested the patch. Here are comments to the code itself.
- Oid of pg_stat_get_backend_progress() must be changed because we are using
the id for another function.
- One complier warning:
copy.c:1702: warning: ‘file_size’ may be used uninitialized in this function
- We can move the division "work_done/work_total" to outside of
st_changecount++ block.
Past discussions seemed to indicate that the best place to report this
would be in pg_stat_activity.
Agreed. BTW, "query_progress" column shows NaN if progress
counter is unavailable, but NULL would be better.
VACUUM (lazy) (also autovacuum), table-rewriting ALTER TABLE
We could also support VACUUM FULL, CLUSTER, CREATE INDEX and REINDEX.
COPY out from table, COPY in from file,
COPY FROM STDIN shows Infinity, but NULL might be better, too.
a very simple query.
SELECT * FROM tbl;
can report reasonable progress, but
SELECT count(*) FROM tbl;
cannot, because planned_tuple_count of the aggregation is 1.
I hope better solutions for the grouping case because they are used
in complex queries, where the progress counter is eagerly wanted.
- Are the interfaces OK?
I like the new column in pg_stat_activity to "pull" the progress.
In addition, as previously discussed, we could also have "push"
notifications; Ex. GUC parameter "notice_per_progress" (0.0-1.0),
or periodical NOTIFY messages.
- Is this going to be too slow to be useful?
- Should there be a separate switch to turn it on (currently
track_activities)?
I think we can always track the counters because shared memory
based counters are lightweight enough.
- How to handle commands that process multiple tables? For example,
lazy VACUUM on a single table is pretty easy to track (count the block
numbers), but what about a database-wide lazy VACUUM?
Not only a database-wide lazy VACUUM but also some of maintenance
commands have non-linear progress -- Progress of index scans in VACUUM
is not linear. ALTER TABLE could have REINDEX after table rewrites.
We might need to have arbitrary knowledges for the non-uniform commands;
For example, "CREATE INDEX assigns 75% of the progress for table scan,
and 25% for the final merging of tapes".
--
Itagaki Takahiro
On tor, 2010-09-16 at 15:47 +0900, Itagaki Takahiro wrote:
On Tue, Aug 17, 2010 at 2:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
VACUUM (lazy) (also autovacuum), table-rewriting ALTER TABLE
We could also support VACUUM FULL, CLUSTER, CREATE INDEX and REINDEX.
Well, yeah, but those are a lot harder to do. ;-)
a very simple query.
SELECT * FROM tbl;
can report reasonable progress, but
SELECT count(*) FROM tbl;
cannot, because planned_tuple_count of the aggregation is 1.
I hope better solutions for the grouping case because they are used
in complex queries, where the progress counter is eagerly wanted.
I think that's a problem for a later day. Once we have the interfaces
to report the progress, someone (else) can investigate how to track
progress of arbitrary queries.
- Are the interfaces OK?
I like the new column in pg_stat_activity to "pull" the progress.
In addition, as previously discussed, we could also have "push"
notifications; Ex. GUC parameter "notice_per_progress" (0.0-1.0),
or periodical NOTIFY messages.
That's a three-line change in pgstat_report_progress() in the simplest
case. Maybe also something to consider later.
- How to handle commands that process multiple tables? For example,
lazy VACUUM on a single table is pretty easy to track (count the block
numbers), but what about a database-wide lazy VACUUM?Not only a database-wide lazy VACUUM but also some of maintenance
commands have non-linear progress -- Progress of index scans in VACUUM
is not linear. ALTER TABLE could have REINDEX after table rewrites.We might need to have arbitrary knowledges for the non-uniform commands;
For example, "CREATE INDEX assigns 75% of the progress for table scan,
and 25% for the final merging of tapes".
Maybe another approach is to forget about presenting progress
numerically. Instead, make it a string that saying something like, for
example for database-wide VACUUM, 'table 1/14 block 5/32'. That way you
can cover anything you want, and you give the user the most accurate
information available, but then you can't do things like sort
pg_stat_activitiy by expected end time, or display a progress bar. Or
of course we could do numerically and string, but that might be a bit
too much clutter.
On Thu, Sep 16, 2010 at 2:52 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
a very simple query.
SELECT * FROM tbl;
can report reasonable progress, but
SELECT count(*) FROM tbl;
cannot, because planned_tuple_count of the aggregation is 1.
I hope better solutions for the grouping case because they are used
in complex queries, where the progress counter is eagerly wanted.I think that's a problem for a later day. Once we have the interfaces
to report the progress, someone (else) can investigate how to track
progress of arbitrary queries.
I reiterate my earlier criticism of this whole approach: it seems to
assume that computing query progress is something inexpensive enough
that we can afford to do it regardless of whether anyone is looking.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company