Thick indexes - a look at count(1) query

Started by Gokulakannan Somasundaramalmost 18 years ago5 messages
#1Gokulakannan Somasundaram
gokul007@gmail.com

Hi,
I have submitted a new patch against thick indexes(indexes with snapshot)
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php.<http://archives.postgresql.org/pgsql-hackers/2007-10/msg00220.php>

I did look closely at improving the performance of count(1) queries. It
worked well, when we are selecting a sub-set of the result-set. But when i
did a query like "select count(1) from table", it showed a improvement in
response time, but not to the extent, i wanted it to be. Let's have a look
at the stats.

gokul=# explain analyze select count(1) from dd;
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.277208 elapsed 0.275457 user 0.000148 system sec
! [1.128422 user 0.004976 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
! 0/39 [5/160] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 1024 Logical Reads, 0 Physical
Reads, 0 written, buffer hit rate = 100.00%
! Local blocks: 0 read, 0 written, buffer
hit rate = 0.00%
! Direct blocks: 0 read, 0 written
STATEMENT: explain analyze select count(1) from dd;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual time=
276.838..276.838 rows=1 loops=1)
-> Index Only Scan using idx on dd (cost=0.00..6741.42 rows=232671
width=0) (actual time=0.042..160.753 rows=232679 loops=1)
Total runtime: 276.928 ms
(3 rows)

gokul=# set enable_indexscan=off;
SET

gokul=# explain analyze select count(1) from dd;
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.331441 elapsed 0.258903 user 0.067953 system sec
! [1.906069 user 0.211479 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
! 0/39 [9/312] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 5223 Logical Reads, 4391 Physical
Reads, 0 written, buffer hit rate = 15.93%
! Local blocks: 0 read, 0 written, buffer
hit rate = 0.00%
! Direct blocks: 0 read, 0 written
STATEMENT: explain analyze select count(1) from dd;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual time=
331.075..331.076 rows=1 loops=1)
-> Seq Scan on dd (cost=0.00..7549.71 rows=232671 width=0) (actual
time=0.042..203.958 rows=232679 loops=1)
Total runtime: 331.167 ms
(3 rows)

I have run the query multiple times and it shows the response time, around
what is shown here
The table is just a multiple copy of pg_class table( approx 200000 rows). As
it can be seen, the Logical reads show a ratio of 1:5, but the response
time is not in the same ratio. I tried to profile and couldn't find anything
significant. Eventhough it shows 4391 physical reads, that's from OS cache,
since i ave already run the query multiple times.

One more disadvantage with using select count(1) using index scan is that,
it pollutes the shared memory, unlike full-table scans. But something can be
done in the regard.

Any thoughts?

Thanks,
Gokul.

#2Decibel!
decibel@decibel.org
In reply to: Gokulakannan Somasundaram (#1)
1 attachment(s)
Re: Thick indexes - a look at count(1) query

On Jan 16, 2008, at 8:07 AM, Gokulakannan Somasundaram wrote:

gokul=# explain analyze select count(1) from dd;
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.277208 elapsed 0.275457 user 0.000148 system sec
! [1.128422 user 0.004976 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
! 0/39 [5/160] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 1024 Logical Reads, 0
Physical Reads, 0 written, buffer hit rate = 100.00%
! Local blocks: 0 read, 0 written,
buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written
STATEMENT: explain analyze select count(1) from dd;
QUERY PLAN
----------------------------------------------------------------------
------------------------------------------------------------
Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual
time=276.838..276.838 rows=1 loops=1)
-> Index Only Scan using idx on dd (cost=0.00..6741.42
rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)
Total runtime: 276.928 ms
(3 rows)

gokul=# set enable_indexscan=off;
SET

gokul=# explain analyze select count(1) from dd;
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.331441 elapsed 0.258903 user 0.067953 system sec
! [1.906069 user 0.211479 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
! 0/39 [9/312] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 5223 Logical Reads, 4391
Physical Reads, 0 written, buffer hit rate = 15.93%
! Local blocks: 0 read, 0 written,
buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written
STATEMENT: explain analyze select count(1) from dd;
QUERY PLAN
----------------------------------------------------------------------
-------------------------------------------
Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual
time=331.075..331.076 rows=1 loops=1)
-> Seq Scan on dd (cost=0.00..7549.71 rows=232671 width=0)
(actual time=0.042..203.958 rows=232679 loops=1)
Total runtime: 331.167 ms
(3 rows)

I have run the query multiple times and it shows the response time,
around what is shown here
The table is just a multiple copy of pg_class table( approx 200000
rows). As it can be seen, the Logical reads show a ratio of 1:5,
but the response time is not in the same ratio. I tried to profile
and couldn't find anything significant. Eventhough it shows 4391
physical reads, that's from OS cache, since i ave already run the
query multiple times.

Have you tried just executing the query with executor stats on? You
could be seeing the overhead of explain analyze...
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#3Gokulakannan Somasundaram
gokul007@gmail.com
In reply to: Decibel! (#2)
Re: Thick indexes - a look at count(1) query

As you said, 'gettimeofday' is occupying 40% of the execution time. But
without explain analyze(with just executor stats), i am not getting the time
duration at all. Is there any other way, i can get the timings without this
overhead?

Thanks,
Gokul.

On Jan 18, 2008 1:23 AM, Decibel! <decibel@decibel.org> wrote:

Show quoted text

On Jan 16, 2008, at 8:07 AM, Gokulakannan Somasundaram wrote:

gokul=# explain analyze select count(1) from dd;
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.277208 elapsed 0.275457 user 0.000148 system sec
! [1.128422 user 0.004976 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [6/12] messages rcvd/sent
! 0/39 [5/160] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 1024 Logical Reads, 0
Physical Reads, 0 written, buffer hit rate = 100.00%
! Local blocks: 0 read, 0 written,
buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written
STATEMENT: explain analyze select count(1) from dd;
QUERY PLAN
----------------------------------------------------------------------
------------------------------------------------------------
Aggregate (cost=7323.10..7323.11 rows=1 width=0) (actual
time=276.838..276.838 rows=1 loops=1)
-> Index Only Scan using idx on dd (cost=0.00..6741.42
rows=232671 width=0) (actual time=0.042..160.753 rows=232679 loops=1)
Total runtime: 276.928 ms
(3 rows)

gokul=# set enable_indexscan=off;
SET

gokul=# explain analyze select count(1) from dd;
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.331441 elapsed 0.258903 user 0.067953 system sec
! [1.906069 user 0.211479 sys total]
! 0/0 [0/0] filesystem blocks in/out
! 0/0 [0/0] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [10/19] messages rcvd/sent
! 0/39 [9/312] voluntary/involuntary context switches
! buffer usage stats:
! Shared blocks: 5223 Logical Reads, 4391
Physical Reads, 0 written, buffer hit rate = 15.93%
! Local blocks: 0 read, 0 written,
buffer hit rate = 0.00%
! Direct blocks: 0 read, 0 written
STATEMENT: explain analyze select count(1) from dd;
QUERY PLAN
----------------------------------------------------------------------
-------------------------------------------
Aggregate (cost=8131.39..8131.40 rows=1 width=0) (actual
time=331.075..331.076 rows=1 loops=1)
-> Seq Scan on dd (cost=0.00..7549.71 rows=232671 width=0)
(actual time=0.042..203.958 rows=232679 loops=1)
Total runtime: 331.167 ms
(3 rows)

I have run the query multiple times and it shows the response time,
around what is shown here
The table is just a multiple copy of pg_class table( approx 200000
rows). As it can be seen, the Logical reads show a ratio of 1:5,
but the response time is not in the same ratio. I tried to profile
and couldn't find anything significant. Eventhough it shows 4391
physical reads, that's from OS cache, since i ave already run the
query multiple times.

Have you tried just executing the query with executor stats on? You
could be seeing the overhead of explain analyze...
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

#4Martijn van Oosterhout
kleptog@svana.org
In reply to: Gokulakannan Somasundaram (#3)
Re: Thick indexes - a look at count(1) query

On Wed, Jan 23, 2008 at 10:02:14PM +0530, Gokulakannan Somasundaram wrote:

As you said, 'gettimeofday' is occupying 40% of the execution time. But
without explain analyze(with just executor stats), i am not getting the time
duration at all. Is there any other way, i can get the timings without this
overhead?

\timing iirc.

Inside psql, check the help.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy

#5Guillaume Smet
guillaume.smet@gmail.com
In reply to: Martijn van Oosterhout (#4)
Re: Thick indexes - a look at count(1) query

On Jan 23, 2008 5:48 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:

On Wed, Jan 23, 2008 at 10:02:14PM +0530, Gokulakannan Somasundaram wrote:

As you said, 'gettimeofday' is occupying 40% of the execution time. But
without explain analyze(with just executor stats), i am not getting the time
duration at all. Is there any other way, i can get the timings without this
overhead?

\timing iirc.

Yep, it's \timing.

You can also use log_min_duration_statement and pgFouine to have
average stats on a longer run.

--
Guillaume