visibility map - what do i miss?
--- repost to hackers as suggested by RhodiumToad ---
hi,
i tried to test new "visibility map" feature.
to do so i:
1. fetched postgresql sources from cvs
2. compiled
3. turned autovacuum off
4. started pg
5. ran this queries:
- CREATE TABLE test_1 (i INT4);
- CREATE TABLE test_2 (i INT4);
- CREATE TABLE test_3 (i INT4);
- CREATE TABLE test_4 (i INT4);
- INSERT INTO test_1 SELECT generate_series(1, 100000000);
- INSERT INTO test_2 SELECT generate_series(1, 100000000);
- INSERT INTO test_3 SELECT generate_series(1, 100000000);
- INSERT INTO test_4 SELECT generate_series(1, 100000000);
- UPDATE test_2 SET i = i + 1 WHERE i < 10000000;
- UPDATE test_3 SET i = i + 1 WHERE i < 50000000;
- UPDATE test_4 SET i = i + 1 WHERE i < 90000000;
- VACUUM test_1;
- VACUUM test_2;
- VACUUM test_3;
- VACUUM test_4;
I did it 2 times, first with sources of pg from 1st of november, and
second - with head from yesterday evening (warsaw, poland time).
results puzzled me.
First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 ms
Second run - on head:
Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 ms
Now - as I understand the change - visilibity maps should make second run much faster?
Tests were performed on laptop. During first test I used it to browse the web,
read mail. During second test - nobody used the laptop.
Relation forms seem to exist:
# select oid from pg_database where datname = 'depesz';
oid
-------
16389
(1 row)
# select relfilenode from pg_class where relname ~ 'test_';
relfilenode
-------------
26756
26759
26762
26765
(4 rows)
=> ls -l {26756,26759,26762,26765}*
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:31 26756
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:33 26756.1
-rw------- 1 pgdba pgdba 1065066496 2008-12-06 01:34 26756.2
-rw------- 1 pgdba pgdba 811008 2008-12-06 01:34 26756_fsm
-rw------- 1 pgdba pgdba 57344 2008-12-06 01:34 26756_vm
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:35 26759
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:36 26759.1
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:37 26759.2
-rw------- 1 pgdba pgdba 312582144 2008-12-06 01:39 26759.3
-rw------- 1 pgdba pgdba 892928 2008-12-06 01:39 26759_fsm
-rw------- 1 pgdba pgdba 57344 2008-12-06 01:39 26759_vm
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:39 26762
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:49 26762.1
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:41 26762.2
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:42 26762.3
-rw------- 1 pgdba pgdba 523862016 2008-12-06 01:43 26762.4
-rw------- 1 pgdba pgdba 1204224 2008-12-06 01:43 26762_fsm
-rw------- 1 pgdba pgdba 81920 2008-12-06 01:53 26762_vm
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 02:01 26765
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 02:08 26765.1
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 02:18 26765.2
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:50 26765.3
-rw------- 1 pgdba pgdba 1073741824 2008-12-06 01:51 26765.4
-rw------- 1 pgdba pgdba 735141888 2008-12-06 02:00 26765.5
-rw------- 1 pgdba pgdba 1523712 2008-12-06 02:00 26765_fsm
-rw------- 1 pgdba pgdba 98304 2008-12-06 02:18 26765_vm
What do I miss?
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Sat, Dec 6, 2008 at 12:53 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 msSecond run - on head:
Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 msNow - as I understand the change - visilibity maps should make second run much faster?
If I understand correctly what Heikki explained, not if you run VACUUM
only once (and you confirmed me on IRC you run it only once). It's the
VACUUM which sets the PD_ALL_VISIBLE flag on the pages so the first
VACUUM should be slower with HEAD than with 8.3 as it has far more
work to do.
The second VACUUM should then be faster.
--
Guillaume
Guillaume Smet wrote:
On Sat, Dec 6, 2008 at 12:53 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 msSecond run - on head:
Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 msNow - as I understand the change - visilibity maps should make second run much faster?
If I understand correctly what Heikki explained, not if you run VACUUM
only once (and you confirmed me on IRC you run it only once). It's the
VACUUM which sets the PD_ALL_VISIBLE flag on the pages so the first
VACUUM should be slower with HEAD than with 8.3 as it has far more
work to do.The second VACUUM should then be faster.
That diagnosis is not quite right, but the prognosis is correct. The
first VACUUM after the UPDATEs doesn't set the PD_ALL_VISIBLE flags,
because there's still dead tuples on the pages. The dead tuples are
removed in the 2nd pass of the first vacuum, but it doesn't try to set
the PD_ALL_VISIBLE flags; that's only done in the first phase.
The second vacuum is just as slow as the first one, because the
visibility map doesn't have any bits set yet. The second vacuum will set
the bits, though, so the *third* vacuum should go faster.
So setting the PD_ALL_VISIBLE flags doesn't slow things down. That
should be just a tiny bit of extra CPU work per vacuumed page, not
something that would show up in performance tests.
This is the 1st issue I mentioned in this mail:
http://archives.postgresql.org/message-id/4925664C.3090605@enterprisedb.com
There was some suggestions in that thread, but none has been implemented.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com