Script to compute random page cost

Started by Bruce Momjianover 23 years ago35 messageshackers

bruce@momjian.us

over 23 years ago

Because we have seen many complains about sequential vs index scans, I
wrote a script which computes the value for your OS/hardware
combination.

Under BSD/OS on one SCSI disk, I get a random_page_cost around 60. Our
current postgresql.conf default is 4.

What do other people get for this value?

Keep in mind if we increase this value, we will get a more sequential
scans vs. index scans.

One flaw in this test is that it randomly reads blocks from different
files rather than randomly reading from the same file. Do people have a
suggestion on how to correct this? Does it matter?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Bruce Momjian

bruce@momjian.us

over 23 years ago

In reply to: Bruce Momjian (#1)

Re: Script to compute random page cost

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

---------------------------------------------------------------------------

Bruce Momjian wrote:

Because we have seen many complains about sequential vs index scans, I
wrote a script which computes the value for your OS/hardware
combination.

Under BSD/OS on one SCSI disk, I get a random_page_cost around 60. Our
current postgresql.conf default is 4.

What do other people get for this value?

Keep in mind if we increase this value, we will get a more sequential
scans vs. index scans.

One flaw in this test is that it randomly reads blocks from different
files rather than randomly reading from the same file. Do people have a
suggestion on how to correct this? Does it matter?
-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#!/bin/bash

trap "rm -f /tmp/$$" 0 1 2 3 15

BLCKSZ=8192

if [ "$RANDOM" = "$RANDOM" ]
then echo "Your shell does not support \$RANDOM. Try using bash." 1>&2
exit 1
fi

# XXX We assume 0 <= random <= 32767

echo "Collecting sizing information ..."

TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
then echo "Your installation should have at least four times the data stored in template1 to yield meaningful results" 1>&2
exit 1
fi

find "$PGDATA/base" -type f -exec ls -ld {} \; |
awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
grep -v '^0 ' > /tmp/$$

TOTAL=`awk 'BEGIN {sum=0}
{sum += $1}
END {print sum}' /tmp/$$`

echo "Running random access timing test ..."

START=`date '+%s'`
PAGES=1000

while [ "$PAGES" -ne 0 ]
do
BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`

OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`

RESULT=`awk ' BEGIN {offset = 0}
offset + $1 > '"$OFFSET"' \
{print $2, '"$OFFSET"' - offset ; exit}
{offset += $1}' /tmp/$$`
FILE=`echo "$RESULT" | awk '{print $1}'`
OFFSET=`echo "$RESULT" | awk '{print $2}'`

dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE" of="/dev/null" >/dev/null 2>&1
PAGES=`expr "$PAGES" - 1`
done

STOP=`date '+%s'`
RANDTIME=`expr "$STOP" - "$START"`

echo "Running sequential access timing test ..."

START=`date '+%s'`
# We run the random test 10 times more because it is quicker and
# we need it to run for a while to get accurate results.
PAGES=10000

while [ "$PAGES" -ne 0 ]
do
BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`

OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`

RESULT=`awk ' BEGIN {offset = 0}
offset + $1 > '"$OFFSET"' \
{print $2, $1; exit}
{offset += $1}' /tmp/$$`
FILE=`echo "$RESULT" | awk '{print $1}'`
FILEPAGES=`echo "$RESULT" | awk '{print $2}'`

if [ "$FILEPAGES" -gt "$PAGES" ]
then FILEPAGES="$PAGES"
fi

dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE" of="/dev/null" >/dev/null 2>&1
PAGES=`expr "$PAGES" - "$FILEPAGES"`
done

STOP=`date '+%s'`
SEQTIME=`expr "$STOP" - "$START"`

echo
awk 'BEGIN {printf "random_page_cost = %f\n", ('"$RANDTIME"' / '"$SEQTIME"') * 10}'

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 23 years ago

In reply to: Bruce Momjian (#2)

Re: Script to compute random page cost

I got:

random_page_cost = 0.807018

For FreeBSD 4.4/i386

With 512MB RAM & SCSI HDD

Chris

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
Sent: Monday, 9 September 2002 2:14 PM
To: PostgreSQL-development
Subject: Re: [HACKERS] Script to compute random page cost

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

------------------------------------------------------------------
---------

Bruce Momjian wrote:
Because we have seen many complains about sequential vs index scans, I
wrote a script which computes the value for your OS/hardware
combination.

Under BSD/OS on one SCSI disk, I get a random_page_cost around 60. Our
current postgresql.conf default is 4.

What do other people get for this value?

Keep in mind if we increase this value, we will get a more sequential
scans vs. index scans.

One flaw in this test is that it randomly reads blocks from different
files rather than randomly reading from the same file. Do people have a
suggestion on how to correct this? Does it matter?
--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square,
Pennsylvania 19073

#!/bin/bash

trap "rm -f /tmp/$$" 0 1 2 3 15

BLCKSZ=8192

if [ "$RANDOM" = "$RANDOM" ]
then echo "Your shell does not support \$RANDOM. Try

using bash." 1>&2

exit 1
fi

# XXX We assume 0 <= random <= 32767

echo "Collecting sizing information ..."

TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
then echo "Your installation should have at least four

times the data stored in template1 to yield meaningful results" 1>&2

exit 1
fi

find "$PGDATA/base" -type f -exec ls -ld {} \; |
awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
grep -v '^0 ' > /tmp/$$

TOTAL=`awk 'BEGIN {sum=0}
{sum += $1}
END {print sum}' /tmp/$$`

echo "Running random access timing test ..."

START=`date '+%s'`
PAGES=1000

while [ "$PAGES" -ne 0 ]
do
BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`

OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) *

'"$TOTAL"'}'`

RESULT=`awk ' BEGIN {offset = 0}
offset + $1 > '"$OFFSET"' \
{print $2, '"$OFFSET"' - offset ; exit}
{offset += $1}' /tmp/$$`
FILE=`echo "$RESULT" | awk '{print $1}'`
OFFSET=`echo "$RESULT" | awk '{print $2}'`

dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE"

of="/dev/null" >/dev/null 2>&1

PAGES=`expr "$PAGES" - 1`
done

STOP=`date '+%s'`
RANDTIME=`expr "$STOP" - "$START"`

echo "Running sequential access timing test ..."

START=`date '+%s'`
# We run the random test 10 times more because it is quicker and
# we need it to run for a while to get accurate results.
PAGES=10000

while [ "$PAGES" -ne 0 ]
do
BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`

OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) *

'"$TOTAL"'}'`

RESULT=`awk ' BEGIN {offset = 0}
offset + $1 > '"$OFFSET"' \
{print $2, $1; exit}
{offset += $1}' /tmp/$$`
FILE=`echo "$RESULT" | awk '{print $1}'`
FILEPAGES=`echo "$RESULT" | awk '{print $2}'`

if [ "$FILEPAGES" -gt "$PAGES" ]
then FILEPAGES="$PAGES"
fi

dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE"

of="/dev/null" >/dev/null 2>&1

PAGES=`expr "$PAGES" - "$FILEPAGES"`
done

STOP=`date '+%s'`
SEQTIME=`expr "$STOP" - "$START"`

echo
awk 'BEGIN {printf "random_page_cost = %f\n", ('"$RANDTIME"' /

'"$SEQTIME"') * 10}'

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square,
Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Dave Page

dpage@pgadmin.org

over 23 years ago

In reply to: Christopher Kings-Lynne (#3)

Re: Script to compute random page cost

Dell Inspiron 8100 laptop, 1.2GHz Pentium, 512Mb RAM, Windows XP Pro
CYGWIN_NT-5.1 PC9 1.3.10(0.51/3/2) 2002-02-25 11:14 i686 unknown

random_page_cost = 0.924119

Regards, Dave.

Show quoted text

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: 09 September 2002 07:14
To: PostgreSQL-development
Subject: Re: [HACKERS] Script to compute random page cost

OK, turns out that the loop for sequential scan ran fewer
times and was skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

Import Notes

Resolved by subject fallback

Tatsuo Ishii

t-ishii@sra.co.jp

over 23 years ago

In reply to: Bruce Momjian (#2)

Re: Script to compute random page cost

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

I got:

random_page_cost = 1.047619

Linux kernel 2.4.18
Pentium III 750MHz
Memory 256MB
IDE HDD

(A notebook/SONY VAIO PCG-Z505CR/K)
--
Tatsuo Ishii

Curt Sampson

cjs@cynic.net

over 23 years ago

In reply to: Bruce Momjian (#1)

Re: Script to compute random page cost

On Mon, 9 Sep 2002, Bruce Momjian wrote:

What do other people get for this value?

With your new script, with a 1.5 GHz Athlon, 512 MB RAM, and a nice fast
IBM 7200 RPM IDE disk, I get random_page_cost = 0.933333.

One flaw in this test is that it randomly reads blocks from different
files rather than randomly reading from the same file. Do people have a
suggestion on how to correct this? Does it matter?

From my quick glance, it also does a lot of work work to read each
block, including forking off serveral other programs. This would tend to
push up the cost of a random read. You might want to look at modifying
the randread program (http://randread.sourceforge.net) to do what you
want....

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

Mario Weilguni

mario.weilguni@icomedias.com

over 23 years ago

In reply to: Curt Sampson (#6)

Re: Script to compute random page cost

What do other people get for this value?

Keep in mind if we increase this value, we will get a more sequential
scans vs. index scans.

With the new script I get 0.929825 on 2 IBM DTLA 5400RPM (80GB) with a 3Ware
6400 Controller (RAID-1)

Best regards,
Mario Weilguni

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Import Notes

Resolved by subject fallback

Oliver Elphick

olly@lfix.co.uk

over 23 years ago

In reply to: Bruce Momjian (#2)

Re: Script to compute random page cost

On Mon, 2002-09-09 at 07:13, Bruce Momjian wrote:

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

---------------------------------------------------------------------------

Five successive runs:

random_page_cost = 0.947368
random_page_cost = 0.894737
random_page_cost = 0.947368
random_page_cost = 0.894737
random_page_cost = 0.894737

linux 2.4.18 SMP
dual Athlon MP 1900+
512Mb RAM
SCSI

--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight, UK
http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
========================================
"Submit yourselves therefore to God. Resist the devil,
and he will flee from you." James 4:7

Rod Taylor

rbt@rbt.ca

over 23 years ago

In reply to: Bruce Momjian (#2)

Re: Script to compute random page cost

On Mon, 2002-09-09 at 02:13, Bruce Momjian wrote:

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

The current script pulls way more data for Sequential scan than random
scan now.

Random is pulling a single page (count=1 for dd) with every loop.
Sequential does the same number of loops, but pulls count > 1 in each.

In effect, sequential is random with more data load -- which explains
all of the 0.9's.

Rod Taylor

#10

Nick Fankhauser

nickf@ontko.com

over 23 years ago

In reply to: Bruce Momjian (#2)

Re: Script to compute random page cost

Bruce-

With the change in the script that I mentioned to you off-list (which I
believe just pointed it at our "real world" data), I got the following
results with 6 successive runs on each of our two development platforms:

(We're running PGSQL 7.2.1 on Debian Linux 2.4)

System 1:
1.2 GHz Athlon Processor, 512MB RAM, Database on IDE hard drive
random_page_cost = 0.857143
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.857143
random_page_cost = 0.884615

System 2:
Dual 1.2Ghz Athlon MP Processors, SMP enabled, 1 GB RAM, Database on Ultra
SCSI RAID 5 with Hardware controller.
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737

I was surprised that the SCSI RAID drive is generally slower than IDE, but
the values are in line with the results that others have been getting.

-Nick

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
Sent: Monday, September 09, 2002 1:14 AM
To: PostgreSQL-development
Subject: Re: [HACKERS] Script to compute random page cost

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

#11

scott.marlowe

scott.marlowe@ihs.com

over 23 years ago

In reply to: Bruce Momjian (#2)

Re: Script to compute random page cost

I'm getting an infinite wait on that file, could someone post it to the
list please?

On Mon, 9 Sep 2002, Bruce Momjian wrote:

Show quoted text

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

#12

Nick Fankhauser

nickf@ontko.com

over 23 years ago

In reply to: Nick Fankhauser (#10)

Re: Script to compute random page cost

Hi again-

I bounced these numbers off of Ray Ontko here at our shop, and he pointed
out that random page cost is measured in multiples of a sequential page
fetch. It seems almost impossible that a random fetch would be less
expensive than a sequential fetch, yet we all seem to be getting results <
1. I can't see anything obviously wrong with the script, but something very
odd is going.

-Nick

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Nick Fankhauser
Sent: Monday, September 09, 2002 11:25 AM
To: Bruce Momjian; PostgreSQL-development
Cc: Ray Ontko
Subject: Re: [HACKERS] Script to compute random page cost

Bruce-

With the change in the script that I mentioned to you off-list (which I
believe just pointed it at our "real world" data), I got the following
results with 6 successive runs on each of our two development platforms:

(We're running PGSQL 7.2.1 on Debian Linux 2.4)

System 1:
1.2 GHz Athlon Processor, 512MB RAM, Database on IDE hard drive
random_page_cost = 0.857143
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.857143
random_page_cost = 0.884615

System 2:
Dual 1.2Ghz Athlon MP Processors, SMP enabled, 1 GB RAM, Database on Ultra
SCSI RAID 5 with Hardware controller.
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737

I was surprised that the SCSI RAID drive is generally slower than IDE, but
the values are in line with the results that others have been getting.

-Nick

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
Sent: Monday, September 09, 2002 1:14 AM
To: PostgreSQL-development
Subject: Re: [HACKERS] Script to compute random page cost

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#13

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Nick Fankhauser (#12)

Re: Script to compute random page cost

"Nick Fankhauser" <nickf@ontko.com> writes:

I bounced these numbers off of Ray Ontko here at our shop, and he pointed
out that random page cost is measured in multiples of a sequential page
fetch. It seems almost impossible that a random fetch would be less
expensive than a sequential fetch, yet we all seem to be getting results <
1. I can't see anything obviously wrong with the script, but something very
odd is going.

The big problem with the script is that it involves an invocation of
"dd" - hence, at least one process fork --- for every page read
operation. The seqscan part of the test is even worse, as it adds a
test(1) call and a shell if/then/else to the overhead. My guess is that
we are measuring script overhead here, and not the desired I/O quantities
at all --- the script overhead is completely swamping the latter. The
apparent stability of the results across a number of different platforms
bolsters that thought.

Someone else opined that the script was also not comparing equal
numbers of pages read for the random and sequential cases. I haven't
tried to decipher the logic enough to see if that allegation is true,
but it's not obviously false.

Finally, I wouldn't believe the results for a moment if they were taken
against databases that are not several times the size of physical RAM
on the test machine, with a total I/O volume also much more than
physical RAM. We are trying to measure the behavior when kernel
caching is not helpful; if the database fits in RAM then you are just
naturally going to get random_page_cost close to 1, because the kernel
will avoid doing any I/O at all.

regards, tom lane

#14

Bruce Momjian

bruce@momjian.us

over 23 years ago

In reply to: Nick Fankhauser (#12)

Re: Script to compute random page cost

Nick Fankhauser wrote:

Hi again-

I bounced these numbers off of Ray Ontko here at our shop, and he pointed
out that random page cost is measured in multiples of a sequential page
fetch. It seems almost impossible that a random fetch would be less
expensive than a sequential fetch, yet we all seem to be getting results <
1. I can't see anything obviously wrong with the script, but something very
odd is going.

OK, new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

What I have done is to take all of the computation stuff out of the
timed loop so only the 'dd' is done in the loop.

I am getting a 1.0 for random pages cost with this new code, but I don't
have much data in the database so it is very possible I have it all
cached. Would others please test it?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#15

Curt Sampson

cjs@cynic.net

over 23 years ago

In reply to: Tom Lane (#13)

Re: Script to compute random page cost

On Mon, 9 Sep 2002, Tom Lane wrote:

Finally, I wouldn't believe the results for a moment if they were taken
against databases that are not several times the size of physical RAM
on the test machine, with a total I/O volume also much more than
physical RAM. We are trying to measure the behavior when kernel
caching is not helpful; if the database fits in RAM then you are just
naturally going to get random_page_cost close to 1, because the kernel
will avoid doing any I/O at all.

Um...yeah; another reason to use randread against a raw disk device.
(A little hard to use on linux systems, I bet, but works fine on
BSD systems.)

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#16

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Curt Sampson (#15)

Re: Script to compute random page cost

Curt Sampson <cjs@cynic.net> writes:

On Mon, 9 Sep 2002, Tom Lane wrote:

... We are trying to measure the behavior when kernel
caching is not helpful; if the database fits in RAM then you are just
naturally going to get random_page_cost close to 1, because the kernel
will avoid doing any I/O at all.

Um...yeah; another reason to use randread against a raw disk device.
(A little hard to use on linux systems, I bet, but works fine on
BSD systems.)

Umm... not really; surely randread wouldn't know anything about
read-ahead logic?

The reason this is a difficult topic is that we are trying to measure
certain kernel behaviors --- namely readahead for sequential reads ---
and not others --- namely caching, because we have other parameters
of the cost models that purport to deal with that.

Mebbe this is an impossible task and we need to restructure the cost
models from the ground up. But I'm not convinced of that. The fact
that a one-page shell script can't measure the desired quantity doesn't
mean we can't measure it with more effort.

regards, tom lane

#17

Curt Sampson

cjs@cynic.net

over 23 years ago

In reply to: Tom Lane (#16)

Re: Script to compute random page cost

On Mon, 9 Sep 2002, Tom Lane wrote:

Curt Sampson <cjs@cynic.net> writes:

On Mon, 9 Sep 2002, Tom Lane wrote:

... We are trying to measure the behavior when kernel
caching is not helpful; if the database fits in RAM then you are just
naturally going to get random_page_cost close to 1, because the kernel
will avoid doing any I/O at all.

Um...yeah; another reason to use randread against a raw disk device.
(A little hard to use on linux systems, I bet, but works fine on
BSD systems.)

Umm... not really; surely randread wouldn't know anything about
read-ahead logic?

Randread doesn't know anything about read-ahead logic, but I don't
see how that matters one way or the other. The chances of it reading
blocks sequentially are pretty much infinitesimal if you're reading
across a reasonably large area of disk (I recommend at least 4GB),
so readahead will never be triggered.

The reason this is a difficult topic is that we are trying to measure
certain kernel behaviors --- namely readahead for sequential reads ---
and not others --- namely caching, because we have other parameters
of the cost models that purport to deal with that.

Well, for the sequential reads, the readahead should be trigerred
even when reading from a raw device. So just use dd to measure
that. If you want to slightly more accurately model postgres'
behaviour, you probably want to pick a random area of the disk,
read a gigabyte, switch areas, read another gigabyte, and so on.
This will model the "split into 1GB" files thing.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#18

Bruce Momjian

bruce@momjian.us

over 23 years ago

In reply to: Curt Sampson (#17)

Re: Script to compute random page cost

OK, I have a better version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I have added a null loop which does a dd on a single file without
reading any data, and by netting that loop out of the total computation
and increasing the number of tests, I have gotten the following results
for three runs:

random test: 36
sequential test: 33
null timing test: 27

random_page_cost = 1.500000

random test: 38
sequential test: 32
null timing test: 27

random_page_cost = 2.200000

random test: 40
sequential test: 31
null timing test: 27

random_page_cost = 3.250000

Interesting that random time is increasing, while the others were
stable. I think this may have to do with other system activity at the
time of the test. I will run it some more tomorrow but clearly we are
seeing reasonable numbers now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#19

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 23 years ago

In reply to: Bruce Momjian (#18)

Re: Script to compute random page cost

I got somewhat different:

$ ./randcost /usr/local/pgsql/data
Collecting sizing information ...
Running random access timing test ...
Running sequential access timing test ...
Running null loop timing test ...
random test: 13
sequential test: 15
null timing test: 11

random_page_cost = 0.500000

Chris

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Bruce Momjian
Sent: Tuesday, 10 September 2002 2:02 PM
To: Curt Sampson
Cc: Tom Lane; nickf@ontko.com; PostgreSQL-development; Ray Ontko
Subject: Re: [HACKERS] Script to compute random page cost

OK, I have a better version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I have added a null loop which does a dd on a single file without
reading any data, and by netting that loop out of the total computation
and increasing the number of tests, I have gotten the following results
for three runs:

random test: 36
sequential test: 33
null timing test: 27

random_page_cost = 1.500000

random test: 38
sequential test: 32
null timing test: 27

random_page_cost = 2.200000

random test: 40
sequential test: 31
null timing test: 27

random_page_cost = 3.250000

Interesting that random time is increasing, while the others were
stable. I think this may have to do with other system activity at the
time of the test. I will run it some more tomorrow but clearly we are
seeing reasonable numbers now.
-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#20

Curt Sampson

cjs@cynic.net

over 23 years ago

In reply to: Bruce Momjian (#18)

Re: Script to compute random page cost

On Tue, 10 Sep 2002, Bruce Momjian wrote:

Interesting that random time is increasing, while the others were
stable. I think this may have to do with other system activity at the
time of the test.

Actually, the random versus sequential time may also be different
depending on how many processes are competing for disk access, as
well. If the OS isn't maintaining readahead for whatever reason,
sequential access could, in theory, degrade to being the same speed
as random access. It might be interesting to test this, too.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#21

Mario Weilguni

mario.weilguni@icomedias.com

over 23 years ago

In reply to: Curt Sampson (#20)

#22