Maximum transaction rate

Started by Jack Orensteinabout 17 years ago48 messagesgeneral
Jump to latest
#1Jack Orenstein
jack.orenstein@hds.com

I'm using postgresql 8.3.6 through JDBC, and trying to measure the maximum
transaction rate on a given Linux box. I wrote a test program that:

- Creates a table with two int columns and no indexes,

- loads the table through a configurable number of threads, with each
transaction writing one row and then committing, (auto commit is
false), and

- reports transactions/sec.

The postgres configuration regarding syncing is standard: fsync = on,
synchronous_commit = on, wal_sync_method = fsync. My linux kernel is
2.6.27.19-78.2.30.fc9.i686.

The transaction rates I'm getting seem way too high: 2800-2900 with one thread,
5000-7000 with ten threads. I'm guessing that writes aren't really reaching the
disk. Can someone suggest how to figure out where, below postgres, someone is
lying about writes reaching the disk?

Jack

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jack Orenstein (#1)
Re: Maximum transaction rate

Jack Orenstein <jack.orenstein@hds.com> writes:

The transaction rates I'm getting seem way too high: 2800-2900 with
one thread, 5000-7000 with ten threads. I'm guessing that writes
aren't really reaching the disk. Can someone suggest how to figure out
where, below postgres, someone is lying about writes reaching the
disk?

AFAIK there are two trouble sources in recent Linux machines: LVM and
the disk drive itself. LVM is apparently broken by design --- it simply
fails to pass fsync requests. If you're using it you have to stop.
(Which sucks, because it's exactly the kind of thing DBAs tend to want.)
Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

regards, tom lane

#3Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#2)
Re: Maximum transaction rate

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

hdparm -I lets you check if write caching is on, hdparm -W lets you toggle
it off. That's for ATA disks; SCSI ones can use sdparm instead, but
usually it's something you can adjust more permanently in the card
configuration or BIOS instead for those.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#4Ben
bench@silentmedia.com
In reply to: Greg Smith (#3)
Re: Maximum transaction rate

On Fri, 6 Mar 2009, Greg Smith wrote:

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

How does turning off write caching on the disk stop the problem with LVM?
It still seems like you have to get the data out of the OS buffer, and if
fsync() doesn't do that for you....

#5Scott Marlowe
scott.marlowe@gmail.com
In reply to: Ben (#4)
Re: Maximum transaction rate

On Fri, Mar 6, 2009 at 2:22 PM, Ben Chobot <bench@silentmedia.com> wrote:

On Fri, 6 Mar 2009, Greg Smith wrote:

On Fri, 6 Mar 2009, Tom Lane wrote:

 Otherwise you need to reconfigure your drive to not cache writes.
 I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

How does turning off write caching on the disk stop the problem with LVM? It
still seems like you have to get the data out of the OS buffer, and if
fsync() doesn't do that for you....

I think he was saying otherwise (if you're not using LVM and you still
have this super high transaction rate) you'll need to turn off the
drive's write caches. I kinda wondered at it for a second too.

#6Greg Smith
gsmith@gregsmith.com
In reply to: Ben (#4)
Re: Maximum transaction rate

On Fri, 6 Mar 2009, Ben Chobot wrote:

How does turning off write caching on the disk stop the problem with LVM?

It doesn't. Linux LVM is awful and broken, I was just suggesting more
details on what you still need to check even when it's not involved.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#7Marco Colombo
pgsql@esiway.net
In reply to: Scott Marlowe (#5)
Re: Maximum transaction rate

Scott Marlowe wrote:

On Fri, Mar 6, 2009 at 2:22 PM, Ben Chobot <bench@silentmedia.com> wrote:

On Fri, 6 Mar 2009, Greg Smith wrote:

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

How does turning off write caching on the disk stop the problem with LVM? It
still seems like you have to get the data out of the OS buffer, and if
fsync() doesn't do that for you....

I think he was saying otherwise (if you're not using LVM and you still
have this super high transaction rate) you'll need to turn off the
drive's write caches. I kinda wondered at it for a second too.

And I'm still wondering. The problem with LVM, AFAIK, is missing support
for write barriers. Once you disable the write-back cache on the disk,
you no longer need write barriers. So I'm missing something, what else
does LVM do to break fsync()?

It was my understanding that disabling disk caches was enough.

.TM.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Colombo (#7)
Re: Maximum transaction rate

Marco Colombo <pgsql@esiway.net> writes:

And I'm still wondering. The problem with LVM, AFAIK, is missing support
for write barriers. Once you disable the write-back cache on the disk,
you no longer need write barriers. So I'm missing something, what else
does LVM do to break fsync()?

I think you're imagining that the disk hardware is the only source of
write reordering, which isn't the case ... various layers in the kernel
can reorder operations before they get sent to the disk.

regards, tom lane

#9Marco Colombo
pgsql@esiway.net
In reply to: Tom Lane (#8)
Re: Maximum transaction rate

Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

And I'm still wondering. The problem with LVM, AFAIK, is missing support
for write barriers. Once you disable the write-back cache on the disk,
you no longer need write barriers. So I'm missing something, what else
does LVM do to break fsync()?

I think you're imagining that the disk hardware is the only source of
write reordering, which isn't the case ... various layers in the kernel
can reorder operations before they get sent to the disk.

regards, tom lane

You mean some layer (LVM) is lying about the fsync()?

write(A);
fsync();
...
write(B);
fsync();
...
write(C);
fsync();

you mean that the process may be awakened after the first fsync() while
A is still somewhere in OS buffers and not sent to disk yet, so it's
possible that B gets to the disk BEFORE A. And if the system crashes,
A never hits the platters while B (possibly) does. Is it this you
mean by "write reodering"?

But doesn't this break any application with transactional-like behavior,
such as sendmail? The problem being 3rd parties, if sendmail declares
"ok, I saved the message" (*after* a fsync()) to the SMTP client,
it's actually lying 'cause the message hasn't hit the platters yet.
Same applies to IMAP/POP server, say. Well, it applies to anything
using fsync().

I mean, all this with disk caches in write-thru modes? It's the OS
lying, not the disks?

Wait, this breaks all journaled FSes as well, a DM device is just
a block device to them, if it's lying about synchronous writes the
whole purpose of the journal is defeated... I find it hard to
believe, I have to say.

.TM.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Colombo (#9)
Re: Maximum transaction rate

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

regards, tom lane

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#10)
Re: Maximum transaction rate

On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

I wouldn't think this would be a problem with the proper battery backed
raid controller correct?

Joshua D. Drake

regards, tom lane

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#12Ben
bench@silentmedia.com
In reply to: Joshua D. Drake (#11)
Re: Maximum transaction rate

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

I wouldn't think this would be a problem with the proper battery backed
raid controller correct?

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

#13Joshua D. Drake
jd@commandprompt.com
In reply to: Ben (#12)
Re: Maximum transaction rate

On Fri, 2009-03-13 at 11:17 -0700, Ben Chobot wrote:

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

I wouldn't think this would be a problem with the proper battery backed
raid controller correct?

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

A BBU controller is about more than that. It is also supposed to be
about data integrity. The ability to have unexpected outages and have
the drives stay consistent because the controller remembers the state
(if that is a reasonable way to put it).

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#14Ben
bench@silentmedia.com
In reply to: Joshua D. Drake (#13)
Re: Maximum transaction rate

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

A BBU controller is about more than that. It is also supposed to be
about data integrity. The ability to have unexpected outages and have
the drives stay consistent because the controller remembers the state
(if that is a reasonable way to put it).

Of course. But if you can't reliably flush the OS buffers (because, say,
you're using LVM so fsync() doesn't work), then you can't say what
actually has made it to the safety of the raid card.

#15Joshua D. Drake
jd@commandprompt.com
In reply to: Ben (#14)
Re: Maximum transaction rate

On Fri, 2009-03-13 at 11:41 -0700, Ben Chobot wrote:

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

Of course. But if you can't reliably flush the OS buffers (because, say,
you're using LVM so fsync() doesn't work), then you can't say what
actually has made it to the safety of the raid card.

Good point. So the next question of course is, does EVMS do it right?

http://evms.sourceforge.net/

This is actually a pretty significant issue.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#16Joshua D. Drake
jd@commandprompt.com
In reply to: Ben (#14)
Re: Maximum transaction rate

On Fri, 2009-03-13 at 11:41 -0700, Ben Chobot wrote:

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

A BBU controller is about more than that. It is also supposed to be
about data integrity. The ability to have unexpected outages and have
the drives stay consistent because the controller remembers the state
(if that is a reasonable way to put it).

Of course. But if you can't reliably flush the OS buffers (because, say,
you're using LVM so fsync() doesn't work), then you can't say what
actually has made it to the safety of the raid card.

Wait, actually a good BBU RAID controller will disable the cache on the
drives. So everything that is cached is already on the controller vs.
the drives itself.

Or am I missing something?

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#17Christophe Pettus
xof@thebuild.com
In reply to: Joshua D. Drake (#16)
Re: Maximum transaction rate

On Mar 13, 2009, at 11:59 AM, Joshua D. Drake wrote:

Wait, actually a good BBU RAID controller will disable the cache on
the
drives. So everything that is cached is already on the controller vs.
the drives itself.

Or am I missing something?

Maybe I'm missing something, but a BBU controller moves the "safe
point" from the platters to the controller, but it doesn't move it all
the way into the OS.

So, if the software calls fsync, but fsync doesn't actually push the
data to the controller, you are still at risk... right?

#18Scott Marlowe
scott.marlowe@gmail.com
In reply to: Christophe Pettus (#17)
Re: Maximum transaction rate

On Fri, Mar 13, 2009 at 1:09 PM, Christophe <xof@thebuild.com> wrote:

On Mar 13, 2009, at 11:59 AM, Joshua D. Drake wrote:

Wait, actually a good BBU RAID controller will disable the cache on the
drives. So everything that is cached is already on the controller vs.
the drives itself.

Or am I missing something?

Maybe I'm missing something, but a BBU controller moves the "safe point"
from the platters to the controller, but it doesn't move it all the way into
the OS.

So, if the software calls fsync, but fsync doesn't actually push the data to
the controller, you are still at risk... right?

Ding!

#19Marco Colombo
pgsql@esiway.net
In reply to: Scott Marlowe (#18)
Re: Maximum transaction rate

Scott Marlowe wrote:

On Fri, Mar 13, 2009 at 1:09 PM, Christophe <xof@thebuild.com> wrote:

So, if the software calls fsync, but fsync doesn't actually push the data to
the controller, you are still at risk... right?

Ding!

I've been doing some googling, now I'm not sure that not supporting barriers
implies not supporting (of lying) at blkdev_issue_flush(). It seems that
it's pretty common (and well-defined) for block devices to report
-EOPNOTSUPP at BIO_RW_BARRIER requests. device mapper apparently falls in
this category.

See:
http://lkml.org/lkml/2007/5/25/71
this is an interesting discussion on barriers and flushing.

It seems to me that PostgreSQL needs both ordered and synchronous
writes, maybe at different times (not that EVERY write must be both ordered
and synchronous).

You can emulate ordered with single+synchronous althought with a price.
You can't emulate synchronous writes with just barriers.

OPTIMAL: write-barrier-write-barrier-write-barrier-flush

SUBOPTIMAL: write-flush-write-flush-write-flush

As I understand it, fsync() should always issue a real flush: it's unrelated
to the barriers issue.
There's no API to issue ordered writes (or barriers) at user level,
AFAIK. (Uhm... O_DIRECT, maybe implies that?)

FS code may internally issue barrier requests to the block device, for
its own purposes (e.g. journal updates), but there's not useland API for
that.

Yet, there's no reference to DM not supporting flush correctly in the
whole thread... actually there are refereces to the opposite. DM devices
are defined as FLUSHABLE.

Also see:
http://lkml.org/lkml/2008/2/26/41
but it seems to me that all this discussion is under the assuption that
disks have write-back caches.
"The alternative is to disable the disk write cache." says it all.

.TM.

#20Joshua D. Drake
jd@commandprompt.com
In reply to: Marco Colombo (#19)
Re: Maximum transaction rate

On Sat, 2009-03-14 at 05:25 +0100, Marco Colombo wrote:

Scott Marlowe wrote:

Also see:
http://lkml.org/lkml/2008/2/26/41
but it seems to me that all this discussion is under the assuption that
disks have write-back caches.
"The alternative is to disable the disk write cache." says it all.

If this applies to raid based cache as well then performance is going to
completely tank. For users of Linux + PostgreSQL using LVM.

Joshua D. Drake

.TM.

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#21Marco Colombo
pgsql@esiway.net
In reply to: Joshua D. Drake (#20)
#22Joshua D. Drake
jd@commandprompt.com
In reply to: Marco Colombo (#21)
#23Marco Colombo
pgsql@esiway.net
In reply to: Joshua D. Drake (#22)
#24Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#2)
#25Scott Marlowe
scott.marlowe@gmail.com
In reply to: Stefan Kaltenbrunner (#24)
#26John R Pierce
pierce@hogranch.com
In reply to: Stefan Kaltenbrunner (#24)
#27Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Scott Marlowe (#25)
#28Marco Colombo
pgsql@esiway.net
In reply to: John R Pierce (#26)
#29Greg Smith
gsmith@gregsmith.com
In reply to: Marco Colombo (#28)
#30Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Greg Smith (#29)
#31Greg Smith
gsmith@gregsmith.com
In reply to: Ron Mayer (#30)
#32Marco Colombo
pgsql@esiway.net
In reply to: Greg Smith (#29)
#33Marco Colombo
pgsql@esiway.net
In reply to: Ron Mayer (#30)
#34Greg Smith
gsmith@gregsmith.com
In reply to: Marco Colombo (#32)
#35Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Marco Colombo (#33)
#36Marco Colombo
pgsql@esiway.net
In reply to: Greg Smith (#34)
#37Martijn van Oosterhout
kleptog@svana.org
In reply to: Marco Colombo (#36)
#38Marco Colombo
pgsql@esiway.net
In reply to: Ron Mayer (#35)
#39Greg Smith
gsmith@gregsmith.com
In reply to: Martijn van Oosterhout (#37)
#40Marco Colombo
pgsql@esiway.net
In reply to: Martijn van Oosterhout (#37)
#41Joshua D. Drake
jd@commandprompt.com
In reply to: Marco Colombo (#40)
#42Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Marco Colombo (#38)
#43Baron Schwartz
baron@xaprb.com
In reply to: Ron Mayer (#42)
#44Martijn van Oosterhout
kleptog@svana.org
In reply to: Marco Colombo (#40)
#45Marco Colombo
pgsql@esiway.net
In reply to: Martijn van Oosterhout (#44)
#46Marco Colombo
pgsql@esiway.net
In reply to: Ron Mayer (#42)
#47Markus Wanner
markus@bluegap.ch
In reply to: Martijn van Oosterhout (#44)
#48Marco Colombo
pgsql@esiway.net
In reply to: Markus Wanner (#47)