Maximum transaction rate

Started by Jack Orensteinabout 17 years ago48 messagesgeneral

jack.orenstein@hds.com

about 17 years ago

I'm using postgresql 8.3.6 through JDBC, and trying to measure the maximum
transaction rate on a given Linux box. I wrote a test program that:

- Creates a table with two int columns and no indexes,

- loads the table through a configurable number of threads, with each
transaction writing one row and then committing, (auto commit is
false), and

- reports transactions/sec.

The postgres configuration regarding syncing is standard: fsync = on,
synchronous_commit = on, wal_sync_method = fsync. My linux kernel is
2.6.27.19-78.2.30.fc9.i686.

The transaction rates I'm getting seem way too high: 2800-2900 with one thread,
5000-7000 with ten threads. I'm guessing that writes aren't really reaching the
disk. Can someone suggest how to figure out where, below postgres, someone is
lying about writes reaching the disk?

Jack

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Jack Orenstein (#1)

Re: Maximum transaction rate

Jack Orenstein <jack.orenstein@hds.com> writes:

The transaction rates I'm getting seem way too high: 2800-2900 with
one thread, 5000-7000 with ten threads. I'm guessing that writes
aren't really reaching the disk. Can someone suggest how to figure out
where, below postgres, someone is lying about writes reaching the
disk?

AFAIK there are two trouble sources in recent Linux machines: LVM and
the disk drive itself. LVM is apparently broken by design --- it simply
fails to pass fsync requests. If you're using it you have to stop.
(Which sucks, because it's exactly the kind of thing DBAs tend to want.)
Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

regards, tom lane

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Tom Lane (#2)

Re: Maximum transaction rate

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

hdparm -I lets you check if write caching is on, hdparm -W lets you toggle
it off. That's for ATA disks; SCSI ones can use sdparm instead, but
usually it's something you can adjust more permanently in the card
configuration or BIOS instead for those.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Ben

bench@silentmedia.com

about 17 years ago

In reply to: Greg Smith (#3)

Re: Maximum transaction rate

On Fri, 6 Mar 2009, Greg Smith wrote:

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

How does turning off write caching on the disk stop the problem with LVM?
It still seems like you have to get the data out of the OS buffer, and if
fsync() doesn't do that for you....

Scott Marlowe

scott.marlowe@gmail.com

about 17 years ago

In reply to: Ben (#4)

Re: Maximum transaction rate

On Fri, Mar 6, 2009 at 2:22 PM, Ben Chobot <bench@silentmedia.com> wrote:

On Fri, 6 Mar 2009, Greg Smith wrote:

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

How does turning off write caching on the disk stop the problem with LVM? It
still seems like you have to get the data out of the OS buffer, and if
fsync() doesn't do that for you....

I think he was saying otherwise (if you're not using LVM and you still
have this super high transaction rate) you'll need to turn off the
drive's write caches. I kinda wondered at it for a second too.

Greg Smith

gsmith@gregsmith.com

about 17 years ago

In reply to: Ben (#4)

Re: Maximum transaction rate

On Fri, 6 Mar 2009, Ben Chobot wrote:

How does turning off write caching on the disk stop the problem with LVM?

It doesn't. Linux LVM is awful and broken, I was just suggesting more
details on what you still need to check even when it's not involved.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Marco Colombo

pgsql@esiway.net

about 17 years ago

In reply to: Scott Marlowe (#5)

Re: Maximum transaction rate

Scott Marlowe wrote:

On Fri, Mar 6, 2009 at 2:22 PM, Ben Chobot <bench@silentmedia.com> wrote:

On Fri, 6 Mar 2009, Greg Smith wrote:

On Fri, 6 Mar 2009, Tom Lane wrote:

Otherwise you need to reconfigure your drive to not cache writes.
I forget the incantation for that but it's in the PG list archives.

There's a dicussion of this in the docs now,
http://www.postgresql.org/docs/8.3/interactive/wal-reliability.html

How does turning off write caching on the disk stop the problem with LVM? It
still seems like you have to get the data out of the OS buffer, and if
fsync() doesn't do that for you....

I think he was saying otherwise (if you're not using LVM and you still
have this super high transaction rate) you'll need to turn off the
drive's write caches. I kinda wondered at it for a second too.

And I'm still wondering. The problem with LVM, AFAIK, is missing support
for write barriers. Once you disable the write-back cache on the disk,
you no longer need write barriers. So I'm missing something, what else
does LVM do to break fsync()?

It was my understanding that disabling disk caches was enough.

.TM.

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Marco Colombo (#7)

Re: Maximum transaction rate

Marco Colombo <pgsql@esiway.net> writes:

And I'm still wondering. The problem with LVM, AFAIK, is missing support
for write barriers. Once you disable the write-back cache on the disk,
you no longer need write barriers. So I'm missing something, what else
does LVM do to break fsync()?

I think you're imagining that the disk hardware is the only source of
write reordering, which isn't the case ... various layers in the kernel
can reorder operations before they get sent to the disk.

regards, tom lane

Marco Colombo

pgsql@esiway.net

about 17 years ago

In reply to: Tom Lane (#8)

Re: Maximum transaction rate

Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

And I'm still wondering. The problem with LVM, AFAIK, is missing support
for write barriers. Once you disable the write-back cache on the disk,
you no longer need write barriers. So I'm missing something, what else
does LVM do to break fsync()?

I think you're imagining that the disk hardware is the only source of
write reordering, which isn't the case ... various layers in the kernel
can reorder operations before they get sent to the disk.

regards, tom lane

You mean some layer (LVM) is lying about the fsync()?

write(A);
fsync();
...
write(B);
fsync();
...
write(C);
fsync();

you mean that the process may be awakened after the first fsync() while
A is still somewhere in OS buffers and not sent to disk yet, so it's
possible that B gets to the disk BEFORE A. And if the system crashes,
A never hits the platters while B (possibly) does. Is it this you
mean by "write reodering"?

But doesn't this break any application with transactional-like behavior,
such as sendmail? The problem being 3rd parties, if sendmail declares
"ok, I saved the message" (*after* a fsync()) to the SMTP client,
it's actually lying 'cause the message hasn't hit the platters yet.
Same applies to IMAP/POP server, say. Well, it applies to anything
using fsync().

I mean, all this with disk caches in write-thru modes? It's the OS
lying, not the disks?

Wait, this breaks all journaled FSes as well, a DM device is just
a block device to them, if it's lying about synchronous writes the
whole purpose of the journal is defeated... I find it hard to
believe, I have to say.

.TM.

#10

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Marco Colombo (#9)

Re: Maximum transaction rate

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

regards, tom lane

#11

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Tom Lane (#10)

Re: Maximum transaction rate

On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

I wouldn't think this would be a problem with the proper battery backed
raid controller correct?

Joshua D. Drake

regards, tom lane

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#12

Ben

bench@silentmedia.com

about 17 years ago

In reply to: Joshua D. Drake (#11)

Re: Maximum transaction rate

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

I wouldn't think this would be a problem with the proper battery backed
raid controller correct?

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

#13

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Ben (#12)

Re: Maximum transaction rate

On Fri, 2009-03-13 at 11:17 -0700, Ben Chobot wrote:

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

On Fri, 2009-03-13 at 14:00 -0400, Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

You mean some layer (LVM) is lying about the fsync()?

Got it in one.

I wouldn't think this would be a problem with the proper battery backed
raid controller correct?

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

A BBU controller is about more than that. It is also supposed to be
about data integrity. The ability to have unexpected outages and have
the drives stay consistent because the controller remembers the state
(if that is a reasonable way to put it).

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#14

Ben

bench@silentmedia.com

about 17 years ago

In reply to: Joshua D. Drake (#13)

Re: Maximum transaction rate

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

A BBU controller is about more than that. It is also supposed to be
about data integrity. The ability to have unexpected outages and have
the drives stay consistent because the controller remembers the state
(if that is a reasonable way to put it).

Of course. But if you can't reliably flush the OS buffers (because, say,
you're using LVM so fsync() doesn't work), then you can't say what
actually has made it to the safety of the raid card.

#15

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Ben (#14)

Re: Maximum transaction rate

On Fri, 2009-03-13 at 11:41 -0700, Ben Chobot wrote:

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

Of course. But if you can't reliably flush the OS buffers (because, say,
you're using LVM so fsync() doesn't work), then you can't say what
actually has made it to the safety of the raid card.

Good point. So the next question of course is, does EVMS do it right?

http://evms.sourceforge.net/

This is actually a pretty significant issue.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#16

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Ben (#14)

Re: Maximum transaction rate

On Fri, 2009-03-13 at 11:41 -0700, Ben Chobot wrote:

On Fri, 13 Mar 2009, Joshua D. Drake wrote:

It seems to me that all you get with a BBU-enabled card is the ability to
get burts of writes out of the OS faster. So you still have the problem,
it's just less like to be encountered.

A BBU controller is about more than that. It is also supposed to be
about data integrity. The ability to have unexpected outages and have
the drives stay consistent because the controller remembers the state
(if that is a reasonable way to put it).

Of course. But if you can't reliably flush the OS buffers (because, say,
you're using LVM so fsync() doesn't work), then you can't say what
actually has made it to the safety of the raid card.

Wait, actually a good BBU RAID controller will disable the cache on the
drives. So everything that is cached is already on the controller vs.
the drives itself.

Or am I missing something?

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#17

Christophe Pettus

xof@thebuild.com

about 17 years ago

In reply to: Joshua D. Drake (#16)

Re: Maximum transaction rate

On Mar 13, 2009, at 11:59 AM, Joshua D. Drake wrote:

Wait, actually a good BBU RAID controller will disable the cache on
the
drives. So everything that is cached is already on the controller vs.
the drives itself.

Or am I missing something?

Maybe I'm missing something, but a BBU controller moves the "safe
point" from the platters to the controller, but it doesn't move it all
the way into the OS.

So, if the software calls fsync, but fsync doesn't actually push the
data to the controller, you are still at risk... right?

#18

Scott Marlowe

scott.marlowe@gmail.com

about 17 years ago

In reply to: Christophe Pettus (#17)

Re: Maximum transaction rate

On Fri, Mar 13, 2009 at 1:09 PM, Christophe <xof@thebuild.com> wrote:

On Mar 13, 2009, at 11:59 AM, Joshua D. Drake wrote:

Wait, actually a good BBU RAID controller will disable the cache on the
drives. So everything that is cached is already on the controller vs.
the drives itself.

Or am I missing something?

Maybe I'm missing something, but a BBU controller moves the "safe point"
from the platters to the controller, but it doesn't move it all the way into
the OS.

So, if the software calls fsync, but fsync doesn't actually push the data to
the controller, you are still at risk... right?

Ding!

#19

Marco Colombo

pgsql@esiway.net

about 17 years ago

In reply to: Scott Marlowe (#18)

Re: Maximum transaction rate

Scott Marlowe wrote:

On Fri, Mar 13, 2009 at 1:09 PM, Christophe <xof@thebuild.com> wrote:

So, if the software calls fsync, but fsync doesn't actually push the data to
the controller, you are still at risk... right?

Ding!

I've been doing some googling, now I'm not sure that not supporting barriers
implies not supporting (of lying) at blkdev_issue_flush(). It seems that
it's pretty common (and well-defined) for block devices to report
-EOPNOTSUPP at BIO_RW_BARRIER requests. device mapper apparently falls in
this category.

See:
http://lkml.org/lkml/2007/5/25/71
this is an interesting discussion on barriers and flushing.

It seems to me that PostgreSQL needs both ordered and synchronous
writes, maybe at different times (not that EVERY write must be both ordered
and synchronous).

You can emulate ordered with single+synchronous althought with a price.
You can't emulate synchronous writes with just barriers.

OPTIMAL: write-barrier-write-barrier-write-barrier-flush

SUBOPTIMAL: write-flush-write-flush-write-flush

As I understand it, fsync() should always issue a real flush: it's unrelated
to the barriers issue.
There's no API to issue ordered writes (or barriers) at user level,
AFAIK. (Uhm... O_DIRECT, maybe implies that?)

FS code may internally issue barrier requests to the block device, for
its own purposes (e.g. journal updates), but there's not useland API for
that.

Yet, there's no reference to DM not supporting flush correctly in the
whole thread... actually there are refereces to the opposite. DM devices
are defined as FLUSHABLE.

Also see:
http://lkml.org/lkml/2008/2/26/41
but it seems to me that all this discussion is under the assuption that
disks have write-back caches.
"The alternative is to disable the disk write cache." says it all.

.TM.

#20

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Marco Colombo (#19)

Re: Maximum transaction rate

On Sat, 2009-03-14 at 05:25 +0100, Marco Colombo wrote:

Scott Marlowe wrote:

Also see:
http://lkml.org/lkml/2008/2/26/41
but it seems to me that all this discussion is under the assuption that
disks have write-back caches.
"The alternative is to disable the disk write cache." says it all.

If this applies to raid based cache as well then performance is going to
completely tank. For users of Linux + PostgreSQL using LVM.

Joshua D. Drake

.TM.

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#21

Marco Colombo

pgsql@esiway.net

about 17 years ago

In reply to: Joshua D. Drake (#20)

#22

Joshua D. Drake

jd@commandprompt.com

about 17 years ago

In reply to: Marco Colombo (#21)

#23

Marco Colombo