PostgreSQL on 64 bit Linux

Started by Naz Gassiepover 19 years ago61 messages
#1Naz Gassiep
naz@mira.net

I have a PostgreSQL installation on a Debian box that had the 64bit SMP
kernel installed before PostgreSQL was compiled and installed on it.
Does PostgreSQL take any advantage of the 64 bit environment or have we
not done anything to move into the 64 bit world yet?
Regards,
- Naz

#2Douglas McNaught
doug@mcnaught.org
In reply to: Naz Gassiep (#1)
Re: PostgreSQL on 64 bit Linux

Naz Gassiep <naz@mira.net> writes:

I have a PostgreSQL installation on a Debian box that had the 64bit
SMP kernel installed before PostgreSQL was compiled and installed on
it. Does PostgreSQL take any advantage of the 64 bit environment or
have we not done anything to move into the 64 bit world yet?

Depends on whether PG was compiled as 64-bit or 32-bit--is your
toolchain 64-bit all the way, or is it just the kernel?

-Doug

#3Naz Gassiep
naz@mira.net
In reply to: Douglas McNaught (#2)
Re: PostgreSQL on 64 bit Linux

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Douglas McNaught wrote:
<blockquote cite="mid87mz9zkue1.fsf@suzuka.mcnaught.org" type="cite">
<pre wrap="">Naz Gassiep <a class="moz-txt-link-rfc2396E" href="mailto:naz@mira.net">&lt;naz@mira.net&gt;</a> writes:

</pre>
<blockquote type="cite">
<pre wrap="">I have a PostgreSQL installation on a Debian box that had the 64bit
SMP kernel installed before PostgreSQL was compiled and installed on
it. Does PostgreSQL take any advantage of the 64 bit environment or
have we not done anything to move into the 64 bit world yet?
</pre>
</blockquote>
<pre wrap=""><!---->
Depends on whether PG was compiled as 64-bit or 32-bit--is your
toolchain 64-bit all the way, or is it just the kernel?

-Doug</pre>
</blockquote>
I just compiled as the manual says. I guess I must have compiled it in
32. I'll recompile in 64 when I upgrade to 8.2 when it's out.<br>
Thanks,<br>
- Naz.<br>
</body>
</html>

#4Douglas McNaught
doug@mcnaught.org
In reply to: Naz Gassiep (#3)
Re: PostgreSQL on 64 bit Linux

Naz Gassiep <naz@mira.net> writes:

I just compiled as the manual says. I guess I must have compiled it
in 32. I'll recompile in 64 when I upgrade to 8.2 when it's out.

The 'file' command will tell you whether a binary is 32- or 64-bit.

If you have a full 64-bit install, you'll get a 64-bit compile by
default, but it sounds like you just added a 64-bit kernel to a 32-bit
Debian system?

-Doug

#5Noname
mark@mark.mielke.cc
In reply to: Douglas McNaught (#2)
Re: PostgreSQL on 64 bit Linux

On Sun, Aug 20, 2006 at 04:46:30PM -0400, Douglas McNaught wrote:

Naz Gassiep <naz@mira.net> writes:

I have a PostgreSQL installation on a Debian box that had the 64bit
SMP kernel installed before PostgreSQL was compiled and installed on
it. Does PostgreSQL take any advantage of the 64 bit environment or
have we not done anything to move into the 64 bit world yet?

Depends on whether PG was compiled as 64-bit or 32-bit--is your
toolchain 64-bit all the way, or is it just the kernel?

I think he means - have benchmarks, or profiling been done with the
goal of specifically improving performance on 64-bit platforms.

For most applications available today, the answer is no. Compiling an
application designed for 32-bit, on a 64-bit architecture, does not
automatically improve performance. Too frequently, it can actually
reduce performance. Pointers are large, which means that any
application that is heavily pointer based can be forced to deal with
twice as many copies of memory, which reduces the effectiveness of
the various cache levels, and RAM itself.

Hopefully GLIBC counts here, in that it should contain 64-bit specific
code where it might count, so libc calls should be able to take
advantage of the 64-bit machine instructions.

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#6Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Noname (#5)
Re: PostgreSQL on 64 bit Linux

On 8/21/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

That's certainly just a reference to the wintel world? AIX, HP-UX
and Solaris-Sparc have been 64-bit for a while now...

Cheers,
mark

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

#7Douglas McNaught
doug@mcnaught.org
In reply to: Noname (#5)
Re: PostgreSQL on 64 bit Linux

mark@mark.mielke.cc writes:

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

I find this question a bit amusing, since PG has run on 64-bit
architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while
now. :)

As I said in a private email to Naz, the main advantage I think you'd
see from 64-bit is the ability to run with more than 2GB or so of
shared buffers on a system with lots of RAM. Whether you'd want to do
that, or let the OS do most of the buffering, is an open question...

-Doug

#8Luke Lonergan
llonergan@greenplum.com
In reply to: Naz Gassiep (#1)
Re: PostgreSQL on 64 bit Linux

Naz,

On 8/20/06 12:59 PM, "Naz Gassiep" <naz@mira.net> wrote:

I have a PostgreSQL installation on a Debian box that had the 64bit SMP
kernel installed before PostgreSQL was compiled and installed on it.
Does PostgreSQL take any advantage of the 64 bit environment or have we
not done anything to move into the 64 bit world yet?

Very likely the default gcc compiles for 64-bit, if not you need to specify
"-m64". As another respondent said - do a "file `which initdb`" to find out
whether you have compiled for 64-bit or not.

WRT 64-bit and Postgres, it depends on the CPU as to whether you see a
simple performance benefit. On the Opteron you will see a benefit when
doing CPU bound work. When doing the CPU portion, the additional registers
of the Opteron running in 64-bit mode are used by the compiler to produce a
20-30% boost in performance. On the Xeon in 64-bit mode, the same regions
of execution will slow down by about 5%.

Postgres benefits automatically from the larger memory addressing of the
64-bit kernel by using the larger I/O cache of Linux.

- Luke

#9Joshua D. Drake
jd@commandprompt.com
In reply to: Luke Lonergan (#8)
Re: PostgreSQL on 64 bit Linux

WRT 64-bit and Postgres, it depends on the CPU as to whether you see a
simple performance benefit. On the Opteron you will see a benefit when
doing CPU bound work. When doing the CPU portion, the additional registers
of the Opteron running in 64-bit mode are used by the compiler to produce a
20-30% boost in performance. On the Xeon in 64-bit mode, the same regions
of execution will slow down by about 5%.

Is that true of even Woodcrest?

Joshua D. Drake

Postgres benefits automatically from the larger memory addressing of the
64-bit kernel by using the larger I/O cache of Linux.

- Luke

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#10mdean
mdean@xn1.com
In reply to: Luke Lonergan (#8)
Replication

One person who commented on the The business of Postbrsql made this comment:

Posted Aug 3, 2006 8:45 UTC (Thu) by subscriber *jgarzik* [Link
<http://lwn.net/Articles/193946/&gt;]Cluster immaturity. MySQL has been
shipping a workable single-master replication+failover for quite a while
now in most Linux distros. MySQL's multi-master solution, while
requiring RAM (not disk) for storage, is also well-integrated and
deployed in production.

In contrast, the PostgreSQL team has chosen to provide hooks for
replication and failover. This has led to a situation where there are
multiple projects supporting replications/failover, none of which are
production-ready nor shipped in a modern Linux distro.

Modern systems *must* scale beyond a single computer, and the PostgreSQL
support shipped in modern Linux distros is completely incapable of this.

I really would appreciate a response. Thanks~ Michael

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.3/423 - Release Date: 8/18/2006

#11Joshua D. Drake
jd@commandprompt.com
In reply to: mdean (#10)
Re: Replication

In contrast, the PostgreSQL team has chosen to provide hooks for
replication and failover. This has led to a situation where there are
multiple projects supporting replications/failover, none of which are
production-ready nor shipped in a modern Linux distro.

And no, we don't really provide hooks :). However there are several
projects trying to solve different problems with PostgreSQL.

Modern systems *must* scale beyond a single computer, and the PostgreSQL
support shipped in modern Linux distros is completely incapable of this.

Slony-I is quite capable as a production class FOSS replication system
and is in use widely.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#12Luke Lonergan
llonergan@greenplum.com
In reply to: Joshua D. Drake (#9)
Re: PostgreSQL on 64 bit Linux

Josh,

On 8/20/06 8:52 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:

Is that true of even Woodcrest?

Joshua D. Drake

Not sure - haven't read anything about the register set on the Core 2 to
make me think it benefits from 64 bit.

The point may be academic from now on though - the comparisons between
Opteron and Core 2 will all likely be in 64-bit mode from now on.

- Luke

#13Alexander Kirpa
postgres@bilteks.com
In reply to: Luke Lonergan (#12)
Re: PostgreSQL on 64 bit Linux

WRT 64-bit and Postgres, it depends on the CPU as to whether you
see a simple performance benefit. On the Opteron you will see a
benefit when doing CPU bound work. When doing the CPU portion, the
additional registers of the Opteron running in 64-bit mode are used
by the compiler to produce a 20-30% boost in performance. On the
Xeon in 64-bit mode, the same regions of execution will slow down
by about 5%.

Postgres benefits automatically from the larger memory addressing
of the 64-bit kernel by using the larger I/O cache of Linux.

Main benefit Postgres in 64-bit mode possible only in case dedicated
DB server on system with RAM > 3GB and use most part of RAM for
shared buffers and avoid persistent moving buffers between OS cache
and shared memory. On system with RAM below 2-3GB to difficult found
serious gain of performance.

Best regards,
Alexander Kirpa

#14Fujii Masao
fujii.masao@oss.ntt.co.jp
In reply to: Joshua D. Drake (#11)
Re: Replication

Joshua D. Drake wrote:

Modern systems *must* scale beyond a single computer, and the PostgreSQL
support shipped in modern Linux distros is completely incapable of this.

Slony-I is quite capable as a production class FOSS replication system
and is in use widely.

Slony-I is not enough because it can cause the inconsistency of data between servers.
IMO, log-based replication is needed also for PostgreSQL just like MySQL.

Regards;

#15Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Fujii Masao (#14)
Re: Replication

Fujii Masao wrote:

Joshua D. Drake wrote:

Modern systems *must* scale beyond a single computer, and the
PostgreSQL support shipped in modern Linux distros is completely
incapable of this.

Slony-I is quite capable as a production class FOSS replication system
and is in use widely.

Slony-I is not enough because it can cause the inconsistency of data
between servers.

hmm what are you refering to here ? slony1 does row-level replication
(something that MySQL cannot do until 5.1 which is still beta) - so it
should not be possible to cause data-inconsistency.
It is however async replication so you can loose data commited on the
master but not yet replicated to the slaves in case you loose the master
completely.

Stefan

#16Fujii Masao
fujii.masao@oss.ntt.co.jp
In reply to: Stefan Kaltenbrunner (#15)
Re: Replication

Stefan Kaltenbrunner wrote:

It is however async replication so you can loose data commited on the
master but not yet replicated to the slaves in case you loose the master
completely.

Yes, here is an insufficient point of Slony-I, i think.
Most systems will not permit the committed data to be lost, so use is limited.

IMO, log-based replication is needed also for PostgreSQL just like MySQL.

Well, I had misunderstood MySQL. Its replication is also asynchronous.

regards;

#17Noname
mark@mark.mielke.cc
In reply to: Andrej Ricnik-Bay (#6)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 02:56:10PM +1200, Andrej Ricnik-Bay wrote:

On 8/21/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

That's certainly just a reference to the wintel world? AIX, HP-UX
and Solaris-Sparc have been 64-bit for a while now...

I don't think so. In the Open Source world, most projects are still 32-bit
centric, regardless of how many years the products have been supported on
64-bit platforms.

What application were you thinking of that takes full advantage of 64-bit,
making the 64-bit application much significantly faster than the 32-bit
application? The only area I am aware of, is video processing.

It's often a surprise to people that an upgrade to 64-bit, regardless of
CPU architecture, too often ends up slower, rather than faster.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#18Noname
mark@mark.mielke.cc
In reply to: Douglas McNaught (#7)
Re: PostgreSQL on 64 bit Linux

On Sun, Aug 20, 2006 at 11:00:26PM -0400, Douglas McNaught wrote:

mark@mark.mielke.cc writes:

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

I find this question a bit amusing, since PG has run on 64-bit
architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while
now. :)

I don't think so. Software can be designed to take best advantage of
hardware. Recompiling it for a different architecture, running test
cases, and declaring support, is not the same as optimizing for.

As I said in a private email to Naz, the main advantage I think you'd
see from 64-bit is the ability to run with more than 2GB or so of
shared buffers on a system with lots of RAM. Whether you'd want to do
that, or let the OS do most of the buffering, is an open question...

This is what I mean by after thought. PostgreSQL is designed for
32-bit processors. Which is fine. I'm not complaining. The question
was whether there is an interest in pursuing 64-bit specific
optimizations. In the PostgreSQL code, a quick check points me only to
"has long int 64" as a 64-bit source code #ifdef. Of the six places
that reference this, five of them actually slow down the code, as they
check for overflow of the 'long int' result beyond 4 bytes of
data. The sixth place is used to define the 64-bit type in use by
PostgreSQL, which I suspect is infrequently used.

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:

1) require significant re-architecture

2) reduce the performance in a 32-bit world

It's a question that only half interests me. As with most projects, I
don't think the projects are ready to re-architect for this
purpose. Perhaps once 50%+ of people are running PostgreSQL in 64-bit
mode, the question will be more serious to more people.

As a half interesting question, I'm defending it as a valid question.
Please don't write it off, but it is fine to say "not yet, we have more
important things to work on".

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#19Martijn van Oosterhout
kleptog@svana.org
In reply to: Noname (#18)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 09:16:46AM -0400, mark@mark.mielke.cc wrote:

This is what I mean by after thought. PostgreSQL is designed for
32-bit processors. Which is fine. I'm not complaining. The question
was whether there is an interest in pursuing 64-bit specific
optimizations. In the PostgreSQL code, a quick check points me only to
"has long int 64" as a 64-bit source code #ifdef. Of the six places
that reference this, five of them actually slow down the code, as they
check for overflow of the 'long int' result beyond 4 bytes of
data. The sixth place is used to define the 64-bit type in use by
PostgreSQL, which I suspect is infrequently used.

There are two defines, the end result being to declare an int64 type
which is used a fair bit around the place. biginteger and bigserial
being the obvious ones.

The checks I see relate to strtol, where the code only wants an int4.
There's no strtoi so on 32 bit the range check is built-in, but if long
is 64 bit you have to do the check seperatly.

That's just an interface problem, there's not a lot we can do about
that really.

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:

1) require significant re-architecture

2) reduce the performance in a 32-bit world

Can you think of any places at all where 64-bit would make a difference
to processing? 64-bit gives you more memory, and on some x86 chips, more
registers, but that's it.

Have anice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

In reply to: Noname (#18)
Re: PostgreSQL on 64 bit Linux

mark@mark.mielke.cc wrote:

This is what I mean by after thought. PostgreSQL is designed for
32-bit processors. Which is fine. I'm not complaining. The question
was whether there is an interest in pursuing 64-bit specific
optimizations. In the PostgreSQL code, a quick check points me only to
"has long int 64" as a 64-bit source code #ifdef. Of the six places
that reference this, five of them actually slow down the code, as they
check for overflow of the 'long int' result beyond 4 bytes of
data. The sixth place is used to define the 64-bit type in use by
PostgreSQL, which I suspect is infrequently used.

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:

1) require significant re-architecture

2) reduce the performance in a 32-bit world

Just out of intereset - what areas in postgres do you think could be
improved (performance wise) on 64-bit machines? The only area that
I can see is the int64 datatype - it's stored in palloc()'ed memory
on 32-bit machines AFAIK - I'm not sure if it uses the "long long"
datatype on 64-bit archs.. But I can't imagine any other area that
could be tuned by making use of (native) 64-bit ints.

greetings, Florian Pflug

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#19)
Re: PostgreSQL on 64 bit Linux

Martijn van Oosterhout <kleptog@svana.org> writes:

Can you think of any places at all where 64-bit would make a difference
to processing? 64-bit gives you more memory, and on some x86 chips, more
registers, but that's it.

It would be interesting to look into making int8 and float8 be
pass-by-value datatypes to save palloc overhead on machines where
Datum is going to be 8 bytes wide anyway. Of course this would
only help queries that use those datatypes extensively, and it
might be an insignificant savings anyhow :-(

For the most part though I think that the main advantage of 64-bit
for a database is the ability to address more memory. We've been
working slowly towards allowing PG to make better use of large
numbers of shared buffers, for instance.

regards, tom lane

#22Douglas McNaught
doug@mcnaught.org
In reply to: Noname (#18)
Re: PostgreSQL on 64 bit Linux

mark@mark.mielke.cc writes:

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:

1) require significant re-architecture

2) reduce the performance in a 32-bit world

Honestly, I think the main "optimizations" happen automatically: the
compiler uses the extra registers (if present) and pointers in LP64
mode are automatically 64-bit, which allows much larger memory areas.
That's probably 95% of the win right there. What other
"optimizations" need doing?

People have been running PG with big databases on 64-bit iron for
years; don't you think any glaring infelicities would have been
addressed?

It's a question that only half interests me. As with most projects, I
don't think the projects are ready to re-architect for this
purpose.

What re-architecting would be needed?

-Doug

#23Joshua D. Drake
jd@commandprompt.com
In reply to: Fujii Masao (#16)
Re: Replication

Fujii Masao wrote:

Stefan Kaltenbrunner wrote:

It is however async replication so you can loose data commited on the
master but not yet replicated to the slaves in case you loose the master
completely.

Yes, here is an insufficient point of Slony-I, i think.
Most systems will not permit the committed data to be lost, so use is
limited.

Wanna bet?

It is very, very common to have asynchronous replication. I would say
the need for synchronous is far more limited (although greater desired).

Joshua D. Drake

IMO, log-based replication is needed also for PostgreSQL just like
MySQL.

Well, I had misunderstood MySQL. Its replication is also asynchronous.

regards;

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#24Noname
mark@mark.mielke.cc
In reply to: Douglas McNaught (#22)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 10:23:16AM -0400, Douglas McNaught wrote:

mark@mark.mielke.cc writes:

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:
1) require significant re-architecture
2) reduce the performance in a 32-bit world

Honestly, I think the main "optimizations" happen automatically: the
compiler uses the extra registers (if present) and pointers in LP64
mode are automatically 64-bit, which allows much larger memory areas.
That's probably 95% of the win right there. What other
"optimizations" need doing?

Compilers are getting smarter, but having code profiled, and hand tuned
for 32-bits does not automatically get compiled with a 64-bit compiler,
to code that has been profile, and hand tuned for 64-bit.

An example of this is in how algorithms are implemented. For
Intel-style processors at least, using array offsets is cheaper than
using many pointers. For 64-bit processors, 32-bit array offsets would
use less memory.

For re-architecture, the first thing that came to mind was Datum
increasing in size. Copy by value instead of allocating a pointer,
and passing the pointer. The pointer will be 64-bits, so Datum
could support 64-bits as well. Tom caught this, but he did point
out what I missed. This would only benefit data types that are
64-bits in length, which are still likely uncommon (most people
would use serial, not bigserial).

Has anybody taken any of the 64-bit processor optimization manuals,
and walked through it, cross referencing it against bottleneck
programs in PostgreSQL? To counter this, I suggest to you that most
every attempt to optimize PostgreSQL for performance reasons, likely
considered variations in algorithms measured on a 32-bit processor,
finally selecting the version that was simplest, and best performing,
on a 32-bit processor. This is normal evolution for a piece of
software. It is naturally optimized for the processor that is most
used by the users.

I'm not the expert on the matter. I don't see evidence that anybody
has taken this route though.

People have been running PG with big databases on 64-bit iron for
years; don't you think any glaring infelicities would have been
addressed?

I doubt there are 'glaring infelicities'. I doubt any changes required
to provide a noticeable speed improvement would be one liners. They will
not be obvious. They will require a complete understanding of the many
architectural differences between common 32-bit processors, and common
64-bit processors. Few have this information in their head. I don't.
I suspect that anybody who claims that the only difference is the amount
of addressable memory, and the number of registers doesn't. :-)

It's a question that only half interests me. As with most projects, I
don't think the projects are ready to re-architect for this
purpose.

What re-architecting would be needed?

I'm asking that it be admitted that it has not been looked at. Not
seriously. Nor is this unique to PostgreSQL. I expect the performance
for Linux applications in general to slowly improve on 64-bit
processors as more and more people begin to use it. I recall a few
threads on other mailing lists where the primary developers admitted
to not having ready access to a 64-bit machine. Until they do, the
chance that these applications are hand-tuned for 64-bit is unlikely.

I'm not claiming that I have the answers. I am claiming that a few of
you share this weakness, and that an assumption that PostgreSQL is
designed for 64-bit would be incorrect. :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#25Noname
mark@mark.mielke.cc
In reply to: Alexander Kirpa (#13)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote:

WRT 64-bit and Postgres, it depends on the CPU as to whether you
see a simple performance benefit. On the Opteron you will see a
benefit when doing CPU bound work. When doing the CPU portion, the
additional registers of the Opteron running in 64-bit mode are used
by the compiler to produce a 20-30% boost in performance. On the
Xeon in 64-bit mode, the same regions of execution will slow down
by about 5%.

Postgres benefits automatically from the larger memory addressing
of the 64-bit kernel by using the larger I/O cache of Linux.

Main benefit Postgres in 64-bit mode possible only in case dedicated
DB server on system with RAM > 3GB and use most part of RAM for
shared buffers and avoid persistent moving buffers between OS cache
and shared memory. On system with RAM below 2-3GB to difficult found
serious gain of performance.

This is the main difference between PostgreSQL today - designed for
32-bit - when recompiled with a 64-bit compiler.

The additional registers are barely enough to counter the increased
cost of processing in 64-bits.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#26Martijn van Oosterhout
kleptog@svana.org
In reply to: Noname (#24)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 10:46:56AM -0400, mark@mark.mielke.cc wrote:

Compilers are getting smarter, but having code profiled, and hand tuned
for 32-bits does not automatically get compiled with a 64-bit compiler,
to code that has been profile, and hand tuned for 64-bit.

I don't see any evidence that anyone has any kind of hand-tuning, not
for 32-bit, not for 64-bit. That's compiler's job really.

That's said, if you want to go hand-tuning, go for the low-hanging
fruit first: using likely()/unlikely() in often used code paths
(heap_get_attr for example), find the parts of the code that could
benefit from strict-aliasing, etc.

In general we avoid architechture specific code because the cost of
maintainence outweighs the benefits.

For re-architecture, the first thing that came to mind was Datum
increasing in size. Copy by value instead of allocating a pointer,
and passing the pointer. The pointer will be 64-bits, so Datum
could support 64-bits as well. Tom caught this, but he did point
out what I missed. This would only benefit data types that are
64-bits in length, which are still likely uncommon (most people
would use serial, not bigserial).

A Datum must be 64-bit on a 64-bit arch, how else is it going to hold
the pointer for a pass-by-ref value?

On the whole, PostgreSQL is still in the phase where we're trying to
find the best algorithms. When that's done (if ever), we can start
worrying about processor optimisations...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#27AgentM
agentm@themactionfaction.com
In reply to: Joshua D. Drake (#23)
Re: Replication

On Aug 21, 2006, at 10:30 , Joshua D. Drake wrote:

Fujii Masao wrote:

Stefan Kaltenbrunner wrote:

It is however async replication so you can loose data commited on
the
master but not yet replicated to the slaves in case you loose the
master
completely.

Yes, here is an insufficient point of Slony-I, i think.
Most systems will not permit the committed data to be lost, so use
is limited.

Wanna bet?

It is very, very common to have asynchronous replication. I would
say the need for synchronous is far more limited (although greater
desired).

I would imagine that multi-master synchronous replication would be
fairly trivial to implement with 2PC and wal-shipping available, no?

#28Joshua D. Drake
jd@commandprompt.com
In reply to: AgentM (#27)
Re: Replication

It is very, very common to have asynchronous replication. I would say
the need for synchronous is far more limited (although greater desired).

I would imagine that multi-master synchronous replication would be
fairly trivial to implement with 2PC and wal-shipping available, no?

Trivial? I would say... no. There is a reason it hasn't been done yet,
and a reason why CMD (for example) has zero plans to even try.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#29Markus Schiltknecht
markus@bluegap.ch
In reply to: AgentM (#27)
Re: Replication

Hi,

AgentM wrote:

I would imagine that multi-master synchronous replication would be
fairly trivial to implement with 2PC and wal-shipping available, no?

Yes, that could be done. And AFAIK eigter pgpool or PgCluster (1) try to
do sync, multi-master replication that way.

The problem is that such an implementation is very network intensive and
scales very badly for writing transactions. If you're interested in a
good sync, multi-master replication algorithm, you might want to have a
look at what Slony-II [1]http://www.slony2.org/wiki/index.php?title=Documentation or Postgres-R [2]http://www.postgres-r.org/ try to do. I also recommend
reading "Don't be lazy, be consistent" of Bettina Kemme [3]http://www.cs.mcgill.ca/~kemme/papers/vldb00.html.

Regards

Markus

[1]: http://www.slony2.org/wiki/index.php?title=Documentation
[2]: http://www.postgres-r.org/
[3]: http://www.cs.mcgill.ca/~kemme/papers/vldb00.html

#30Noname
mark@mark.mielke.cc
In reply to: Martijn van Oosterhout (#26)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 05:13:58PM +0200, Martijn van Oosterhout wrote:

On the whole, PostgreSQL is still in the phase where we're trying to
find the best algorithms. When that's done (if ever), we can start
worrying about processor optimisations...

I don't disagree. :-)

You claimed that PostgreSQL is not tuned to a specific processor, and
relies on the kernel, the compiler, and libc to perform some tuning. I
agree with this as well.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#31Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Fujii Masao (#16)
Re: Replication

Fujii Masao wrote:

Stefan Kaltenbrunner wrote:

It is however async replication so you can loose data commited on the
master but not yet replicated to the slaves in case you loose the master
completely.

Yes, here is an insufficient point of Slony-I, i think.
Most systems will not permit the committed data to be lost, so use is
limited.

not sure i agree with "most systems" here - a _LOT_ of use cases
actually want async (and note that slony1 can do a controlled failover
without any transactions lost).

Nevertheless there are also points for having sync-replication but
calling slony1 "insufficient" in that regard is a bit much since it is
actually designed to be async and does quite a good job with that.

Stefan

#32Jeff Davis
pgsql@j-davis.com
In reply to: AgentM (#27)
Re: Replication

On Mon, 2006-08-21 at 11:33 -0400, AgentM wrote:

I would imagine that multi-master synchronous replication would be
fairly trivial to implement with 2PC and wal-shipping available, no?

How does WAL shipping help synchronous replication?

Regards,
Jeff Davis

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#24)
Re: PostgreSQL on 64 bit Linux

mark@mark.mielke.cc writes:

What re-architecting would be needed?

I'm asking that it be admitted that it has not been looked at. Not
seriously. Nor is this unique to PostgreSQL. I expect the performance
for Linux applications in general to slowly improve on 64-bit
processors as more and more people begin to use it. I recall a few
threads on other mailing lists where the primary developers admitted
to not having ready access to a 64-bit machine. Until they do, the
chance that these applications are hand-tuned for 64-bit is unlikely.

What you seem to be asking for is that we engage in a lot of
machine-specific tuning for some particular processor or other.
That has not happened and will not happen. This project believes
in portable code, not "optimized for Xeon-of-the-month" code.

regards, tom lane

#34Markus Schiltknecht
markus@bluegap.ch
In reply to: Jeff Davis (#32)
Re: Replication

Jeff Davis wrote:

How does WAL shipping help synchronous replication?

The WAL is written _before_ commit, logging all the changes the
transaction wants to write to the disk. This makes it look very similar
to what is needed for synchronous replication.

Instead of waiting for confirmation from the disk, that your WAL has
been written, you would have to wait from the network to confirm that
the changes have been delivered via the network.

Regards

Markus

#35Alvaro Herrera
alvherre@commandprompt.com
In reply to: Markus Schiltknecht (#34)
Re: Replication

Markus Schiltknecht wrote:

Jeff Davis wrote:

How does WAL shipping help synchronous replication?

The WAL is written _before_ commit, logging all the changes the
transaction wants to write to the disk. This makes it look very similar
to what is needed for synchronous replication.

Instead of waiting for confirmation from the disk, that your WAL has
been written, you would have to wait from the network to confirm that
the changes have been delivered via the network.

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So bad,
that I don't think anyone will want to use such a replication system ...

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#36Noname
mark@mark.mielke.cc
In reply to: Tom Lane (#33)
Re: PostgreSQL on 64 bit Linux

On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote:

mark@mark.mielke.cc writes:

What re-architecting would be needed?

I'm asking that it be admitted that it has not been looked at. Not
seriously. Nor is this unique to PostgreSQL. I expect the performance
for Linux applications in general to slowly improve on 64-bit
processors as more and more people begin to use it. I recall a few
threads on other mailing lists where the primary developers admitted
to not having ready access to a 64-bit machine. Until they do, the
chance that these applications are hand-tuned for 64-bit is unlikely.

What you seem to be asking for is that we engage in a lot of
machine-specific tuning for some particular processor or other.
That has not happened and will not happen. This project believes
in portable code, not "optimized for Xeon-of-the-month" code.

Hi Tom.

I'm asking for a statement such as the above, that you provide. :-)

The question was whether PostgreSQL was optimized for 64-bit Linux.
The answer is "not specifically".

The answer is quite acceptable to me. It's not the original answer that
was given to the original poster though... :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#36)
Re: PostgreSQL on 64 bit Linux

mark@mark.mielke.cc writes:

On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote:

What you seem to be asking for is that we engage in a lot of
machine-specific tuning for some particular processor or other.

The question was whether PostgreSQL was optimized for 64-bit Linux.
The answer is "not specifically".

Well, my notion of "optimizing for 64 bit" seems to have little to do
with yours. To me, stuff like making sure we can use lots of buffers
effectively is a 64-bit optimization. What you seem to have in mind is
assembly-code-level optimization for specific processors.

regards, tom lane

#38Gregory Maxwell
gmaxwell@gmail.com
In reply to: Alvaro Herrera (#35)
Re: Replication

On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So bad,
that I don't think anyone will want to use such a replication system ...

Okay. I give up... Why is waiting for fsync on a fast local network
which takes 15us to send a message (infiniband is cheap..) an
unimaginable delay when we tolerate a local 8ms fsync delay on systems
without writeback cache?

#39Jeff Davis
pgsql@j-davis.com
In reply to: Markus Schiltknecht (#34)
Re: Replication

On Mon, 2006-08-21 at 19:42 +0200, Markus Schiltknecht wrote:

Jeff Davis wrote:

How does WAL shipping help synchronous replication?

The WAL is written _before_ commit, logging all the changes the
transaction wants to write to the disk. This makes it look very similar
to what is needed for synchronous replication.

Instead of waiting for confirmation from the disk, that your WAL has
been written, you would have to wait from the network to confirm that
the changes have been delivered via the network.

Synchronous replication (to me) means that the data has been written to
permanent storage on all masters and all slaves before any master or
slave reports a successful COMMIT. Are you suggesting that you ship the
WAL over the network, wait for it to be written to the slave, and then
report a successful commit?

Regards,
Jeff Davis

#40D'Arcy J.M. Cain
darcy@druid.net
In reply to: Gregory Maxwell (#38)
Re: Replication

On Mon, 21 Aug 2006 14:46:05 -0400
"Gregory Maxwell" <gmaxwell@gmail.com> wrote:

On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So bad,
that I don't think anyone will want to use such a replication system ...

Okay. I give up... Why is waiting for fsync on a fast local network
which takes 15us to send a message (infiniband is cheap..) an
unimaginable delay when we tolerate a local 8ms fsync delay on systems
without writeback cache?

OK, that solves your problem. How about my problem where replication
has to happen on servers in three countries on two continents and
thousands of updates a second have to happen in less that 10ms? This is
the critical issue with replication - one size does not fit all.
Syncronous replication, in particular, fits almost no one.

My experience is that any replication needs to be based on your business
rules which will vary widely.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
#41Markus Schiltknecht
markus@bluegap.ch
In reply to: Alvaro Herrera (#35)
Re: Replication

Alvaro Herrera wrote:

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So bad,
that I don't think anyone will want to use such a replication system ...

Yeah, that's the big problem of sync, multi-master replication. IMHO the
key to high performance sync, multi-master replication is to minimize
the efforts needed to get that confirmation.

A hard drive has the built-in functionality to confirm what has been
written (and hopefully it does not lie about that). A cluster does not
have this ability. Now, what does it mean to get a confirmation from the
cluster at all? First of all, you have to be sure the cluster has
received your message, then you also need to be sure it can apply your
changes (the so called WriteSet).

Unfortunately 'the cluster' consists of many independent machines. If
you wait for every single machine in the cluster to confirm being able
to apply your WriteSet, you won't get beyond single-machine performance
for writing transactions.

Postgres-R uses a Group Communication System to get very fast
'confirmation' by optimizing the logic a little: like with patches, you
can be sure a WriteSet can be applied if you try to apply it to the same
base. So Postgres-R uses totally ordered communication to ensure all
transactions are processed in the very same order on all nodes. Another
optimization is buried in the depth of the GCS: it only guarantees that
the message you sent _will_be_delivered_. So thanks to the GCS you don't
have to wait for all other nodes, but only for the GCS to confirm that
your message will be delivered on the other nodes. Of course the GCS
also needs to send messages around to be able to confirm that, but GCSes
are designed to do exactly that very efficiently.

I hope to have brought some light to the sync, multi-master replication
problem. Please ask if you have more questions. I propose to continue
that discussion on the Postgres-R mailing list I have set up.

Regards

Markus

#42Markus Schiltknecht
markus@bluegap.ch
In reply to: Gregory Maxwell (#38)
Re: Replication

Gregory Maxwell wrote:

infiniband is cheap..

Can I get one? I'd love to run some tests with Postgres-R ;-)

#43AgentM
agentm@themactionfaction.com
In reply to: D'Arcy J.M. Cain (#40)
Re: Replication

On Aug 21, 2006, at 15:00 , D'Arcy J.M. Cain wrote:

On Mon, 21 Aug 2006 14:46:05 -0400
"Gregory Maxwell" <gmaxwell@gmail.com> wrote:

On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So
bad,
that I don't think anyone will want to use such a replication
system ...

Okay. I give up... Why is waiting for fsync on a fast local network
which takes 15us to send a message (infiniband is cheap..) an
unimaginable delay when we tolerate a local 8ms fsync delay on
systems
without writeback cache?

OK, that solves your problem. How about my problem where replication
has to happen on servers in three countries on two continents and
thousands of updates a second have to happen in less that 10ms?
This is
the critical issue with replication - one size does not fit all.
Syncronous replication, in particular, fits almost no one.

My experience is that any replication needs to be based on your
business
rules which will vary widely.

Sure- and more specifically, replication rules may differ on every
table according to those rules. The current solutions are on/off for
a list of tables. I wonder if the various pgsql replication engines
have any problems co-existing...

-M

#44Markus Schiltknecht
markus@bluegap.ch
In reply to: Jeff Davis (#39)
Re: Replication

Jeff Davis wrote:

Synchronous replication (to me) means that the data has been written to
permanent storage on all masters and all slaves before any master or
slave reports a successful COMMIT. Are you suggesting that you ship the
WAL over the network, wait for it to be written to the slave, and then
report a successful commit?

I'm not suggesting doing replication like that. But I think that's what
people mean when they propose that "multi-master synchronous replication
would be fairly trivial to implement with 2PC and wal-shipping".

This is more or less trivial to implement, yes. But it won't scale.

Regards

Markus

#45Alvaro Herrera
alvherre@commandprompt.com
In reply to: AgentM (#43)
Re: Replication

AgentM wrote:

On Aug 21, 2006, at 15:00 , D'Arcy J.M. Cain wrote:

On Mon, 21 Aug 2006 14:46:05 -0400
"Gregory Maxwell" <gmaxwell@gmail.com> wrote:

On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So
bad,
that I don't think anyone will want to use such a replication
system ...

Okay. I give up... Why is waiting for fsync on a fast local network
which takes 15us to send a message (infiniband is cheap..) an
unimaginable delay when we tolerate a local 8ms fsync delay on
systems
without writeback cache?

OK, that solves your problem. How about my problem where replication
has to happen on servers in three countries on two continents and
thousands of updates a second have to happen in less that 10ms?
This is
the critical issue with replication - one size does not fit all.
Syncronous replication, in particular, fits almost no one.

My experience is that any replication needs to be based on your
business
rules which will vary widely.

Sure- and more specifically, replication rules may differ on every
table according to those rules. The current solutions are on/off for
a list of tables. I wonder if the various pgsql replication engines
have any problems co-existing...

Althought I have never tried, I am sure Mammoth Replicator could
coexist relatively sanely with Slony-I.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#46Alexander Kirpa
postgres@bilteks.com
In reply to: Noname (#25)
Re: PostgreSQL on 64 bit Linux

On 21 Aug 2006, at 10:48, mark@mark.mielke.cc wrote:

On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote:

WRT 64-bit and Postgres, it depends on the CPU as to whether you
see a simple performance benefit. On the Opteron you will see a
benefit when doing CPU bound work. When doing the CPU portion,
the additional registers of the Opteron running in 64-bit mode are
used by the compiler to produce a 20-30% boost in performance. On
the Xeon in 64-bit mode, the same regions of execution will slow
down by about 5%.

Postgres benefits automatically from the larger memory addressing
of the 64-bit kernel by using the larger I/O cache of Linux.

Main benefit Postgres in 64-bit mode possible only in case dedicated
DB server on system with RAM > 3GB and use most part of RAM for
shared buffers and avoid persistent moving buffers between OS cache
and shared memory. On system with RAM below 2-3GB to difficult found
serious gain of performance.

This is the main difference between PostgreSQL today - designed for
32-bit - when recompiled with a 64-bit compiler.

The additional registers are barely enough to counter the increased
cost of processing in 64-bits.

Cheers,
mark

Current 32-bit Postgres architecture allow use main benefit
of 64-bit OS - huge memory size for shared buffers.
At current time possible use 2G x 8KB = 16TB as shared memory
and regarding this issue need use (O_DIRECT) to avoid OS cache
especially in case databases fit to shared memory.

Best regards,
Alexander Kirpa

#47D'Arcy J.M. Cain
darcy@druid.net
In reply to: AgentM (#43)
Re: Replication

On Mon, 21 Aug 2006 15:14:10 -0400
AgentM <agentm@themactionfaction.com> wrote:

My experience is that any replication needs to be based on your
business
rules which will vary widely.

Sure- and more specifically, replication rules may differ on every
table according to those rules. The current solutions are on/off for

In fact this is exactly what I did for my app. Some tables were
multi-master and some were required to be modified from a single master.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
#48Hannu Krosing
hannu@skype.net
In reply to: Fujii Masao (#16)
Re: Replication

Ühel kenal päeval, E, 2006-08-21 kell 21:46, kirjutas Fujii Masao:

Stefan Kaltenbrunner wrote:

It is however async replication so you can loose data commited on the
master but not yet replicated to the slaves in case you loose the master
completely.

Yes, here is an insufficient point of Slony-I, i think.
Most systems will not permit the committed data to be lost, so use is limited.

But any sync _replication_ system will have severe impact on
performance. My guess is that for a full sync replication, going from 1
server to 2 will actually lower performance andsome small gains would be
possible only starting from 3rd server.

This has nothing to do with postgreSQL, but is just due to latencies of
memory, disk and network and the need to do remote locking.

My quess is based on using standard 100Gb ethernet. It may be possible
to do better on some more advanced interconnects like myrinet.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#49Hannu Krosing
hannu@skype.net
In reply to: D'Arcy J.M. Cain (#40)
Re: Replication

Ühel kenal päeval, E, 2006-08-21 kell 15:00, kirjutas D'Arcy J.M. Cain:

On Mon, 21 Aug 2006 14:46:05 -0400
"Gregory Maxwell" <gmaxwell@gmail.com> wrote:

On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible. So bad,
that I don't think anyone will want to use such a replication system ...

Okay. I give up... Why is waiting for fsync on a fast local network
which takes 15us to send a message (infiniband is cheap..) an
unimaginable delay when we tolerate a local 8ms fsync delay on systems
without writeback cache?

OK, that solves your problem. How about my problem where replication
has to happen on servers in three countries on two continents and
thousands of updates a second have to happen in less that 10ms?

For this scenario you are far better off with partitioning than
replication.

That is if your data is partitionable. But geographically distributed
data often is.

This is
the critical issue with replication - one size does not fit all.
Syncronous replication, in particular, fits almost no one.

My experience is that any replication needs to be based on your business
rules which will vary widely.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#50Markus Schiltknecht
markus@bluegap.ch
In reply to: Hannu Krosing (#48)
Re: Replication

Hannu Krosing wrote:

But any sync _replication_ system will have severe impact on
performance. My guess is that for a full sync replication, going from 1
server to 2 will actually lower performance andsome small gains would be
possible only starting from 3rd server.

Only testing will show concrete results, but for sure such a general
answer does not make much sense. It all depends very much on the type of
your load. Especially the read/write ratio is very important. Another
important factor is the amount of conflicting transactions.

My quess is based on using standard 100Gb ethernet. It may be possible
to do better on some more advanced interconnects like myrinet.

100Gb ethernet already sounds very advanced... ;-)

Regards

Markus

#51Hannu Krosing
hannu@skype.net
In reply to: Markus Schiltknecht (#50)
Re: Replication

Ühel kenal päeval, K, 2006-08-23 kell 13:09, kirjutas Markus
Schiltknecht:

Hannu Krosing wrote:

But any sync _replication_ system will have severe impact on
performance. My guess is that for a full sync replication, going from 1
server to 2 will actually lower performance andsome small gains would be
possible only starting from 3rd server.

Only testing will show concrete results, but for sure such a general
answer does not make much sense. It all depends very much on the type of
your load. Especially the read/write ratio is very important.

True.

But if you have very few writes, then there seems no reason to do sync
anyway.

Another important factor is the amount of conflicting transactions.

That too, but just the need to do *any* locking on all nodes will
significantly slow down sync replication

My quess is based on using standard 100Gb ethernet. It may be possible
to do better on some more advanced interconnects like myrinet.

100Gb ethernet already sounds very advanced... ;-)

Yeah, I meant some mix of 100Mb and 1Gb ;)

Regards

Markus

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#52Markus Schiltknecht
markus@bluegap.ch
In reply to: Hannu Krosing (#51)
Re: Replication

Hannu Krosing wrote:

But if you have very few writes, then there seems no reason to do sync
anyway.

I think there is one: high-availability. A standby-server which can
continue if your primary fails. Of course sync is only needed if you
absolutely cannot effort loosing any committed transaction.

Another important factor is the amount of conflicting transactions.

That too, but just the need to do *any* locking on all nodes will
significantly slow down sync replication

If you implement sync replication with locking, yes. But there are
better ways: the Postgres-R approach does not do network locking, but
aborts conflicting transactions just before committing. That results in
much less network traffic (one GCS-message per writing-transaction).

Regards

Markus

#53D'Arcy J.M. Cain
darcy@druid.net
In reply to: Hannu Krosing (#49)
Re: Replication

On Wed, 23 Aug 2006 12:42:11 +0300
Hannu Krosing <hannu@skype.net> wrote:

OK, that solves your problem. How about my problem where replication
has to happen on servers in three countries on two continents and
thousands of updates a second have to happen in less that 10ms?

For this scenario you are far better off with partitioning than
replication.

That is if your data is partitionable. But geographically distributed
data often is.

I agree but in this case it wasn't.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
#54Hannu Krosing
hannu@skype.net
In reply to: Markus Schiltknecht (#52)
Re: Replication

Ühel kenal päeval, K, 2006-08-23 kell 13:36, kirjutas Markus
Schiltknecht:

Hannu Krosing wrote:

But if you have very few writes, then there seems no reason to do sync
anyway.

I think there is one: high-availability. A standby-server which can
continue if your primary fails. Of course sync is only needed if you
absolutely cannot effort loosing any committed transaction.

Yes. I guess that for that an enchanced WAL-shipping would be ok.
Probably one where you require both sync to local disk and copy to
remote (without waiting for sync) to have happened in order to
COMMIT/ROLLBACK successfully.

Another important factor is the amount of conflicting transactions.

That too, but just the need to do *any* locking on all nodes will
significantly slow down sync replication

If you implement sync replication with locking, yes. But there are
better ways: the Postgres-R approach does not do network locking, but
aborts conflicting transactions just before committing. That results in
much less network traffic (one GCS-message per writing-transaction).

but it still needs to do at least one network roundtrip + any needed
testing on all nodes + WAL sync on all nodes before it can COMMIT, no?

And I'm afraid that GCS serialisation will need more than one roundtrip
or risk being out-of-date.

I'm not saying that Postgres-R (or any other sync replication) is not
doable or even useful. I just can't see right away, how it can scale
very well for any significant write load.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

#55Markus Schiltknecht
markus@bluegap.ch
In reply to: Hannu Krosing (#54)
Re: Replication

Hi,

Hannu Krosing wrote:

but it still needs to do at least one network roundtrip + any needed
testing on all nodes + WAL sync on all nodes before it can COMMIT, no?

No. It only needs the 'roundtrip' in the sense that a transaction sends
out its writeset and has to wait for the GCS to have it serialized (i.e.
the GCS sends the message back to the sender node).

Then all nodes do the testing and WAL sync independently. (As Neil
recently pointed out in [1]http://pgfoundry.org/pipermail/postgres-r-general/2006-August/000001.html this opens a small risk for data loss in the
case all nodes crash.)

And I'm afraid that GCS serialisation will need more than one roundtrip
or risk being out-of-date.

The spread people did some tests on 20 pentium machines connected via
100mbit ethernet. In [2]The Spread Toolkit: Architecture and Performance by Yair Amir, Claudiu Danilov, Michal Miskin-Amir, John Schultz, Jonathan Stanton http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf they state that it takes between 1.7 to 2.8 ms
(depending on the message size) to 'serialize' a message within this
group of 20 nodes.

I'm not saying that Postgres-R (or any other sync replication) is not
doable or even useful. I just can't see right away, how it can scale
very well for any significant write load.

Sure, sync replication won't solve everybody's problems. But out of all
the sync replication algorithms, Postgres-R is my clear favorite. ;-)

Regards

Markus

[1]: http://pgfoundry.org/pipermail/postgres-r-general/2006-August/000001.html
http://pgfoundry.org/pipermail/postgres-r-general/2006-August/000001.html

[2]: The Spread Toolkit: Architecture and Performance by Yair Amir, Claudiu Danilov, Michal Miskin-Amir, John Schultz, Jonathan Stanton http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf
Claudiu Danilov, Michal Miskin-Amir, John Schultz, Jonathan Stanton
http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf

#56Jeff Davis
pgsql@j-davis.com
In reply to: Markus Schiltknecht (#52)
Re: Replication

On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote:

Hannu Krosing wrote:

But if you have very few writes, then there seems no reason to do sync
anyway.

I think there is one: high-availability. A standby-server which can
continue if your primary fails. Of course sync is only needed if you
absolutely cannot effort loosing any committed transaction.

I disagree about high-availability. In fact, I would say that sync
replication is trading availability and performance for synchronization
(which is a valid tradeoff, but costly).

If you have an async system, all nodes must go down for the system to go
down.

If you have a sync system, if any node goes down the system goes down.
If you plan on doing failover, consider this: what if it's not obvious
which system is still up? What if the network route between the two
systems goes down (or just becomes too slow to replicate over), but
clients can still connect to both servers? Then you have two systems
that both think that the other system went down, and both start
accepting transactions. Now you no longer have replication at all.

Regards,
Jeff Davis

#57Markus Schiltknecht
markus@bluegap.ch
In reply to: Jeff Davis (#56)
Re: Replication

Hi,

Jeff Davis wrote:

I disagree about high-availability. In fact, I would say that sync
replication is trading availability and performance for synchronization
(which is a valid tradeoff, but costly).

In a way, replication is for databases systems what RAID1 is for hard
drives. Having multiple cluster nodes in sync minimizes the risk of a
complete outage due to hardware failure. Thus maximizing availability.
Of course, as you say, traded for performance.

If you have an async system, all nodes must go down for the system to go
down.

Yes. But it takes only one node to go down to potentially lose committed
transactions. In contrast to synchronous replication systems, where a
committed transaction is guaranteed to be 'committed on the cluster'. So
if at least one node of the cluster is up and running, you can be
assured to have consistent data.

Please note that the Postgres-R approach does relax some of these
constraints a little to gain performance. The most obvious result of
these relaxations is that the nodes may 'behind' with replaying
transactions and show a past view of the data.

If you have a sync system, if any node goes down the system goes down.

That's plain wrong.

If you plan on doing failover, consider this: what if it's not obvious
which system is still up? What if the network route between the two
systems goes down (or just becomes too slow to replicate over), but
clients can still connect to both servers? Then you have two systems
that both think that the other system went down, and both start
accepting transactions. Now you no longer have replication at all.

This problem is often called 'network partitioning', which also refers
to a more general case: a group of M nodes being split into two groups
of N and (M-N) nodes (due to network failure or whatever).

In Postgres-R a Group Communication System is used to cover all these
aspects (error detection, congruent agreement on a major group, etc..).

Regards

Markus

#58Chris Browne
cbbrowne@acm.org
In reply to: Luke Lonergan (#8)
Re: Replication

pgsql@j-davis.com (Jeff Davis) writes:

On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote:

Hannu Krosing wrote:

But if you have very few writes, then there seems no reason to do sync
anyway.

I think there is one: high-availability. A standby-server which can
continue if your primary fails. Of course sync is only needed if you
absolutely cannot effort loosing any committed transaction.

I disagree about high-availability. In fact, I would say that sync
replication is trading availability and performance for synchronization
(which is a valid tradeoff, but costly).

If you have an async system, all nodes must go down for the system to go
down.

If you have a sync system, if any node goes down the system goes down.
If you plan on doing failover, consider this: what if it's not obvious
which system is still up? What if the network route between the two
systems goes down (or just becomes too slow to replicate over), but
clients can still connect to both servers? Then you have two systems
that both think that the other system went down, and both start
accepting transactions. Now you no longer have replication at all.

That is why for multimaster, there's a need for both automatic policy
as well as some human intervention.

- You need an automatic determination of "quorum", where, to be safe,
it is only permissible for a set of $m$ servers to believe themselves
to be active if they number more than 1/2 of the total of expected
servers.

Thus, if there are 13 servers in the cluster, then "quorum" is 7
servers.

If a set of 6 servers get cut off from the rest of the network, they
don't number at least 7, and thus know that they can't represent a
quorum.

- And if conditions change, a human may need to change the quorum
number.

If 4 new nodes get added, quorum moves up to 9.

If 5 nodes get dropped, quorum moves down to 5.

Deciding when to throw a node out of the quorum because it is
responding too slowly is still not completely trivial, but having a
quorum policy does address your issue.
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/linux.html
"Be humble. A lot happened before you were born." - Life's Little
Instruction Book

#59Jeff Davis
pgsql@j-davis.com
In reply to: Markus Schiltknecht (#57)
Re: Replication

On Thu, 2006-08-24 at 11:18 +0200, Markus Schiltknecht wrote:

Hi,

Jeff Davis wrote:

I disagree about high-availability. In fact, I would say that sync
replication is trading availability and performance for synchronization
(which is a valid tradeoff, but costly).

In a way, replication is for databases systems what RAID1 is for hard
drives. Having multiple cluster nodes in sync minimizes the risk of a
complete outage due to hardware failure. Thus maximizing availability.
Of course, as you say, traded for performance.

If you have an async system, all nodes must go down for the system to go
down.

Yes. But it takes only one node to go down to potentially lose committed
transactions. In contrast to synchronous replication systems, where a
committed transaction is guaranteed to be 'committed on the cluster'. So
if at least one node of the cluster is up and running, you can be
assured to have consistent data.

Right, that's the cost of asynchronous replication.

Please note that the Postgres-R approach does relax some of these
constraints a little to gain performance. The most obvious result of
these relaxations is that the nodes may 'behind' with replaying
transactions and show a past view of the data.

If you have a sync system, if any node goes down the system goes down.

That's plain wrong.

Ok, maybe not one node, but I don't think I'm totally off base. See my
explanation below.

If you plan on doing failover, consider this: what if it's not obvious
which system is still up? What if the network route between the two
systems goes down (or just becomes too slow to replicate over), but
clients can still connect to both servers? Then you have two systems
that both think that the other system went down, and both start
accepting transactions. Now you no longer have replication at all.

This problem is often called 'network partitioning', which also refers
to a more general case: a group of M nodes being split into two groups
of N and (M-N) nodes (due to network failure or whatever).

In Postgres-R a Group Communication System is used to cover all these
aspects (error detection, congruent agreement on a major group, etc..).

Which doesn't work very well in the case of two groups of servers set up
in two physical locations. I can see two possibilities:
(1) You require a quorum to be effective, in which case your cluster of
databases is only as reliable as the location which holds more servers.
(2) You have another central authority that determines which databases
are up, and which are down. Then your cluster is only as reliable as
that central authority.

Sure, if you have a million groups of servers spread all over the
internet, it works with a very high degree of reliability because you
can likely always form a quorum. However, you then have horrible
performance because the updates need to be spread to so many locations.
And for truly synchronous replication you probably have to serialize the
updates, which is very costly over that many nodes all over a network.

Even if you have a large number of nodes at different locations, then
you end up with strange decisions to make if the network connections are
intermittent or very slow. A temporary slowdown of many nodes could
cause them to be degraded until some kind of human intervention brought
them back. Until that time you might not be able to determine which
nodes make up an authoritative group. This kind of degradation could
happen in the case of a DDoS attack, or perhaps a worm moving around the
internet.

In practice everyone can find a solution that works for them. However,
synchronous replication is not perfect, and there are many failure
scenarios which need to be resolved in a way that fits your business. I
think synchronous replication is inherently less available than
asynchronous.

Regards,
Jeff Davis

#60Markus Schiltknecht
markus@bluegap.ch
In reply to: Jeff Davis (#59)
Re: Replication

Jeff Davis wrote:

Which doesn't work very well in the case of two groups of servers set up
in two physical locations. I can see two possibilities:
(1) You require a quorum to be effective, in which case your cluster of
databases is only as reliable as the location which holds more servers.
(2) You have another central authority that determines which databases
are up, and which are down. Then your cluster is only as reliable as
that central authority.

Right, the ideal here would be two sync clusters a both locations,
connected via async replication :-)

Even if you have a large number of nodes at different locations, then
you end up with strange decisions to make if the network connections are
intermittent or very slow. A temporary slowdown of many nodes could
cause them to be degraded until some kind of human intervention brought
them back. Until that time you might not be able to determine which
nodes make up an authoritative group.

Side note: in such a case, I think a GCS will just choose only one node
to be the 'authoritative group'. Because most systems cannot effort to
have long waits for such decisions. For database replication I also
think its better to have at least one node running than none.

This kind of degradation could
happen in the case of a DDoS attack, or perhaps a worm moving around the
internet.

Well, sync replication in general needs a good, low latency and secure
interconnect. The internet does not seem to be a good fit here.

In practice everyone can find a solution that works for them. However,
synchronous replication is not perfect, and there are many failure
scenarios which need to be resolved in a way that fits your business. I
think synchronous replication is inherently less available than
asynchronous.

This surely depends on the environment. With a dedicated (i.e. low
latency and secure) interconnect sync replication is surely more
available because your arguments above don't apply. And because sync
replication guarantees you won't loose committed transactions.

If however you want or have to replicate over the internet it depends.
Your arguments above also apply to async replication. Only that because
of the conflict resolution, async replication systems can continue to
operate on all the disconnected nodes and merge their work later on as
the network is up again. But then again, async still has the danger of
loosing transactions.

So I probably agree: if you are on an unreliable network and if you have
conflict resolution correctly setup then async replication is more
available, but less secure.

As I said above, sync replication needs a reliable interconnect, better
even have two interconnects, because it's a SPOF for a clustered
database system.

Regards

Markus

#61Jeff Davis
pgsql@j-davis.com
In reply to: Markus Schiltknecht (#60)
Re: Replication

On Fri, 2006-08-25 at 11:23 +0200, Markus Schiltknecht wrote:

Jeff Davis wrote:

Which doesn't work very well in the case of two groups of servers set up
in two physical locations. I can see two possibilities:
(1) You require a quorum to be effective, in which case your cluster of
databases is only as reliable as the location which holds more servers.
(2) You have another central authority that determines which databases
are up, and which are down. Then your cluster is only as reliable as
that central authority.

Right, the ideal here would be two sync clusters a both locations,
connected via async replication :-)

Even if you have a large number of nodes at different locations, then
you end up with strange decisions to make if the network connections are
intermittent or very slow. A temporary slowdown of many nodes could
cause them to be degraded until some kind of human intervention brought
them back. Until that time you might not be able to determine which
nodes make up an authoritative group.

Side note: in such a case, I think a GCS will just choose only one node
to be the 'authoritative group'. Because most systems cannot effort to
have long waits for such decisions. For database replication I also
think its better to have at least one node running than none.

This kind of degradation could
happen in the case of a DDoS attack, or perhaps a worm moving around the
internet.

Well, sync replication in general needs a good, low latency and secure
interconnect. The internet does not seem to be a good fit here.

In practice everyone can find a solution that works for them. However,
synchronous replication is not perfect, and there are many failure
scenarios which need to be resolved in a way that fits your business. I
think synchronous replication is inherently less available than
asynchronous.

This surely depends on the environment. With a dedicated (i.e. low
latency and secure) interconnect sync replication is surely more
available because your arguments above don't apply. And because sync
replication guarantees you won't loose committed transactions.

If however you want or have to replicate over the internet it depends.
Your arguments above also apply to async replication. Only that because
of the conflict resolution, async replication systems can continue to
operate on all the disconnected nodes and merge their work later on as
the network is up again. But then again, async still has the danger of
loosing transactions.

So I probably agree: if you are on an unreliable network and if you have
conflict resolution correctly setup then async replication is more
available, but less secure.

As I said above, sync replication needs a reliable interconnect, better
even have two interconnects, because it's a SPOF for a clustered
database system.

Ok, I agree with your statements. Async is convenient in many ways, but
has less durable transactions (at least for transactions committed
recently). Sync has some limitations, and is harder to get right (at
least if you want good availability as well) but provides more durable
transactions and consistency between systems.

Regards,
Jeff Davis