Reduce WAL logging of INSERT SELECT

Started by Bruce Momjianover 14 years ago23 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

We currently have reduced WAL logging for wal_level = minimum for these
commands:

CREATE TABLE AS
CREATE INDEX
CLUSTER
COPY into tables that were created or truncated in the same
transaction

One thing we don't optimize is INSERT ... SELECT when the table is
created or truncated in the same transaction. Seems we could.

We optimize CREATE TABLE AS which is effectively SELECT ... INTO using a
different syntax. Is this a TODO?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Reduce WAL logging of INSERT SELECT

Bruce Momjian <bruce@momjian.us> writes:

One thing we don't optimize is INSERT ... SELECT when the table is
created or truncated in the same transaction. Seems we could.

We optimize CREATE TABLE AS which is effectively SELECT ... INTO using a
different syntax. Is this a TODO?

Considering that SELECT INTO is deprecated, I don't think we should be
expending effort to encourage people to use it.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: Reduce WAL logging of INSERT SELECT

On 08/04/2011 04:55 PM, Tom Lane wrote:

Bruce Momjian<bruce@momjian.us> writes:

One thing we don't optimize is INSERT ... SELECT when the table is
created or truncated in the same transaction. Seems we could.
We optimize CREATE TABLE AS which is effectively SELECT ... INTO using a
different syntax. Is this a TODO?

Considering that SELECT INTO is deprecated, I don't think we should be
expending effort to encourage people to use it.

Right, but the original point about INSERT ... SELECT seems reasonable, no?

cheers

andrew

#4Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#3)
Re: Reduce WAL logging of INSERT SELECT

Andrew Dunstan wrote:

On 08/04/2011 04:55 PM, Tom Lane wrote:

Bruce Momjian<bruce@momjian.us> writes:

One thing we don't optimize is INSERT ... SELECT when the table is
created or truncated in the same transaction. Seems we could.
We optimize CREATE TABLE AS which is effectively SELECT ... INTO using a
different syntax. Is this a TODO?

Considering that SELECT INTO is deprecated, I don't think we should be
expending effort to encourage people to use it.

Right, but the original point about INSERT ... SELECT seems reasonable, no?

Right. I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#4)
Re: Reduce WAL logging of INSERT SELECT

On Thu, Aug 4, 2011 at 10:46 PM, Bruce Momjian <bruce@momjian.us> wrote:

Right.  I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

What you should be asking is whether the optimisation would be
effective for INSERT SELECT, or even test it.

My observation is that the optimisation is only effective for very
large loads that cause I/O. As RAM sizes get bigger, I'm inclined to
remove the optimisation and make it optional since it is ineffective
in many cases and negative benefit for smaller cases.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#6Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#5)
Re: Reduce WAL logging of INSERT SELECT

Simon Riggs wrote:

On Thu, Aug 4, 2011 at 10:46 PM, Bruce Momjian <bruce@momjian.us> wrote:

Right. ?I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

What you should be asking is whether the optimisation would be
effective for INSERT SELECT, or even test it.

My observation is that the optimisation is only effective for very
large loads that cause I/O. As RAM sizes get bigger, I'm inclined to
remove the optimisation and make it optional since it is ineffective
in many cases and negative benefit for smaller cases.

I am confused how generating WAL traffic that is larger than the heap
file we are fsync'ing can possibly be slower. Are you just throwing out
an idea to try to make me prove it?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#7Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#4)
Re: Reduce WAL logging of INSERT SELECT

On Thu, 2011-08-04 at 17:46 -0400, Bruce Momjian wrote:

Right. I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

The above statement is a little confusing, so let me start from the
beginning:

How could we avoid WAL logging for INSERT ... SELECT?

The way we do it for CREATE TABLE AS is because nobody would even *see*
the table if our transaction doesn't commit. Therefore we don't need to
bother logging it. Same can be said for SELECT INTO.

INSERT ... SELECT is just an insert. It needs just as much logging as
inserting tuples any other way. For instance, it will potentially share
pages with other inserts, and better properly record all such page
modifications so that they return to a consistent state.

Regards,
Jeff Davis

#8Bruce Momjian
bruce@momjian.us
In reply to: Jeff Davis (#7)
Re: Reduce WAL logging of INSERT SELECT

Jeff Davis wrote:

On Thu, 2011-08-04 at 17:46 -0400, Bruce Momjian wrote:

Right. I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

The above statement is a little confusing, so let me start from the
beginning:

How could we avoid WAL logging for INSERT ... SELECT?

The way we do it for CREATE TABLE AS is because nobody would even *see*
the table if our transaction doesn't commit. Therefore we don't need to
bother logging it. Same can be said for SELECT INTO.

INSERT ... SELECT is just an insert. It needs just as much logging as
inserting tuples any other way. For instance, it will potentially share
pages with other inserts, and better properly record all such page
modifications so that they return to a consistent state.

It would act like COPY, meaning the table would have to be truncated or
created in the same transaction.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#9Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#8)
Re: Reduce WAL logging of INSERT SELECT

On Thu, Aug 4, 2011 at 8:55 PM, Bruce Momjian <bruce@momjian.us> wrote:

Jeff Davis wrote:

On Thu, 2011-08-04 at 17:46 -0400, Bruce Momjian wrote:

Right.  I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

The above statement is a little confusing, so let me start from the
beginning:

How could we avoid WAL logging for INSERT ... SELECT?

The way we do it for CREATE TABLE AS is because nobody would even *see*
the table if our transaction doesn't commit. Therefore we don't need to
bother logging it. Same can be said for SELECT INTO.

INSERT ... SELECT is just an insert. It needs just as much logging as
inserting tuples any other way. For instance, it will potentially share
pages with other inserts, and better properly record all such page
modifications so that they return to a consistent state.

It would act like COPY, meaning the table would have to be truncated or
created in the same transaction.

It seems to me that, if we know the relation was created or truncated
in the current transaction, and if wal_level = minimal, then we don't
need to WAL-log *anything* until transaction commit (provided we fsync
at commit).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#8)
Re: Reduce WAL logging of INSERT SELECT

On Thu, 2011-08-04 at 20:55 -0400, Bruce Momjian wrote:

It would act like COPY, meaning the table would have to be truncated or
created in the same transaction.

Well, in that case it could work for any INSERT. No need for a SELECT to
be involved. For that matter, why not make it work for DELETE and
UPDATE, too?

However, I think this is all just a workaround for not having a faster
loading path. I don't object to applying this optimization to inserts,
but I think it might be more productive to figure out if we can support
loading data efficiently -- i.e. also set hint bits and frozenxid during
the load.

Regards,
Jeff Davis

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#10)
Re: Reduce WAL logging of INSERT SELECT

On 05.08.2011 04:23, Jeff Davis wrote:

On Thu, 2011-08-04 at 20:55 -0400, Bruce Momjian wrote:

It would act like COPY, meaning the table would have to be truncated or
created in the same transaction.

Well, in that case it could work for any INSERT. No need for a SELECT to
be involved. For that matter, why not make it work for DELETE and
UPDATE, too?

Yep. If we are to expand it, we should make it work for any operation.

However, for small operations it's a net loss - you avoid writing a WAL
record, but you have to fsync() the heap instead. If you only modify a
few rows, the extra fsync (or fsyncs if there are indexes too) is more
expensive than writing the WAL.

We'd need a heuristic to decide whether to write WAL or fsync at the
end. For regular INSERTs, UPDATEs and DELETEs, you have the planner's
estimate of number of rows affected. Another thing we should do is move
the fsync call from the end of COPY (and other such operations) to the
end of transaction. That way if you do e.g one COPY followed by a bunch
of smaller INSERTs or UPDATEs, you only need to fsync the files once.

However, I think this is all just a workaround for not having a faster
loading path. I don't object to applying this optimization to inserts,
but I think it might be more productive to figure out if we can support
loading data efficiently -- i.e. also set hint bits and frozenxid during
the load.

Yeah, that would make a much bigger impact in practice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#6)
Re: Reduce WAL logging of INSERT SELECT

On Thu, Aug 4, 2011 at 11:07 PM, Bruce Momjian <bruce@momjian.us> wrote:

Simon Riggs wrote:

On Thu, Aug 4, 2011 at 10:46 PM, Bruce Momjian <bruce@momjian.us> wrote:

Right. ?I brought up SELECT INTO because you could make the argument
that INSERT ... SELECT is not a utility command like the other ones and
therefore can't be done easily, but CREATE TABLE AS is internal SELECT
INTO and implemented in execMain.c, which I think is where INSERT ...
SELECT would also be implemented.

What you should be asking is whether the optimisation would be
effective for INSERT SELECT, or even test it.

My observation is that the optimisation is only effective for very
large loads that cause I/O. As RAM sizes get bigger, I'm inclined to
remove the optimisation and make it optional since it is ineffective
in many cases and negative benefit for smaller cases.

I am confused how generating WAL traffic that is larger than the heap
file we are fsync'ing can possibly be slower.

I'm telling you what I know to be true as an assistance to you.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#13Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#6)
Re: Reduce WAL logging of INSERT SELECT

On Thu, 2011-08-04 at 18:07 -0400, Bruce Momjian wrote:

I am confused how generating WAL traffic that is larger than the heap
file we are fsync'ing can possibly be slower. Are you just throwing out
an idea to try to make me prove it?

That's worded in a slightly confusing way, but here is the trade-off:

1. If you are using WAL, then regardless of what your transaction does,
only the WAL needs to be fsync'd at commit time. Conveniently, that's
being written sequentially, so it's a single fairly cheap fsync; and all
the data page changes are deferred, collected together, and fsync'd at
checkpoint time (rather than commit time). The cost is that you
double-write the data.

2. If you aren't using WAL, you need to fsync every data file the
transaction touched, which are probably not localized with other
activity. Also, the _entire_ data files needs to be sync'd, so perhaps
many other transactions have made changes to one data file all over, and
it may require _many_ seeks to accomplish the one fsync. The benefit is
that you don't double-write the data.

So, fundamentally, WAL is (in the OLTP case, where a transaction is much
shorter than a checkpoint interval) a big performance _win_, because it
allows us to do nice sequential writing in a single place for all
activities of all transactions; and defer all those random writes to
data pages until the next checkpoint. So we shouldn't treat WAL like a
cost burden that we want to avoid in every case we can.

But in the data load case (where many checkpoints may happen during a
single transaction anyway), it happens that avoiding WAL is a
performance win, because the seeks are not the dominant cost.

Regards,
Jeff Davis

#14Bruce Momjian
bruce@momjian.us
In reply to: Jeff Davis (#13)
Re: Reduce WAL logging of INSERT SELECT

Jeff Davis wrote:

On Thu, 2011-08-04 at 18:07 -0400, Bruce Momjian wrote:

I am confused how generating WAL traffic that is larger than the heap
file we are fsync'ing can possibly be slower. Are you just throwing out
an idea to try to make me prove it?

That's worded in a slightly confusing way, but here is the trade-off:

1. If you are using WAL, then regardless of what your transaction does,
only the WAL needs to be fsync'd at commit time. Conveniently, that's
being written sequentially, so it's a single fairly cheap fsync; and all
the data page changes are deferred, collected together, and fsync'd at
checkpoint time (rather than commit time). The cost is that you
double-write the data.

2. If you aren't using WAL, you need to fsync every data file the
transaction touched, which are probably not localized with other
activity. Also, the _entire_ data files needs to be sync'd, so perhaps
many other transactions have made changes to one data file all over, and
it may require _many_ seeks to accomplish the one fsync. The benefit is
that you don't double-write the data.

So, fundamentally, WAL is (in the OLTP case, where a transaction is much
shorter than a checkpoint interval) a big performance _win_, because it
allows us to do nice sequential writing in a single place for all
activities of all transactions; and defer all those random writes to
data pages until the next checkpoint. So we shouldn't treat WAL like a
cost burden that we want to avoid in every case we can.

But in the data load case (where many checkpoints may happen during a
single transaction anyway), it happens that avoiding WAL is a
performance win, because the seeks are not the dominant cost.

Well, if the table is created in the same transaction (which is the only
case under consideration), no other sessions can write to the table so
you are just writing the entire table on commit, rather than to the WAL.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In reply to: Heikki Linnakangas (#11)
Re: Reduce WAL logging of INSERT SELECT

However, for small operations it's a net loss - you avoid writing a WAL
record, but you have to fsync() the heap instead. If you only modify a few
rows, the extra fsync (or fsyncs if there are indexes too) is more expensive
than writing the WAL.

We'd need a heuristic to decide whether to write WAL or fsync at the end.
For regular INSERTs, UPDATEs and DELETEs, you have the planner's estimate of
number of rows affected. Another thing we should do is move the fsync call
from the end of COPY (and other such operations) to the end of transaction.
That way if you do e.g one COPY followed by a bunch of smaller INSERTs or
UPDATEs, you only need to fsync the files once.

Have you thought about recovery, especially when the page size is greater
than the disk block size( > 4K ). With WAL, there is a way to restore the
pages to the original state, during recovery, if there are partial page
writes. Is it possible to do the same with direct fsync without WAL?

Gokul.

#16Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Gokulakannan Somasundaram (#15)
Re: Reduce WAL logging of INSERT SELECT

On 06.08.2011 06:32, Gokulakannan Somasundaram wrote:

However, for small operations it's a net loss - you avoid writing a WAL
record, but you have to fsync() the heap instead. If you only modify a few
rows, the extra fsync (or fsyncs if there are indexes too) is more expensive
than writing the WAL.

We'd need a heuristic to decide whether to write WAL or fsync at the end.
For regular INSERTs, UPDATEs and DELETEs, you have the planner's estimate of
number of rows affected. Another thing we should do is move the fsync call
from the end of COPY (and other such operations) to the end of transaction.
That way if you do e.g one COPY followed by a bunch of smaller INSERTs or
UPDATEs, you only need to fsync the files once.

Have you thought about recovery, especially when the page size is greater
than the disk block size(> 4K ). With WAL, there is a way to restore the
pages to the original state, during recovery, if there are partial page
writes. Is it possible to do the same with direct fsync without WAL?

The point of the optimization is that you can only skip WAL when it's
been created (or truncated) in the same transaction. In that case, if
the transaction aborts because of a crash, you don't care about the
contents of the table anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#14)
Re: Reduce WAL logging of INSERT SELECT

On Sat, Aug 6, 2011 at 4:16 AM, Bruce Momjian <bruce@momjian.us> wrote:

Well, if the table is created in the same transaction (which is the only
case under consideration), no other sessions can write to the table so
you are just writing the entire table on commit, rather than to the WAL.

Below a certain point, skipping WAL is slower and over an intermediate
range there is no benefit. So small amounts of data on large servers
goes slower.

heap_fsync() requires a scan of shared buffers, which may not be cheap.

There is a difficulty because you would need to calculate the cut-off
is for a particular database, and then predict ahead of time whether
the number of rows that will be handled by the statement is low enough
to warrant using the optimisation. Both of which I call a hard
problem.

I think we should remove the COPY optimisation because of this and
definitely not extend INSERT SELECT to perform it automatically.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#17)
Re: Reduce WAL logging of INSERT SELECT

On 06.08.2011 13:13, Simon Riggs wrote:

I think we should remove the COPY optimisation because of this and
definitely not extend INSERT SELECT to perform it automatically.

It can be very helpful when loading a lot of data, so I'm not in favor
of removing it altogether. Maybe WAL-log the first 10000 rows or such
normally, and skip WAL after that. Of course, loading 10001 rows becomes
the worst case then, but something along those lines...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Heikki Linnakangas (#18)
Re: Reduce WAL logging of INSERT SELECT

On Sat, Aug 6, 2011 at 11:05 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 06.08.2011 13:13, Simon Riggs wrote:

I think we should remove the COPY optimisation because of this and
definitely not extend INSERT SELECT to perform it automatically.

It can be very helpful when loading a lot of data, so I'm not in favor of
removing it altogether. Maybe WAL-log the first 10000 rows or such normally,
and skip WAL after that. Of course, loading 10001 rows becomes the worst
case then, but something along those lines...

why 10000 rows? maybe the right solution is move towards make a normal
table unlogged and viceversa... probably that's harder to do but we
will have better control and less odd heuristics

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jaime Casanova (#19)
Re: Reduce WAL logging of INSERT SELECT

Heikki Linnakangas wrote:

On 06.08.2011 13:13, Simon Riggs wrote:

I think we should remove the COPY optimisation because of this and
definitely not extend INSERT SELECT to perform it automatically.

It can be very helpful when loading a lot of data, so I'm not in
favor of removing it altogether.

Yeah, it can currently help a lot. Of course, if WAL-logging could
in any way facilitate hint bit and frozen xmin setting during bulk
loads, I'm sure the WAL-logged version would win easily.

-Kevin

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jaime Casanova (#19)
#22Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#14)
#23Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#17)