COPY FROM - to avoid WAL generation

Started by Ravi Krishnaover 7 years ago4 messagesgeneral
Jump to latest
#1Ravi Krishna
sravikrishna@aol.com

In a recent thread of mine I learned something very interesting.  If a table is created and data is loaded via COPY FROM within the same transaction, then PG will be smart enough to not generate WAL logs because all it needs to do is to track the status of the transaction and let the data load go to the new data file created for the table.  If committed, the table is released for other sessions, if rolledback, vaccum will delete the data file later on.
I tested it as follows for a table with 50 milllion rows.  No indexes.
Case 1  - create the table first.  - in a separate transaction load the 50 million rows.
Took 3 min 22 seconds
Case 2  - start transaction  - create table  - load 50 million rows  - commit transaction
Took: 3 min 16 seconds.
Am I missing anything?

#2Jeff Ross
jross@openvistas.net
In reply to: Ravi Krishna (#1)
Re: COPY FROM - to avoid WAL generation

On 8/21/18 9:00 AM, Ravi Krishna wrote:

In a recent thread of mine I learned something very interesting. If a
table is created and data is loaded via COPY FROM within the same
transaction, then PG will be smart enough to not generate WAL logs
because all it needs to do is to track the status of the transaction
and let the data load go to the new data file created for the table. 
If committed, the table is released for other sessions, if rolledback,
vaccum will delete the data file later on.

I tested it as follows for a table with 50 milllion rows. No indexes.

Case 1
  - create the table first.
  - in a separate transaction load the 50 million rows.

Took 3 min 22 seconds

Case 2
  - start transaction
  - create table
  - load 50 million rows
  - commit transaction

Took: 3 min 16 seconds.

Am I missing anything?

Have you looked into pg_bulkload?

https://github.com/ossc-db/pg_bulkload

Docs are here:

http://ossc-db.github.io/pg_bulkload/index.html

Jeff

#3Andres Freund
andres@anarazel.de
In reply to: Ravi Krishna (#1)
Re: COPY FROM - to avoid WAL generation

Hi,

On 2018-08-21 15:00:03 +0000, Ravi Krishna wrote:

In a recent thread of mine I learned something very interesting.� If a table is created and data is loaded via COPY FROM within the same transaction, then PG will be smart enough to not generate WAL logs because all it needs to do is to track the status of the transaction and let the data load go to the new data file created for the table.� If committed, the table is released for other sessions, if rolledback, vaccum will delete the data file later on.
I tested it as follows for a table with 50 milllion rows.� No indexes.

Please note this is only the case if wal_level = minimal. If replication
(or PITR) is supported, that mode can't be used, because the data has to
go into the WAL.

Were you using wal_level = minimal?

(FWIW, it's not VACUUM that'd unlink the data in cause of failure, but
that doesn't really matter much).

Greetings,

Andres Freund

#4Ravi Krishna
sravikrishna@aol.com
In reply to: Andres Freund (#3)
Re: COPY FROM - to avoid WAL generation

Please note this is only the case if wal_level = minimal. If replication
(or PITR) is supported, that mode can't be used, because the data has to
go into the WAL.
Were you using wal_level = minimal?

Aha. No it was not minimal. For a second I thought PG is super smart. Oh well. Thanks.