Possibilities for optimizing inserts across oracle_fdw foreign data wrapper
Hello all
We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works great.
However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
insert into t_ora (a,b,c)
select a,b,c from t_pg;
This is driven from a plpgsql stored procedure, if that matters.
I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.
Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?
If I could make the Oracle insert direct load, that would usually also increase throughput. But, is that possible here. There are no constraints defined on the destinaton tables.
Regards Niels Jespersen
On 9/19/21 06:28, Niels Jespersen wrote:
Hello all
We are often using the oracle_fdw to transfer data between Postgres
(version 11+) and Oracle (version 18+). It works great.However I have a task at hand that requires inserting a few billion
rows in an Oracle table from a Postgres query.insert into t_ora (a,b,c)
select a,b,c from t_pg;
This is driven from a plpgsql stored procedure, if that matters.
I want to optimize the running time of this. But I am unsure of which,
if any, possibilities there actually is.Reducing the number of network roundtrips is usually a good way to
increase throughput. But, how do I do that?If I could make the Oracle insert direct load, that would usually also
increase throughput. But, is that possible here. There are no
constraints defined on the destinaton tables.Regards Niels Jespersen
The problem with oracle_fdw is that the SQL is parsed on the Postgres
side, not on the Oracle side. If it was parsed on the Oracle side, you
could use /*+ APPEND */ hint, which is essentially, a direct insert. You
will have to write a script in one of the scripting languages, which
would utilize the array insert, available with the instant client. Even
Oracle ODBC driver utilizes array insert, as visible from the following
article:
https://dbwhisperer.wordpress.com/2020/11/21/pyodbc-fast_executemany-and-oracle-rdbms/
Unfortunately, the Postgres side of the equation is not particularly
good when using array fetch and does not do particularly well when
trying to cut down on the number of network trips:
https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany
I would use a script on the Postgres side and then use superior options
provided by SQL*Net. You will need some fancy programming to prevent
waiting on each operation. I would actually write 2 scripts, one reading
data from Postgres, converting it to CSV and then piping it into script
that inserts data into Oracle. That would make the scripts work in
parallel, at least partially. Situations like this are the reason why a
DBA needs to know how to script. So, this is where you start:
Regards
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
On Sun, 2021-09-19 at 10:28 +0000, Niels Jespersen wrote:
We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works great.
However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
insert into t_ora (a,b,c)
select a,b,c from t_pg;
This is driven from a plpgsql stored procedure, if that matters.
I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.
Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?
If I could make the Oracle insert direct load, that would usually also increase throughput.
But, is that possible here. There are no constraints defined on the destinaton tables.
The cause of the bad performance for bulk data modifications is that the FDW API is built
that way: each row INSERTed means a round trip between PostgreSQL and Oracle.
That could be improved by collecting rows and inserting them in bulk on the Oracle
side, but I don't feel like implementing that and complicating the code.
From my point of view, oracle_fdw is good for reading, but not for bulk writes.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
On Mon, Sep 20, 2021 at 3:18 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2021-09-19 at 10:28 +0000, Niels Jespersen wrote:
We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works great.
However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
insert into t_ora (a,b,c)
select a,b,c from t_pg;This is driven from a plpgsql stored procedure, if that matters.
I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.
Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?
If I could make the Oracle insert direct load, that would usually also increase throughput.
But, is that possible here. There are no constraints defined on the destinaton tables.The cause of the bad performance for bulk data modifications is that the FDW API is built
that way: each row INSERTed means a round trip between PostgreSQL and Oracle.
Just as a note, with PostgreSQL 14 the FDW APIs allow batch insert. It
should be possible to update oracle_fdw to take advantage of that as
well, right?
--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/
-----Oprindelig meddelelse-----
Fra: Laurenz Albe <laurenz.albe@cybertec.at>
Sendt: 20. september 2021 15:18
Til: Niels Jespersen <NJN@dst.dk>; pgsql-general@postgresql.org
Emne: Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapperOn Sun, 2021-09-19 at 10:28 +0000, Niels Jespersen wrote:
We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works great.
However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
If I could make the Oracle insert direct load, that would usually also increase throughput.But, is that possible here. There are no constraints defined on the destinaton tables.
The cause of the bad performance for bulk data modifications is that the FDW API is built that way: each row INSERTed means a round trip between PostgreSQL and Oracle.
That could be improved by collecting rows and inserting them in bulk on the Oracle side, but I don't feel like implementing that and complicating the code.>
From my point of view, oracle_fdw is good for reading, but not for bulk writes.
Yours,
Laurenz Albe
Thank you for clairifying. I will use a python script for doing this, then. Read large chunks of data from Postgres and insert equally large chunks into Oracle, possibly using direct load. I know this can work.
Regards Niels
On Mon, 2021-09-20 at 15:41 +0200, Magnus Hagander wrote:
Just as a note, with PostgreSQL 14 the FDW APIs allow batch insert. It
should be possible to update oracle_fdw to take advantage of that as
well, right?
Yes. The exercise is more complicated in Oracle, because they don't
support multi-line INSERTs.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com