Testing of a fast method to bulk insert a Pandas DataFrame into Postgres
Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>
I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
Regards,
David
Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>
I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
Regards,
David
On 10/4/21 8:44 AM, Shaozhong SHI wrote:
Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
method is a legitimate keyword:
So there must be something about how you used it.
Pandas version?
The actual code you used when the error occurred?
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
Hello, Adrian Klaver,
Pandas version is 0.23.0.
I used the following code:
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name)
else:
table_name = table.name
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
df.to_sql('test1', engine, schema='public', if_exists='append',
index=False, method=psql_insert_copy)
I could not find obvious reasons.
Regards,
David
On Mon, 4 Oct 2021 at 17:06, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 10/4/21 8:44 AM, Shaozhong SHI wrote:
Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/
I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
method is a legitimate keyword:
So there must be something about how you used it.
Pandas version?
The actual code you used when the error occurred?
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
On Oct 4, 2021, at 10:20 AM, Shaozhong SHI <shishaozhong@gmail.com> wrote:
Hello, Adrian Klaver,
Pandas version is 0.23.0.
I used the following code:
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name <http://table.name/>)
else:
table_name = table.name <http://table.name/>sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
df.to_sql('test1', engine, schema='public', if_exists='append', index=False, method=psql_insert_copy)
you need to quote ‘psql_insert_copy'
On 10/4/21 9:20 AM, Shaozhong SHI wrote:
Hello, Adrian Klaver,
Pandas version is 0.23.0.
The reason the below does not work is method did not show up until
pandas 0.24.0.
I used the following code:
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name
<http://table.name>)
else:
table_name = table.name <http://table.name>sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
df.to_sql('test1', engine, schema='public', if_exists='append',
index=False, method=psql_insert_copy)I could not find obvious reasons.
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
Hello, Adrian Klaver,
What is the robust way to upgrade Pandas?
Regards,
David
On Monday, 4 October 2021, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
Show quoted text
On 10/4/21 9:20 AM, Shaozhong SHI wrote:
Hello, Adrian Klaver,
Pandas version is 0.23.0.
The reason the below does not work is method did not show up until pandas
0.24.0.I used the following code:
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name <
http://table.name>)
else:
table_name = table.name <http://table.name>sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql+psycopg2://:5432/postgres')
try:
df.to_sql('test1', engine, schema='public', if_exists='append',
index=False, method=psql_insert_copy)I could not find obvious reasons.
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
On 10/4/21 10:10 AM, Shaozhong SHI wrote:
Hello, Adrian Klaver,
What is the robust way to upgrade Pandas?
Carefully.
The most recent version is 1.3.3, which is approximately 5 versions
ahead of where you are now. The big jump is when Pandas went from 0.25
to 1.0. See docs here:
https://pandas.pydata.org/docs/whatsnew/v1.0.0.html?highlight=upgrade
So the process should be 0.24 -> 0.25, verify, 0.25 -> 1.0, verify. Then
on to wherever you want to end up a step at a time.
Before each step spend time here:
https://pandas.pydata.org/docs/whatsnew/
to see what the gotcha's are.
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
On Mon, Oct 4, 2021 at 8:55 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
Interesting - I wasn't aware there was a faster method than "COPY ... FROM
... CSV"
Maybe my best input is that I've always found COPY/FROM/CSV to be insanely
fast and really easy to write from any standard console or ORM, so maybe
the fractional performance increase from other methods isn't worth it,
given how performant, reliable and easy to use this approach is? Not really
an answer but food for thought..
Steve
On 10/4/21 10:28 AM, Adrian Klaver wrote:
On 10/4/21 10:10 AM, Shaozhong SHI wrote:
Hello, Adrian Klaver,
What is the robust way to upgrade Pandas?Carefully.
The most recent version is 1.3.3, which is approximately 5 versions
ahead of where you are now. The big jump is when Pandas went from 0.25
to 1.0. See docs here:https://pandas.pydata.org/docs/whatsnew/v1.0.0.html?highlight=upgrade
So the process should be 0.24 -> 0.25, verify, 0.25 -> 1.0, verify. Then
on to wherever you want to end up a step at a time.Before each step spend time here:
https://pandas.pydata.org/docs/whatsnew/
to see what the gotcha's are.
Should have added:
If you are not already working in a virtualenv it would be a good idea
to do the above in one or more.
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
Hello, Steve,
That is interesting.
Send me the link for Python doing "Copy---From....CSV".
I would like to review and test.
Regards,
David
On Mon, 4 Oct 2021 at 18:30, Steve Midgley <science@misuse.org> wrote:
Show quoted text
On Mon, Oct 4, 2021 at 8:55 AM Shaozhong SHI <shishaozhong@gmail.com>
wrote:Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
Interesting - I wasn't aware there was a faster method than "COPY ... FROM
... CSV"Maybe my best input is that I've always found COPY/FROM/CSV to be insanely
fast and really easy to write from any standard console or ORM, so maybe
the fractional performance increase from other methods isn't worth it,
given how performant, reliable and easy to use this approach is? Not really
an answer but food for thought..Steve
On Mon, Oct 4, 2021 at 1:19 PM Shaozhong SHI <shishaozhong@gmail.com> wrote:
Hello, Steve,
That is interesting.
Send me the link for Python doing "Copy---From....CSV".
I would like to review and test.
Regards,
David
On Mon, 4 Oct 2021 at 18:30, Steve Midgley <science@misuse.org> wrote:
On Mon, Oct 4, 2021 at 8:55 AM Shaozhong SHI <shishaozhong@gmail.com>
wrote:Has anyone tested this one?
A Fast Method to Bulk Insert a Pandas DataFrame into Postgres · Ellis
Valentiner
<https://ellisvalentiner.com/post/a-fast-method-to-insert-a-pandas-dataframe-into-postgres/>I tried psql_insert_copy method, but I got the following error message.
to_sql() got an unexpected keyword argument 'method'
Can anyone shed light on this?
Interesting - I wasn't aware there was a faster method than "COPY ...
FROM ... CSV"Maybe my best input is that I've always found COPY/FROM/CSV to be
insanely fast and really easy to write from any standard console or ORM, so
maybe the fractional performance increase from other methods isn't worth
it, given how performant, reliable and easy to use this approach is? Not
really an answer but food for thought..The way I've implemented that (using Ruby, but very similar) is to
actually copy from STDIN rather than CSV. In my implementation I wasn't
running my import code on the Postgres server, so I couldn't get the CSV
file onto the local DB machine. So I had to feed the data over the wire via
STDIN. This is slower than if you can point Postgres to a CSV file that the
Pg server itself can access on the file system, but it's more flexible and
still remarkably fast. It can be even (a bit) faster if you don't feed
STDIN line by line, but in my experience it's a nightmare to debug if you
don't get a line number when an import file barfs.
Anyway, here's the Ruby code that shows the fast import concept:
https://gist.github.com/science/393907d4123c87ed767bc81e9dd5a7da
I wrote this a LONG time ago, but I think the concepts are still relevant.
IIRC, feeding Pg from STDIN was ~400x faster than using "insert into" or
other SQL type commands to get the data in.
I hope this is helpful - I would think this code is easily portable to
Python, as you just need to get a raw connection to Postgres from your ORM
adapter and then model the commands the same way in the code above. Good
luck and write with questions..
Steve