Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

Started by Rajnish Vishwakarmaabout 5 years ago3 messagesgeneral

Jump to latest

Rajnish Vishwakarma

rajnish.nationfirst@gmail.com

about 5 years ago

Hi Postgres Team,

The below are the scenarios which we are dealing with.

1) There are 20 Tables - On an average each having 150 columns.

2) There are 20 Threads Handled by Thread Pool Executor ( here we are using
Python's - psycopg2 module / library to fetch the data .)

3) I am using the below statement to insert the data using Python -
psycopg2 module - using the exceute(...) command as .

sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" +
str(tuple(array_of_curly_values))
print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor()
v = col_cursor_db.execute(sql_stmt);

But earlier the same 22 threads were running and the insertion time was
gradually increased from 1 second to 30-35 seconds.

Requesting and urging the postgres general support team to help me out on
this.

How can i increase the INSERTION speed to minimize the insertion time taken
by each thread in the THREAD POOL.

Or there any different python libraries other than psycopg2 ?

Is there any different functions in python psycopg2 ?

Or what performance tuning has to be done to increaser the insertion speed ?

Daniele Varrazzo

daniele.varrazzo@gmail.com

about 5 years ago

In reply to: Rajnish Vishwakarma (#1)

Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

On Wed, 10 Feb 2021, 09:26 Rajnish Vishwakarma, <
rajnish.nationfirst@gmail.com> wrote:

Hi Postgres Team,

The below are the scenarios which we are dealing with.

1) There are 20 Tables - On an average each having 150 columns.

2) There are 20 Threads Handled by Thread Pool Executor ( here we are
using Python's - psycopg2 module / library to fetch the data .)

3) I am using the below statement to insert the data using Python -
psycopg2 module - using the exceute(...) command as .

sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" +
str(tuple(array_of_curly_values))
print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor()
v = col_cursor_db.execute(sql_stmt);

This is an insecure way to do it, but that's beside the point.

But earlier the same 22 threads were running and the insertion time was

gradually increased from 1 second to 30-35 seconds.

Requesting and urging the postgres general support team to help me out on
this.

How can i increase the INSERTION speed to minimize the insertion time
taken by each thread in the THREAD POOL.

Using a COPY statement instead of insert. For a more moderate change in
your code, but for a smaller increase of speed, you can look at the
batching helpers (
https://www.psycopg.org/docs/extras.html#fast-execution-helpers).

Or there any different python libraries other than psycopg2 ?

Psycopg3 hasn't been released yet, so using it is on the experimental side.
However it provides a better support to using copy which would be perfect
for your use case (
https://www.psycopg.org/psycopg3/docs/copy.html#writing-data-row-by-row).

-- Daniele

cen

cen.is.imba@gmail.com

about 5 years ago

In reply to: Rajnish Vishwakarma (#1)

Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

On 10. 02. 21 09:14, Rajnish Vishwakarma wrote:

Hi Postgres Team,

The below are the scenarios which we are dealing with.

1) There are 20 Tables - On an average each having 150 columns.

2) There are 20 Threads Handled by Thread Pool Executor ( here we are
using Python's - psycopg2 module / library to fetch the data .)

3) I am using the below statement to insert the data using Python -
psycopg2 module - using the exceute(...) command as .

sql_stmt = "INSERT INTO " + name_Table + final_col_string + "VALUES" +
str(tuple(array_of_curly_values))
print('Sql statement', sql_stmt)col_cursor_db = db_conn.cursor()
v = col_cursor_db.execute(sql_stmt);

But earlier the same 22 threads were running and the insertion time
was gradually increased from 1 second to 30-35 seconds.

Requesting and urging the postgres general support team to help me out
on this.

How can i increase the INSERTION speed to minimize the insertion time
taken by each thread in the THREAD POOL.

Or there any different python libraries other than psycopg2 ?

Is there any different functions in python psycopg2 ?

Or what performance tuning has to be done to increaser the insertion
speed ?

Is a single insert taking 30 seconds or do you have such a large number
of inserts that your thread pool can't handle it and you are waiting for
a free connection?

For single insert, one reason for slowness at large databases could be
indexes which need to be updated for each insert. For the latter, you
should increase the thread pool size.