Read write performance check

Started by veem vover 2 years ago12 messagesgeneral
Jump to latest
#1veem v
veema0000@gmail.com

Hello All,
Its Aurora postgresql database in AWS. We have a couple of custom tables
created with some ~100 odd columns and required indexes and constraints
created on them and that represents a certain business use case. Currently
we don't have any data in them. Requirement is to test read and write
queries performance for this sample use cases/tables. We need to evaluate
both row by row and batch read/write performance. Also performance with
concurrent read/write from multiple threads/sessions.

As i understand below things to be done here,
1)For write performance , the rows needs to be inserted from multiple
sessions at same time, with required random values as per the data types
i.e. Character, Number, date columns. And this needs to be tested for row
by row insert and batched insert.

2)For the read performance test , the table first has to be populated with
those ~100million rows. Then querying will happen on that table row by row
and batched way from the concurrent session.

I am new to postgresql but mostly worked with Oracle, so I wanted to
understand if the above can be achieved by creating a simple procedure or
will a major effort be needed? And I saw a few blogs , pgbench to be used
for concurrency tests. I want to know if this will still work on Aurora
postgresql from intellij client worksheet.

Regards
Veem

#2Ilya Kosmodemiansky
ilya.kosmodemiansky@postgresql-consulting.com
In reply to: veem v (#1)
Re: Read write performance check

Hi Veem,

On Tue, Dec 19, 2023 at 7:36 AM veem v <veema0000@gmail.com> wrote:

1)For write performance , the rows needs to be inserted from multiple sessions at same time, with required random values as per the data types i.e. Character, Number, date columns. And this needs to be tested for row by row insert and batched insert.

2)For the read performance test , the table first has to be populated with those ~100million rows. Then querying will happen on that table row by row and batched way from the concurrent session.

I am new to postgresql but mostly worked with Oracle, so I wanted to understand if the above can be achieved by creating a simple procedure or will a major effort be needed? And I saw a few blogs , pgbench to be used for concurrency tests.

Yes, you are right, pgbench with customized script is what you are looking for

I want to know if this will still work on Aurora postgresql from intellij client worksheet.

pgbench would work with aurora as with normal postgres, it is
basically a shell script which connects to the database. Not sure if
idea worksheet would help you in such case however, you can run it
just from any machine with pgbench installed

Best regards,
Ilya

--
Ilya Kosmodemiansky
CEO, Founder

Data Egret GmbH
Your remote PostgreSQL DBA team
T.: +49 6821 919 3297
ik@dataegret.com

#3veem v
veema0000@gmail.com
In reply to: Ilya Kosmodemiansky (#2)
Re: Read write performance check

Thank you for the confirmation.

So at first, we need to populate the base tables with the necessary data
(say 100million rows) with required skewness using random functions to
generate the variation in the values of different data types. Then in case
of row by row write/read test , we can traverse in a cursor loop. and in
case of batch write/insert , we need to traverse in a bulk collect loop.
Something like below and then this code can be wrapped into a procedure and
passed to the pgbench and executed from there. Please correct me if I'm
wrong.

Also can you please guide how the batch(say batch size of ~1000) Insert can
be written ?

-- Row by row write
FOR i IN 1..total_rows LOOP
data_row := (SELECT
md5(random()::TEXT),
floor(random() * 100)::INT,
random() * 1000::NUMERIC,
NOW()::TIMESTAMP
);
INSERT INTO BASE_TABLE(column1, column2, column3, column4)
VALUES (data_row.column1, data_row.column2, data_row.column3,
data_row.column4);
END LOOP;

--Row by row read
BEGIN
FOR i IN 1..total_rows LOOP
-- Row by row read
SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
END LOOP;
END;

-- Batch read
BEGIN
-- Batch read
OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
CLOSE data_set;
END;

On Tue, 19 Dec 2023 at 12:58, Ilya Kosmodemiansky <ik@dataegret.com> wrote:

Show quoted text

Hi Veem,

On Tue, Dec 19, 2023 at 7:36 AM veem v <veema0000@gmail.com> wrote:

1)For write performance , the rows needs to be inserted from multiple

sessions at same time, with required random values as per the data types
i.e. Character, Number, date columns. And this needs to be tested for row
by row insert and batched insert.

2)For the read performance test , the table first has to be populated

with those ~100million rows. Then querying will happen on that table row by
row and batched way from the concurrent session.

I am new to postgresql but mostly worked with Oracle, so I wanted to

understand if the above can be achieved by creating a simple procedure or
will a major effort be needed? And I saw a few blogs , pgbench to be used
for concurrency tests.

Yes, you are right, pgbench with customized script is what you are looking
for

I want to know if this will still work on Aurora postgresql from intellij

client worksheet.

pgbench would work with aurora as with normal postgres, it is
basically a shell script which connects to the database. Not sure if
idea worksheet would help you in such case however, you can run it
just from any machine with pgbench installed

Best regards,
Ilya

--
Ilya Kosmodemiansky
CEO, Founder

Data Egret GmbH
Your remote PostgreSQL DBA team
T.: +49 6821 919 3297
ik@dataegret.com

#4Rob Sargent
robjsargent@gmail.com
In reply to: veem v (#3)
Re: Read write performance check

On 12/19/23 12:14, veem v wrote:

Thank you for the confirmation.

 So at first, we need to populate the base tables with the necessary
data (say 100million rows) with required skewness using random
functions to generate the variation in the values of different data
types. Then in case of row by row write/read test , we can traverse in
a cursor loop. and in case of batch write/insert , we need to traverse
in a bulk collect loop. Something like below and then this code can be
wrapped into a procedure and passed to the pgbench and executed from
there. Please correct me if I'm wrong.

Also can you please guide how the batch(say batch size of ~1000)
Insert can be written ?

-- Row by row write
 FOR i IN 1..total_rows LOOP
        data_row := (SELECT
            md5(random()::TEXT),
            floor(random() * 100)::INT,
            random() * 1000::NUMERIC,
            NOW()::TIMESTAMP
        );
        INSERT INTO BASE_TABLE(column1, column2, column3, column4)
        VALUES (data_row.column1, data_row.column2, data_row.column3,
data_row.column4);
    END LOOP;

--Row by row read
BEGIN
    FOR i IN 1..total_rows LOOP
        -- Row by row read
        SELECT * INTO data_row FROM BASE_TABLE WHERE limit 1;
    END LOOP;
END;

This row by row is guaranteed to be slow if there's no index on the 100M
rows

-- Batch read
BEGIN
    -- Batch read
    OPEN data_set FOR SELECT * FROM BASE_TABLE LIMIT total_rows;
    CLOSE data_set;
END;

Does this batch read in the entire 100M row table? And some suspicious
syntax

PS: Notice that top posting is frowned upon on this list.

#5Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: veem v (#3)
Re: Read write performance check

On 2023-12-20 00:44:48 +0530, veem v wrote:

 So at first, we need to populate the base tables with the necessary data (say
100million rows) with required skewness using random functions to generate the
variation in the values of different data types. Then in case of row by row
write/read test , we can traverse in a cursor loop. and in case of batch write/
insert , we need to traverse in a bulk collect loop. Something like below and
then this code can be wrapped into a procedure and passed to the pgbench and
executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#6veem v
veema0000@gmail.com
In reply to: Peter J. Holzer (#5)
Re: Read write performance check

Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we able
to reach with both of these row by row and batch read/write test. And then
afterwards, this figure may be compared with other databases etc with
similar setups.

So wanted to understand from experts here, if this approach is fine? Or
some other approach is advisable?

I agree to the point that , network will play a role in real world app, but
here, we are mainly wanted to see the database capability, as network will
always play a similar kind of role across all databases. Do you suggest
some other approach to achieve this objective?

On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:

Show quoted text

On 2023-12-20 00:44:48 +0530, veem v wrote:

So at first, we need to populate the base tables with the necessary

data (say

100million rows) with required skewness using random functions to

generate the

variation in the values of different data types. Then in case of row by

row

write/read test , we can traverse in a cursor loop. and in case of batch

write/

insert , we need to traverse in a bulk collect loop. Something like

below and

then this code can be wrapped into a procedure and passed to the pgbench

and

executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#7Lok P
loknath.73@gmail.com
In reply to: veem v (#6)
Re: Read write performance check

As Rob mentioned, the syntax you posted is not correct. You need to process
or read a certain batch of rows like 1000 or 10k etc. Not all 100M at one
shot.

But again your uses case seems common one considering you want to compare
the read and write performance on multiple databases with similar table
structure as per your usecase. So in that case, you may want to use some
test scripts which others must have already done rather reinventing the
wheel.

On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:

Show quoted text

Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we
able to reach with both of these row by row and batch read/write test. And
then afterwards, this figure may be compared with other databases etc with
similar setups.

So wanted to understand from experts here, if this approach is fine? Or
some other approach is advisable?

I agree to the point that , network will play a role in real world app,
but here, we are mainly wanted to see the database capability, as network
will always play a similar kind of role across all databases. Do you
suggest some other approach to achieve this objective?

On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:

On 2023-12-20 00:44:48 +0530, veem v wrote:

So at first, we need to populate the base tables with the necessary

data (say

100million rows) with required skewness using random functions to

generate the

variation in the values of different data types. Then in case of row by

row

write/read test , we can traverse in a cursor loop. and in case of

batch write/

insert , we need to traverse in a bulk collect loop. Something like

below and

then this code can be wrapped into a procedure and passed to the

pgbench and

executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#8veem v
veema0000@gmail.com
In reply to: Lok P (#7)
Re: Read write performance check

Thank you.

That would really be helpful if such test scripts or similar setups are
already available. Can someone please guide me to some docs or blogs or
sample scripts, on same please.

On Wed, 20 Dec, 2023, 10:34 am Lok P, <loknath.73@gmail.com> wrote:

Show quoted text

As Rob mentioned, the syntax you posted is not correct. You need to
process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
at one shot.

But again your uses case seems common one considering you want to compare
the read and write performance on multiple databases with similar table
structure as per your usecase. So in that case, you may want to use some
test scripts which others must have already done rather reinventing the
wheel.

On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:

Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we
able to reach with both of these row by row and batch read/write test. And
then afterwards, this figure may be compared with other databases etc with
similar setups.

So wanted to understand from experts here, if this approach is fine? Or
some other approach is advisable?

I agree to the point that , network will play a role in real world app,
but here, we are mainly wanted to see the database capability, as network
will always play a similar kind of role across all databases. Do you
suggest some other approach to achieve this objective?

On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:

On 2023-12-20 00:44:48 +0530, veem v wrote:

So at first, we need to populate the base tables with the necessary

data (say

100million rows) with required skewness using random functions to

generate the

variation in the values of different data types. Then in case of row

by row

write/read test , we can traverse in a cursor loop. and in case of

batch write/

insert , we need to traverse in a bulk collect loop. Something like

below and

then this code can be wrapped into a procedure and passed to the

pgbench and

executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#9veem v
veema0000@gmail.com
In reply to: veem v (#8)
Re: Read write performance check

Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

On Wed, 20 Dec, 2023, 10:39 am veem v, <veema0000@gmail.com> wrote:

Show quoted text

Thank you.

That would really be helpful if such test scripts or similar setups are
already available. Can someone please guide me to some docs or blogs or
sample scripts, on same please.

On Wed, 20 Dec, 2023, 10:34 am Lok P, <loknath.73@gmail.com> wrote:

As Rob mentioned, the syntax you posted is not correct. You need to
process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
at one shot.

But again your uses case seems common one considering you want to compare
the read and write performance on multiple databases with similar table
structure as per your usecase. So in that case, you may want to use some
test scripts which others must have already done rather reinventing the
wheel.

On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:

Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we
able to reach with both of these row by row and batch read/write test. And
then afterwards, this figure may be compared with other databases etc with
similar setups.

So wanted to understand from experts here, if this approach is fine? Or
some other approach is advisable?

I agree to the point that , network will play a role in real world app,
but here, we are mainly wanted to see the database capability, as network
will always play a similar kind of role across all databases. Do you
suggest some other approach to achieve this objective?

On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at> wrote:

On 2023-12-20 00:44:48 +0530, veem v wrote:

So at first, we need to populate the base tables with the necessary

data (say

100million rows) with required skewness using random functions to

generate the

variation in the values of different data types. Then in case of row

by row

write/read test , we can traverse in a cursor loop. and in case of

batch write/

insert , we need to traverse in a bulk collect loop. Something like

below and

then this code can be wrapped into a procedure and passed to the

pgbench and

executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#10Lok P
loknath.73@gmail.com
In reply to: veem v (#9)
Re: Read write performance check

As I mentioned your scenario looks generic one, but I don't have any sample
scripts/docs to share, sorry for that. Other people may suggest any sample
scripts etc if any. or you may post it on performance group, if someone has
done similar stuff in the past.

But as per me, the performance test scripts will look like 1) row by row
insert/select in cursor loop as you were showing earlier in this thread and
another one with batch/bulk inserts. And then calling it through pgbench or
any other scheduler for creating concurrency.

On Thu, Dec 21, 2023 at 7:00 PM veem v <veema0000@gmail.com> wrote:

Show quoted text

Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

On Wed, 20 Dec, 2023, 10:39 am veem v, <veema0000@gmail.com> wrote:

Thank you.

That would really be helpful if such test scripts or similar setups are
already available. Can someone please guide me to some docs or blogs or
sample scripts, on same please.

On Wed, 20 Dec, 2023, 10:34 am Lok P, <loknath.73@gmail.com> wrote:

As Rob mentioned, the syntax you posted is not correct. You need to
process or read a certain batch of rows like 1000 or 10k etc. Not all 100M
at one shot.

But again your uses case seems common one considering you want to
compare the read and write performance on multiple databases with similar
table structure as per your usecase. So in that case, you may want to use
some test scripts which others must have already done rather reinventing
the wheel.

On Wed, 20 Dec, 2023, 10:19 am veem v, <veema0000@gmail.com> wrote:

Thank you.

Yes, actually we are trying to compare and see what maximum TPS are we
able to reach with both of these row by row and batch read/write test. And
then afterwards, this figure may be compared with other databases etc with
similar setups.

So wanted to understand from experts here, if this approach is fine?
Or some other approach is advisable?

I agree to the point that , network will play a role in real world app,
but here, we are mainly wanted to see the database capability, as network
will always play a similar kind of role across all databases. Do you
suggest some other approach to achieve this objective?

On Wed, 20 Dec, 2023, 2:42 am Peter J. Holzer, <hjp-pgsql@hjp.at>
wrote:

On 2023-12-20 00:44:48 +0530, veem v wrote:

So at first, we need to populate the base tables with the necessary

data (say

100million rows) with required skewness using random functions to

generate the

variation in the values of different data types. Then in case of row

by row

write/read test , we can traverse in a cursor loop. and in case of

batch write/

insert , we need to traverse in a bulk collect loop. Something like

below and

then this code can be wrapped into a procedure and passed to the

pgbench and

executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#11Kirk Wolak
wolakk@gmail.com
In reply to: veem v (#9)
Re: Read write performance check

On Thu, Dec 21, 2023 at 8:31 AM veem v <veema0000@gmail.com> wrote:

Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

...

Veem, first things first... "Top Posting" is when you reply at the top of
the email... Notice how I replied at the bottom (and I deleted context,
clearly).
This is the style we prefer here.

Second, since you are new to postgreSQL... Let me recommend some reading.
Cybertec has articles on performance (Tom Kyte style).
Also, read the "Don't Do That" wiki, and finally, have a look at pgbench
and psql documentation. And specifically look at GENERATE_SERIES(),
but the Cybertec articles will touch on that. Regardless... Reading the
docs is insightful.

Links:
https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
https://wiki.postgresql.org/wiki/Don&#39;t_Do_This
https://www.postgresql.org/docs/current/app-psql.html
https://www.postgresql.org/docs/current/pgbench.html
https://www.postgresql.org/docs/16/functions-srf.html

HTH,

Kirk Out!

#12Hao Zhang
kennthhz@gmail.com
In reply to: Kirk Wolak (#11)
Re: Read write performance check

Veem

You should also be familiar with Aurora Postgres's storage
architecture, which is very different from regular Postgres (see
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html
)
Aurora is remote storage, which means if your read workload can't fit into
the PG's shared buffers, it will have a very different performance than if
the storage is a local SSD.
On write, it writes six copies to three different availability zones for
high durability and availablity. So having enough network bandwidth is a
factor as well.

Ken

On Tue, Dec 26, 2023 at 11:30 PM Kirk Wolak <wolakk@gmail.com> wrote:

Show quoted text

On Thu, Dec 21, 2023 at 8:31 AM veem v <veema0000@gmail.com> wrote:

Can someone please guide me, if any standard scripting is available for
doing such read/write performance test? Or point me to any available docs?

...

Veem, first things first... "Top Posting" is when you reply at the top of
the email... Notice how I replied at the bottom (and I deleted context,
clearly).
This is the style we prefer here.

Second, since you are new to postgreSQL... Let me recommend some reading.
Cybertec has articles on performance (Tom Kyte style).
Also, read the "Don't Do That" wiki, and finally, have a look at pgbench
and psql documentation. And specifically look at GENERATE_SERIES(),
but the Cybertec articles will touch on that. Regardless... Reading the
docs is insightful.

Links:
https://www.cybertec-postgresql.com/en/postgresql-hash-index-performance/
https://wiki.postgresql.org/wiki/Don&#39;t_Do_This
https://www.postgresql.org/docs/current/app-psql.html
https://www.postgresql.org/docs/current/pgbench.html
https://www.postgresql.org/docs/16/functions-srf.html

HTH,

Kirk Out!