Generate test data inserts - 1GB

Started by Shital Aover 6 years ago8 messagesgeneral
Jump to latest
#1Shital A
brightuser2019@gmail.com

Hello

Postgresql 9.6

Need to generate 1GB test data in very less time. I got some techniques
online but they take around 40mins for 400GB. Any quicker way?

Thanks.

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Shital A (#1)
Re: Generate test data inserts - 1GB

On 8/9/19 4:12 AM, Shital A wrote:

Hello

Postgresql 9.6

Need to generate 1GB test data in very less time. I got some techniques
online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?

Thanks.

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Shital A
brightuser2019@gmail.com
In reply to: Adrian Klaver (#2)
Re: Generate test data inserts - 1GB

On Fri, 9 Aug 2019, 20:08 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 8/9/19 4:12 AM, Shital A wrote:

Hello

Postgresql 9.6

Need to generate 1GB test data in very less time. I got some techniques
online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?

Thanks.

--
Adrian Klaver
adrian.klaver@aklaver.com

Hello,

Sorry 400GB was a typo. Its 400 MB Details are below:

1) Postgres version?
9.6

2) Data going into single table or multiple tables?
Single table having Multiple columns

3) What is the time frame you are trying to achieve?
As quick as possible.

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a time. This
takes 40 mins.

5) If you need only 1GB why the 400GB number?
That was 400MB. M checking size using \l+ option.

Thanks.

Show quoted text
#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Shital A (#3)
Re: Generate test data inserts - 1GB

On 8/9/19 8:14 AM, Shital A wrote:

On Fri, 9 Aug 2019, 20:08 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 8/9/19 4:12 AM, Shital A wrote:

Hello

Postgresql 9.6

Need to generate 1GB test data in very less time. I got some

techniques

online but they take around 40mins for 400GB. Any quicker way?

1) Postgres version?

2) Data going into single table or multiple tables?

3) What is the time frame you are trying to achieve?

4) What techniques have you tried?

5) If you need only 1GB why the 400GB number?

Thanks.

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

Hello,

Sorry 400GB was a typo. Its 400 MB Details are below:

1) Postgres version?
9.6

2) Data going into single table or multiple tables?
Single table having Multiple columns

3) What is the time frame you are trying to achieve?
As quick as possible.

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a time. This
takes 40 mins.

Might take a look at COPY:

https://www.postgresql.org/docs/11/sql-copy.html

or its psql equivalent:

\copy.

COPY runs as the server user and sees files relative to the server location.
\copy runs as the psql(client) user.

In either case suitability will depend on where you are sourcing the
test data from and what format it is in to begin with.

If you want a more sophisticated wrapper over the above then:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

5) If you need only 1GB why the 400GB number?
That was 400MB. M checking size using \l+ option.

Thanks.

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Shital A (#3)
Re: Generate test data inserts - 1GB

On 8/9/19 8:14 AM, Shital A wrote:

Hello,

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a time. This
takes 40 mins.

To add to my previous post. If you already have data in a Postgres
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you
could apply that to your test database(after TRUNCATE on table, assuming
you want to start fresh):

psql -d test_db -f test_data.sql

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Shital A
brightuser2019@gmail.com
In reply to: Adrian Klaver (#5)
Re: Generate test data inserts - 1GB

On Fri, 9 Aug 2019, 21:25 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 8/9/19 8:14 AM, Shital A wrote:

Hello,

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a time. This
takes 40 mins.

To add to my previous post. If you already have data in a Postgres
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you
could apply that to your test database(after TRUNCATE on table, assuming
you want to start fresh):

psql -d test_db -f test_data.sql

--
Adrian Klaver
adrian.klaver@aklaver.com

Thanks for the reply Adrian.

Missed one requirement. Will these methods generate wal logs needed for
replication?

Actually the data is to check if replication catches up. Below is scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for auto
failure.
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in
generating data.
Master slave are in same network.

Thanks !

#7Rob Sargent
robjsargent@gmail.com
In reply to: Shital A (#6)
Re: Generate test data inserts - 1GB

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

Thanks for the reply Adrian.

Missed one requirement. Will these methods generate wal logs needed
for replication?

Actually the data is to check if replication catches up. Below is
scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for
auto failure.
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in
generating data.
Master slave are in same network.

Thanks !

How quickly does your production instance grow, in terms of
GB/per-unit-time?

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Shital A (#6)
Re: Generate test data inserts - 1GB

On 8/9/19 9:51 AM, Shital A wrote:

On Fri, 9 Aug 2019, 21:25 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 8/9/19 8:14 AM, Shital A wrote:

Hello,

4) What techniques have you tried?
Insert into with With statement, inserting 2000000 rows at a

time. This

takes 40 mins.

To add to my previous post. If you already have data in a Postgres
database then you could do:

pg_dump -d db -t some_table -a -f test_data.sql

That will dump the data only for the table in COPY format. Then you
could apply that to your test database(after TRUNCATE on table,
assuming
you want to start fresh):

psql -d test_db -f test_data.sql

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

Thanks for the reply Adrian.

Missed one requirement. Will these methods generate wal logs needed for
replication?

For COPY, AFAIK yes.

To verify set up a small test table and COPY to it and sees if the data
shows up on the standby.

pg_bulkload:

https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

"IMPORTANT NOTE: Under streaming replication environment, pg_bulkload
does not work properly. See here for details.

Actually the data is to check if replication catches up. Below is scenario :

1. Have a master slave cluster with replication setup

2. Kill master so that standby takes over. We are using pacemaker for
auto failure.
Insert 1 GB data in new master while replication is broken.

3 Start oldnode as standby and check if 1GB data gets replicated.

As such testing might be frequent we needed to spend minimum time in
generating data.
Master slave are in same network.

Thanks !

--
Adrian Klaver
adrian.klaver@aklaver.com