pgbench: Skipping the creating primary keys after initialization
Hi all,
I'd like to propose a new option -I for pgbench command which skips
the creating primary keys after initialized tables. This option is
useful for users who want to do bench marking with no index or indexes
other than btree primary index. If we initialize pgbench tables at a
large number scale factor the primary key index creation takes a long
time even if we're going to use other types of indexes. With this
option, the initialization time is reduced and you can create indexes
as you want.
Feedback is very welcome. I'll add this patch to the next CF.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
skip_building_pkeys_after_initialized.patchapplication/octet-stream; name=skip_building_pkeys_after_initialized.patchDownload+35-17
On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'd like to propose a new option -I for pgbench command which skips
the creating primary keys after initialized tables. This option is
useful for users who want to do bench marking with no index or indexes
other than btree primary index. If we initialize pgbench tables at a
large number scale factor the primary key index creation takes a long
time even if we're going to use other types of indexes. With this
option, the initialization time is reduced and you can create indexes
as you want.Feedback is very welcome. I'll add this patch to the next CF.
I support adding an option for this, but I propose that we just make
it a long-form option, similar to --log-prefix or --index-tablespace.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'd like to propose a new option -I for pgbench command which skips
the creating primary keys after initialized tables. This option is
useful for users who want to do bench marking with no index or indexes
other than btree primary index. If we initialize pgbench tables at a
large number scale factor the primary key index creation takes a long
time even if we're going to use other types of indexes. With this
option, the initialization time is reduced and you can create indexes
as you want.Feedback is very welcome. I'll add this patch to the next CF.
I support adding an option for this, but I propose that we just make
it a long-form option, similar to --log-prefix or --index-tablespace.
Yeah, that's better. I'll update the patch.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'd like to propose a new option -I for pgbench command which skips
the creating primary keys after initialized tables.
I support adding an option for this, but I propose that we just make
it a long-form option, similar to --log-prefix or --index-tablespace.
I think we could probably do without this ... if you want a non-default
test setup, why do you need to use "pgbench -i" to create it?
It's not that there's anything greatly wrong with this particular idea,
it's just that pgbench has too many switches already, and omitting random
subsets of the initialization actions doesn't seem like it contributes
fundamental new benchmarking capability.
I could get behind a proposal that generalized pgbench's "-i" behavior
in some meaningful way. I wonder whether it would be possible to convert
that behavior into a script. Some of what it does is just SQL commands
with injected parameters, which pgbench does already. There's also
data-loading actions, which could be converted to backslash commands
perhaps. Then desires like this could be addressed by invoking a
customized script instead of complicating pgbench's option set.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I think we could probably do without this ... if you want a non-default
test setup, why do you need to use "pgbench -i" to create it?It's not that there's anything greatly wrong with this particular idea,
it's just that pgbench has too many switches already, and omitting random
subsets of the initialization actions doesn't seem like it contributes
fundamental new benchmarking capability.I could get behind a proposal that generalized pgbench's "-i" behavior
in some meaningful way. I wonder whether it would be possible to convert
that behavior into a script. Some of what it does is just SQL commands
with injected parameters, which pgbench does already. There's also
data-loading actions, which could be converted to backslash commands
perhaps. Then desires like this could be addressed by invoking a
customized script instead of complicating pgbench's option set.
+1.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'd like to propose a new option -I for pgbench command which skips
the creating primary keys after initialized tables.I support adding an option for this, but I propose that we just make
it a long-form option, similar to --log-prefix or --index-tablespace.I think we could probably do without this ... if you want a non-default
test setup, why do you need to use "pgbench -i" to create it?It's not that there's anything greatly wrong with this particular idea,
it's just that pgbench has too many switches already, and omitting random
subsets of the initialization actions doesn't seem like it contributes
fundamental new benchmarking capability.I could get behind a proposal that generalized pgbench's "-i" behavior
in some meaningful way. I wonder whether it would be possible to convert
that behavior into a script. Some of what it does is just SQL commands
with injected parameters, which pgbench does already. There's also
data-loading actions, which could be converted to backslash commands
perhaps. Then desires like this could be addressed by invoking a
customized script instead of complicating pgbench's option set.
I've actually wanted this exact thing multiple times: most recently,
to make a non-unique btree index instead of a unique one, and to make
a hash index instead of a btree one. I don't object to a modest
effort at coming up with a more general mechanism here, but I also
think the switch as proposed is something that would have met my real
needs on multiple occasions. I've probably had 10 different occasions
when I wanted all of the standard pgbench initialization *except for*
something different about the indexes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
I've actually wanted this exact thing multiple times: most recently,
to make a non-unique btree index instead of a unique one, and to make
a hash index instead of a btree one. I don't object to a modest
effort at coming up with a more general mechanism here, but I also
think the switch as proposed is something that would have met my real
needs on multiple occasions. I've probably had 10 different occasions
when I wanted all of the standard pgbench initialization *except for*
something different about the indexes.
Sure, but "no indexes at all" is hardly ever the real goal, is it?
So the switch as proposed is only solving part of your problem.
I'd rather see a solution that addresses a larger range of desires.
Or in other words, this looks to me quite a bit like the hackery
that resulted in pgbench's -S and -N options, before we figured out
that making it scriptable was a better answer.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sure, but "no indexes at all" is hardly ever the real goal, is it?
Right.
So the switch as proposed is only solving part of your problem.
I'd rather see a solution that addresses a larger range of desires.
That's reasonable.
Or in other words, this looks to me quite a bit like the hackery
that resulted in pgbench's -S and -N options, before we figured out
that making it scriptable was a better answer.
But it's not very clear to me how we could make this case scriptable,
and it would probably not be much different from just using the
proposed option and then running the script afterwards yourself via
psql. The thing about -N and -S is that those scripts are being run
repeatedly, so pgbench has to be involved. If you just want to create
different/extra indexes, you can do that yourself.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Or in other words, this looks to me quite a bit like the hackery
that resulted in pgbench's -S and -N options, before we figured out
that making it scriptable was a better answer.
But it's not very clear to me how we could make this case scriptable,
Well, I'm imagining that "-i" would essentially become a short form
of "-b initialize", as already happened for -S and -N, where the script
looks something like
drop table if exists pgbench_branches;
create table pgbench_branches (
bid int not null,bbalance int,filler char(88)
);
\load_data pgbench_branches [ other parameters to-be-determined ]
alter table pgbench_branches add primary key (bid);
... repeat for other tables ...
and we'd document that the same way we do for the existing built-in
scripts. Then, if there's something you don't like about it, you
just paste the script into a file and edit to taste.
I'm sure there's complexities that would only become apparent when
someone tries to write the patch, but that seems to me like a better
foundation for this class of desires than extending the option set
with various one-off options having no discernible architecture.
If you just want to create
different/extra indexes, you can do that yourself.
Sure, but there's no end to the number of small variations on this
theme that somebody might want. For example, we realized years ago
that the "filler" fields as-implemented don't really meet the intent
of the TPC-B spec (cf comment in the init() function). If someone
comes along with a patch adding a "--really-tpc-b" option to change
the table declarations and/or data loading code to fix that, will we
take that patch? What about one that wants all the id fields (not
just accounts.aid) to be bigint, or one that wants the balance fields
to be numeric?
You can say "let 'em set up the tables manually if they want that",
but I don't see why a nonstandard set of indexes is much different.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Or in other words, this looks to me quite a bit like the hackery
that resulted in pgbench's -S and -N options, before we figured out
that making it scriptable was a better answer.But it's not very clear to me how we could make this case scriptable,
Well, I'm imagining that "-i" would essentially become a short form
of "-b initialize", as already happened for -S and -N, where the script
looks something likedrop table if exists pgbench_branches;
create table pgbench_branches (
bid int not null,bbalance int,filler char(88)
);
\load_data pgbench_branches [ other parameters to-be-determined ]
alter table pgbench_branches add primary key (bid);
... repeat for other tables ...and we'd document that the same way we do for the existing built-in
scripts. Then, if there's something you don't like about it, you
just paste the script into a file and edit to taste.
I imagine that would be useful for some use cases, but it's a heck of
a lot more work than just writing --no-indexes-please.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Well, I'm imagining that "-i" would essentially become a short form
of "-b initialize", as already happened for -S and -N, where the script
looks something like ...
I imagine that would be useful for some use cases, but it's a heck of
a lot more work than just writing --no-indexes-please.
Of course. It's also a heck of a lot more flexible. Adding on another
ad-hoc option that does the minimum possible amount of work needed to
address one specific problem is always going to be less work; but after
we repeat that process five or ten times, we're going to have a mess.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Of course. It's also a heck of a lot more flexible. Adding on another
ad-hoc option that does the minimum possible amount of work needed to
address one specific problem is always going to be less work; but after
we repeat that process five or ten times, we're going to have a mess.
Well, I still like Masahiko-san's proposal, but I'm not prepared to
keep arguing about it right now. Maybe some other people will weigh
in with an opinion.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 2, 2017 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Or in other words, this looks to me quite a bit like the hackery
that resulted in pgbench's -S and -N options, before we figured out
that making it scriptable was a better answer.But it's not very clear to me how we could make this case scriptable,
Well, I'm imagining that "-i" would essentially become a short form
of "-b initialize", as already happened for -S and -N, where the script
looks something like
Yes, I would imagine a facility where one could do pgbench $script and
issue a complete test set. Here is for example a funky idea: let's
separate each script with a set of meta-commands, \init being what is
used just for initialization, and then use \script to define a set of
commands with a custom weight. Say:
\init
CREATE TABLE foo(a int);
\script select_query [weight N]
SELECT count(*) FROM foo;
\script insert_query [weight N]
INSERT INTO foo VALUES ('1');
That may be over-engineering things, but personally I don't like much
having just a switch to remove indexes. Next time we will come with
another option that only selects a portion of the indexes created.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 2, 2017 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Of course. It's also a heck of a lot more flexible. Adding on another
ad-hoc option that does the minimum possible amount of work needed to
address one specific problem is always going to be less work; but after
we repeat that process five or ten times, we're going to have a mess.Well, I still like Masahiko-san's proposal, but I'm not prepared to
keep arguing about it right now. Maybe some other people will weigh
in with an opinion.
My motivation of this proposal is same as what Robert has. I
understand that ad-hoc option can solve only the part of big problem
and it could be cause of mess. However It seems me that the script
especially for table initialization will not be flexible than we
expected. I mean, even if we provide some meta commands for table
initialization or data loading, these meta commands work for only
pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
If we want to create other tables and load data to them as we want we
can do that using psql -f. So an alternative ways is having a flexible
style option for example --custom-initialize = { [load, create_pkey,
create_fkey, vacuum], ... }. That would solve this in a better way.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
My motivation of this proposal is same as what Robert has. I
understand that ad-hoc option can solve only the part of big problem
and it could be cause of mess. However It seems me that the script
especially for table initialization will not be flexible than we
expected. I mean, even if we provide some meta commands for table
initialization or data loading, these meta commands work for only
pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
If we want to create other tables and load data to them as we want we
can do that using psql -f. So an alternative ways is having a flexible
style option for example --custom-initialize = { [load, create_pkey,
create_fkey, vacuum], ... }. That would solve this in a better way.
Personnaly, I could be fine with a limited number of long options to
adjust pgbench initialization to various needs, eg --use-hash-index,
--skip-whetever-index, etc.
The flexible --custom-init idea outlined above looks nice as well.
As for a more generic solution, the easy part are the "CREATE" stuff and
the transaction script stuff (existing pgbench scripts).
For the CREATE stuff, the script language is SQL, the command to use it is
"psql"...
The real and hard part is to fill tables with meaningful pseudo-random
test data which do not violate constraints for any non trivial schema
involving foreign keys and various unique constraints.
The solution for this is SQL for trivial cases, think of:
"INSERT INTO Foo() SELECT ... FROM generate_series(...);"
For instance the pgbench initialization is really close to:
psql -Dscale=10 <<EOF
CREATE TABLE ... ;
INSERT INTO pgbench_account(...)
SELECT ... FROM generate_series(1, 100000 * :scale) AS i;
INSERT ...
CREATE INDEX ...;
VACUUM FULL ANALYZE;
EOF
And all existing options could probably be implemented easilly with the
recently added conditional (\if).
So my 0.02ᅵ is that if something is to be done, I would suggest to turn
the creation and initialization stuff into a standard "psql" script that
could be called from pgbench instead of integrating much more ad-hoc stuff
into pgbench.
Note that non trivial schema initialization requires more general
programming, so I do not believe in doing a lot at pgbench or psql levels.
The best I could come with is a data generator which takes as input the
schema with added directives on how to generate the various attributes
(tool named "datafiller", that some people use:-).
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fabien COELHO <coelho@cri.ensmp.fr> writes:
As for a more generic solution, the easy part are the "CREATE" stuff and
the transaction script stuff (existing pgbench scripts).
For the CREATE stuff, the script language is SQL, the command to use it is
"psql"...
The real and hard part is to fill tables with meaningful pseudo-random
test data which do not violate constraints for any non trivial schema
involving foreign keys and various unique constraints.
The solution for this is SQL for trivial cases, think of:
"INSERT INTO Foo() SELECT ... FROM generate_series(...);"
Yeah. I was also thinking that complicated data-generation requirements
could be handled with plpgsql DO blocks, avoiding the need for hard-wired
code inside pgbench.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Masahiko Sawada <sawada.mshk@gmail.com> writes:
If we want to create other tables and load data to them as we want we
can do that using psql -f. So an alternative ways is having a flexible
style option for example --custom-initialize = { [load, create_pkey,
create_fkey, vacuum], ... }. That would solve this in a better way.
FWIW, I like that significantly better than your original proposal.
It'd allow people to execute parts of pgbench's standard initialization
sequence and then do other things in between (in psql). Realistically,
that's probably about as much win as we need here --- if you're veering
far enough away from the standard scenario that that doesn't do it for
you, you might as well just write an all-custom setup script in psql.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
For the CREATE stuff, the script language is SQL, the command to use it is
"psql"...The real and hard part is to fill tables with meaningful pseudo-random
test data which do not violate constraints for any non trivial schema
involving foreign keys and various unique constraints.The solution for this is SQL for trivial cases, think of:
"INSERT INTO Foo() SELECT ... FROM generate_series(...);"Yeah. I was also thinking that complicated data-generation requirements
could be handled with plpgsql DO blocks, avoiding the need for hard-wired
code inside pgbench.
I do not thing that it is really be needed for what pgbench does, though.
See attached attempt, including a no_foreign_keys option.
The only tricky thing is to have the elapsed/remaining advancement report
on stdout, maybe with some PL/pgSQL.
Timings are very similar compared to "pgbench -i".
--
Fabien.
Attachments:
On Fri, Aug 4, 2017 at 3:24 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
For the CREATE stuff, the script language is SQL, the command to use it
is
"psql"...The real and hard part is to fill tables with meaningful pseudo-random
test data which do not violate constraints for any non trivial schema
involving foreign keys and various unique constraints.The solution for this is SQL for trivial cases, think of:
"INSERT INTO Foo() SELECT ... FROM generate_series(...);"Yeah. I was also thinking that complicated data-generation requirements
could be handled with plpgsql DO blocks, avoiding the need for hard-wired
code inside pgbench.I do not thing that it is really be needed for what pgbench does, though.
See attached attempt, including a no_foreign_keys option.The only tricky thing is to have the elapsed/remaining advancement report on
stdout, maybe with some PL/pgSQL.Timings are very similar compared to "pgbench -i".
The generating data with plpgsql DO blocks means that we do the
data-generation on sever side rather than on client side. I think it's
preferable in a sense because could speed up initialization time by
reducing the network traffic.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 3, 2017 at 11:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Masahiko Sawada <sawada.mshk@gmail.com> writes:
If we want to create other tables and load data to them as we want we
can do that using psql -f. So an alternative ways is having a flexible
style option for example --custom-initialize = { [load, create_pkey,
create_fkey, vacuum], ... }. That would solve this in a better way.FWIW, I like that significantly better than your original proposal.
It'd allow people to execute parts of pgbench's standard initialization
sequence and then do other things in between (in psql). Realistically,
that's probably about as much win as we need here --- if you're veering
far enough away from the standard scenario that that doesn't do it for
you, you might as well just write an all-custom setup script in psql.
Attached patch introduces --custom-initialize option that allows us to
specify the initialization step and its order. For example, If you
want to skip building primary keys you can specify
--custom-initialize="create_table, load_data, vacuum". Since each
custom initialization commands is invoked in specified order, for
example we also can create primary keys *before* loading data. The
data-generation is doing on client side, so progress information for
initialization is still supported.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center