Patch to fix search_path defencies with pg_bench
Hello,
I have been doing some testing with pgbench and I realized that it
forces the use of public as its search_path. This is bad if:
* You want to run multiple pgbench instances within the same database
* You don't want to use public (for whatever reason)
This patch removes that ability and thus will defer to the default
search_path for the connecting user.
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad20cac..1f25921 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -357,8 +357,6 @@ doConnect(void)
return NULL;
}
- executeStatement(conn, "SET search_path = public");
-
return conn;
}
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
"Joshua D. Drake" <jd@commandprompt.com> writes:
I have been doing some testing with pgbench and I realized that it
forces the use of public as its search_path. This is bad if:
* You want to run multiple pgbench instances within the same database
* You don't want to use public (for whatever reason)
This patch removes that ability and thus will defer to the default
search_path for the connecting user.
Hmm. The search_path setting seems to have been added here
http://archives.postgresql.org/pgsql-committers/2002-10/msg00118.php
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/pgbench/pgbench.c.diff?r1=1.20;r2=1.21
as part of a mass patch to make everything in contrib work in the public
schema. I agree that it probably wasn't considered carefully whether
pg_bench should do that; but does anyone see a reason not to change it?
regards, tom lane
On Tue, 5 May 2009, Tom Lane wrote:
I agree that it probably wasn't considered carefully whether pg_bench
should do that; but does anyone see a reason not to change it?
I thought of one pretty weak use-case for not making this change, but
would wager the additional flexibility here is far more likely to be
appreciated. I'd say it's a clear net improvement.
As for that case...many good database designs put all the user relations
into a schema, so that it's easier to do bulk operations on all of them
while avoiding catalog tables etc.--less work to filter out pg_class to
find them for example.
I once did some pgbench testing on a system that included a real
"accounts" table in a named schema. "pgbench -i" will execute "drop table
if exists accounts". It had already accidentally wiped out the copy of
the accounts table on the system during an earlier test, before the schema
policy was in place, leaving everyone wary of it.
I was able to defend the risk for running pgbench with the new schema
layout by saying "that can only execute against public.accounts no matter
what the user search_path is, so you're safe now". That made everybody
happy. Anyone counting on such behavior could be rudely surprised at this
change. For all I know I'm the only person to ever actually run into that
particular situation though.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes:
I once did some pgbench testing on a system that included a real
"accounts" table in a named schema. "pgbench -i" will execute "drop table
if exists accounts". It had already accidentally wiped out the copy of
the accounts table on the system during an earlier test, before the schema
policy was in place, leaving everyone wary of it.
Seems like the right policy for that is "run pgbench in its own
database". I doubt that either adding or removing the "set search_path"
command changes the risk of trouble very much.
regards, tom lane
Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu:
Greg Smith <gsmith@gregsmith.com> writes:
I once did some pgbench testing on a system that included a real
"accounts" table in a named schema. "pgbench -i" will execute "drop table
if exists accounts". It had already accidentally wiped out the copy of
the accounts table on the system during an earlier test, before the schema
policy was in place, leaving everyone wary of it.Seems like the right policy for that is "run pgbench in its own
database".
A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.
regards.
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br
Dickson S. Guedes wrote:
Em Qua, 2009-05-06 �s 09:37 -0400, Tom Lane escreveu:
Seems like the right policy for that is "run pgbench in its own
database".A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.
I think it would be better that the schema is specified on the command
line.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
"Dickson S. Guedes" <listas@guedesoft.net> writes:
Em Qua, 2009-05-06 �s 09:37 -0400, Tom Lane escreveu:
Seems like the right policy for that is "run pgbench in its own
database".
A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.
There already is a prominent warning in the pgbench docs:
Caution
pgbench -i creates four tables accounts, branches, history, and
tellers, destroying any existing tables of these names. Be very
careful to use another database if you have tables having these
names!
regards, tom lane
On Wed, 2009-05-06 at 15:13 -0400, Alvaro Herrera wrote:
Dickson S. Guedes wrote:
Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu:
Seems like the right policy for that is "run pgbench in its own
database".A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.I think it would be better that the schema is specified on the command
line.
I could see that as an option but applications that use a role should
adhere to the rules the DBA sets forth for that role. In this particular
case I explicitly said that role bench01 was to connect to the database
bench and that his search path was bench01 (thus all tables would be
created under the schema bench01). Public should never come into play in
that scenario.
Sincerely,
Joshua D. Drake
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
Alvaro Herrera <alvherre@commandprompt.com> writes:
I think it would be better that the schema is specified on the command
line.
Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.
regards, tom lane
Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu:
Alvaro Herrera <alvherre@commandprompt.com> writes:
I think it would be better that the schema is specified on the command
line.Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.
So, in my opinion, the Joshua alternative is a good little change that
let "pgbench" runs in a more flexible way.
But, there is the possibility that someone are using an automated script
that could be broken by this change?
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br
On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:
Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu:
Alvaro Herrera <alvherre@commandprompt.com> writes:
I think it would be better that the schema is specified on the command
line.Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.So, in my opinion, the Joshua alternative is a good little change that
let "pgbench" runs in a more flexible way.But, there is the possibility that someone are using an automated script
that could be broken by this change?
Only if the role pgbench is using as an explicit search_path set.
Sincerely,
Joshua D. Drake
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997
Em Qua, 2009-05-06 às 13:49 -0700, Joshua D. Drake escreveu:
On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:
Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu:
Alvaro Herrera <alvherre@commandprompt.com> writes:
I think it would be better that the schema is specified on the command
line.Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.So, in my opinion, the Joshua alternative is a good little change that
let "pgbench" runs in a more flexible way.But, there is the possibility that someone are using an automated script
that could be broken by this change?Only if the role pgbench is using as an explicit search_path set.
So, in a way to avoid the scenario where a ROLE has an explicit
search_path set to schemes that already have tables named same as the
pgbench's tables, doesn't makes sense also create a "pgbench_" suffix
for them?
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br
"Joshua D. Drake" <jd@commandprompt.com> writes:
On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:
But, there is the possibility that someone are using an automated script
that could be broken by this change?
Only if the role pgbench is using as an explicit search_path set.
Even then, it's not a problem from the point of view of pgbench ---
the tables will still get created and used correctly. The only problem
shows up if someone is ignoring the existing warning in the docs and
running pgbench in a database that has application tables named accounts
&etc. If you're doing that you're at considerable risk anyway, no
matter *what* we do or don't do with pgbench's search path.
regards, tom lane
"Dickson S. Guedes" <listas@guedesoft.net> writes:
So, in a way to avoid the scenario where a ROLE has an explicit
search_path set to schemes that already have tables named same as the
pgbench's tables, doesn't makes sense also create a "pgbench_" suffix
for them?
Hm, just rename the standard scenario's tables to pgbench_accounts
etc? Sure, but then we break custom pgbench scripts that happen
to be using the default tables for their own purposes. There's
no free lunch.
regards, tom lane
On Wed, 2009-05-06 at 15:18 -0400, Tom Lane wrote:
"Dickson S. Guedes" <listas@guedesoft.net> writes:
Em Qua, 2009-05-06 s 09:37 -0400, Tom Lane escreveu:
Seems like the right policy for that is "run pgbench in its own
database".A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.There already is a prominent warning in the pgbench docs:
Caution
pgbench -i creates four tables accounts, branches, history, and
tellers, destroying any existing tables of these names. Be very
careful to use another database if you have tables having these
names!
Holy Handgrenade, what a huge footgun! It doesn't even have a
conceivable upside.
The table names "accounts" and "history" are fairly common and a caution
isn't a sufficient safeguard on production data. We know the manual
rarely gets read *after* a problem, let alone beforehand.
We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Thu, May 7, 2009 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, 2009-05-06 at 15:18 -0400, Tom Lane wrote:
"Dickson S. Guedes" <listas@guedesoft.net> writes:
Em Qua, 2009-05-06 s 09:37 -0400, Tom Lane escreveu:
Seems like the right policy for that is "run pgbench in its own
database".A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.There already is a prominent warning in the pgbench docs:
Caution
pgbench -i creates four tables accounts, branches, history, and
tellers, destroying any existing tables of these names. Be very
careful to use another database if you have tables having these
names!Holy Handgrenade, what a huge footgun! It doesn't even have a
conceivable upside.The table names "accounts" and "history" are fairly common and a caution
isn't a sufficient safeguard on production data. We know the manual
rarely gets read *after* a problem, let alone beforehand.We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.
I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/
...Robert
On Thu, 2009-05-07 at 11:14 -0400, Robert Haas wrote:
We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/
Running pgbench has become more popular now, with various people
recommending running it on every system to test performance. I don't
disagree with that recommendation, but I've had problems myself with a
similar issue - hence earlier patch to prevent pgbench running a
complete database VACUUM before every test run.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
* Robert Haas <robertmhaas@gmail.com> [090507 11:15]:
I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/
That has the "legacy compatibility" problem...
But seeing as "legacy" has a:
SET search_path TO public;
And uses plain <table> in it's queries/creates/drops, couldn't we just
make "new" pgbench refer to tables as <schema>.<table> where <schema> is
"public"? If we leave "schema" as public, and leave in the search_path,
we should be identical to what we currently have, except we've
explicliyt scoped was was searched for before.
And it leads to an easy way for people to change public (in the
search path and/or <schema>.<table>) to do other things (although I'm
not saying that's necessarily required or desired either).
a.
--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.
Simon Riggs <simon@2ndQuadrant.com> writes:
On Thu, 2009-05-07 at 11:14 -0400, Robert Haas wrote:
We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/
Running pgbench has become more popular now, with various people
recommending running it on every system to test performance. I don't
disagree with that recommendation, but I've had problems myself with a
similar issue - hence earlier patch to prevent pgbench running a
complete database VACUUM before every test run.
Well, pgbench has been coded this way since forever and we've only seen
this one report of trouble. Still, I can't object very hard to renaming
the tables to pgbench_accounts etc --- it's a trivial change and the
only thing it could break is custom pgbench scenarios that rely on the
default scenario's tables, which there are probably not many of.
So do we have consensus on dropping the "SET search_path" and renaming
the tables?
regards, tom lane
On Thu, 2009-05-07 at 12:47 -0400, Tom Lane wrote:
Well, pgbench has been coded this way since forever and we've only seen
this one report of trouble. Still, I can't object very hard to renaming
the tables to pgbench_accounts etc --- it's a trivial change and the
only thing it could break is custom pgbench scenarios that rely on the
default scenario's tables, which there are probably not many of.So do we have consensus on dropping the "SET search_path" and renaming
the tables?
+1 (I hate prefixed table names but I get the idea)
Joshua D. Drake
regards, tom lane
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997