Patch to fix search_path defencies with pg_bench

Started by Joshua D. Drakealmost 17 years ago31 messageshackers
Jump to latest
#1Joshua D. Drake
jd@commandprompt.com

Hello,

I have been doing some testing with pgbench and I realized that it
forces the use of public as its search_path. This is bad if:

* You want to run multiple pgbench instances within the same database
* You don't want to use public (for whatever reason)

This patch removes that ability and thus will defer to the default
search_path for the connecting user.

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad20cac..1f25921 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -357,8 +357,6 @@ doConnect(void)
                return NULL;
        }

- executeStatement(conn, "SET search_path = public");
-
return conn;
}

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#1)
Re: Patch to fix search_path defencies with pg_bench

"Joshua D. Drake" <jd@commandprompt.com> writes:

I have been doing some testing with pgbench and I realized that it
forces the use of public as its search_path. This is bad if:

* You want to run multiple pgbench instances within the same database
* You don't want to use public (for whatever reason)

This patch removes that ability and thus will defer to the default
search_path for the connecting user.

Hmm. The search_path setting seems to have been added here
http://archives.postgresql.org/pgsql-committers/2002-10/msg00118.php
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/pgbench/pgbench.c.diff?r1=1.20;r2=1.21

as part of a mass patch to make everything in contrib work in the public
schema. I agree that it probably wasn't considered carefully whether
pg_bench should do that; but does anyone see a reason not to change it?

regards, tom lane

#3Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#2)
Re: Patch to fix search_path defencies with pg_bench

On Tue, 5 May 2009, Tom Lane wrote:

I agree that it probably wasn't considered carefully whether pg_bench
should do that; but does anyone see a reason not to change it?

I thought of one pretty weak use-case for not making this change, but
would wager the additional flexibility here is far more likely to be
appreciated. I'd say it's a clear net improvement.

As for that case...many good database designs put all the user relations
into a schema, so that it's easier to do bulk operations on all of them
while avoiding catalog tables etc.--less work to filter out pg_class to
find them for example.

I once did some pgbench testing on a system that included a real
"accounts" table in a named schema. "pgbench -i" will execute "drop table
if exists accounts". It had already accidentally wiped out the copy of
the accounts table on the system during an earlier test, before the schema
policy was in place, leaving everyone wary of it.

I was able to defend the risk for running pgbench with the new schema
layout by saying "that can only execute against public.accounts no matter
what the user search_path is, so you're safe now". That made everybody
happy. Anyone counting on such behavior could be rudely surprised at this
change. For all I know I'm the only person to ever actually run into that
particular situation though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#3)
Re: Patch to fix search_path defencies with pg_bench

Greg Smith <gsmith@gregsmith.com> writes:

I once did some pgbench testing on a system that included a real
"accounts" table in a named schema. "pgbench -i" will execute "drop table
if exists accounts". It had already accidentally wiped out the copy of
the accounts table on the system during an earlier test, before the schema
policy was in place, leaving everyone wary of it.

Seems like the right policy for that is "run pgbench in its own
database". I doubt that either adding or removing the "set search_path"
command changes the risk of trouble very much.

regards, tom lane

#5Dickson S. Guedes
listas@guedesoft.net
In reply to: Tom Lane (#4)
Re: Patch to fix search_path defencies with pg_bench

Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu:

Greg Smith <gsmith@gregsmith.com> writes:

I once did some pgbench testing on a system that included a real
"accounts" table in a named schema. "pgbench -i" will execute "drop table
if exists accounts". It had already accidentally wiped out the copy of
the accounts table on the system during an earlier test, before the schema
policy was in place, leaving everyone wary of it.

Seems like the right policy for that is "run pgbench in its own
database".

A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.

regards.
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dickson S. Guedes (#5)
Re: Patch to fix search_path defencies with pg_bench

Dickson S. Guedes wrote:

Em Qua, 2009-05-06 �s 09:37 -0400, Tom Lane escreveu:

Seems like the right policy for that is "run pgbench in its own
database".

A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.

I think it would be better that the schema is specified on the command
line.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dickson S. Guedes (#5)
Re: Patch to fix search_path defencies with pg_bench

"Dickson S. Guedes" <listas@guedesoft.net> writes:

Em Qua, 2009-05-06 �s 09:37 -0400, Tom Lane escreveu:

Seems like the right policy for that is "run pgbench in its own
database".

A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.

There already is a prominent warning in the pgbench docs:

Caution

pgbench -i creates four tables accounts, branches, history, and
tellers, destroying any existing tables of these names. Be very
careful to use another database if you have tables having these
names!

regards, tom lane

#8Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Herrera (#6)
Re: Patch to fix search_path defencies with pg_bench

On Wed, 2009-05-06 at 15:13 -0400, Alvaro Herrera wrote:

Dickson S. Guedes wrote:

Em Qua, 2009-05-06 às 09:37 -0400, Tom Lane escreveu:

Seems like the right policy for that is "run pgbench in its own
database".

A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.

I think it would be better that the schema is specified on the command
line.

I could see that as an option but applications that use a role should
adhere to the rules the DBA sets forth for that role. In this particular
case I explicitly said that role bench01 was to connect to the database
bench and that his search path was bench01 (thus all tables would be
created under the schema bench01). Public should never come into play in
that scenario.

Sincerely,

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: Patch to fix search_path defencies with pg_bench

Alvaro Herrera <alvherre@commandprompt.com> writes:

I think it would be better that the schema is specified on the command
line.

Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.

regards, tom lane

#10Dickson S. Guedes
listas@guedesoft.net
In reply to: Tom Lane (#9)
Re: Patch to fix search_path defencies with pg_bench

Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I think it would be better that the schema is specified on the command
line.

Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.

So, in my opinion, the Joshua alternative is a good little change that
let "pgbench" runs in a more flexible way.

But, there is the possibility that someone are using an automated script
that could be broken by this change?

--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

#11Joshua D. Drake
jd@commandprompt.com
In reply to: Dickson S. Guedes (#10)
Re: Patch to fix search_path defencies with pg_bench

On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:

Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I think it would be better that the schema is specified on the command
line.

Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.

So, in my opinion, the Joshua alternative is a good little change that
let "pgbench" runs in a more flexible way.

But, there is the possibility that someone are using an automated script
that could be broken by this change?

Only if the role pgbench is using as an explicit search_path set.

Sincerely,

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#12Dickson S. Guedes
listas@guedesoft.net
In reply to: Joshua D. Drake (#11)
Re: Patch to fix search_path defencies with pg_bench

Em Qua, 2009-05-06 às 13:49 -0700, Joshua D. Drake escreveu:

On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:

Em Qua, 2009-05-06 às 16:27 -0400, Tom Lane escreveu:

Alvaro Herrera <alvherre@commandprompt.com> writes:

I think it would be better that the schema is specified on the command
line.

Surely that's more work than the issue is worth. It's also inconvenient
to use, because you'd have to remember to give the switch both for the
-i run and the normal test runs.

So, in my opinion, the Joshua alternative is a good little change that
let "pgbench" runs in a more flexible way.

But, there is the possibility that someone are using an automated script
that could be broken by this change?

Only if the role pgbench is using as an explicit search_path set.

So, in a way to avoid the scenario where a ROLE has an explicit
search_path set to schemes that already have tables named same as the
pgbench's tables, doesn't makes sense also create a "pgbench_" suffix
for them?

--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://planeta.postgresql.org.br

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#11)
Re: Patch to fix search_path defencies with pg_bench

"Joshua D. Drake" <jd@commandprompt.com> writes:

On Wed, 2009-05-06 at 17:42 -0300, Dickson S. Guedes wrote:

But, there is the possibility that someone are using an automated script
that could be broken by this change?

Only if the role pgbench is using as an explicit search_path set.

Even then, it's not a problem from the point of view of pgbench ---
the tables will still get created and used correctly. The only problem
shows up if someone is ignoring the existing warning in the docs and
running pgbench in a database that has application tables named accounts
&etc. If you're doing that you're at considerable risk anyway, no
matter *what* we do or don't do with pgbench's search path.

regards, tom lane

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dickson S. Guedes (#12)
Re: Patch to fix search_path defencies with pg_bench

"Dickson S. Guedes" <listas@guedesoft.net> writes:

So, in a way to avoid the scenario where a ROLE has an explicit
search_path set to schemes that already have tables named same as the
pgbench's tables, doesn't makes sense also create a "pgbench_" suffix
for them?

Hm, just rename the standard scenario's tables to pgbench_accounts
etc? Sure, but then we break custom pgbench scripts that happen
to be using the default tables for their own purposes. There's
no free lunch.

regards, tom lane

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#7)
Re: Patch to fix search_path defencies with pg_bench

On Wed, 2009-05-06 at 15:18 -0400, Tom Lane wrote:

"Dickson S. Guedes" <listas@guedesoft.net> writes:

Em Qua, 2009-05-06 s 09:37 -0400, Tom Lane escreveu:

Seems like the right policy for that is "run pgbench in its own
database".

A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.

There already is a prominent warning in the pgbench docs:

Caution

pgbench -i creates four tables accounts, branches, history, and
tellers, destroying any existing tables of these names. Be very
careful to use another database if you have tables having these
names!

Holy Handgrenade, what a huge footgun! It doesn't even have a
conceivable upside.

The table names "accounts" and "history" are fairly common and a caution
isn't a sufficient safeguard on production data. We know the manual
rarely gets read *after* a problem, let alone beforehand.

We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#16Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#15)
Re: Patch to fix search_path defencies with pg_bench

On Thu, May 7, 2009 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2009-05-06 at 15:18 -0400, Tom Lane wrote:

"Dickson S. Guedes" <listas@guedesoft.net> writes:

Em Qua, 2009-05-06 s 09:37 -0400, Tom Lane escreveu:

Seems like the right policy for that is "run pgbench in its own
database".

A text warning about this could be shown at start of pgbench if the
target database isn't named "pgbench", for examplo, or just some text
could be added to the docs.

There already is a prominent warning in the pgbench docs:

              Caution

      pgbench -i creates four tables accounts, branches, history, and
      tellers, destroying any existing tables of these names. Be very
      careful to use another database if you have tables having these
      names!

Holy Handgrenade, what a huge footgun! It doesn't even have a
conceivable upside.

The table names "accounts" and "history" are fairly common and a caution
isn't a sufficient safeguard on production data. We know the manual
rarely gets read *after* a problem, let alone beforehand.

We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.

I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/

...Robert

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#16)
Re: Patch to fix search_path defencies with pg_bench

On Thu, 2009-05-07 at 11:14 -0400, Robert Haas wrote:

We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.

I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/

Running pgbench has become more popular now, with various people
recommending running it on every system to test performance. I don't
disagree with that recommendation, but I've had problems myself with a
similar issue - hence earlier patch to prevent pgbench running a
complete database VACUUM before every test run.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#18Aidan Van Dyk
aidan@highrise.ca
In reply to: Robert Haas (#16)
Re: Patch to fix search_path defencies with pg_bench

* Robert Haas <robertmhaas@gmail.com> [090507 11:15]:

I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/

That has the "legacy compatibility" problem...

But seeing as "legacy" has a:
SET search_path TO public;

And uses plain <table> in it's queries/creates/drops, couldn't we just
make "new" pgbench refer to tables as <schema>.<table> where <schema> is
"public"? If we leave "schema" as public, and leave in the search_path,
we should be identical to what we currently have, except we've
explicliyt scoped was was searched for before.

And it leads to an easy way for people to change public (in the
search path and/or <schema>.<table>) to do other things (although I'm
not saying that's necessarily required or desired either).

a.

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#17)
Re: Patch to fix search_path defencies with pg_bench

Simon Riggs <simon@2ndQuadrant.com> writes:

On Thu, 2009-05-07 at 11:14 -0400, Robert Haas wrote:

We should check they are the correct tables before we just drop them.
Perhaps check for the comment "Tables for pgbench application. Not
production data" on the tables, which would be nice to add anyway.

I bet it would be just as good and a lot simpler to do what someone
suggested upthread, namely s/^/pgbench_/

Running pgbench has become more popular now, with various people
recommending running it on every system to test performance. I don't
disagree with that recommendation, but I've had problems myself with a
similar issue - hence earlier patch to prevent pgbench running a
complete database VACUUM before every test run.

Well, pgbench has been coded this way since forever and we've only seen
this one report of trouble. Still, I can't object very hard to renaming
the tables to pgbench_accounts etc --- it's a trivial change and the
only thing it could break is custom pgbench scenarios that rely on the
default scenario's tables, which there are probably not many of.

So do we have consensus on dropping the "SET search_path" and renaming
the tables?

regards, tom lane

#20Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#19)
Re: Patch to fix search_path defencies with pg_bench

On Thu, 2009-05-07 at 12:47 -0400, Tom Lane wrote:

Well, pgbench has been coded this way since forever and we've only seen
this one report of trouble. Still, I can't object very hard to renaming
the tables to pgbench_accounts etc --- it's a trivial change and the
only thing it could break is custom pgbench scenarios that rely on the
default scenario's tables, which there are probably not many of.

So do we have consensus on dropping the "SET search_path" and renaming
the tables?

+1 (I hate prefixed table names but I get the idea)

Joshua D. Drake

regards, tom lane

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aidan Van Dyk (#18)
#22Aidan Van Dyk
aidan@highrise.ca
In reply to: Tom Lane (#21)
#23Joshua D. Drake
jd@commandprompt.com
In reply to: Aidan Van Dyk (#22)
#24Aidan Van Dyk
aidan@highrise.ca
In reply to: Joshua D. Drake (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Joshua D. Drake (#20)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#23)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#1)
#28Greg Smith
gsmith@gregsmith.com
In reply to: Aidan Van Dyk (#22)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#28)
#30Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#30)