plpgsql.consistent_into

Started by Marko Tiikkajaover 12 years ago45 messageshackers

marko@joh.to

over 12 years ago

Greetings fellow elephants,

I would humbly like to submit for your consideration my proposal for
alleviating pain caused by one of the most annoying footguns in
PL/PgSQL: the behaviour of SELECT .. INTO when the query returns more
than one row. Some of you might know that no exception is raised in
this case (as opposed to INSERT/UPDATE/DELETE .. INTO, all of them
yielding TOO_MANY_ROWS), which can hide subtle bugs in queries if during
testing the query always returns only one row or the "correct" one
happens to be picked up every time. Additionally, the row_count() after
execution is always going to be either 0 or 1, so even if you want to
explicitly guard against potentially broken queries, you can't do so!

So I added the following compile-time option:

set plpgsql.consistent_into to true;

create or replace function footest() returns void as $$
declare
x int;
begin
-- too many rows
select 1 from foo into x;
end$$ language plpgsql;

select footest();
ERROR: query returned more than one row

It defaults to false to preserve full backwards compatibility. Also
turning it on makes the executor try and find two rows, so it might have
an effect on performance as well. The patch, as currently written, also
changes the behaviour of EXECUTE .. INTO, but I don't feel strongly
about whether that should be affected as well or not.

Regards,
Marko Tiikkaja

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Marko Tiikkaja (#1)

Re: plpgsql.consistent_into

Hello

2014/1/12 Marko Tiikkaja <marko@joh.to>

Greetings fellow elephants,

I would humbly like to submit for your consideration my proposal for
alleviating pain caused by one of the most annoying footguns in PL/PgSQL:
the behaviour of SELECT .. INTO when the query returns more than one row.
Some of you might know that no exception is raised in this case (as
opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding
TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the
query always returns only one row or the "correct" one happens to be picked
up every time. Additionally, the row_count() after execution is always
going to be either 0 or 1, so even if you want to explicitly guard against
potentially broken queries, you can't do so!

It is not bad and, sure, - it is very useful and important

but - it is a redundant to INTO STRICT clause. When you use it, then you
change a INTO behaviour. Is not better to ensure STRICT option than hidden
redefining INTO?

Option INTO (without STRICT clause) is not safe and we should to disallow.
I see a three states (not only two)

a) disallow INTO without STRICT (as preferred for new code)
b) implicit check after every INTO without STRICT
c) without check

these modes should be: "strict_required", "strict_default", "strict_legacy"

So I added the following compile-time option:

set plpgsql.consistent_into to true;

This name is not best (there is not clean with it a into should be
consistent)

Is question, if this functionality should be enabled by GUC to be used for
legacy code (as protection against some kind of hidden bugs)

This topic is interesting idea for me - some checks can be pushed to
plpgsql_check (as errors or warnings) too.

Generally I like proposed functionality, just I am not sure, so hidden
redefining INTO clause (to INTO STRICT) is what we want. We can do it (but
explicitly). I don't know any situation where INTO without STRICT is valid.
Introduction of STRICT option was wrong idea - and now is not way to back.

Regards

Pavel

Show quoted text

create or replace function footest() returns void as $$
declare
x int;
begin
-- too many rows
select 1 from foo into x;
end$$ language plpgsql;

select footest();
ERROR: query returned more than one row

It defaults to false to preserve full backwards compatibility. Also
turning it on makes the executor try and find two rows, so it might have an
effect on performance as well. The patch, as currently written, also
changes the behaviour of EXECUTE .. INTO, but I don't feel strongly about
whether that should be affected as well or not.

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marko Tiikkaja

marko@joh.to

over 12 years ago

In reply to: Pavel Stehule (#2)

Re: plpgsql.consistent_into

On 1/12/14, 7:47 AM, Pavel Stehule wrote:

2014/1/12 Marko Tiikkaja <marko@joh.to>

Greetings fellow elephants,

I would humbly like to submit for your consideration my proposal for
alleviating pain caused by one of the most annoying footguns in PL/PgSQL:
the behaviour of SELECT .. INTO when the query returns more than one row.
Some of you might know that no exception is raised in this case (as
opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding
TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the
query always returns only one row or the "correct" one happens to be picked
up every time. Additionally, the row_count() after execution is always
going to be either 0 or 1, so even if you want to explicitly guard against
potentially broken queries, you can't do so!

It is not bad and, sure, - it is very useful and important

but - it is a redundant to INTO STRICT clause. When you use it, then you
change a INTO behaviour. Is not better to ensure STRICT option than hidden
redefining INTO?

That only works if the query should never return 0 rows either. If you
want to allow for missing rows, STRICT is out of the question.

Option INTO (without STRICT clause) is not safe and we should to disallow.
I see a three states (not only two)

a) disallow INTO without STRICT (as preferred for new code)
b) implicit check after every INTO without STRICT
c) without check

these modes should be: "strict_required", "strict_default", "strict_legacy"

I can't get excited about this. Mostly because it doesn't solve the
problem I'm having.

It is important to be able to execute queries with INTO which might not
return a row. That's what FOUND is for.

So I added the following compile-time option:

set plpgsql.consistent_into to true;

This name is not best (there is not clean with it a into should be
consistent)

I agree, but I had to pick something. One of the three hard problems in
CS..

Is question, if this functionality should be enabled by GUC to be used for
legacy code (as protection against some kind of hidden bugs)

This topic is interesting idea for me - some checks can be pushed to
plpgsql_check (as errors or warnings) too.

Generally I like proposed functionality, just I am not sure, so hidden
redefining INTO clause (to INTO STRICT) is what we want. We can do it (but
explicitly). I don't know any situation where INTO without STRICT is valid.
Introduction of STRICT option was wrong idea - and now is not way to back.

Note that this is different from implicitly STRICTifying every INTO,
like I said above.

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Marko Tiikkaja (#3)

Re: plpgsql.consistent_into

2014/1/12 Marko Tiikkaja <marko@joh.to>

On 1/12/14, 7:47 AM, Pavel Stehule wrote:

2014/1/12 Marko Tiikkaja <marko@joh.to>

Greetings fellow elephants,

I would humbly like to submit for your consideration my proposal for
alleviating pain caused by one of the most annoying footguns in PL/PgSQL:
the behaviour of SELECT .. INTO when the query returns more than one row.
Some of you might know that no exception is raised in this case (as
opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding
TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing
the
query always returns only one row or the "correct" one happens to be
picked
up every time. Additionally, the row_count() after execution is always
going to be either 0 or 1, so even if you want to explicitly guard
against
potentially broken queries, you can't do so!

It is not bad and, sure, - it is very useful and important

but - it is a redundant to INTO STRICT clause. When you use it, then you
change a INTO behaviour. Is not better to ensure STRICT option than hidden
redefining INTO?

That only works if the query should never return 0 rows either. If you
want to allow for missing rows, STRICT is out of the question.

hmm - you have true.

try to find better name.

Other questions is using a GUC for legacy code. I am for this checked mode
be default (or can be simply activated for new code)

Regards

Pavel

Show quoted text

Option INTO (without STRICT clause) is not safe and we should to disallow.

I see a three states (not only two)

a) disallow INTO without STRICT (as preferred for new code)
b) implicit check after every INTO without STRICT
c) without check

these modes should be: "strict_required", "strict_default",
"strict_legacy"

I can't get excited about this. Mostly because it doesn't solve the
problem I'm having.

It is important to be able to execute queries with INTO which might not
return a row. That's what FOUND is for.

So I added the following compile-time option:

set plpgsql.consistent_into to true;

This name is not best (there is not clean with it a into should be
consistent)

I agree, but I had to pick something. One of the three hard problems in
CS..

Is question, if this functionality should be enabled by GUC to be used for

legacy code (as protection against some kind of hidden bugs)

This topic is interesting idea for me - some checks can be pushed to
plpgsql_check (as errors or warnings) too.

Generally I like proposed functionality, just I am not sure, so hidden
redefining INTO clause (to INTO STRICT) is what we want. We can do it (but
explicitly). I don't know any situation where INTO without STRICT is
valid.
Introduction of STRICT option was wrong idea - and now is not way to back.

Note that this is different from implicitly STRICTifying every INTO, like
I said above.

Regards,
Marko Tiikkaja

Florian Pflug

fgp@phlo.org

over 12 years ago

In reply to: Marko Tiikkaja (#1)

Re: plpgsql.consistent_into

On Jan12, 2014, at 06:51 , Marko Tiikkaja <marko@joh.to> wrote:

I would humbly like to submit for your consideration my proposal for alleviating pain caused by one of the most annoying footguns in PL/PgSQL: the behaviour of SELECT .. INTO when the query returns more than one row. Some of you might know that no exception is raised in this case (as opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the query always returns only one row or the "correct" one happens to be picked up every time. Additionally, the row_count() after execution is always going to be either 0 or 1, so even if you want to explicitly guard against potentially broken queries, you can't do so!

So I added the following compile-time option:

set plpgsql.consistent_into to true;

I don't think a GUC is the best way to handle this. Handling
this via a per-function setting similar to #variable_conflict would
IMHO be better.So a function containing

#into_surplus_rows error

would complain whereas

#into_surplus_rows ignore_for_select

would leave the behaviour unchanged.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marko Tiikkaja

marko@joh.to

over 12 years ago

In reply to: Florian Pflug (#5)

Re: plpgsql.consistent_into

On 1/12/14, 10:19 PM, Florian Pflug wrote:

On Jan12, 2014, at 06:51 , Marko Tiikkaja <marko@joh.to> wrote:

set plpgsql.consistent_into to true;

I don't think a GUC is the best way to handle this. Handling
this via a per-function setting similar to #variable_conflict would
IMHO be better.

This is exactly what the patch does. A global default in the form of a
GUC, and then a per-function option for overriding that default.

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Florian Pflug (#5)

Re: plpgsql.consistent_into

2014/1/12 Florian Pflug <fgp@phlo.org>

On Jan12, 2014, at 06:51 , Marko Tiikkaja <marko@joh.to> wrote:

I would humbly like to submit for your consideration my proposal for

alleviating pain caused by one of the most annoying footguns in PL/PgSQL:
the behaviour of SELECT .. INTO when the query returns more than one row.
Some of you might know that no exception is raised in this case (as
opposed to INSERT/UPDATE/DELETE .. INTO, all of them yielding
TOO_MANY_ROWS), which can hide subtle bugs in queries if during testing the
query always returns only one row or the "correct" one happens to be picked
up every time. Additionally, the row_count() after execution is always
going to be either 0 or 1, so even if you want to explicitly guard against
potentially broken queries, you can't do so!

So I added the following compile-time option:

set plpgsql.consistent_into to true;

I don't think a GUC is the best way to handle this. Handling
this via a per-function setting similar to #variable_conflict would
IMHO be better.So a function containing

#into_surplus_rows error

would complain whereas

#into_surplus_rows ignore_for_select

would leave the behaviour unchanged.

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

Regards

Pavel

Show quoted text

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Florian Pflug

fgp@phlo.org

over 12 years ago

In reply to: Pavel Stehule (#7)

Re: plpgsql.consistent_into

On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

I must say I hate behaviour-changing GUCs with quite some passion. IMHO
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is

1) Code gets written, depends on some particular set of settings
to work correctly

2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.

3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.

Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.

Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.

But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should

*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.

*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in

That's all just my opinion of course.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Gavin Flower

GavinFlower@archidevsys.co.nz

over 12 years ago

In reply to: Florian Pflug (#8)

Re: plpgsql.consistent_into

On 13/01/14 11:44, Florian Pflug wrote:

On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

I must say I hate behaviour-changing GUCs with quite some passion. IMHO
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is

1) Code gets written, depends on some particular set of settings
to work correctly

2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.

3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.

Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.

Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.

But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should

*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.

*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in

That's all just my opinion of course.

best regards,
Florian Pflug

Possibly there should be a warning put out, whenever someone uses some
behaviour that requires a GUC set to a non-default value?

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Florian Pflug (#8)

Re: plpgsql.consistent_into

2014/1/12 Florian Pflug <fgp@phlo.org>

On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

I must say I hate behaviour-changing GUCs with quite some passion. IMHO
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is

1) Code gets written, depends on some particular set of settings
to work correctly

2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.

3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.

Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.

Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.

But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should

*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.

*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in

I disagree - automatic code injection can is strange. Is not problem inject
code from simple DO statement, but I dislike append lines to source code
without any specific user request.

Maybe this problem with GUC can be solved in 9.4. Some specific GUC can be
ported with database.

Pavel

Show quoted text

That's all just my opinion of course.

best regards,
Florian Pflug

#11

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Gavin Flower (#9)

Re: plpgsql.consistent_into

2014/1/13 Gavin Flower <GavinFlower@archidevsys.co.nz>

On 13/01/14 11:44, Florian Pflug wrote:

On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

I must say I hate behaviour-changing GUCs with quite some passion. IMHO
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is

1) Code gets written, depends on some particular set of settings
to work correctly

2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.

3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.

Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.

Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.

But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should

*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.

*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in

That's all just my opinion of course.

best regards,
Florian Pflug

Possibly there should be a warning put out, whenever someone uses some

behaviour that requires a GUC set to a non-default value?

It is a good idea. I though about it. A worry about GUC are legitimate, but
we are most static and sometimes we try to design bigger creatures, than we
try to solve.

I am thinking so dump can contain a serialized GUC values, and can raises
warnings when GUC are different (not only different from default).

Similar problems are with different FROM_COLAPS_LIMIT, JOIN_COLAPS_LIMIT,
WORK_MEM, ...

Regards

Pavel

Show quoted text

Cheers,
Gavin

#12

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Florian Pflug (#8)

Re: plpgsql.consistent_into

2014/1/12 Florian Pflug <fgp@phlo.org>

On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com> wrote:

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

I must say I hate behaviour-changing GUCs with quite some passion. IMHO
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is

1) Code gets written, depends on some particular set of settings
to work correctly

2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.

3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.

Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.

Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.

But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should

*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.

*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in

That's all just my opinion of course.

I am thinking so GUC and plpgsql option can live together. If you like to
accent a some behave, then you can use a plpgsql option. On second hand, I
would to use a some functionality, that is safe, but I don't would to dirty
source code by using repeated options. But I have to check (and calculate
with risk) a GUC settings.

One idea: required GUC? Can be nice a possibility to ensure some GUC
setting, and restore ensure these values or raises warning.

Back to main topic. Required and described feature doesn't change a behave
of INTO clause. I can enable or disable this functionality and well written
code should to work without change (and problems). When check is disabled,
then execution is just less safe. So in this case, a impact of GUC is
significantly less than by you described issues. Does know anybody a use
case where this check should be disabled?

Probably we have a different experience about GUC. I had a problem with
standard_conforming_strings and bytea format some years ago. Now I prepare
document about required setting. But I can see (from my experience from
Czech area) more often problems related to effective_cache_size or
from_collapse_limit and similar GUC. These parameters are behind knowledge
(and visibility) typical user.

Best regards

Pavel

Show quoted text

best regards,
Florian Pflug

#13

Jim Nasby

Jim.Nasby@BlueTreble.com

over 12 years ago

In reply to: Pavel Stehule (#12)

Re: plpgsql.consistent_into

On 1/13/14, 1:44 AM, Pavel Stehule wrote:

2014/1/12 Florian Pflug <fgp@phlo.org <mailto:fgp@phlo.org>>

On Jan12, 2014, at 22:37 , Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>> wrote:

There is GUC for variable_conflict already too. In this case I would to
enable this functionality everywhere (it is tool how to simply eliminate
some kind of strange bugs) so it needs a GUC.

We have GUC for plpgsql.variable_conflict three years and I don't know
about any problem.

I must say I hate behaviour-changing GUCs with quite some passion. IMHO
they tend to cause bugs, not avoid them, in the long run. The pattern
usually is

1) Code gets written, depends on some particular set of settings
to work correctly

2) Code gets reused, with little further testing since it's supposed
to be battle-proven anyway. Settings get dropped.

3) Code blows up for those corner-cases where the setting actually
matter. Debugging is hell, because you effectively have to go
over the code line-by-line and check if it might be affected by
some GUC or another.

Only a few days ago I spent more than an hour tracking down a bug
which, as it turned out, was caused by a regex which subtly changed its
meaning depending on whether standard_conforming_strings is on or off.

Some GUCs are unavoidable - standard_conforming_strings, for example
probably still was a good idea, since the alternative would have been
to stick with the historical, non-standard behaviour forever.

But in this case, my feeling is that the trouble such a GUC may cause
out-weights the potential benefits. I'm all for having a directive like
#consistent_into (though I feel that the name could convey the
meaning better). If we *really* think that this ought to be the default
from 9.4 onward, then we should

*) Change it to always complain, except if the function explictly
specifies "#consistent_into on" or whatever.

*) Have pg_dump add that to all plpgsql functions if the server
version is < 9.4 or whatever major release this ends up in

That's all just my opinion of course.

I am thinking so GUC and plpgsql option can live together. If you like to accent a some behave, then you can use a plpgsql option. On second hand, I would to use a some functionality, that is safe, but I don't would to dirty source code by using repeated options. But I have to check (and calculate with risk) a GUC settings.

One idea: required GUC? Can be nice a possibility to ensure some GUC setting, and restore ensure these values or raises warning.

Back to main topic. Required and described feature doesn't change a behave of INTO clause. I can enable or disable this functionality and well written code should to work without change (and problems). When check is disabled, then execution is just less safe. So in this case, a impact of GUC is significantly less than by you described issues. Does know anybody a use case where this check should be disabled?

Probably we have a different experience about GUC. I had a problem with standard_conforming_strings and bytea format some years ago. Now I prepare document about required setting. But I can see (from my experience from Czech area) more often problems related to effective_cache_size or from_collapse_limit and similar GUC. These parameters are behind knowledge (and visibility) typical user.

ISTM that in this case, it should be safe to make the new default behavior STRICT; if you forget to set the GUC to disable than you'll get an error that points directly at the problem, at which point you'll go "Oh, yeah... I forgot to set X..."

Outside of the GUC, I believe the default should definitely be STRICT. If your app is relying on non-strict then you need to be made aware of that. We should be able to provide a DO block that will change this setting for every function you've got if someone isn't happy with STRICT mode.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Florian Pflug

fgp@phlo.org

over 12 years ago

In reply to: Jim Nasby (#13)

Re: plpgsql.consistent_into

On Jan13, 2014, at 22:49 , Jim Nasby <jim@nasby.net> wrote:

ISTM that in this case, it should be safe to make the new default behavior STRICT;
if you forget to set the GUC to disable than you'll get an error that points directly
at the problem, at which point you'll go "Oh, yeah... I forgot to set X..."

What do you mean by STRICT? STRICT (which we already support) complains if the
query doesn't return *exactly* one row. What Marko wants is to raise an error
for a plain SELECT ... INTO if more than one row is returned, but to still
convert zero rows to NULL.

Outside of the GUC, I believe the default should definitely be STRICT. If your app is
relying on non-strict then you need to be made aware of that. We should be able to
provide a DO block that will change this setting for every function you've got if
someone isn't happy with STRICT mode.

If you mean that we should change SELECT ... INTO to always behave as if STRICT
had been specified - why on earth would we want to do that? That would break
perfectly fine code for no good reason whatsoever.

In fact, after reading the documentation on SELECT ... INTO, I'm convinced the
the whole consistent_into thing is a bad idea. The documentation states clearly
that

For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more than
one returned row, even when STRICT is not specified. This is because there is no
option such as ORDER BY with which to determine which affected row should be
returned.

It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
for a reason. We shouldn't be second-guessing ourselves by changing that later -
not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't.

(And yeah, personally I'd prefer if we'd complain about multiple rows. But it's
IMHO just too late for that)

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Marko Tiikkaja

marko@joh.to

over 12 years ago

In reply to: Florian Pflug (#14)

Re: plpgsql.consistent_into

On 1/14/14, 12:41 AM, Florian Pflug wrote:

In fact, after reading the documentation on SELECT ... INTO, I'm convinced the
the whole consistent_into thing is a bad idea. The documentation states clearly
that

For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more than
one returned row, even when STRICT is not specified. This is because there is no
option such as ORDER BY with which to determine which affected row should be
returned.

It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
for a reason.

Yeah, it does state that. But it's a BS reason. In addition to ORDER
BY, SELECT also has a LIMIT which you can use to get the "first row"
behaviour. There's no way to go to the more sane behaviour from what we
have right now.

When I've worked with PL/PgSQL, this has been a source of a few bugs
that would have been noticed during testing if the behaviour of INTO
wasn't as dangerous as it is right now. Yes, it breaks backwards
compatibility, but that's why there's a nice GUC. If we're not going to
scrap PL/PgSQL and start over again, we are going to have to do changes
like this to make the language better. Also I think that out of all the
things we could do to break backwards compatibility, this is closer to
"harmless" than "a pain in the butt".

Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Josh Berkus

josh@agliodbs.com

over 12 years ago

In reply to: Marko Tiikkaja (#1)

Re: plpgsql.consistent_into

On 01/13/2014 03:41 PM, Florian Pflug wrote:

It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
for a reason. We shouldn't be second-guessing ourselves by changing that later -
not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't.

(And yeah, personally I'd prefer if we'd complain about multiple rows. But it's
IMHO just too late for that)

I *really* don't want to go through all my old code to find places where
I used SELECT ... INTO just to pop off the first row, and ignored the
rest. I doubt anyone else does, either.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMb4ff4b3533b9f17544d25d55f2fb34009d7a53054a557fa9fe03e6af7aba3a9d7563b4f195808498acd4f84796a85403@asav-2.01.com

#17

Florian Pflug

fgp@phlo.org

over 12 years ago

In reply to: Marko Tiikkaja (#15)

Re: plpgsql.consistent_into

On Jan14, 2014, at 00:52 , Marko Tiikkaja <marko@joh.to> wrote:

When I've worked with PL/PgSQL, this has been a source of a few bugs that
would have been noticed during testing if the behaviour of INTO wasn't as
dangerous as it is right now.

The question is, how many bugs stemmed from wrong SQL queries, and what
percentage of those would have been caught by this? The way I see it, there
are thousands of ways to screw up a query, and having it return multiple
rows instead of one is just one of them.

Yes, it breaks backwards compatibility, but that's why there's a nice GUC.

Which doesn't help, because the GUC isn't tied to the code. This *adds*
an error case, not remove one - now, instead of getting your code correct,
you *also* have to get the GUC correct. If you even *know* that such a GUC
exists.

If we're not going to scrap PL/PgSQL and
start over again, we are going to have to do changes like this to make the
language better. Also I think that out of all the things we could do to
break backwards compatibility, this is closer to "harmless" than "a pain
in the butt".

I very strongly believe that languages don't get better by adding a thousand
little knobs which subtly change semantics. Look at the mess that is PHP -
we absolutely, certainly don't want to go there. The most important rule in
language design is in my opinion "stick with your choices". C, C++ and JAVA
all seem to follow this, and it's one of the reasons these languages are
popular for big projects, I think.

The way I see it, the only OK way to change existing behaviour is to have
the concept of a "language version", and force code to indicate the language
version it expects. The important thing is that the language version is an
attribute of code, not some global setting that you can change without ever
looking at the code it'd affect.

So if we really want to change this, I think we need to have a
LANGUAGE_VERSION attribute on functions. Each time a major postgres release
changes the behaviour of one of the procedural languages, we'd increment
that language's version, and enable the old behaviour for all functions
tagged with an older one.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Jim Nasby

Jim.Nasby@BlueTreble.com

over 12 years ago

In reply to: Josh Berkus (#16)

Re: plpgsql.consistent_into

On 1/13/14, 5:57 PM, Josh Berkus wrote:

On 01/13/2014 03:41 PM, Florian Pflug wrote:

It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
for a reason. We shouldn't be second-guessing ourselves by changing that later -
not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't.

(And yeah, personally I'd prefer if we'd complain about multiple rows. But it's
IMHO just too late for that)

I *really* don't want to go through all my old code to find places where
I used SELECT ... INTO just to pop off the first row, and ignored the
rest. I doubt anyone else does, either.

Do you regularly have use cases where you actually want just one RANDOM row? I suspect the far more likely scenario is that people write code assuming they'll get only one row and they'll end up with extremely hard to trace bugs if that assumption is ever wrong.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Jim Nasby

Jim.Nasby@BlueTreble.com

over 12 years ago

In reply to: Florian Pflug (#17)

Re: plpgsql.consistent_into

On 1/13/14, 6:16 PM, Florian Pflug wrote:

On Jan14, 2014, at 00:52 , Marko Tiikkaja <marko@joh.to> wrote:

When I've worked with PL/PgSQL, this has been a source of a few bugs that
would have been noticed during testing if the behaviour of INTO wasn't as
dangerous as it is right now.

The question is, how many bugs stemmed from wrong SQL queries, and what
percentage of those would have been caught by this? The way I see it, there
are thousands of ways to screw up a query, and having it return multiple
rows instead of one is just one of them.

A query that's simply wrong is more likely to fail consistently. Non-strict use of INTO is going to fail in very subtle ways (unless you actually DO want just the first row, in which case you should explicitly use LIMIT 1).

If we're not going to scrap PL/PgSQL and
start over again, we are going to have to do changes like this to make the
language better. Also I think that out of all the things we could do to
break backwards compatibility, this is closer to "harmless" than "a pain
in the butt".

I very strongly believe that languages don't get better by adding a thousand
little knobs which subtly change semantics. Look at the mess that is PHP -
we absolutely, certainly don't want to go there. The most important rule in
language design is in my opinion "stick with your choices". C, C++ and JAVA
all seem to follow this, and it's one of the reasons these languages are
popular for big projects, I think.

The way I see it, the only OK way to change existing behaviour is to have
the concept of a "language version", and force code to indicate the language
version it expects. The important thing is that the language version is an
attribute of code, not some global setting that you can change without ever
looking at the code it'd affect.

So if we really want to change this, I think we need to have a
LANGUAGE_VERSION attribute on functions. Each time a major postgres release
changes the behaviour of one of the procedural languages, we'd increment
that language's version, and enable the old behaviour for all functions
tagged with an older one.

I like that idea. It allows us to fix past decisions that were ill considered without hosing all existing code.

BTW, have we always had support for STRICT, or was it added at some point? It's in 8.4, but I don't know how far back it goes.

And if we've always had it, why on earth didn't we make STRICT the default behavior?
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Florian Pflug

fgp@phlo.org

over 12 years ago

In reply to: Jim Nasby (#18)

Re: plpgsql.consistent_into

(Responding to both of your mails here)

On Jan14, 2014, at 01:20 , Jim Nasby <jim@nasby.net> wrote:

On 1/13/14, 5:57 PM, Josh Berkus wrote:

On 01/13/2014 03:41 PM, Florian Pflug wrote:

It therefor isn't an oversight that SELECT ... INTO allows multiple result rows
but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and
for a reason. We shouldn't be second-guessing ourselves by changing that later -
not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't.

(And yeah, personally I'd prefer if we'd complain about multiple rows. But it's
IMHO just too late for that)

I *really* don't want to go through all my old code to find places where
I used SELECT ... INTO just to pop off the first row, and ignored the
rest. I doubt anyone else does, either.

Do you regularly have use cases where you actually want just one RANDOM row?
I suspect the far more likely scenario is that people write code assuming they'll
get only one row and they'll end up with extremely hard to trace bugs if that
assumption is ever wrong.

One case that immediatly comes to mind is a JOIN which sometimes returns
multiple rows, and a projection clause that only uses one of the tables
involved in the join.

Another are queries including an ORDER BY - I don't think the patch makes an
exception for those, and even if it did, it probably wouldn't catch all
cases, like e.g. an SRF which returns the rows in a deterministic order.

Or maybe you're picking a row to process next, and don't really care about
the order in which you work through them.

The question is, how many bugs stemmed from wrong SQL queries, and what
percentage of those would have been caught by this? The way I see it, there
are thousands of ways to screw up a query, and having it return multiple
rows instead of one is just one of them.

A query that's simply wrong is more likely to fail consistently. Non-strict
use of INTO is going to fail in very subtle ways (unless you actually DO want
just the first row, in which case you should explicitly use LIMIT 1).

How so? Say you expect "SELECT * FROM view WHERE c=<n>" to only ever return
one row. Then "SELECT sum(f) FROM table JOIN view ON table.c = view.c" is
just as subtly wrong as the first query is.

And if we've always had it, why on earth didn't we make STRICT the default
behavior?

Dunno, but AFAIK pl/pgsql mimics Oracle's PL/SQL, at least in some aspects,
so maybe this is one of the areas where we just do what oracle does.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21