Doing better at HINTing an appropriate column within errorMissingColumn()

Started by Peter Geogheganabout 12 years ago147 messageshackers
Jump to latest

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.

Consider the following sample sessions of mine, made with the
dellstore2 sample database:

[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate
---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)

[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989

We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.

I have some outstanding concerns about the proposed patch:

* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.

* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.

Thoughts?
--
Peter Geoghegan

Attachments:

levenshtein_column_hint.v1.2014_03_27.patch.gzapplication/x-gzip; name=levenshtein_column_hint.v1.2014_03_27.patch.gzDownload
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Geoghegan (#1)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

Hello

I see only one risk - it can do some slowdown of exception processing.

Sometimes you can have a code like

BEGIN
WHILE ..
LOOP
BEGIN
INSERT INTO ...
EXCEPTION WHEN ..
; /* ignore this error */
END;
END LOOP;

without this risks, proposed feature is nice, but should be fast

Regards

Pavel

2014-03-27 20:10 GMT+01:00 Peter Geoghegan <pg@heroku.com>:

Show quoted text

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.

Consider the following sample sessions of mine, made with the
dellstore2 sample database:

[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate

---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)

[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989

We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.

I have some outstanding concerns about the proposed patch:

* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.

* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.

Thoughts?
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Pavel Stehule (#2)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Fri, Mar 28, 2014 at 1:00 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I see only one risk - it can do some slowdown of exception processing.

I think it's unlikely that you'd see ERRCODE_UNDEFINED_COLUMN in
procedural code like that in practice. In any case it's worth noting
that I continually pass back a "max" to the Levenshtein distance
implementation, which is the current shortest distance observed. The
implementation is therefore not obliged to exhaustively find a
distance that is already known to be of no use. See commit 604ab0.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Peter Geoghegan (#3)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

2014-03-28 9:22 GMT+01:00 Peter Geoghegan <pg@heroku.com>:

On Fri, Mar 28, 2014 at 1:00 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

I see only one risk - it can do some slowdown of exception processing.

I think it's unlikely that you'd see ERRCODE_UNDEFINED_COLUMN in
procedural code like that in practice. In any case it's worth noting
that I continually pass back a "max" to the Levenshtein distance
implementation, which is the current shortest distance observed. The
implementation is therefore not obliged to exhaustively find a
distance that is already known to be of no use. See commit 604ab0.

if it is related to ERRCODE_UNDEFINED_COLUMN then it should be ok (from
performance perspective)

but second issue can be usage from plpgsql - where is mix SQL identifiers
and plpgsql variables.

Pavel

Show quoted text

--
Peter Geoghegan

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Peter Geoghegan (#1)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

Very interesting idea, I'd think about optionally add similarity hinting
support to psql tab. With, say, 80% of similarity matching, it
shouldn't be very annoying. For interactive usage there is no risk of
slowdown.
On Mar 27, 2014 11:11 PM, "Peter Geoghegan" <pg@heroku.com> wrote:

Show quoted text

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.

Consider the following sample sessions of mine, made with the
dellstore2 sample database:

[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate

---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)

[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989

We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.

I have some outstanding concerns about the proposed patch:

* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.

* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.

Thoughts?
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Peter Geoghegan (#1)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

Peter Geoghegan wrote:

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name.

[local]/postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".

This sounds like a mild version of DWIM:
http://www.jargondb.org/glossary/dwim

Maybe it is just me, but I get uncomfortable when a program tries
to second-guess what I really want.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Christoph Berg
myon@debian.org
In reply to: Laurenz Albe (#6)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

Re: Albe Laurenz 2014-03-28 <A737B7A37273E048B164557ADEF4A58B17CE8DEA@ntex2010i.host.magwien.gv.at>

ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".

This sounds like a mild version of DWIM:
http://www.jargondb.org/glossary/dwim

Maybe it is just me, but I get uncomfortable when a program tries
to second-guess what I really want.

I find it very annoying when zsh asks me "did you mean foo [y/n]" and
I need to confirm that, but I'd find a mere HINT that I can easily
ignore a very useful feature. +1 for the idea.

Christoph
--
cb@df7cb.de | http://www.df7cb.de/

#8Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#1)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Fri, Mar 28, 2014 at 4:10 AM, Peter Geoghegan <pg@heroku.com> wrote:

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.

Consider the following sample sessions of mine, made with the
dellstore2 sample database:

[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate
---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)

[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989

We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.

What about the overhead that this processing creates if error
processing needs to scan a schema with let's say hundreds of tables?

* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.

As long as Hiragana-only words (basic alphabet for Japanese words),
and more particularly Katakana only-words (to write phonetically
foreign words) are compared (even Kanji-only things compared),
Levenstein could play its role pretty well. But once a comparison is
made with two words using different alphabet, well Levenstein is not
going to work well. A simple example is 'ramen' (Japanese noodles),
that you can find written sometimes in Hiragana, or even in Katakana,
and here Levenstein performs poorly:
=# select levenshtein('ラーメン', 'らあめん');
levenshtein
-------------
4
(1 row)
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Michael Paquier (#8)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Fri, Mar 28, 2014 at 5:57 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

What about the overhead that this processing creates if error
processing needs to scan a schema with let's say hundreds of tables?

It doesn't work that way. I've extended searchRangeTableForCol() so
that when it calls scanRTEForColumn(), it considers Levenshtein
distance, and not just plain string equality, which is what happens
today. The code only looks through ParseState.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Laurenz Albe (#6)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Fri, Mar 28, 2014 at 4:47 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

Peter Geoghegan wrote:

With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.

I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name.

[local]/postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".

This sounds like a mild version of DWIM:
http://www.jargondb.org/glossary/dwim

Maybe it is just me, but I get uncomfortable when a program tries
to second-guess what I really want.

It's not really DWIM, because the backend is still throwing an error.
It's just trying to help you sort out the error, along the way.
Still, I share some of your discomfort. I see Peter's patch as an
example of a broader class of things that we could do - but I'm not
altogether sure that we want to do them. There's a risk of adding not
only CPU cycles but also clutter. If we do things that encourage
people to crank the log verbosity down, I think that's going to be bad
more often than it's good. It strains credulity to think that this
patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#10)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Tue, Apr 1, 2014 at 7:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

There's a risk of adding not
only CPU cycles but also clutter. If we do things that encourage
people to crank the log verbosity down, I think that's going to be bad
more often than it's good.

While I share your concern here, I think that this is something that
is only likely to be seen in an interactive psql session, where it is
seen quite frequently. I am reasonably confident that it's highly
unusual to see ERRCODE_UNDEFINED_COLUMN in other settings. Not having
to do a mental context switch when writing an ad-hoc query has
considerable value. Even C compilers like Clang have this kind of
feedback. This is a patch that was written out of personal
frustration with the experience of interacting with many different
databases. Things like the Python REPL don't do so much of this kind
of thing, but presumably that's because of Python's dynamic typing.
This is a HINT that can be given with fairly high confidence that
it'll be helpful - there just won't be that many things that the user
could have meant to choose from. I think it's even useful when the
suggested column is distant from the original suggestion (i.e.
errorMissingColumn() offers only what is clearly a "wild guess"),
because then the user knows that he or she has got it quite wrong.
Frequently, this will be because the wrong synonym for what should
have been written was used.

It strains credulity to think that this
patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.

Fair enough.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Geoghegan (#11)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On 4/1/14, 1:04 PM, Peter Geoghegan wrote:

It strains credulity to think that this

patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.

Fair enough.

I agree with the concern, but also have to say that I can't count how many times I could have used this. A big +1, at least in this case.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#12)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

Normally I'm not for adding gucs that just gate new features. But I think a
simple guc to turn this on or off would be fine and alleviate any concerns.
I think users would appreciate it quite a lot

It would even have a positive effect of helping raise awareness of the
feature. I often scan the list of config options to get an idea of new
features when I'm installing new software or upgrading.

--
greg
On 1 Apr 2014 17:38, "Jim Nasby" <jim@nasby.net> wrote:

Show quoted text

On 4/1/14, 1:04 PM, Peter Geoghegan wrote:

It strains credulity to think that this

patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.

Fair enough.

I agree with the concern, but also have to say that I can't count how many
times I could have used this. A big +1, at least in this case.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#13)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On 2014-04-02 21:08:47 +0100, Greg Stark wrote:

Normally I'm not for adding gucs that just gate new features. But I think a
simple guc to turn this on or off would be fine and alleviate any concerns.
I think users would appreciate it quite a lot

I don't have strong feelings about the feature, but introducing a guc
for it feels entirely ridiculous to me. This is a minor detail in an
error message, not more.

It would even have a positive effect of helping raise awareness of the
feature. I often scan the list of config options to get an idea of new
features when I'm installing new software or upgrading.

Really? Should we now add GUCs for every feature then?

Greetings,

Andres Freund

PS: Could you please start to properly quote again? You seem to have
stopped doing that entirely in the last few months.

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andres Freund (#14)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I don't have strong feelings about the feature, but introducing a guc
for it feels entirely ridiculous to me. This is a minor detail in an
error message, not more.

I agree. It's just a HINT. It's quite helpful in certain particular
contexts, but in the grand scheme of things isn't all that important.
I am being quite conservative in trying to anticipate cases where on
balance it'll actually hurt more than it will help. I doubt that there
actually are any.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#14)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund <andres@2ndquadrant.com>wrote:

PS: Could you please start to properly quote again? You seem to have
stopped doing that entirely in the last few months.

I've been responding a lot from the phone. Unfortunately the Gmail client
on the phone makes it nearly impossible to format messages well. I'm
beginning to think it would be better to just not quote at all any more.
I'm normally not doing a point-by-point response anyways.

--
greg

#17Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#16)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On 2014-04-03 00:48:12 -0400, Greg Stark wrote:

On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund <andres@2ndquadrant.com>wrote:

PS: Could you please start to properly quote again? You seem to have
stopped doing that entirely in the last few months.

I've been responding a lot from the phone. Unfortunately the Gmail client
on the phone makes it nearly impossible to format messages well. I'm
beginning to think it would be better to just not quote at all any more.
I'm normally not doing a point-by-point response anyways.

I really don't care where you're answering from TBH. It's unreadable,
misses context and that's it. If $device doesn't work for you, don't use
it.
I don't mind an occasional quick answer that's badly formatted, but for
other things it's really annoying.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Josh Berkus
josh@agliodbs.com
In reply to: Peter Geoghegan (#1)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On 04/02/2014 01:16 PM, Andres Freund wrote:

On 2014-04-02 21:08:47 +0100, Greg Stark wrote:

Normally I'm not for adding gucs that just gate new features. But I think a
simple guc to turn this on or off would be fine and alleviate any concerns.
I think users would appreciate it quite a lot

I don't have strong feelings about the feature, but introducing a guc
for it feels entirely ridiculous to me. This is a minor detail in an
error message, not more.

It would even have a positive effect of helping raise awareness of the
feature. I often scan the list of config options to get an idea of new
features when I'm installing new software or upgrading.

Really? Should we now add GUCs for every feature then?

-1 for having a GUC for this.

+1 on the feature.

Review with functional test coming up.

Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint? Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
If so, how do we get that list?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Josh Berkus (#18)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On Mon, Jun 16, 2014 at 4:04 PM, Josh Berkus <josh@agliodbs.com> wrote:

Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint? Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
If so, how do we get that list?

I think that how useful Levenshtein distance is for users based in
east Asia generally, and how useful this patch is to those users are
two distinct questions. I have no idea how common it is for Japanese
users to just use Roman characters as table and attribute names. Since
they're very probably already writing application code that uses Roman
characters (except in the comments, user strings and so on), it might
make sense to do the same in the database. I would welcome further
input on that question. I don't know what the trends are in the real
world.

Also note that the patch scans the range table parse state to pick the
most probable candidate among all Vars/columns that already appear
there. The query would raise an error at an earlier point if a
non-existent relation was referenced, for example. We're only choosing
from a minimal list of possibilities, and pick one that is very
probably what was intended. Even if Levenshtein distance works badly
with Kanji (which is not obviously the case, at least to me), it might
not matter here.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Ian Lawrence Barwick
barwick@gmail.com
In reply to: Peter Geoghegan (#19)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()

On 14/06/17 8:31, Peter Geoghegan wrote:

On Mon, Jun 16, 2014 at 4:04 PM, Josh Berkus <josh@agliodbs.com> wrote:

Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint? Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
If so, how do we get that list?

I think that how useful Levenshtein distance is for users based in
east Asia generally, and how useful this patch is to those users are
two distinct questions. I have no idea how common it is for Japanese
users to just use Roman characters as table and attribute names. Since
they're very probably already writing application code that uses Roman
characters (except in the comments, user strings and so on), it might
make sense to do the same in the database. I would welcome further
input on that question. I don't know what the trends are in the real
world.

From what I've seen in the wild in Japan, Roman/ASCII characters are
widely used for object/attribute names, as generally it's much less
hassle than switching between input methods, dealing with different
encodings etc. The only place where I've seen Japanese characters widely
used is in tutorials, examples etc. However that's only my personal
observation for one particular non-Roman language.

Regards

Ian Barwick

--
Ian Barwick http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Michael Paquier
michael@paquier.xyz
In reply to: Ian Lawrence Barwick (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#21)
#23Ian Lawrence Barwick
barwick@gmail.com
In reply to: Tom Lane (#22)
In reply to: Ian Lawrence Barwick (#23)
#25Ian Lawrence Barwick
barwick@gmail.com
In reply to: Peter Geoghegan (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#24)
In reply to: Ian Lawrence Barwick (#25)
In reply to: Tom Lane (#26)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#29)
#31Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#30)
#32Josh Berkus
josh@agliodbs.com
In reply to: Peter Geoghegan (#1)
In reply to: Tom Lane (#30)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#32)
#35Josh Berkus
josh@agliodbs.com
In reply to: Peter Geoghegan (#1)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#33)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#35)
#38Josh Berkus
josh@agliodbs.com
In reply to: Peter Geoghegan (#1)
In reply to: Tom Lane (#37)
#40Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Peter Geoghegan (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#34)
In reply to: Robert Haas (#41)
#43Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Peter Geoghegan (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Abhijit Menon-Sen (#43)
In reply to: Tom Lane (#44)
#46Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Peter Geoghegan (#45)
#47Michael Paquier
michael@paquier.xyz
In reply to: Abhijit Menon-Sen (#46)
In reply to: Michael Paquier (#47)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#48)
In reply to: Alvaro Herrera (#49)
In reply to: Peter Geoghegan (#50)
#52Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#51)
#53Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#48)
In reply to: Michael Paquier (#52)
In reply to: Michael Paquier (#53)
#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#55)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#52)
#58Bruce Momjian
bruce@momjian.us
In reply to: Peter Geoghegan (#54)
In reply to: Bruce Momjian (#58)
In reply to: Robert Haas (#57)
#61Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#59)
In reply to: Alvaro Herrera (#61)
In reply to: Michael Paquier (#47)
#64Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#55)
In reply to: Michael Paquier (#64)
#66Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#65)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#64)
#68Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#67)
In reply to: Robert Haas (#67)
#70Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#69)
In reply to: Alvaro Herrera (#70)
#72Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#68)
#73Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#71)
#74Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Michael Paquier (#66)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#74)
In reply to: Robert Haas (#75)
#77Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#76)
In reply to: Michael Paquier (#77)
#79Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#76)
In reply to: Michael Paquier (#79)
In reply to: Peter Geoghegan (#80)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#76)
In reply to: Robert Haas (#82)
In reply to: Peter Geoghegan (#83)
In reply to: Peter Geoghegan (#84)
#86Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#81)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#86)
In reply to: Robert Haas (#87)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#85)
In reply to: Robert Haas (#89)
In reply to: Robert Haas (#82)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#91)
In reply to: Robert Haas (#92)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#93)
In reply to: Robert Haas (#94)
#96Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#95)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#96)
In reply to: Robert Haas (#97)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#98)
In reply to: Robert Haas (#99)
In reply to: Peter Geoghegan (#100)
In reply to: Peter Geoghegan (#101)
In reply to: Peter Geoghegan (#101)
#104Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#100)
In reply to: Robert Haas (#104)
In reply to: Robert Haas (#104)
#107Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#104)
In reply to: Alvaro Herrera (#107)
#109Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#108)
In reply to: Alvaro Herrera (#109)
#111Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Geoghegan (#110)
#112Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#106)
#113Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#112)
In reply to: Tom Lane (#113)
In reply to: Robert Haas (#112)
#116Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#115)
In reply to: Robert Haas (#116)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#117)
#119Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#117)
In reply to: Robert Haas (#118)
#121Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#117)
In reply to: Tom Lane (#121)
#123Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#120)
In reply to: Robert Haas (#123)
#125David G. Johnston
david.g.johnston@gmail.com
In reply to: Andres Freund (#119)
In reply to: Peter Geoghegan (#124)
In reply to: Peter Geoghegan (#126)
#128Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#126)
In reply to: Robert Haas (#128)
#130Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#129)
In reply to: Robert Haas (#130)
#132Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#131)
In reply to: Robert Haas (#132)
In reply to: Peter Geoghegan (#133)
#135Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#134)
#136Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#135)
#137Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#136)
#138Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#137)
In reply to: Stephen Frost (#138)
In reply to: Peter Geoghegan (#139)
#141Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#140)
In reply to: Robert Haas (#141)
In reply to: Peter Geoghegan (#142)
#144Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#143)
#145Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#142)
In reply to: Robert Haas (#145)
#147Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#146)