Doing better at HINTing an appropriate column within errorMissingColumn()
With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.
I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.
Consider the following sample sessions of mine, made with the
dellstore2 sample database:
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate
---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)
[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.
I have some outstanding concerns about the proposed patch:
* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.
* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.
Thoughts?
--
Peter Geoghegan
Attachments:
Hello
I see only one risk - it can do some slowdown of exception processing.
Sometimes you can have a code like
BEGIN
WHILE ..
LOOP
BEGIN
INSERT INTO ...
EXCEPTION WHEN ..
; /* ignore this error */
END;
END LOOP;
without this risks, proposed feature is nice, but should be fast
Regards
Pavel
2014-03-27 20:10 GMT+01:00 Peter Geoghegan <pg@heroku.com>:
Show quoted text
With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.Consider the following sample sessions of mine, made with the
dellstore2 sample database:[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.I have some outstanding concerns about the proposed patch:
* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.Thoughts?
--
Peter Geoghegan--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 28, 2014 at 1:00 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I see only one risk - it can do some slowdown of exception processing.
I think it's unlikely that you'd see ERRCODE_UNDEFINED_COLUMN in
procedural code like that in practice. In any case it's worth noting
that I continually pass back a "max" to the Levenshtein distance
implementation, which is the current shortest distance observed. The
implementation is therefore not obliged to exhaustively find a
distance that is already known to be of no use. See commit 604ab0.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-03-28 9:22 GMT+01:00 Peter Geoghegan <pg@heroku.com>:
On Fri, Mar 28, 2014 at 1:00 AM, Pavel Stehule <pavel.stehule@gmail.com>
wrote:I see only one risk - it can do some slowdown of exception processing.
I think it's unlikely that you'd see ERRCODE_UNDEFINED_COLUMN in
procedural code like that in practice. In any case it's worth noting
that I continually pass back a "max" to the Levenshtein distance
implementation, which is the current shortest distance observed. The
implementation is therefore not obliged to exhaustively find a
distance that is already known to be of no use. See commit 604ab0.
if it is related to ERRCODE_UNDEFINED_COLUMN then it should be ok (from
performance perspective)
but second issue can be usage from plpgsql - where is mix SQL identifiers
and plpgsql variables.
Pavel
Show quoted text
--
Peter Geoghegan
Very interesting idea, I'd think about optionally add similarity hinting
support to psql tab. With, say, 80% of similarity matching, it
shouldn't be very annoying. For interactive usage there is no risk of
slowdown.
On Mar 27, 2014 11:11 PM, "Peter Geoghegan" <pg@heroku.com> wrote:
Show quoted text
With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.Consider the following sample sessions of mine, made with the
dellstore2 sample database:[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.I have some outstanding concerns about the proposed patch:
* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.* Perhaps I should have moved the Levenshtein distance functions into
core and be done with it. I thought that given the present restriction
that the implementation imposes on source and target string lengths,
it would be best to leave the user-facing SQL functions in contrib.
That restriction is not relevant to the internal use of Levenshtein
distance added here, though.Thoughts?
--
Peter Geoghegan--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name.
[local]/postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
This sounds like a mild version of DWIM:
http://www.jargondb.org/glossary/dwim
Maybe it is just me, but I get uncomfortable when a program tries
to second-guess what I really want.
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Re: Albe Laurenz 2014-03-28 <A737B7A37273E048B164557ADEF4A58B17CE8DEA@ntex2010i.host.magwien.gv.at>
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".This sounds like a mild version of DWIM:
http://www.jargondb.org/glossary/dwimMaybe it is just me, but I get uncomfortable when a program tries
to second-guess what I really want.
I find it very annoying when zsh asks me "did you mean foo [y/n]" and
I need to confirm that, but I'd find a mere HINT that I can easily
ignore a very useful feature. +1 for the idea.
Christoph
--
cb@df7cb.de | http://www.df7cb.de/
On Fri, Mar 28, 2014 at 4:10 AM, Peter Geoghegan <pg@heroku.com> wrote:
With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name. psql tab
completion is good, but not so good that this doesn't happen all the
time. It's good practice to consistently name columns and tables such
that it's possible to intuit the names of columns from the names of
tables and so on, but it's still pretty common to forget if a column
name from the table "orders" is "order_id", "orderid", or "ordersid",
particularly if you're someone who regularly interacts with many
databases. This problem is annoying in a low intensity kind of way.Consider the following sample sessions of mine, made with the
dellstore2 sample database:[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989
[local]/postgres=# select * from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
orderid | orderdate | customerid | netamount | tax | totalamount |
orderlineid | orderid | prod_id | quantity | orderdate
---------+------------+------------+-----------+-------+-------------+-------------+---------+---------+----------+------------
1 | 2004-01-27 | 7888 | 313.24 | 25.84 | 339.08 |
1 | 1 | 9117 | 1 | 2004-01-27
(1 row)[local]/postgres=# select ordersid from orders o join orderlines ol on
o.orderid = ol.orderid limit 1;
ERROR: 42703: column "ordersid" does not exist
LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
HINT: Perhaps you meant to reference the column "o"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2999
[local]/postgres=# select ol.ordersid from orders o join orderlines ol
on o.orderid = ol.orderid limit 1;
ERROR: 42703: column ol.ordersid does not exist
LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".
LOCATION: errorMissingColumn, parse_relation.c:2989We try to give the most useful possible HINT here, charging extra for
a non-matching alias, and going through the range table in order and
preferring the first column observed to any subsequent column whose
name is of the same distance as an earlier Var. The fuzzy string
matching works well enough that it seems possible in practice to
successfully have the parser make the right suggestion, even when the
user's original guess was fairly far off. I've found it works best to
charge half as much for a character deletion, so that's what is
charged.
What about the overhead that this processing creates if error
processing needs to scan a schema with let's say hundreds of tables?
* It may be the case that dense logosyllabic or morphographic writing
systems, for example Kanji might consistently present, say, Japanese
users with a suggestion that just isn't very useful, to the point of
being annoying. Perhaps some Japanese hackers can comment on the
actual risks here.
As long as Hiragana-only words (basic alphabet for Japanese words),
and more particularly Katakana only-words (to write phonetically
foreign words) are compared (even Kanji-only things compared),
Levenstein could play its role pretty well. But once a comparison is
made with two words using different alphabet, well Levenstein is not
going to work well. A simple example is 'ramen' (Japanese noodles),
that you can find written sometimes in Hiragana, or even in Katakana,
and here Levenstein performs poorly:
=# select levenshtein('ラーメン', 'らあめん');
levenshtein
-------------
4
(1 row)
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 28, 2014 at 5:57 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
What about the overhead that this processing creates if error
processing needs to scan a schema with let's say hundreds of tables?
It doesn't work that way. I've extended searchRangeTableForCol() so
that when it calls scanRTEForColumn(), it considers Levenshtein
distance, and not just plain string equality, which is what happens
today. The code only looks through ParseState.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 28, 2014 at 4:47 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Peter Geoghegan wrote:
With the addition of LATERAL subqueries, Tom fixed up the mechanism
for keeping track of which relations are visible for column references
while the FROM clause is being scanned. That allowed
errorMissingColumn() to give a more useful error to the one produced
by the prior coding of that mechanism, with an errhint sometimes
proffering: 'There is a column named "foo" in table "bar", but it
cannot be referenced from this part of the query'.I wondered how much further this could be taken. Attached patch
modifies contrib/fuzzystrmatch, moving its Levenshtein distance code
into core without actually moving the relevant SQL functions too. That
change allowed me to modify errorMissingColumn() to make more useful
suggestions as to what might have been intended under other
circumstances, like when someone fat-fingers a column name.[local]/postgres=# select * from orders o join orderlines ol on o.orderid = ol.orderids limit 1;
ERROR: 42703: column ol.orderids does not exist
LINE 1: ...* from orders o join orderlines ol on o.orderid = ol.orderid...
^
HINT: Perhaps you meant to reference the column "ol"."orderid".This sounds like a mild version of DWIM:
http://www.jargondb.org/glossary/dwimMaybe it is just me, but I get uncomfortable when a program tries
to second-guess what I really want.
It's not really DWIM, because the backend is still throwing an error.
It's just trying to help you sort out the error, along the way.
Still, I share some of your discomfort. I see Peter's patch as an
example of a broader class of things that we could do - but I'm not
altogether sure that we want to do them. There's a risk of adding not
only CPU cycles but also clutter. If we do things that encourage
people to crank the log verbosity down, I think that's going to be bad
more often than it's good. It strains credulity to think that this
patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 1, 2014 at 7:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
There's a risk of adding not
only CPU cycles but also clutter. If we do things that encourage
people to crank the log verbosity down, I think that's going to be bad
more often than it's good.
While I share your concern here, I think that this is something that
is only likely to be seen in an interactive psql session, where it is
seen quite frequently. I am reasonably confident that it's highly
unusual to see ERRCODE_UNDEFINED_COLUMN in other settings. Not having
to do a mental context switch when writing an ad-hoc query has
considerable value. Even C compilers like Clang have this kind of
feedback. This is a patch that was written out of personal
frustration with the experience of interacting with many different
databases. Things like the Python REPL don't do so much of this kind
of thing, but presumably that's because of Python's dynamic typing.
This is a HINT that can be given with fairly high confidence that
it'll be helpful - there just won't be that many things that the user
could have meant to choose from. I think it's even useful when the
suggested column is distant from the original suggestion (i.e.
errorMissingColumn() offers only what is clearly a "wild guess"),
because then the user knows that he or she has got it quite wrong.
Frequently, this will be because the wrong synonym for what should
have been written was used.
It strains credulity to think that this
patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.
Fair enough.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/1/14, 1:04 PM, Peter Geoghegan wrote:
It strains credulity to think that this
patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.Fair enough.
I agree with the concern, but also have to say that I can't count how many times I could have used this. A big +1, at least in this case.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Normally I'm not for adding gucs that just gate new features. But I think a
simple guc to turn this on or off would be fine and alleviate any concerns.
I think users would appreciate it quite a lot
It would even have a positive effect of helping raise awareness of the
feature. I often scan the list of config options to get an idea of new
features when I'm installing new software or upgrading.
--
greg
On 1 Apr 2014 17:38, "Jim Nasby" <jim@nasby.net> wrote:
Show quoted text
On 4/1/14, 1:04 PM, Peter Geoghegan wrote:
It strains credulity to think that this
patch alone would have that effect, but there might be quite a few
similar improvements that are possible. So I think it would be good
to consider how far we want to go in this direction and where we think
we might want to stop. That's not to say, let's not ever do this,
just, let's think carefully about where we want to end up.Fair enough.
I agree with the concern, but also have to say that I can't count how many
times I could have used this. A big +1, at least in this case.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-04-02 21:08:47 +0100, Greg Stark wrote:
Normally I'm not for adding gucs that just gate new features. But I think a
simple guc to turn this on or off would be fine and alleviate any concerns.
I think users would appreciate it quite a lot
I don't have strong feelings about the feature, but introducing a guc
for it feels entirely ridiculous to me. This is a minor detail in an
error message, not more.
It would even have a positive effect of helping raise awareness of the
feature. I often scan the list of config options to get an idea of new
features when I'm installing new software or upgrading.
Really? Should we now add GUCs for every feature then?
Greetings,
Andres Freund
PS: Could you please start to properly quote again? You seem to have
stopped doing that entirely in the last few months.
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I don't have strong feelings about the feature, but introducing a guc
for it feels entirely ridiculous to me. This is a minor detail in an
error message, not more.
I agree. It's just a HINT. It's quite helpful in certain particular
contexts, but in the grand scheme of things isn't all that important.
I am being quite conservative in trying to anticipate cases where on
balance it'll actually hurt more than it will help. I doubt that there
actually are any.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund <andres@2ndquadrant.com>wrote:
PS: Could you please start to properly quote again? You seem to have
stopped doing that entirely in the last few months.
I've been responding a lot from the phone. Unfortunately the Gmail client
on the phone makes it nearly impossible to format messages well. I'm
beginning to think it would be better to just not quote at all any more.
I'm normally not doing a point-by-point response anyways.
--
greg
On 2014-04-03 00:48:12 -0400, Greg Stark wrote:
On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund <andres@2ndquadrant.com>wrote:
PS: Could you please start to properly quote again? You seem to have
stopped doing that entirely in the last few months.I've been responding a lot from the phone. Unfortunately the Gmail client
on the phone makes it nearly impossible to format messages well. I'm
beginning to think it would be better to just not quote at all any more.
I'm normally not doing a point-by-point response anyways.
I really don't care where you're answering from TBH. It's unreadable,
misses context and that's it. If $device doesn't work for you, don't use
it.
I don't mind an occasional quick answer that's badly formatted, but for
other things it's really annoying.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/02/2014 01:16 PM, Andres Freund wrote:
On 2014-04-02 21:08:47 +0100, Greg Stark wrote:
Normally I'm not for adding gucs that just gate new features. But I think a
simple guc to turn this on or off would be fine and alleviate any concerns.
I think users would appreciate it quite a lotI don't have strong feelings about the feature, but introducing a guc
for it feels entirely ridiculous to me. This is a minor detail in an
error message, not more.It would even have a positive effect of helping raise awareness of the
feature. I often scan the list of config options to get an idea of new
features when I'm installing new software or upgrading.Really? Should we now add GUCs for every feature then?
-1 for having a GUC for this.
+1 on the feature.
Review with functional test coming up.
Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint? Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
If so, how do we get that list?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMfa5058a0cf78e105e4943455e049a6e231537de5b73a550f4e6fbb7aa0182ba1833f03f49d84ffc0ab7bb7c5a90e7f19@asav-1.01.com
On Mon, Jun 16, 2014 at 4:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint? Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
If so, how do we get that list?
I think that how useful Levenshtein distance is for users based in
east Asia generally, and how useful this patch is to those users are
two distinct questions. I have no idea how common it is for Japanese
users to just use Roman characters as table and attribute names. Since
they're very probably already writing application code that uses Roman
characters (except in the comments, user strings and so on), it might
make sense to do the same in the database. I would welcome further
input on that question. I don't know what the trends are in the real
world.
Also note that the patch scans the range table parse state to pick the
most probable candidate among all Vars/columns that already appear
there. The query would raise an error at an earlier point if a
non-existent relation was referenced, for example. We're only choosing
from a minimal list of possibilities, and pick one that is very
probably what was intended. Even if Levenshtein distance works badly
with Kanji (which is not obviously the case, at least to me), it might
not matter here.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 14/06/17 8:31, Peter Geoghegan wrote:
On Mon, Jun 16, 2014 at 4:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint? Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
If so, how do we get that list?I think that how useful Levenshtein distance is for users based in
east Asia generally, and how useful this patch is to those users are
two distinct questions. I have no idea how common it is for Japanese
users to just use Roman characters as table and attribute names. Since
they're very probably already writing application code that uses Roman
characters (except in the comments, user strings and so on), it might
make sense to do the same in the database. I would welcome further
input on that question. I don't know what the trends are in the real
world.
From what I've seen in the wild in Japan, Roman/ASCII characters are
widely used for object/attribute names, as generally it's much less
hassle than switching between input methods, dealing with different
encodings etc. The only place where I've seen Japanese characters widely
used is in tutorials, examples etc. However that's only my personal
observation for one particular non-Roman language.
Regards
Ian Barwick
--
Ian Barwick http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers