[ltree] Should `SELECT LCA('1.2', '1.2.3');` return '1.2' instead of '1'?

Started by Julien Grillotover 8 years ago4 messagesbugs
Jump to latest
#1Julien Grillot
julien.grillot@gmail.com

Operating system: Ubuntu 17.04
PostgreSQL version: 9.6.6

At the moment, both SELECT LCA('1.2', '1.2'); and SELECT LCA('1.2',
'1.2.3'); return ‘1’.

According to the Wikipedia1 definition of LCA:
“[…] the lowest common ancestor (LCA) of two nodes v and w […] is the
lowest (i.e. deepest) node that has both v and w as descendants, where we
define each node to be a descendant of itself (so if v has a direct
connection from w, w is the lowest common ancestor).”

So, in my understanding, both SELECT LCA('1.2', '1.2'); and SELECT
LCA('1.2', '1.2.3'); should return ‘1.2’. What do you think about it?

By the way, the ltree structure is really great, thank you!

Julien Grillot

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Julien Grillot (#1)
Re: [ltree] Should `SELECT LCA('1.2', '1.2.3');` return '1.2' instead of '1'?

On Fri, Dec 1, 2017 at 8:32 AM, Julien Grillot <julien.grillot@gmail.com>
wrote:

At the moment, both SELECT LCA('1.2', '1.2'); and SELECT LCA('1.2',
'1.2.3'); return ‘1’.

According to the Wikipedia1 definition of LCA:
“[…] the lowest common ancestor (LCA) of two nodes v and w […] is the
lowest (i.e. deepest) node that has both v and w as descendants, where we
define each node to be a descendant of itself (so if v has a direct
connection from w, w is the lowest common ancestor).”

So, in my understanding, both SELECT LCA('1.2', '1.2'); and SELECT
LCA('1.2', '1.2.3'); should return ‘1.2’. What do you think about it?

​If indeed one defines "each node to be a descendant of itself" then you
are correct. Unfortunately your example indicates that this implementation
does not define ancestry that way. While you may not agree with the
definition at this point changing the existing function's behavior in this
manner seems highly undesirable - too much existing code would silently
break. Adding a leading (due to varargs) boolean argument that specifies
"end node is ancestor of itself" behavior might be a workable solution.

Regardless, the docs should be clarified on this point, though, and maybe
add a code comment somewhere.

David J.

#3Julien Grillot
julien.grillot@gmail.com
In reply to: David G. Johnston (#2)
Re: [ltree] Should `SELECT LCA('1.2', '1.2.3');` return '1.2' instead of '1'?

2017-12-01 17:17 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:

​[…] Adding a leading (due to varargs) boolean argument that specifies
"end node is ancestor of itself" behavior might be a workable solution.

Regardless, the docs should be clarified on this point, though, and maybe
add a code comment somewhere.

I understand the importance not to break existing usages. It would be
perfect to have a boolean to specify that – and to clarify the docs, since
the example given does not reveal this behavior
https://www.postgresql.org/docs/9.1/static/ltree.html#LTREE-FUNC-TABLE

I digged inside the source code and ended at
`contrib/ltree/ltree_op.c:406` but I don't feel able to do the patch.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: [ltree] Should `SELECT LCA('1.2', '1.2.3');` return '1.2' instead of '1'?

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Fri, Dec 1, 2017 at 8:32 AM, Julien Grillot <julien.grillot@gmail.com>
wrote:

So, in my understanding, both SELECT LCA('1.2', '1.2'); and SELECT
LCA('1.2', '1.2.3'); should return ‘1.2’. What do you think about it?

If indeed one defines "each node to be a descendant of itself" then you
are correct. Unfortunately your example indicates that this implementation
does not define ancestry that way. While you may not agree with the
definition at this point changing the existing function's behavior in this
manner seems highly undesirable - too much existing code would silently
break. Adding a leading (due to varargs) boolean argument that specifies
"end node is ancestor of itself" behavior might be a workable solution.

Regardless, the docs should be clarified on this point, though, and maybe
add a code comment somewhere.

This came up again today; I've now adjusted the docs to reflect reality.

If we were to do something about the lack of a function that behaves
the other way, I'd suggest calling it lcp() for "longest common prefix".
Trying to deal with it with an optional argument is going to create
overloading problems.

regards, tom lane