BUG #15971: Behaviour of SUBSTR function depending on its arguments

Started by PG Bug reporting formover 6 years ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15971
Logged by: VIMAL VICTOR B
Email address: bvimalvictor@gmail.com
PostgreSQL version: 10.4
Operating system: Linux
Description:

Hi,

select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
substr('abcdef',-1,2) val3;

The above query in Postgres returns expected result for val1 ('ab') but for
val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
RDBMS systems would return 'ab' for val1 and val2. When the start position
is -ve, then the start position will be considered from end of input string.
The start position can be considered either as 0 or 1 and substring from
that position can be returned by considering the max length of the input
string, which is logically correct. But when -ve value given for start
position and returning empty string '', which does not seem logically
correct. Kindly consider this case and let me know your thoughts.

Regards,
Vimal

#2Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

I don't think it's a bug. I think it's just that the SUBSTR in
PostgreSQL is more SQL compliant than any other databases that you
mentioned. As per the SQL standard, if the start position is zero or
negative number, it should be adjusted to the start of the string and
not to the end of the string and that way I feel the behaviour of
SUBSTR in PostgreSQL is more SQL compliant than other databases.
That's my opinion though, let's see what others have to say on it.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Wed, Aug 21, 2019 at 12:33 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

Show quoted text

The following bug has been logged on the website:

Bug reference: 15971
Logged by: VIMAL VICTOR B
Email address: bvimalvictor@gmail.com
PostgreSQL version: 10.4
Operating system: Linux
Description:

Hi,

select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
substr('abcdef',-1,2) val3;

The above query in Postgres returns expected result for val1 ('ab') but for
val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
RDBMS systems would return 'ab' for val1 and val2. When the start position
is -ve, then the start position will be considered from end of input string.
The start position can be considered either as 0 or 1 and substring from
that position can be returned by considering the max length of the input
string, which is logically correct. But when -ve value given for start
position and returning empty string '', which does not seem logically
correct. Kindly consider this case and let me know your thoughts.

Regards,
Vimal

#3Thomas Kellerer
spam_eater@gmx.net
In reply to: PG Bug reporting form (#1)
Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

PG Bug reporting form schrieb am 21.08.2019 um 09:02:

select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
substr('abcdef',-1,2) val3;

The above query in Postgres returns expected result for val1 ('ab') but for
val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
RDBMS systems would return 'ab' for val1 and val2.

The only "other" DBMS that behaves the way you describe it, is Oracle.

When I ran it on different systems, this is the result:

| val1 | val2 | val3 |
-----------|------|---------|---------|
Postgres | ab | a | <empty> |
SQL Server | ab | a | <empty> |
SQLite | ab | a | f |
MySQL | ab | <empty> | f |
Oracle | ab | ab | f |

DB2 and Firebird do not allow a starting position smaller than 1

So SQL Server works the same as Postgres and Oracle is the only one that returns 'ab' for the second case

So clearly the claim "most other" is wrong here.

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

st 21. 8. 2019 v 9:03 odesílatel PG Bug reporting form <
noreply@postgresql.org> napsal:

The following bug has been logged on the website:

Bug reference: 15971
Logged by: VIMAL VICTOR B
Email address: bvimalvictor@gmail.com
PostgreSQL version: 10.4
Operating system: Linux
Description:

Hi,

select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
substr('abcdef',-1,2) val3;

The above query in Postgres returns expected result for val1 ('ab') but for
val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
RDBMS systems would return 'ab' for val1 and val2. When the start position
is -ve, then the start position will be considered from end of input
string.
The start position can be considered either as 0 or 1 and substring from
that position can be returned by considering the max length of the input
string, which is logically correct. But when -ve value given for start
position and returning empty string '', which does not seem logically
correct. Kindly consider this case and let me know your thoughts.

Orafce support substr function with Oracle behave

https://github.com/orafce/orafce

Regards

Pavel

Show quoted text

Regards,
Vimal

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Sharma (#2)
Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

Ashutosh Sharma <ashu.coek88@gmail.com> writes:

I don't think it's a bug. I think it's just that the SUBSTR in
PostgreSQL is more SQL compliant than any other databases that you
mentioned. As per the SQL standard, if the start position is zero or
negative number, it should be adjusted to the start of the string and
not to the end of the string and that way I feel the behaviour of
SUBSTR in PostgreSQL is more SQL compliant than other databases.

Yes, the standard provides no wiggle room here. The behavior of
substring() with integer parameters, as specified in SQL:2008
6.29 <string value function> general rule 3, is

3) If <character substring function> is specified, then:

a) If the character encoding form of <character value expression> is
UTF8, UTF16, or UTF32, then, in the remainder of this General Rule, the
term “character” shall be taken to mean “unit specified by <char length
units>”.

b) Let C be the value of the <character value expression>, let LC be the
length in characters of C, and let S be the value of the <start
position>.

c) If <string length> is specified, then let L be the value of <string
length> and let E be S+L. Otherwise, let E be the larger of LC + 1 and
S.

d) If either C, S, or L is the null value, then the result of the
<character substring function> is the null value.

e) If E is less than S, then an exception condition is raised: data
exception — substring error.

f) Case:

i) If S is greater than LC or if E is less than 1 (one), then the
result of the <character substring function> is a zero-length string.

ii) Otherwise,

1) Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E
and LC+1. Let L1 be E1–S1.

2) The result of the <character substring function> is a character
string containing the L1 characters of C starting at character
number S1 in the same order that the characters appear in C.

I believe our implementation does this exactly.

Even if it were true that Oracle's behavior is more common than the
spec's definition, it's quite unlikely that we could be talked into
abandoning spec-compliant behavior to match Oracle.

regards, tom lane

#6Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Tom Lane (#5)
Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

On Wed, Aug 21, 2019 at 7:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Sharma <ashu.coek88@gmail.com> writes:

I don't think it's a bug. I think it's just that the SUBSTR in
PostgreSQL is more SQL compliant than any other databases that you
mentioned. As per the SQL standard, if the start position is zero or
negative number, it should be adjusted to the start of the string and
not to the end of the string and that way I feel the behaviour of
SUBSTR in PostgreSQL is more SQL compliant than other databases.

Yes, the standard provides no wiggle room here. The behavior of
substring() with integer parameters, as specified in SQL:2008
6.29 <string value function> general rule 3, is

3) If <character substring function> is specified, then:

a) If the character encoding form of <character value expression> is
UTF8, UTF16, or UTF32, then, in the remainder of this General Rule, the
term “character” shall be taken to mean “unit specified by <char length
units>”.

b) Let C be the value of the <character value expression>, let LC be the
length in characters of C, and let S be the value of the <start
position>.

c) If <string length> is specified, then let L be the value of <string
length> and let E be S+L. Otherwise, let E be the larger of LC + 1 and
S.

d) If either C, S, or L is the null value, then the result of the
<character substring function> is the null value.

e) If E is less than S, then an exception condition is raised: data
exception — substring error.

f) Case:

i) If S is greater than LC or if E is less than 1 (one), then the
result of the <character substring function> is a zero-length string.

ii) Otherwise,

1) Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E
and LC+1. Let L1 be E1–S1.

2) The result of the <character substring function> is a character
string containing the L1 characters of C starting at character
number S1 in the same order that the characters appear in C.

I believe our implementation does this exactly.

Even if it were true that Oracle's behavior is more common than the
spec's definition, it's quite unlikely that we could be talked into
abandoning spec-compliant behavior to match Oracle.

Absolutely. We shouldn't do that. We are more accurate than Oracle as
per the SQL standard.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com