Fault with initcap
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.
Anyway to get around this problem?
Regards,
David
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.
Anyway to get around this problem?
Regards,
David
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.Anyway to get around this problem?
Sure. Step one: define "expected"
Karsten
On Oct 12, 2021, at 10:30 AM, Shaozhong SHI <shishaozhong@gmail.com> wrote:
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
My version 12 system gives Notemachine
Show quoted text
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.
Anyway to get around this problem?
Regards,
David
On Tue, Oct 12, 2021 at 9:54 AM Rob Sargent <robjsargent@gmail.com> wrote:
On Oct 12, 2021, at 10:30 AM, Shaozhong SHI <shishaozhong@gmail.com>
wrote:
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
My version 12 system gives Notemachine
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.
Anyway to get around this problem?
Pg v.13 doesn't have a problem with notemachine either, but does seem to
screw up Sainsbury's Bank as Sainsbury'S bank.
https://www.db-fiddle.com/f/m2BMATwVcp6A1ZvD8TyurW/1
The only way I could think of fixing this would be to swap out the
apostrophe with an alpha string not present in the string, and then swap
back when you're done (see example above -- I'm not recommending using
"xyzxyz" - that's just an example. And ideally you'd search for your
placeholder string first to be sure it doesn't exist, or escape it).
Or just use a regexp to solve your casing problem -- maybe break up your
string by whitespace into array elements, uppercase the first letter of
each array and the concat the array elements back?
Steve
Hi David,
Expected are as follows:
Notemachine
Sainsbury's bank.
Now, step two: generalize that exemplary definition.
Karsten
Import Notes
Reply to msg id not found: CA+i5Jwauegq6UcYQ0mNN1aZM_9nnNQJbcYLxdBVxE_=sZreVrA@mail.gmail.com
On 10/12/21 09:31, Shaozhong SHI wrote:
I tried initcap and found a major problem with it.
What Postgres version?
In version 12 and 14 I get:
Initcap of notemachine is NoteMachine.
select initcap('notemachine');
initcap
-------------
Notemachine
Initcap of Sainsbury's Bank is Sainsbury'S bank.
select initcap('Sainsbury''s Bank');
initcap
------------------
Sainsbury'S Bank
Which follows the definition here:
https://www.postgresql.org/docs/14/functions-string.html
initcap ( text ) → text
Converts the first letter of each word to upper case and the rest to
lower case. Words are sequences of alphanumeric characters separated by
non-alphanumeric characters.
This is not expected.
What is the encoding, collate, ctype for the database?
Can be found in psql using:
\l db_name
Anyway to get around this problem?
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
Only the first letter of each word to be capitalised/uppercased.
The next step is to not top-post.
Then to keep the list involved if you wish further help.
Then, if you are intent on using regular expressions, look at
the PostgreSQL docs for regexp_replace.
Karsten
Import Notes
Reply to msg id not found: CA+i5Jwb6R=3sh1PFdYtpCEqnjeNwxkoL4-JtTDT0mYnORvN=kA@mail.gmail.com
On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 10/12/21 09:31, Shaozhong SHI wrote:
I tried initcap and found a major problem with it.
What Postgres version?
In version 12 and 14 I get:
Initcap of notemachine is NoteMachine.
select initcap('notemachine');
initcap
-------------
NotemachineInitcap of Sainsbury's Bank is Sainsbury'S bank.
select initcap('Sainsbury''s Bank');
initcap
------------------
Sainsbury'S BankWhich follows the definition here:
https://www.postgresql.org/docs/14/functions-string.html
initcap ( text ) → text
Converts the first letter of each word to upper case and the rest to
lower case. Words are sequences of alphanumeric characters separated by
non-alphanumeric characters.Hi, Adrian Klaver,
It looks like that you replicated the error.
There must be a way to do the following.
a column contains a list of words. Only the first letter of each word
should be capitalised. INITCAP can not do that. How to create a function
just to capitalised each word (substring) in a list of words/strings. This
will be very useful and create great impact.
Regards,
David
Am Tue, Oct 12, 2021 at 09:50:16PM +0100 schrieb Shaozhong SHI:
There must be a way to do the following.
[...] Only the first letter of each word should be capitalised.
Indeed, there is. It is called "human brain in cultural
context". "AI" is close nowadays, but, hopefully, not quite
there yet.
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On 10/12/21 13:50, Shaozhong SHI wrote:
On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com
Which follows the definition here:
https://www.postgresql.org/docs/14/functions-string.html
<https://www.postgresql.org/docs/14/functions-string.html>initcap ( text ) → text
Converts the first letter of each word to upper case and the rest to
lower case. Words are sequences of alphanumeric characters separated by
non-alphanumeric characters.Hi, Adrian Klaver,
It looks like that you replicated the error.
There is no error, initcap is doing what it is documented to.
notemachine is not two words anymore then 'online', 'bluebell',
'network' are.
There must be a way to do the following.
Maybe, but as Karsten says it would involve an AI. One that understands
the mutt language that is English.
a column contains a list of words. Only the first letter of each word
should be capitalised. INITCAP can not do that. How to create a
function just to capitalised each word (substring) in a list of
words/strings. This will be very useful and create great impact.
From here:
https://www.grammarly.com/blog/14-of-the-longest-words-in-english/
uncopyrightable
where would you split that into words?:
Some 'words' I see:
un
unc
copy
copyright
right
table
able
Regards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, 12 Oct 2021 at 23:02, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 10/12/21 13:50, Shaozhong SHI wrote:
On Tue, 12 Oct 2021 at 20:34, Adrian Klaver <adrian.klaver@aklaver.com
Which follows the definition here:
https://www.postgresql.org/docs/14/functions-string.html
<https://www.postgresql.org/docs/14/functions-string.html>initcap ( text ) → text
Converts the first letter of each word to upper case and the rest to
lower case. Words are sequences of alphanumeric characters separatedby
non-alphanumeric characters.
Hi, Adrian Klaver,
It looks like that you replicated the error.
There is no error, initcap is doing what it is documented to.
notemachine is not two words anymore then 'online', 'bluebell',
'network' are.There must be a way to do the following.
Maybe, but as Karsten says it would involve an AI. One that understands
the mutt language that is English.a column contains a list of words. Only the first letter of each word
should be capitalised. INITCAP can not do that. How to create a
function just to capitalised each word (substring) in a list of
words/strings. This will be very useful and create great impact.From here:
https://www.grammarly.com/blog/14-of-the-longest-words-in-english/
uncopyrightable
where would you split that into words?:
Some 'words' I see:
un
unc
copy
copyright
right
table
ableRegards,
David
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi, Adrian Klaver,
In Python, there is a capwords. Do we have an equivalent in Postgres?
Regards, David
On 10/12/21 16:03, Shaozhong SHI wrote:
Hi, Adrian Klaver,
In Python, there is a capwords. Do we have an equivalent in Postgres?
https://docs.python.org/3/library/string.html?highlight=capwords#string.capwords
string.capwords(s, sep=None)
Split the argument into words using str.split(), capitalize each
word using str.capitalize(), and join the capitalized words using
str.join(). If the optional second argument sep is absent or None, runs
of whitespace characters are replaced by a single space and leading and
trailing whitespace are removed, otherwise sep is used to split and join
the words.
That is not going to do what you are proposing either as it still comes
down to deciding where to split the 'words':
import string
string.capwords('notemachine')
'Notemachine'
There are similar functions to split strings here:
https://www.postgresql.org/docs/14/functions-string.html
Though again they depend on some delimiter or regexp to make the split.
There is no function that just 'knows' that 'notemachine' is two words
and should become 'NoteMachine'.
Regards, David
--
Adrian Klaver
adrian.klaver@aklaver.com
On Tue, Oct 12, 2021 at 8:28 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
There is no function that just 'knows' that 'notemachine' is two words
and should become 'NoteMachine'.
Any chance that 'notemachine' is stored with a zero-width space (Unicode
U+200B)? This is common for compound words where the programmer wants to be
able to "know" that they are distinct words that are smashed together.
select initcap('notemachine'), initcap('note' || U&'\200B' || 'machine');
initcap |initcap |
-----------+------------+
Notemachine|NoteMachine|
You could check the length() of the string. If it is 12, there is an
invisible character in there. This would explain the "unexpected" behavior
that no one seems to be able to replicate.
Best,
--Lee
--
Lee Hachadoorian
Assistant Professor of Instruction, Geography and Urban Studies
Assistant Director, Professional Science Master's in GIS
Temple University
I tried both on PostgreSQL 11.13 and yes "Sainsbury's bank" 's "S" is
buggy, but notemachine is not.
Will anyone try @ 14?
n s version
Notemachine Sainsbury'S Bank PostgreSQL 11.13 (Debian 11.13-0+deb10u1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
On Tue, Oct 12, 2021 at 7:30 PM Shaozhong SHI <shishaozhong@gmail.com>
wrote:
Show quoted text
I tried initcap and found a major problem with it.
Initcap of notemachine is NoteMachine.
Initcap of Sainsbury's Bank is Sainsbury'S bank.
This is not expected.
Anyway to get around this problem?
Regards,
David
st 13. 10. 2021 v 8:34 odesílatel Metin Ulusinan <
metin.ulusinan@ssicilian.net> napsal:
I tried both on PostgreSQL 11.13 and yes "Sainsbury's bank" 's "S" is
buggy, but notemachine is not.Will anyone try @ 14?
It cannot to work everywhere - the word is defined (for this function) as
sequence of alpha numeric chars. "'" is not alpha numeric char.
Theoretically it can work if you use ICU collation provider, if it is
supported by ICU.
Regards
Pavel
On Wed, 13 Oct 2021 at 07:33, Metin Ulusinan <metin.ulusinan@ssicilian.net>
wrote:
I tried both on PostgreSQL 11.13 and yes "Sainsbury's bank" 's "S" is
buggy, but notemachine is not.Will anyone try @ 14?
n s version
Notemachine Sainsbury'S Bank PostgreSQL 11.13 (Debian 11.13-0+deb10u1) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bitHello, Metin Ulusinan,
I can confirm that you pin-out the issue.
There is a INITCAP2. Alternative function to PostgreSQL builtin
initcap(text) with support for accented words. · GitHub
<https://gist.github.com/pv8/8291531>
Can this be adapted to capitalise the first letter of each work in a string?
Regards,
David
Hi,
Yes, this can be adaptable, and i did simple version of this.
It just split text words with find spaces, capitalise each
pieces(word) and merge together again.
This is a quick and simple work. You can develop over it about your needs.
Try that and tell us about result.
CREATE OR REPLACE FUNCTION initcap2(text)
RETURNS text
LANGUAGE plpgsql
AS $function$
DECLARE
sentence TEXT := '';
word_array TEXT[];
word TEXT;
word_out TEXT;
BEGIN
sentence := $1;
IF sentence is NULL THEN
RETURN NULL;
END IF;
word_array := regexp_split_to_array($1, E'\\s+');
FOREACH word IN ARRAY word_array
LOOP
word_out := upper(left(word, 1)) || lower(substring(word, 2));
sentence := regexp_replace(sentence, word, word_out);
END LOOP;
RETURN trim(sentence);
END;
$function$
;
On Wed, Oct 13, 2021 at 11:48 AM Shaozhong SHI <shishaozhong@gmail.com>
wrote:
Show quoted text
On Wed, 13 Oct 2021 at 07:33, Metin Ulusinan <metin.ulusinan@ssicilian.net>
wrote:I tried both on PostgreSQL 11.13 and yes "Sainsbury's bank" 's "S" is
buggy, but notemachine is not.Will anyone try @ 14?
n s version
Notemachine Sainsbury'S Bank PostgreSQL 11.13 (Debian 11.13-0+deb10u1)
on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bitHello, Metin Ulusinan,
I can confirm that you pin-out the issue.
There is a INITCAP2. Alternative function to PostgreSQL builtin
initcap(text) with support for accented words. · GitHub
<https://gist.github.com/pv8/8291531>Can this be adapted to capitalise the first letter of each work in a
string?Regards,
David
This was reported in PG 8.1.4 15 years ago - https://pgsql-bugs.postgresql.narkive.com/R7HmTKef/bug-2579-initcap-should-not-capitalize-letter-after-apostrophe
From: Metin Ulusinan <metin.ulusinan@ssicilian.net>
Sent: Wednesday, October 13, 2021 5:39 AM
To: Shaozhong SHI <shishaozhong@gmail.com>
Cc: pgsql-sql <pgsql-sql@lists.postgresql.org>
Subject: Re: Fault with initcap
CAUTION: This email originated from outside of Snap-on. Do not click on links or open attachments unless you have validated the sender, even if it is a known contact. Contact the sender by phone to validate the contents.
Hi,
Yes, this can be adaptable, and i did simple version of this.
It just split text words with find spaces, capitalise each
pieces(word) and merge together again.
This is a quick and simple work. You can develop over it about your needs.
Try that and tell us about result.
CREATE OR REPLACE FUNCTION initcap2(text)
RETURNS text
LANGUAGE plpgsql
AS $function$
DECLARE
sentence TEXT := '';
word_array TEXT[];
word TEXT;
word_out TEXT;
BEGIN
sentence := $1;
IF sentence is NULL THEN
RETURN NULL;
END IF;
word_array := regexp_split_to_array($1, E'\\s+');
FOREACH word IN ARRAY word_array
LOOP
word_out := upper(left(word, 1)) || lower(substring(word, 2));
sentence := regexp_replace(sentence, word, word_out);
END LOOP;
RETURN trim(sentence);
END;
$function$
;
On Wed, Oct 13, 2021 at 11:48 AM Shaozhong SHI <shishaozhong@gmail.com<mailto:shishaozhong@gmail.com>> wrote:
On Wed, 13 Oct 2021 at 07:33, Metin Ulusinan <metin.ulusinan@ssicilian.net<mailto:metin.ulusinan@ssicilian.net>> wrote:
I tried both on PostgreSQL 11.13 and yes "Sainsbury's bank" 's "S" is buggy, but notemachine is not.
Will anyone try @ 14?
n
s
version
Notemachine
Sainsbury'S Bank
PostgreSQL 11.13 (Debian 11.13-0+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
Hello, Metin Ulusinan,
I can confirm that you pin-out the issue.
There is a INITCAP2. Alternative function to PostgreSQL builtin initcap(text) with support for accented words. · GitHub<https://urldefense.com/v3/__https:/gist.github.com/pv8/8291531__;!!Lf_9VycLqA!yapR_3tOx6q4nbC3ZDtCTzp8h2LwdrDNJWxNqPOj8WHb5IIUeDb_3WXMt8MOpJOJ$>
Can this be adapted to capitalise the first letter of each work in a string?
Regards,
David
On Wed, 13 Oct 2021 at 10:39, Metin Ulusinan <metin.ulusinan@ssicilian.net>
wrote:
Hi,
Yes, this can be adaptable, and i did simple version of this.
It just split text words with find spaces, capitalise each
pieces(word) and merge together again.
This is a quick and simple work. You can develop over it about your needs.Try that and tell us about result.
CREATE OR REPLACE FUNCTION initcap2(text)
RETURNS text
LANGUAGE plpgsql
AS $function$
DECLARE
sentence TEXT := '';
word_array TEXT[];
word TEXT;
word_out TEXT;
BEGIN
sentence := $1;IF sentence is NULL THEN
RETURN NULL;
END IF;word_array := regexp_split_to_array($1, E'\\s+');
FOREACH word IN ARRAY word_array
LOOP
word_out := upper(left(word, 1)) || lower(substring(word, 2));
sentence := regexp_replace(sentence, word, word_out);
END LOOP;RETURN trim(sentence);
END;
$function$
;Hello, Metin,
See the following testing response.
ERROR: invalid regular expression: parentheses () not balanced CONTEXT:
PL/pgSQL function testinitcap2(text) line 18 at assignment
SQL state: 2201B
Regards,
David