Regular expression for lower case to upper case.

Started by Eagnaover 3 years ago8 messagesgeneral
Jump to latest
#1Eagna
eagna@protonmail.com

Hi all,

I want a regex to change the case of a field from UPPER to lower.

I know about the UPPER() and LOWER() functions and they are not what I want.

I would have thought this should be very simple, but I've searched a lot and can't seem to get an answer.

Here's a fiddle with a couple of things that I've tried - I'm obviously missing something that should be starting me in the face.

Any input appreciated.

E.

#2Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Eagna (#1)
Re: Regular expression for lower case to upper case.

On 2022-12-10 09:40:22 +0000, Eagna wrote:

I want a regex to change the case of a field from UPPER to lower.

First a note about terminology: A regular expression matches a string,
it doesn't replace anything.

The regexp_replace function uses a regular expression to match parts of
a string and then uses to replacement string to replace them - but the
replacement string is not itself a regular expresssion.

Tha said, the replacement string in some editors (like Vim) and some
programming languages (like Perl) provides syntax for changing case
(both vi(m) and Perl use \u and \U...\E for uppercasing).

https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
doesn't mention any special escapes except \1 through \9 and \&. So it
is extremely likely that no such escapes exist.

I know about the UPPER() and LOWER() functions and they are not what I want.

Can you elaborate why you can't use those?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Eagna (#1)
Re: Regular expression for lower case to upper case.

On Saturday, December 10, 2022, Eagna <eagna@protonmail.com> wrote:

Hi all,

I want a regex to change the case of a field from UPPER to lower.

I know about the UPPER() and LOWER() functions and they are not what I
want.

I would have thought this should be very simple, but I've searched a lot
and can't seem to get an answer.

Here's a fiddle with a couple of things that I've tried - I'm obviously
missing something that should be starting me in the face.

Any input appreciated.

E.

RegExp by itself cannot do this. You have to match all parts of the input
into different capturing groups, then use lower() combined with format() to
build a new string. Putting the capturing groups into an array is the most
useful option.

David J.

#4Eagna
eagna@protonmail.com
In reply to: Peter J. Holzer (#2)
Re: Regular expression for lower case to upper case.

Hi, and thanks for your input,

Tha said, the replacement string in some editors (like Vim) and some
programming languages (like Perl) provides syntax for changing case
(both vi(m) and Perl use \u and \U...\E for uppercasing).

This is probably why I was so frustrated - I thought that there should be an equivalent in PostgreSQL.

I was going mad trying all of these from my searches.

I know about the UPPER() and LOWER() functions and they are not what I want.

Can you elaborate why you can't use those?

I want to index on a REGEXP_REPLACE() - I thought using lower -> upper would be a good test.

I could always have used another REGEXP_REPLACE() for my testing, but I then became "obsessed" with the idea of using REGEXP_REPLACE() as a substitute for UPPER() - kind of an obfuscated code competition with myself! :-)

Again, thanks for your input.

E.

Show quoted text

hp

#5Eagna
eagna@protonmail.com
In reply to: David G. Johnston (#3)
Re: Regular expression for lower case to upper case.

Hi, and thanks for your input.

RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use lower() combined with format() to build a new string. Putting the capturing groups into an array is the most useful option.

OK - I *_kind_* of see what you're saying.

There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to give an outline of the solution that you propose.

Thanks again,

E.

Show quoted text

David J.

#6Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Eagna (#5)
Re: Regular expression for lower case to upper case.

On 2022-12-10 11:00:48 +0000, Eagna wrote:

RegExp by itself cannot do this. You have to match all parts of the
input into different capturing groups, then use lower() combined
with format() to build a new string. Putting the capturing groups
into an array is the most useful option.

OK - I *_kind_* of see what you're saying.

There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd
care to give an outline of the solution that you propose.

For example like this:

INSERT INTO test VALUES
('abc_def_ghi');

Let's say I want to uppercase the part between the two underscores.

First use regexp_replace to split the string into three parts: One
before the match, the match and one after the match:

SELECT
regexp_replace(x, '(.*_)(.*)(_.*)', '\1'),
regexp_replace(x, '(.*_)(.*)(_.*)', '\2'),
regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;

╔════════════════╤════════════════╤════════════════╗
║ regexp_replace │ regexp_replace │ regexp_replace ║
╟────────────────┼────────────────┼────────────────╢
║ abc_ │ def │ _ghi ║
╚════════════════╧════════════════╧════════════════╝
(1 row)

Once that works, uppercase the part you want and concatenate everything
together again:

SELECT
regexp_replace(x, '(.*_)(.*)(_.*)', '\1') ||
upper(regexp_replace(x, '(.*_)(.*)(_.*)', '\2')) ||
regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;

╔═════════════╗
║ ?column? ║
╟─────────────╢
║ abc_DEF_ghi ║
╚═════════════╝
(1 row)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Peter J. Holzer (#6)
Re: Regular expression for lower case to upper case.

On Sat, Dec 10, 2022 at 6:32 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2022-12-10 11:00:48 +0000, Eagna wrote:

RegExp by itself cannot do this. You have to match all parts of the
input into different capturing groups, then use lower() combined
with format() to build a new string. Putting the capturing groups
into an array is the most useful option.

OK - I *_kind_* of see what you're saying.

There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd
care to give an outline of the solution that you propose.

For example like this:

INSERT INTO test VALUES
('abc_def_ghi');

Let's say I want to uppercase the part between the two underscores.

First use regexp_replace to split the string into three parts: One
before the match, the match and one after the match:

SELECT
regexp_replace(x, '(.*_)(.*)(_.*)', '\1'),
regexp_replace(x, '(.*_)(.*)(_.*)', '\2'),
regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;

A bit too inefficient for my taste.
I was describing the following:

with parts as materialized (
select regexp_match(
'abc_def_ghi',
'^([^_]*_)([^_]*_)([^_]*)$') as part_array
)
select format(
'%s%s%s',
part_array[1],
upper(part_array[2]),
part_array[3])
from parts;

David J.

#8Alban Hertroys
haramrae@gmail.com
In reply to: Eagna (#5)
Re: Regular expression for lower case to upper case.

On 10 Dec 2022, at 12:00, Eagna <eagna@protonmail.com> wrote:

Hi, and thanks for your input.

RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use lower() combined with format() to build a new string. Putting the capturing groups into an array is the most useful option.

OK - I *_kind_* of see what you're saying.

There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to give an outline of the solution that you propose.

If you put all the regexes and their replacements into a table[1]If you stick to ASCII, you could just calculate them and even omit storing them in a physical table., you could use an aggregate over them to combine all the replacements into the final string. It would need some aggregate like regex_replace_agg, which would probably be a custom aggregate.

[1]: If you stick to ASCII, you could just calculate them and even omit storing them in a physical table.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.