Regular Expression For Duplicate Words
This link is interesting.
regex - Regular Expression For Duplicate Words - Stack Overflow
<https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words>
Is there any example in Postgres?
Regards,
David
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com> wrote:
This link is interesting.
regex - Regular Expression For Duplicate Words - Stack Overflow
<https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words>Is there any example in Postgres?
Not that I'm immediately aware of, and I'm not going to search the internet
for you.
The regex capabilities in PostgreSQL are pretty full-featured so a solution
should be possible. You should try translating the SO post concepts into
PostgreSQL yourself and ask specific questions if you get stuck.
David J.
It's an interesting question. But I also don't know how to do it in
PostgreSQL.
But I figured out alternative solutions.
GNU Grep: grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'
ripgrep: rg '(hello)[[:blank:]]+\1' --pcre2 <<<'one hello hello world'
On Wed, Feb 2, 2022 at 8:53 PM David G. Johnston <david.g.johnston@gmail.com>
wrote:
Show quoted text
On Wed, Feb 2, 2022 at 1:00 AM Shaozhong SHI <shishaozhong@gmail.com>
wrote:This link is interesting.
regex - Regular Expression For Duplicate Words - Stack Overflow
<https://stackoverflow.com/questions/2823016/regular-expression-for-duplicate-words>Is there any example in Postgres?
Not that I'm immediately aware of, and I'm not going to search the
internet for you.The regex capabilities in PostgreSQL are pretty full-featured so a
solution should be possible. You should try translating the SO post
concepts into PostgreSQL yourself and ask specific questions if you get
stuck.David J.
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
regex - Regular Expression For Duplicate Words - Stack Overflow
Is there any example in Postgres?
It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described on
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.
So
[[:<:]] start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]] end of word
\W+ one or more non-word characters
[[:<:]] start of word
\1 the content of the first (and only) capturing group
[[:>:]] end of word
All together:
select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Hi, Peter, Interesting.
On Thu, 3 Feb 2022 at 19:48, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-02-02 08:00:00 +0000, Shaozhong SHI wrote:
regex - Regular Expression For Duplicate Words - Stack Overflow
Is there any example in Postgres?
It's pretty much the same as with other regexp dialects: User word
boundaries and a word character class to match any word and then use a
backreference to match a duplicate word. All the building blocks are
described onhttps://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
and except for [[:<:]] and [[:>:]] for the word boundaries, they are
also pretty standard.So
[[:<:]] start of word
([[:alpha:]]+) one or more alphabetic characters in a capturing group
[[:>:]] end of word
\W+ one or more non-word characters
[[:<:]] start of word
\1 the content of the first (and only) capturing group
[[:>:]] end of wordAll together:
select * from t where t ~ '[[:<:]]([[:alpha:]]+)[[:>:]]\W[[:<:]]\1[[:>:]]';
Give a good example if you can.
Regards,
David