Detecting repeated phrase in a string

Started by Shaozhong SHIover 4 years ago5 messagesgeneral
Jump to latest
#1Shaozhong SHI
shishaozhong@gmail.com

Does anyone know how to detect repeated phrase in a string?

Is there any such function?

Regards,

David

#2Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Shaozhong SHI (#1)
Re: Detecting repeated phrase in a string

On 2021-12-09 12:38:15 +0000, Shaozhong SHI wrote:

Does anyone know how to detect repeated phrase in a string?

Use regular expressions with backreferences:

bayes=> select regexp_match('foo wikiwiki bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ {o} ║
╚══════════════╝
(1 row)

"o" is repeated in "foo".

bayes=> select regexp_match('fo wikiwiki bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ {wiki} ║
╚══════════════╝
(1 row)

"wiki" is repeated in "wikiwiki".

bayes=> select regexp_match('fo wikiwi bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ (∅) ║
╚══════════════╝
(1 row)

nothing is repeated.

Adjust the expression within parentheses if you want to match somethig
more specific than any sequence of one or more characters.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#3Shaozhong SHI
shishaozhong@gmail.com
In reply to: Peter J. Holzer (#2)
Re: Detecting repeated phrase in a string

Hi, Peter,

How to define word boundary as either by using
^ , space, or $

So that the following can be done

fox fox is a repeat

foxfox is not a repeat but just one word.

Regards,

David

On Thu, 9 Dec 2021 at 13:35, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

Show quoted text

On 2021-12-09 12:38:15 +0000, Shaozhong SHI wrote:

Does anyone know how to detect repeated phrase in a string?

Use regular expressions with backreferences:

bayes=> select regexp_match('foo wikiwiki bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ {o} ║
╚══════════════╝
(1 row)

"o" is repeated in "foo".

bayes=> select regexp_match('fo wikiwiki bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ {wiki} ║
╚══════════════╝
(1 row)

"wiki" is repeated in "wikiwiki".

bayes=> select regexp_match('fo wikiwi bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ (∅) ║
╚══════════════╝
(1 row)

nothing is repeated.

Adjust the expression within parentheses if you want to match somethig
more specific than any sequence of one or more characters.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#4Andreas Joseph Krogh
andreas@visena.com
In reply to: Shaozhong SHI (#3)
Re: Detecting repeated phrase in a string

På torsdag 09. desember 2021 kl. 15:46:05, skrev Shaozhong SHI <
shishaozhong@gmail.com <mailto:shishaozhong@gmail.com>>:

Hi, Peter,

How to define word boundary as either by using
^ , space, or $

So that the following can be done

fox fox is a repeat

foxfox is not a repeat but just one word.

Do you want repeated phrase (list of words) ore repeated words?
For repeated words (including unicode-chars) you can do:

(\b\p{L}+\b)(?:\s+\1)+

I'm not quite sure how to translate this to PG, but in JAVA it works.

--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andreas@visena.com <mailto:andreas@visena.com>
www.visena.com <https://www.visena.com&gt;
<https://www.visena.com&gt;

#5Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Andreas Joseph Krogh (#4)
Re: Detecting repeated phrase in a string

On 2021-12-09 16:11:31 +0100, Andreas Joseph Krogh wrote:

For repeated words (including unicode-chars) you can do:
 
(\b\p{L}+\b)(?:\s+\1)+
 
I'm not quite sure how to translate this to PG, but in JAVA it works.

See https://www.postgresql.org/docs/11/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"