Psql regex is NFA or DFA?

Started by Joshua b. Joreover 23 years ago10 messagesgeneral
Jump to latest
#1Joshua b. Jore
josh@greentechnologist.org

So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
and while I don't need regex in PostgreSQL I know I'll do it for something
- eventually. The book makes a distinction between DFA, POSIX NFA and
Traditional NFA and then ascribes some properties and behaviours to each.
So what sort does PostgreSQL have?

Joshua b. Jore -{ weird geeky madness }-> http://www.greentechnologist.org

#2Karel Zak
zakkr@zf.jcu.cz
In reply to: Joshua b. Jore (#1)
Re: Psql regex is NFA or DFA?

On Tue, Sep 10, 2002 at 02:16:51AM -0500, Josh Jore wrote:

So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
and while I don't need regex in PostgreSQL I know I'll do it for something
- eventually. The book makes a distinction between DFA, POSIX NFA and
Traditional NFA and then ascribes some properties and behaviours to each.
So what sort does PostgreSQL have?

Regex in PostgreSQL code:

Copyright (c) 1992, 1993, 1994 Henry Spencer.
Copyright (c) 1992, 1993, 1994
The Regents of the University of California. All rights reserved.

I think it's:

POSIX 1003.2, section 2.8 (Regular Expression Notation)

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua b. Jore (#1)
Re: Psql regex is NFA or DFA?

Josh Jore <josh@greentechnologist.org> writes:

So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
and while I don't need regex in PostgreSQL I know I'll do it for something
- eventually. The book makes a distinction between DFA, POSIX NFA and
Traditional NFA and then ascribes some properties and behaviours to each.
So what sort does PostgreSQL have?

Well, you could read the code (src/backend/regex), or you could apply
the tests that Friedl suggests to distinguish the type of an unknown
engine ...

My guess is that it's an NFA, but I dunno if Spencer did the POSIX
semantics or not.

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: Psql regex is NFA or DFA?

Tom Lane wrote:

Josh Jore <josh@greentechnologist.org> writes:

So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
and while I don't need regex in PostgreSQL I know I'll do it for something
- eventually. The book makes a distinction between DFA, POSIX NFA and
Traditional NFA and then ascribes some properties and behaviours to each.
So what sort does PostgreSQL have?

Well, you could read the code (src/backend/regex), or you could apply
the tests that Friedl suggests to distinguish the type of an unknown
engine ...

My guess is that it's an NFA, but I dunno if Spencer did the POSIX
semantics or not.

I am continuing to talk to Henry about getting a newer version of his
regex library. He is working on it but is not yet finished.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Alvaro Herrera
alvherre@atentus.com
In reply to: Bruce Momjian (#4)
Re: Psql regex is NFA or DFA?

On Tue, 10 Sep 2002, Bruce Momjian wrote:

Tom Lane wrote:

Josh Jore <josh@greentechnologist.org> writes:

So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
and while I don't need regex in PostgreSQL I know I'll do it for something
- eventually. The book makes a distinction between DFA, POSIX NFA and
Traditional NFA and then ascribes some properties and behaviours to each.
So what sort does PostgreSQL have?

Well, you could read the code (src/backend/regex), or you could apply
the tests that Friedl suggests to distinguish the type of an unknown
engine ...

My guess is that it's an NFA, but I dunno if Spencer did the POSIX
semantics or not.

I am continuing to talk to Henry about getting a newer version of his
regex library. He is working on it but is not yet finished.

Do you have some details about the implementation itself? I would like
to work on implementing a regex engine using the Shift-And algorithm
families, and that can probably give a performance improvement over the
current engine.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)

#6Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#5)
Re: Psql regex is NFA or DFA?

Alvaro Herrera wrote:

I am continuing to talk to Henry about getting a newer version of his
regex library. He is working on it but is not yet finished.

Do you have some details about the implementation itself? I would like
to work on implementing a regex engine using the Shift-And algorithm
families, and that can probably give a performance improvement over the
current engine.

[ CC'ing Henry Spencer. ]

No, I don't know any details except that his regex library is in the new
tcl code and has to repackaged as a separate library. Henry, could we
help you finish up your regex stuff?

Henry's regex work is the same code that is in *BSD regex (at least
BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
certain complex cases, so much so that I have moved to super-sed (ssed)
for use on my local machine where I have seen 100x speed improvements
with ssed.

I am not sure how slow our regex really is, but for the sake of
PostgreSQL and of the BSD regex library, I would love to see a newer
version.

In fact, I just had an email exchange with Henry about two weeks ago.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: Psql regex is NFA or DFA?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Henry's regex work is the same code that is in *BSD regex (at least
BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
certain complex cases,

You're speaking of his *old* package (the one we currently use), no?

Friedl seems to think that the current Tcl regex engine (Henry's new
code) is the most advanced thing on the planet.

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
Re: Psql regex is NFA or DFA?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Henry's regex work is the same code that is in *BSD regex (at least
BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
certain complex cases,

You're speaking of his *old* package (the one we currently use), no?

Yes, that is the old stuff shipped with BSD 4.4 and included in all the
*BSD releases I have checked.

Friedl seems to think that the current Tcl regex engine (Henry's new
code) is the most advanced thing on the planet.

I have not heard that, but it is good to hear.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Alvaro Herrera
alvherre@atentus.com
In reply to: Tom Lane (#7)
Re: Psql regex is NFA or DFA?

Tom Lane dijo:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Henry's regex work is the same code that is in *BSD regex (at least
BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
certain complex cases,

You're speaking of his *old* package (the one we currently use), no?

Friedl seems to think that the current Tcl regex engine (Henry's new
code) is the most advanced thing on the planet.

Oh, so the TODO item "replace with newer code" is not just vaporware?
Well, I won't try to compete with Spencer's code in that case.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Linux transform� mi computadora, de una `m�quina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada d�a aprendo
algo nuevo" (Jaime Salinas)

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: Psql regex is NFA or DFA?

Alvaro Herrera <alvherre@atentus.com> writes:

Tom Lane dijo:

Friedl seems to think that the current Tcl regex engine (Henry's new
code) is the most advanced thing on the planet.

Oh, so the TODO item "replace with newer code" is not just vaporware?
Well, I won't try to compete with Spencer's code in that case.

The code is certainly not vaporware: Tcl's been using it for awhile.
But the code in a Tcl-independent package that we could easily use
is a different story. Perhaps you could offer Henry some help in
making it into a clean library package ...

regards, tom lane