Re: Mysql 321 - Mysql 322 - msql

Started by The Hermit Hackerabout 27 years ago11 messages
#1The Hermit Hacker
scrappy@hub.org

On Fri, 27 Nov 1998, Andrew McNaughton wrote:

It's possibly stronger on features, but it's slower than mysql. It is
speed he's emphasizing.

I've never actually installed mysql, so can't really compare the
two, but I've been using PostgreSQL for everything I need an RDBMS for
since I first took on the project 3 years ago now (wow, time flies)...each
release has gotten progressively faster, but we've pretty much hit a limit
as far as optimizations are concerned, there probably isn't a *noticeable*
difference between v6.3.2 and v6.4...

We are hoping to have the PREPARE statement put into v6.5, which
should give a performance improvement in "repeatative queries", as the
planning for the query can be done beforehand, taking out a step...

there was some discussion earlier this year on this list about adding
indexes suitable for fulltext searching to PostgreSQL. Did anything
happen in the end? It's the one feature I'd really like to have. I
suspect it would be an important one to James also.

What do you mean by "fulltext searching"?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

In reply to: The Hermit Hacker (#1)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

The Hermit Hacker <scrappy@hub.org> writes:

What do you mean by "fulltext searching"?

He's talking about inverted text indices, where text is indexed such
that a word is the key, and the index returns pointers to all the
places where that word occurs. Knowledge of word structure is usually
built in, so that "hacks", "hacker", "hackers", "hacking" and so on
are known to be derivatives of "hack", and can match it if requested.
Noise words such as "a", "the" and so forth are usually not indexed.

Inverted indexed text storage tends to take up much space, but there
are ways to reduce this, and the best implementations do it remarkably
well. A simple example: it is not really necessary to actually store
the original text; it can instead be a sequence of links to the store
of all individual words in the text database.

See http://glimpse.cs.arizona.edu/ for a powerful inverted indexing
engine and various related software.

-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"

#3The Hermit Hacker
scrappy@hub.org
In reply to: Tom Ivar Helbekkmo (#2)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

On 27 Nov 1998, Tom Ivar Helbekkmo wrote:

The Hermit Hacker <scrappy@hub.org> writes:

What do you mean by "fulltext searching"?

He's talking about inverted text indices, where text is indexed such
that a word is the key, and the index returns pointers to all the
places where that word occurs. Knowledge of word structure is usually
built in, so that "hacks", "hacker", "hackers", "hacking" and so on
are known to be derivatives of "hack", and can match it if requested.
Noise words such as "a", "the" and so forth are usually not indexed.

Inverted indexed text storage tends to take up much space, but there
are ways to reduce this, and the best implementations do it remarkably
well. A simple example: it is not really necessary to actually store
the original text; it can instead be a sequence of links to the store
of all individual words in the text database.

See http://glimpse.cs.arizona.edu/ for a powerful inverted indexing
engine and various related software.

Just curious, but other then specialized applications like
Glimpse, does anyone actually support/do this?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: The Hermit Hacker (#3)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

The Hermit Hacker <scrappy@hub.org> writes:

On 27 Nov 1998, Tom Ivar Helbekkmo wrote:

See http://glimpse.cs.arizona.edu/ for a powerful inverted indexing
engine and various related software.

Just curious, but other then specialized applications like
Glimpse, does anyone actually support/do this?

I dearly love Glimpse. (Sample things I use it for: rooting through
nearly 10 years worth of archived email; finding all references to a
particular name in the Postgres sources, almost instantly; ditto for the
even larger Ptolemy sources; looking for files that I can't remember
where I put ... it's great. And aren't the Postgres mailing list
archive indexes Glimpse-driven?)

I don't currently have any databases that could benefit from full-text
indexes. But I can think of applications where it'd be important,
particularly after we get rid of the limit on tuple sizes so that it
becomes reasonable to put fair-size chunks of text into database
entries. For example: would it be useful to put my email archive into
a Postgres database, one message per tuple? Maybe ... but if I can't
glimpse it afterwards, forgetaboutit.

You could probably glue something like this together from existing
spare parts, say by running a nightly cron job that dumps out the
text fields of your database for indexing by Glimpse. But it wouldn't
be integrated into SQL --- you'd have to query the index separately
outside of SQL, then use the results to drive a query to fetch the
selected records.

A seamless integration would make Glimpse indexes be a new type of
index associated with a new match operator, something like
create index index1 on table using glimpse (text_field);
select * from table where glimpse(text_field, 'pattern');
I have no idea how hard that would be...

regards, tom lane

#5The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#4)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

On Fri, 27 Nov 1998, Tom Lane wrote:

The Hermit Hacker <scrappy@hub.org> writes:

On 27 Nov 1998, Tom Ivar Helbekkmo wrote:

See http://glimpse.cs.arizona.edu/ for a powerful inverted indexing
engine and various related software.

Just curious, but other then specialized applications like
Glimpse, does anyone actually support/do this?

I dearly love Glimpse. (Sample things I use it for: rooting through
nearly 10 years worth of archived email; finding all references to a
particular name in the Postgres sources, almost instantly; ditto for the
even larger Ptolemy sources; looking for files that I can't remember
where I put ... it's great. And aren't the Postgres mailing list
archive indexes Glimpse-driven?)

Nope, I use ht/Dig for it...

A seamless integration would make Glimpse indexes be a new type of
index associated with a new match operator, something like
create index index1 on table using glimpse (text_field);
select * from table where glimpse(text_field, 'pattern');
I have no idea how hard that would be...

Anyone? This one I'd love to see...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#6Tom
tom@sdf.com
In reply to: The Hermit Hacker (#3)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

On Fri, 27 Nov 1998, The Hermit Hacker wrote:

Just curious, but other then specialized applications like
Glimpse, does anyone actually support/do this?

Well, Oracle has their ConText option that does stuff like this.

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

Tom

In reply to: Tom Lane (#4)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

Tom Lane <tgl@sss.pgh.pa.us> writes:

I don't currently have any databases that could benefit from full-text
indexes. But I can think of applications where it'd be important,
particularly after we get rid of the limit on tuple sizes so that it
becomes reasonable to put fair-size chunks of text into database
entries. For example: would it be useful to put my email archive into
a Postgres database, one message per tuple? Maybe ... but if I can't
glimpse it afterwards, forgetaboutit.

Another very important application is the keeping of structured
documents in a database system. Advanced SGML environments do this,
and Philip Greenspun of MIT, the author of the excellent book
"Database Backed Web Sites" (see http://photo.net/ for information)
recommends doing it for HTML and other data for web publishing. The
web server AOLserver is just one example of an application that can do
this -- and if I'm not mistaken, AOLserver can even use PostgreSQL.

Anyway, once the data is in the database, and much of it is text, it
becomes very interesting to be able to efficiently index and search.

-tih
--
Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"

#8The Hermit Hacker
scrappy@hub.org
In reply to: Tom Ivar Helbekkmo (#7)

On Sat, 28 Nov 1998, John Fieber wrote:

In working with the two, I've also found a couple complicated
join queries where I just couldn't get the optimizer in
PostgreSQL (6.3.2 and 6.4) to do the right thing, resulting in
several minutes of processing per query, while mySQL did the same
query in the blink of an eye.

You mention v6.4 above, so could you provide us with a way of
"reproducing" the bug?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#9John Fieber
jfieber@indiana.edu
In reply to: The Hermit Hacker (#8)
1 attachment(s)

On Sat, 28 Nov 1998, The Hermit Hacker wrote:

On Sat, 28 Nov 1998, John Fieber wrote:

In working with the two, I've also found a couple complicated
join queries where I just couldn't get the optimizer in
PostgreSQL (6.3.2 and 6.4) to do the right thing, resulting in
several minutes of processing per query, while mySQL did the same
query in the blink of an eye.

You mention v6.4 above, so could you provide us with a way of
"reproducing" the bug?

Attached is the database scheme from pg_dump (there are a bunch
of extraneous tables in the context of this query). I make no
claims at all about the quality of the database design or the
query, but both Sybase and mySQL execute it very quickly while
PostgreSQL refuses to use the index on the codes table.

$sheet in the query is the "user input" to and is an integer.

(For the curious, this is part of
http://fallout.campusview.indiana.edu/mapfinder. I can supply
data if needed.)

-john

SELECT
sheet.sheet_id,
sheet.name,
sheet.number,
sheet.note,
cat.call,
cat.series,
cat.main_entry,
sheet.scale,
ca.name as mtype,
cb.name as prod,
cc.name as proj,
cd.name as pm,
ce.name as format,
sheet.coords
FROM
sheet,
cat,
codes ca,
codes cb,
codes cc,
codes cd,
codes ce
WHERE
sheet.sheet_id = $sheet
AND sheet.cat_id = cat.cat_id
AND sheet.mtype = ca.code_id
AND sheet.prod = cb.code_id
AND sheet.proj = cc.code_id
AND sheet.pm = cd.code_id
AND sheet.format = ce.code_id

Attachments:

mf.schematext/plain; charset=US-ASCII; name=mf.schemaDownload
#10The Hermit Hacker
scrappy@hub.org
In reply to: John Fieber (#9)

On Sat, 28 Nov 1998, The Hermit Hacker wrote:

On Sat, 28 Nov 1998, John Fieber wrote:

In working with the two, I've also found a couple complicated
join queries where I just couldn't get the optimizer in
PostgreSQL (6.3.2 and 6.4) to do the right thing, resulting in
several minutes of processing per query, while mySQL did the same
query in the blink of an eye.

You mention v6.4 above, so could you provide us with a way of
"reproducing" the bug?

Attached is the database scheme from pg_dump (there are a bunch
of extraneous tables in the context of this query). I make no
claims at all about the quality of the database design or the
query, but both Sybase and mySQL execute it very quickly while
PostgreSQL refuses to use the index on the codes table.

$sheet in the query is the "user input" to and is an integer.

(For the curious, this is part of
http://fallout.campusview.indiana.edu/mapfinder. I can supply
data if needed.)

-john

SELECT
sheet.sheet_id,
sheet.name,
sheet.number,
sheet.note,
cat.call,
cat.series,
cat.main_entry,
sheet.scale,
ca.name as mtype,
cb.name as prod,
cc.name as proj,
cd.name as pm,
ce.name as format,
sheet.coords
FROM
sheet,
cat,
codes ca,
codes cb,
codes cc,
codes cd,
codes ce
WHERE
sheet.sheet_id = $sheet
AND sheet.cat_id = cat.cat_id
AND sheet.mtype = ca.code_id
AND sheet.prod = cb.code_id
AND sheet.proj = cc.code_id
AND sheet.pm = cd.code_id
AND sheet.format = ce.code_id

--0-1134614595-912283854=:795
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="mf.schema"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.BSF.4.05.9811281510540.795@fallout.campusview.indiana.edu>
Content-Description: mapfinder schema
Content-Disposition: attachment; filename="mf.schema"

Q1JFQVRFIFRBQkxFICJvcGVuZmllbGRzIiAoInNoZWV0X2lkIiAiaW50NCIs
ICJjb2RlIiAiaW50MiIsICJ2YWx1ZSIgImludDQiKTsNCkNSRUFURSBUQUJM
RSAiY29kZXMiICgiY29kZV9pZCIgImludDQiLCAibmFtZSIgInRleHQiKTsN
CkNSRUFURSBUQUJMRSAiY2F0IiAoImNhdF9pZCIgImludDQiLCAiY2FsbCIg
InRleHQiLCAibWFpbl9lbnRyeSIgInRleHQiLCAic2VyaWVzIiAidGV4dCIp
Ow0KQ1JFQVRFIFRBQkxFICJzaGVldCIgKCJmaWxfbm8iICJpbnQ0IiwgInNo
ZWV0X2lkIiAiaW50NCIsICJuYW1lIiAidGV4dCIsICJudW1iZXIiICJ0ZXh0
IiwgIm5vdGUiICJ0ZXh0IiwgImhvbGRpbmdzIiAiaW50MiIsICJjYXRfaWQi
ICJpbnQ0IiwgIm10eXBlIiAiaW50MiIsICJwcm9kIiAiaW50MiIsICJwcm9q
IiAiaW50MiIsICJwbSIgImludDIiLCAiZm9ybWF0IiAiaW50MiIsICJzY2Fs
ZSIgImludDQiLCAiY29vcmRzIiAiYm94Iik7DQpDUkVBVEUgVEFCTEUgImdu
aXNfc3RhdGUiICgiaWQiICJpbnQ0IiwgImFiYnJldiIgdmFyY2hhcig0KSwg
Im5hbWUiICJ0ZXh0Iik7DQpDUkVBVEUgVEFCTEUgImduaXNfZnR5cGUiICgi
aWQiICJpbnQ0IiwgImFiYnJldiIgY2hhcig4KSwgIm5hbWUiICJ0ZXh0Iik7
DQpDUkVBVEUgVEFCTEUgImduaXNfY291bnR5IiAoImlkIiAiaW50NCIsICJz
dGF0ZSIgImludDQiLCAibmFtZSIgInRleHQiKTsNCkNSRUFURSBUQUJMRSAi
Z25pcyIgKCJmbmFtZSIgInRleHQiLCAiZm5hbWVfbGMiICJ0ZXh0IiwgImZ0
eXBlIiAiaW50NCIsICJzdGF0ZSIgImludDQiLCAiY291bnR5IiAiaW50NCIs
ICJlbGV2YXRpb24iICJpbnQ0IiwgInBvcHVsYXRpb24iICJpbnQ0IiwgImxv
Y2F0aW9uIiAicG9pbnQiKTsNCkNSRUFURSBGVU5DVElPTiAiY29kZXRleHQi
ICgiaW50MiIgKSBSRVRVUk5TICJ0ZXh0IiBBUyAnU0VMRUNUIGNvZGVzLm5h
bWUgd2hlcmUgY29kZXMuY29kZV9pZCA9ICQxOycgTEFOR1VBR0UgJ1NRTCc7
DQpDUkVBVEUgIElOREVYICJpX29wZW5maWVsZHMiIG9uICJvcGVuZmllbGRz
IiB1c2luZyBidHJlZSAoICJzaGVldF9pZCIgImludDRfb3BzIiApOw0KQ1JF
QVRFICBJTkRFWCAiaV9jb2RlcyIgb24gImNvZGVzIiB1c2luZyBidHJlZSAo
ICJjb2RlX2lkIiAiaW50NF9vcHMiICk7DQpDUkVBVEUgIElOREVYICJpX2Nh
dCIgb24gImNhdCIgdXNpbmcgaGFzaCAoICJjYXRfaWQiICJpbnQ0X29wcyIg
KTsNCkNSRUFURSAgSU5ERVggImlfc2hlZXQiIG9uICJzaGVldCIgdXNpbmcg
YnRyZWUgKCAic2hlZXRfaWQiICJpbnQ0X29wcyIgKTsNCkNSRUFURSAgSU5E
RVggImlfc2hlZXRuYW1lIiBvbiAic2hlZXQiIHVzaW5nIGJ0cmVlICggIm5h
bWUiICJ0ZXh0X29wcyIgKTsNCkNSRUFURSAgSU5ERVggImlfc2hlZXRjb29y
ZHMiIG9uICJzaGVldCIgdXNpbmcgcnRyZWUgKCAiY29vcmRzIiAiYm94X29w
cyIgKTsNCkNSRUFURSAgSU5ERVggImduaXNfc3RhdGVfaSIgb24gImduaXNf
c3RhdGUiIHVzaW5nIGJ0cmVlICggImlkIiAiaW50NF9vcHMiICk7DQpDUkVB
VEUgIElOREVYICJnbmlzX2NvdW50eV9pIiBvbiAiZ25pc19jb3VudHkiIHVz
aW5nIGJ0cmVlICggImlkIiAiaW50NF9vcHMiICk7DQpDUkVBVEUgIElOREVY
ICJnbmlzX2kiIG9uICJnbmlzIiB1c2luZyBidHJlZSAoICJmbmFtZV9sYyIg
InRleHRfb3BzIiApOw0K
--0-1134614595-912283854=:795--

#11Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#4)
Re: [HACKERS] Re: Mysql 321 - Mysql 322 - msql

I don't currently have any databases that could benefit from full-text
indexes. But I can think of applications where it'd be important,
particularly after we get rid of the limit on tuple sizes so that it
becomes reasonable to put fair-size chunks of text into database
entries. For example: would it be useful to put my email archive into
a Postgres database, one message per tuple? Maybe ... but if I can't
glimpse it afterwards, forgetaboutit.

You could probably glue something like this together from existing
spare parts, say by running a nightly cron job that dumps out the
text fields of your database for indexing by Glimpse. But it wouldn't
be integrated into SQL --- you'd have to query the index separately
outside of SQL, then use the results to drive a query to fetch the
selected records.

We do have contrib/fulltextindex.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026