GROUP BY / ORDER BY string is very slow

Started by Oleg Broytmannalmost 27 years ago3 messages
#1Oleg Broytmann
phd@sun.med.ru

Hello!

I use a modified version of contrib/apache_logginig. The table I am
using:

CREATE TABLE combinedlog (
host text,
accdate abstime,
request text,
authuser text,
cookie text,
referer text,
useragent text,
stime int2,
status int2,
bytes int4
);

Once a month I run a very simple script to put WWW logs into the table.
The very loading is not fast (I am running postmaster with -F, I use
BEGIN/END and I drop indicies before loading. I remember when I started
using BEGIN/END loading speed up a bit, but not significantly), but is not
my biggest concern. What is worse is spped of my queries.
After inserting into the table, I run this shell script:

sel_f() {
field=$1
psql -d ce_wwwlog -c "SELECT COUNT($field), $field FROM raw_combinedlog GROUP BY 2 ORDER BY 1 DESC;"
}

for i in host request referer useragent; do
sel_f $i > by-$i
done

This works very, very slow. I tried to use indicies:

CREATE INDEX host ON combinedlog (host);
CREATE INDEX request ON combinedlog (request);
CREATE INDEX referer ON combinedlog (referer);
CREATE INDEX useragent ON combinedlog (useragent);

but indicies do not help much, and indexing time is so big, that sum of
CREATE INDEX + SELECT is even bigger :(

Why is it so slow? How can I speed it up?

I am running postgres compiled with --enable-locale. that is, for every
string comparision there are 2 (two) malloc calls and one strcoll. Can I
increase speed turning strcoll off? If so, postgres need a SET command to
turn localization temporary off. I can hack in, as I already rewrote
localization stuff a year ago. The only thing I want to hear from postgres
community (hackers, actually :) is how it should be named:
SET STRCOLL=off
or such?
I remember when I submitted my locale patch there was a discussion on
how to do it the Right Way, but I didn't remember the conlusion. What
finally we decided? I want to add command (if I should to) that is
compliant with other stuff here.

Oleg.
----
Oleg Broytmann http://members.tripod.com/~phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.

#2Thomas G. Lockhart
lockhart@alumni.caltech.edu
In reply to: Oleg Broytmann (#1)
Re: [HACKERS] GROUP BY / ORDER BY string is very slow

Once a month I run a very simple script to put WWW logs into the
table.
What is worse is spped of my queries.
Why is it so slow? How can I speed it up?

Are you running vacuum after removing your indices? If you don't then
the table storage area does not actually shrink.

I am running postgres compiled with --enable-locale. that is, for
every string comparision there are 2 (two) malloc calls and one
strcoll. Can I increase speed turning strcoll off? If so, postgres
need a SET command to turn localization temporary off.

How can you "turn localization off" if you have localized strings in
your database? If you build indices without localization, then turn
localization back on, the things are probably hopelessly out of order.

I remember when I submitted my locale patch there was a discussion on
how to do it the Right Way, but I didn't remember the conlusion. What
finally we decided? I want to add command (if I should to) that is
compliant with other stuff here.

Is the Right Way to implement the NATIONAL CHARACTER type rather than
having the CHAR type be localized? That way, you could have both types
in the same database. Or is that SQL92 feature not widely used or
useful?

- Tom

#3Oleg Broytmann
phd@sun.med.ru
In reply to: Thomas G. Lockhart (#2)
Re: [HACKERS] GROUP BY / ORDER BY string is very slow

Hello!

On Fri, 22 Jan 1999, Thomas G. Lockhart wrote:

Once a month I run a very simple script to put WWW logs into the
table.
What is worse is spped of my queries.
Why is it so slow? How can I speed it up?

Are you running vacuum after removing your indices? If you don't then
the table storage area does not actually shrink.

Yes, I did a dozen of experiments, running VACUUM with and without
indices. VACUUM helped a bit, but not much...

I am running postgres compiled with --enable-locale. that is, for
every string comparision there are 2 (two) malloc calls and one
strcoll. Can I increase speed turning strcoll off? If so, postgres
need a SET command to turn localization temporary off.

How can you "turn localization off" if you have localized strings in
your database? If you build indices without localization, then turn
localization back on, the things are probably hopelessly out of order.

What are "localized strings"?
In this particular database there are only strings from WWW-log. If I
could turn localization off, I would turn it off for this entire db
forever.

Is the Right Way to implement the NATIONAL CHARACTER type rather than
having the CHAR type be localized? That way, you could have both types
in the same database. Or is that SQL92 feature not widely used or
useful?

In this particular case NATIONAL CHARACTER (actually, non-NATIONAL
CHARACTER) is a solution. Not sure about other cases.

Oleg.
----
Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/
Programmers don't die, they just GOSUB without RETURN.