Massive table bloat

Started by Markus Wollnyabout 16 years ago4 messagesgeneral
Jump to latest
#1Markus Wollny
Markus.Wollny@computec.de

Hi!

I've set up some system to track slow page executions in one of our (as
yet not live) web apps. The tracking itself is handled completely within
the database using a function. Within a very short time (approx. 1 week)
and although we haven't got that much traffic on our testpages, the
table in question as grown beyond a size of 23 GB, even though a SELECT
count(*) on it will tell me that it only contains 235 rows. I'm sure I
must be missing something obvious here...

Here's the DDL for the table:

CREATE TABLE stats.slowpages
(
url text NOT NULL,
lastexecduration integer NOT NULL,
avgslowexecduration integer,
execcount integer,
lastexectime timestamp without time zone,
site_id integer NOT NULL,
slowestexecduration integer,
totaltimespent bigint,
CONSTRAINT "slowpages_pkey" PRIMARY KEY (url)
)WITHOUT OIDS;

-- Indexes
CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree
(lastexecduration);
CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree
(avgslowexecduration);
CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree
(execcount);
CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree
(lastexectime);
CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree
(site_id);
CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING
btree (url, site_id);
CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree
(totaltimespent);

And this here is the function we use to insert or update entries in this
table:

CREATE or REPLACE FUNCTION "stats"."iou_slowpages"(
IN "_site_id" integer,
IN "_url" text,
IN "_duration" integer)
RETURNS void AS
$BODY$
BEGIN
LOOP

UPDATE stats.slowpages
SET avgslowexecduration =
((avgslowexecduration*execcount)+_duration)/(execcount+1)
, execcount = execcount+1
, lastexectime = now()
, lastexecduration = _duration
, totaltimespent = totaltimespent + _duration
, slowestexecduration = CASE WHEN _duration >
slowestexecduration
THEN _duration ELSE slowestexecduration END
WHERE url = _url AND site_id = _site_id;
IF found THEN
RETURN;
END IF;

BEGIN
INSERT INTO
stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura
tion,totaltimespent,execcount,lastexectime,site_id)
VALUES (_url, _duration, _duration,_duration,_duration, 1,
now(), _site_id);
RETURN;
EXCEPTION WHEN unique_violation THEN

END;
END LOOP;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

_site_id is a small integer value, _url is a full URL string to a page
and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7.

Any idea about what I may be missing here?

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#2Thom Brown
thombrown@gmail.com
In reply to: Markus Wollny (#1)
Re: Massive table bloat

On 3 March 2010 15:33, Markus Wollny <Markus.Wollny@computec.de> wrote:

Hi!

I've set up some system to track slow page executions in one of our (as
yet not live) web apps. The tracking itself is handled completely within
the database using a function. Within a very short time (approx. 1 week)
and although we haven't got that much traffic on our testpages, the
table in question as grown beyond a size of 23 GB, even though a SELECT
count(*) on it will tell me that it only contains 235 rows. I'm sure I
must be missing something obvious here...

Here's the DDL for the table:

CREATE TABLE stats.slowpages
(
url text NOT NULL,
lastexecduration integer NOT NULL,
avgslowexecduration integer,
execcount integer,
lastexectime timestamp without time zone,
site_id integer NOT NULL,
slowestexecduration integer,
totaltimespent bigint,
CONSTRAINT "slowpages_pkey" PRIMARY KEY (url)
)WITHOUT OIDS;

-- Indexes
CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree
(lastexecduration);
CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree
(avgslowexecduration);
CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree
(execcount);
CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree
(lastexectime);
CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree
(site_id);
CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING
btree (url, site_id);
CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree
(totaltimespent);

And this here is the function we use to insert or update entries in this
table:

CREATE or REPLACE FUNCTION "stats"."iou_slowpages"(
IN "_site_id" integer,
IN "_url" text,
IN "_duration" integer)
RETURNS void AS
$BODY$
BEGIN
   LOOP

       UPDATE stats.slowpages
              SET  avgslowexecduration =
((avgslowexecduration*execcount)+_duration)/(execcount+1)
              ,    execcount = execcount+1
              ,    lastexectime = now()
              ,    lastexecduration = _duration
              ,    totaltimespent = totaltimespent + _duration
              ,    slowestexecduration = CASE WHEN _duration >
slowestexecduration
                   THEN _duration ELSE slowestexecduration END
              WHERE url = _url AND site_id = _site_id;
       IF found THEN
           RETURN;
       END IF;

       BEGIN
           INSERT INTO
stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura
tion,totaltimespent,execcount,lastexectime,site_id)
           VALUES (_url, _duration, _duration,_duration,_duration, 1,
now(), _site_id);
           RETURN;
       EXCEPTION WHEN unique_violation THEN

       END;
   END LOOP;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

_site_id is a small integer value, _url is a full URL string to a page
and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7.

Any idea about what I may be missing here?

Kind regards

  Markus

If you update rows, it actually creates a new version of it. The old
one doesn't get removed until the VACUUM process cleans it up, so
maybe you need to run that against the database?

Regards

Thom

#3Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Markus Wollny (#1)
Re: Massive table bloat

do a vacuum analyze verbose on it, and see if it complains about FSM (free
space map) setting. Which it probably will be.

#4Markus Wollny
Markus.Wollny@computec.de
In reply to: Thom Brown (#2)
Re: Massive table bloat

-----Ursprüngliche Nachricht-----
Von: Thom Brown [mailto:thombrown@gmail.com]
Gesendet: Mittwoch, 3. März 2010 16:56
An: Markus Wollny
Cc: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Massive table bloat

If you update rows, it actually creates a new version of it.
The old one doesn't get removed until the VACUUM process
cleans it up, so maybe you need to run that against the database?

I already do on a nightly basis (which is probably not often enough in this case) and have got autovacuum running. I'll check into FSM settings as suggested by Grzegorz Jaśkiewicz, there's probably half a solution to the problem there, the other half being probably the autovacuum daemon not visiting this table nearly often enough.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276