large number dead tup - Postgres 9.5

Started by Patrick Bover 9 years ago5 messagesgeneral
Jump to latest
#1Patrick B
patrickbakerbr@gmail.com

Hi guys,

select schemaname,relname,n_live_tup, n_dead_tup from pg_stat_all_tables
where relname = 'parts';

schemaname relname n_live_tup n_dead_tup

---------- ------------- ---------- ----------
public parts 191623953 182477402

See the large number of dead_tup?

My autovacuum parameters are:

"autovacuum_vacuum_threshold" : "300",

"autovacuum_analyze_threshold" : "200",
"autovacuum_vacuum_scale_factor" : "0.005",
"autovacuum_analyze_scale_factor" : "0.002",

Table size: 68 GB

Why does that happen? Autovacuum shouldn't take care of dead_tuples?

Because of that the table is very slow...
When I do a select on that table it doesn't use an index, for example:

\d parts;

"index_parts_id" btree (company_id)
"index_parts_id_and_country" btree (company_id, country)

explain select * from parts WHERE company_id = 12;

Seq Scan on parts (cost=0.00..6685241.40 rows=190478997 width=223)

Filter: (company_id = 12)

Thanks
Patrick

#2Venkata B Nagothi
nag1010@gmail.com
In reply to: Patrick B (#1)
Re: large number dead tup - Postgres 9.5

On Mon, Sep 12, 2016 at 9:17 AM, Patrick B <patrickbakerbr@gmail.com> wrote:

Hi guys,

select schemaname,relname,n_live_tup, n_dead_tup from pg_stat_all_tables
where relname = 'parts';

schemaname relname n_live_tup n_dead_tup

---------- ------------- ---------- ----------
public parts 191623953 182477402

See the large number of dead_tup?

My autovacuum parameters are:

"autovacuum_vacuum_threshold" : "300",

"autovacuum_analyze_threshold" : "200",
"autovacuum_vacuum_scale_factor" : "0.005",
"autovacuum_analyze_scale_factor" : "0.002",

Table size: 68 GB

Why does that happen? Autovacuum shouldn't take care of dead_tuples?

Could you notice if the table is regularly getting vacuumed at all ? when
was the last_autovacuum and last_autoanalyze time ?

Because of that the table is very slow...
When I do a select on that table it doesn't use an index, for example:

\d parts;

"index_parts_id" btree (company_id)
"index_parts_id_and_country" btree (company_id, country)

explain select * from parts WHERE company_id = 12;

Seq Scan on parts (cost=0.00..6685241.40 rows=190478997 width=223)

Filter: (company_id = 12)

That should be due to not running VACUUM and ANALYZE. Did you VACUUM
ANALYZE and see if the query is picking up the Index. This is possible if
"company_id" has unique values.

Regards,
Venkata B N

Fujitsu Australia

#3Francisco Olarte
folarte@peoplecall.com
In reply to: Patrick B (#1)
Re: large number dead tup - Postgres 9.5

Hi:

On Mon, Sep 12, 2016 at 1:17 AM, Patrick B <patrickbakerbr@gmail.com> wrote:

schemaname relname n_live_tup n_dead_tup
---------- ------------- ---------- ----------
public parts 191623953 182477402

...

Because of that the table is very slow...
When I do a select on that table it doesn't use an index, for example:
\d parts;

"index_parts_id" btree (company_id)
"index_parts_id_and_country" btree (company_id, country)

explain select * from parts WHERE company_id = 12;

Seq Scan on parts (cost=0.00..6685241.40 rows=190478997 width=223)
Filter: (company_id = 12)

You've already been directed to check table is really getting vacuumed
/ analyzed, but I'd like to point that if the count estimates are
nearly correct that plan is good ( it's estimating getting more than
99% of the table, a seq scan tends to beat index scan easily when
selecting that big part of the table, even accounting for dead tuples
it's more about 50% of the table, and a seq scan is much faster PER
TUPLE then an index scan ( and and index scan would likely touch every
data page for that big fraction, so reading all of them sequentially
and oing a quick filter is easier )).

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Akash Bedi
abedi0501@gmail.com
In reply to: Francisco Olarte (#3)
Re: large number dead tup - Postgres 9.5

Note that a VACUUM wouldn't be able to remove the dead rows if there's a
long running active query OR any idle transaction in an isolation >=
Repeatable Read, tracking transactions in "pg_stat_activity" should help
you eliminate/track this activity. Also, the row estimates consider the
size of your table, so it isn't necessary that close estimates indicate an
ANALYZE operation performed, a better way to track this would be monitoring
results from "pg_stat_user_tables", tracking when was did the
autovacuum/analyze last performed on this table

Regards,
Akash

On Mon, Sep 12, 2016 at 4:36 PM, Francisco Olarte <folarte@peoplecall.com>
wrote:

Show quoted text

Hi:

On Mon, Sep 12, 2016 at 1:17 AM, Patrick B <patrickbakerbr@gmail.com>
wrote:

schemaname relname n_live_tup n_dead_tup
---------- ------------- ---------- ----------
public parts 191623953 182477402

...

Because of that the table is very slow...
When I do a select on that table it doesn't use an index, for example:
\d parts;

"index_parts_id" btree (company_id)
"index_parts_id_and_country" btree (company_id, country)

explain select * from parts WHERE company_id = 12;

Seq Scan on parts (cost=0.00..6685241.40 rows=190478997 width=223)
Filter: (company_id = 12)

You've already been directed to check table is really getting vacuumed
/ analyzed, but I'd like to point that if the count estimates are
nearly correct that plan is good ( it's estimating getting more than
99% of the table, a seq scan tends to beat index scan easily when
selecting that big part of the table, even accounting for dead tuples
it's more about 50% of the table, and a seq scan is much faster PER
TUPLE then an index scan ( and and index scan would likely touch every
data page for that big fraction, so reading all of them sequentially
and oing a quick filter is easier )).

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Melvin Davidson
melvin6925@gmail.com
In reply to: Akash Bedi (#4)
Re: large number dead tup - Postgres 9.5

On Mon, Sep 12, 2016 at 7:30 AM, Akash Bedi <abedi0501@gmail.com> wrote:

Note that a VACUUM wouldn't be able to remove the dead rows if there's a
long running active query OR any idle transaction in an isolation >=
Repeatable Read, tracking transactions in "pg_stat_activity" should help
you eliminate/track this activity. Also, the row estimates consider the
size of your table, so it isn't necessary that close estimates indicate an
ANALYZE operation performed, a better way to track this would be monitoring
results from "pg_stat_user_tables", tracking when was did the
autovacuum/analyze last performed on this table

Regards,
Akash

On Mon, Sep 12, 2016 at 4:36 PM, Francisco Olarte <folarte@peoplecall.com>
wrote:

Hi:

On Mon, Sep 12, 2016 at 1:17 AM, Patrick B <patrickbakerbr@gmail.com>
wrote:

schemaname relname n_live_tup n_dead_tup
---------- ------------- ---------- ----------
public parts 191623953 182477402

...

Because of that the table is very slow...
When I do a select on that table it doesn't use an index, for example:
\d parts;

"index_parts_id" btree (company_id)
"index_parts_id_and_country" btree (company_id, country)

explain select * from parts WHERE company_id = 12;

Seq Scan on parts (cost=0.00..6685241.40 rows=190478997 width=223)
Filter: (company_id = 12)

You've already been directed to check table is really getting vacuumed
/ analyzed, but I'd like to point that if the count estimates are
nearly correct that plan is good ( it's estimating getting more than
99% of the table, a seq scan tends to beat index scan easily when
selecting that big part of the table, even accounting for dead tuples
it's more about 50% of the table, and a seq scan is much faster PER
TUPLE then an index scan ( and and index scan would likely touch every
data page for that big fraction, so reading all of them sequentially
and oing a quick filter is easier )).

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Just out of curiosity, rather than rely on auto_vacuum, have you considered
scheduling a cron job to do a manual vacuum / analyze in off peak hours?

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.