slow sql query for big items

Started by Hua W Peng3 months ago3 messagesgeneral

Jump to latest

Hua W Peng

huawaltp@gmail.com

3 months ago

Hello,

I have a common table for telemetry data. the stru is:

Column | Type | Collation | Nullable |
Default

------------------------+--------------------------+-----------+----------+---------

record_time | timestamp with time zone | | not null |

station_name | text | | |

feeder_gis_id | text | | |

switch_name | text | | |

switch_oid | text | | not null |

switch_gis_id | text | | |

switch_status | integer | | |

switch_status_quality | integer | | |

active_power | numeric(18,6) | | |

active_power_quality | integer | | |

reactive_power | numeric(18,6) | | |

reactive_power_quality | integer | | |

current_a | numeric(18,6) | | |

current_a_quality | integer | | |

current_b | numeric(18,6) | | |

current_b_quality | integer | | |

current_c | numeric(18,6) | | |

current_c_quality | integer | | |

voltage_uab | numeric(18,6) | | |

voltage_uab_quality | integer | | |

voltage_ubc | numeric(18,6) | | |

voltage_ubc_quality | integer | | |

voltage_uca | numeric(18,6) | | |

voltage_uca_quality | integer | | |

created_at | timestamp with time zone | | |
now()

Indexes:

"dms_data_gzdy_pkey" PRIMARY KEY, btree (record_time, switch_oid)

"dms_data_gzdy_record_time_idx" btree (record_time DESC)

"idx_dms_feeder_gis_id" btree (feeder_gis_id, record_time)

"idx_dms_station_name" btree (station_name, record_time)

"idx_dms_switch_oid" btree (switch_oid, record_time)

Data records are growing by about *10 million* every day, reaching *300
million* per month. In this case, even a simple COUNT(*) query becomes
extremely slow, taking about 7-8 minutes to finish.

I am running PostgreSQL 14 on Ubuntu 22.04 with a 24GB shared buffer.

And, though in our test env we have timescaledb enabled:

Triggers:

ts_insert_blocker BEFORE INSERT ON dms_data_gzdy FOR EACH ROW EXECUTE
FUNCTION _timescaledb_functions.insert_blocker()

Number of child tables: 9 (Use \d+ to list them.)

But in production env there is no timescaledb which can't be installed as
well.

Can you help me?

Thanks.

Laurenz Albe

laurenz.albe@cybertec.at

3 months ago

In reply to: Hua W Peng (#1)

Re: slow sql query for big items

On Sat, 2026-03-28 at 15:07 +0800, Hua W Peng wrote:

I have a common table for telemetry data. the stru is:
[25 columns]

Data records are growing by about 10 million every day, reaching 300 million per month.
In this case, even a simple COUNT(*) query becomes extremely slow, taking about
7-8 minutes to finish.
I am running PostgreSQL 14 on Ubuntu 22.04 with a 24GB shared buffer.
And, though in our test env we have timescaledb enabled:

Triggers:
ts_insert_blocker BEFORE INSERT ON dms_data_gzdy FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Number of child tables: 9 (Use \d+ to list them.)

But in production env there is no timescaledb which can't be installed as well.

Can you help me?

First, a test environment should be as similar to production as possible,
otherwise it cannot serve its purpose.

There is little you can do about speeding up count(*), it is bound to be slow.
See https://www.cybertec-postgresql.com/en/postgresql-count-made-fast/

But if counting the rows is really your use case, you are doing something wrong.
I suspect that your real problem are other queries.

The way to get good performance with large tables it to have your queries use
an index scan. The indexes you need will depend on your queries, so without
knowing the queries, it is impossible to recommend anything.

Partitioning is not primarily a measuer for improving query performance,
but it would still be a smart idea, primarily to be able to delete old data
efficiently. It doesn't matter if you use TimescaleDB for partitioning or
use PostgreSQL's support directly.

Yours,
Laurenz Albe

Ron

ronljohnsonjr@gmail.com

3 months ago

In reply to: Hua W Peng (#1)

Re: slow sql query for big items

On Sat, Mar 28, 2026 at 3:07 AM Hua W Peng <huawaltp@gmail.com> wrote:

Hello,

I have a common table for telemetry data. the stru is:

Column | Type | Collation | Nullable
| Default

------------------------+--------------------------+-----------+----------+---------

record_time | timestamp with time zone | | not null
|

station_name | text | |
|

feeder_gis_id | text | |
|

switch_name | text | |
|

switch_oid | text | | not null
|

switch_gis_id | text | |
|

switch_status | integer | |
|

switch_status_quality | integer | |
|

active_power | numeric(18,6) | |
|

active_power_quality | integer | |
|

reactive_power | numeric(18,6) | |
|

reactive_power_quality | integer | |
|

current_a | numeric(18,6) | |
|

current_a_quality | integer | |
|

current_b | numeric(18,6) | |
|

current_b_quality | integer | |
|

current_c | numeric(18,6) | |
|

current_c_quality | integer | |
|

voltage_uab | numeric(18,6) | |
|

voltage_uab_quality | integer | |
|

voltage_ubc | numeric(18,6) | |
|

voltage_ubc_quality | integer | |
|

voltage_uca | numeric(18,6) | |
|

voltage_uca_quality | integer | |
|

created_at | timestamp with time zone | | |
now()

Indexes:

"dms_data_gzdy_pkey" PRIMARY KEY, btree (record_time, switch_oid)

"dms_data_gzdy_record_time_idx" btree (record_time DESC)

"idx_dms_feeder_gis_id" btree (feeder_gis_id, record_time)

"idx_dms_station_name" btree (station_name, record_time)

"idx_dms_switch_oid" btree (switch_oid, record_time)

Data records are growing by about *10 million* every day, reaching *300
million* per month.

How many months of data?

Is the production table partitioned? If so, by what date range?

In this case, even a simple COUNT(*) query becomes extremely slow, taking
about 7-8 minutes to finish.

I am running PostgreSQL 14

What minor version?

on Ubuntu 22.04 with a 24GB shared buffer.

Is that 25% of total RAM?

What's the effective_cache_size?

And, though in our test env we have timescaledb enabled:

Triggers:

ts_insert_blocker BEFORE INSERT ON dms_data_gzdy FOR EACH ROW EXECUTE
FUNCTION _timescaledb_functions.insert_blocker()

Number of child tables: 9 (Use \d+ to list them.)

But in production env there is no timescaledb which can't be installed as
well.

Laurenz is right: installing and using timescale in your *test* system *tests
timescale*. Why are you testing timescale when you can't install it in prod?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!