[BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
To: pgsql-bugs@postgresql.org
Subject: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
Environment Information
PostgreSQL Version: 14.5
OS: Debian GNU/Linux 11 (bullseye)
Architecture: aarch64 (8-core CPU)
Memory: 8GB
Storage Medium: eMMC
Problem Description
Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). Queries targeting this record fail with:
plaintext
ERROR: invalid memory alloc request size 18446744073709551613
Corruption details of the record:
Only the guid field remains valid;
All int-type fields (e.g., upload_status) are set to 0;
All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;
String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null).
Timeline
Record insertion: 2025-11-25 21:23:51
Last modification: 2025-11-27 09:41:10
Anomaly window: 2025-12-01 03:00:00 ~ 09:02:31 (record was intact before 03:00:00)
First error detected: 2025-12-01 09:02:31 (triggered by a 30s-interval business cron job querying upload_status = 0)
Error Output (Query Example)
sql
postgres=# select * from tb_workstation_message where guid = 'ce04bd3f-232d-4c7f-9a91-540d0e581649'; ERROR: invalid memory alloc request size 18446744073709551613
Investigation Details
Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).
Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir), but the business app does NOT depend on this library — low corruption probability.
eMMC Storage Analysis:
Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0, confirming physical data corruption).
fstrim Check:
fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.
Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.
Impact
Business failure: Query errors trigger business alerts.
Backup failure: Database backups are affected by corrupted data.
Additional Notes
No system/app/PG restarts during the anomaly window.
No writes to tb_workstation_message after 2025-12-01 05:34:00.
No online operations/terminals between 05:34:00 ~ 09:02:31.
Request:
We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community help analyze the root cause, or guide further troubleshooting?
第108次明天
87326549@qq.com
从QQ邮箱发来的超大附件
postgresql_2025-12-01_000000(1).csv (739.5KB, 2026年1月17日 14:42) 进入下载页面 :https://wx.mail.qq.com/ftn/download?func=3&k=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&key=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&code=9259cf38&from=
Attachments:
D43EFDC4@A4F66438.70A2436900000000.pngapplication/octet-stream; name="D43EFDC4@A4F66438.70A2436900000000.png"Download
On Thu, 2025-12-18 at 14:42 +0800, 第108次明天 wrote:
Environment Information
* PostgreSQL Version: 14.5
* OS: Debian GNU/Linux 11 (bullseye)
* Architecture: aarch64 (8-core CPU)
* Memory: 8GB
* Storage Medium: eMMC
Problem Description
Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649).
Queries targeting this record fail with:
ERROR: invalid memory alloc request size 18446744073709551613
Corruption details of the record:
* Only the guid field remains valid;
* All int-type fields (e.g., upload_status) are set to 0;
* All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;
* String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null).
Investigation Details 1. Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).
2. Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir),
but the business app does NOT depend on this library — low corruption probability.
3. eMMC Storage Analysis:
- Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0,
confirming physical data corruption).
4. fstrim Check:
- fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.
- Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.
Impact * Business failure: Query errors trigger business alerts.
* Backup failure: Database backups are affected by corrupted data.
Additional Notes * No system/app/PG restarts during the anomaly window.
* No writes to tb_workstation_message after 2025-12-01 05:34:00.
* No online operations/terminals between 05:34:00 ~ 09:02:31.
Request:
We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community
help analyze the root cause, or guide further troubleshooting?
"fstrim" should not destroy data...
I am wondering how this problem can lead to backup failure - unless you are using "pg_dump" for backups.
You probably figured out that you will have to delete the row to get rid of the problem.
In addition, you should dump and restore the database to a new cluster - who knows what
other data corruption is lurking there.
I don't know what caused your problem, but I would suspect hardware failure.
Perhaps you should check or replace the disk.
Yours,
Laurenz Albe
- PostgreSQL Version: 14.5
That is way too old. Upgrade to 14.20
or guide further troubleshooting?
Run the pg_checksums program to get a handle on the extent of the corruption
https://www.postgresql.org/docs/14/app-pgchecksums.html
Database backups are affected by corrupted data.
I assume you are not using a good backup system then. pgBackRest, for
example, will not only check for corrupted pages during the backup, but
report on it and allow the backup to proceed.