# Reproducing ENOSPC Error in PostgreSQL

This script reproduces an `ENOSPC` (Error: No Space Left on Device) condition in PostgreSQL by exploiting filesystem-level extent allocation behavior under high-concurrency workloads. The issue is triggered by a combination of:

*   Mount option `allocsize=1M` on the `$PGDATA` mount point
*   Creation of many small tables (preallocating filesystem extents)
*   Parallel bulk inserts into a single large table (fragmenting free space)

This mimics real-world scenarios such as data migration or bulk ETL operations, where filesystem fragmentation leads to allocation failures even when total free space appears sufficient.

---

## Prerequisites

*   PostgreSQL 16.1 with `pgbench` installed
*   **XFS** filesystem (recommended)
*   `$PGDATA` and WAL logs located on different mount points
*   Mount option: `allocsize=1M` (or higher) on the `$PGDATA` mount point  
    *(This forces larger preallocation units, increasing fragmentation risk)*
*   Sufficient disk space (≥ 50 GB recommended)
*   Linux environment with `psql`, `pgbench`, `xargs`, and `seq`

---

## Key Factors for Reproduction

| Factor                    | Recommended Value   | Purpose                                                         |
| :------------------------ | :------------------ | :-------------------------------------------------------------- |
| `allocsize` mount option  | 1M                  | Forces large preallocations, increasing fragmentation risk      |
| Number of small tables    | 100–200             | Consumes allocation groups/clusters                             |
| Parallel threads          | 50–150              | Increases concurrency and allocation contention                 |
| Total rows inserted       | 5M–10M              | Pushes insert size beyond available contiguous extents          |
| Filesystem                | XFS                 | Exhibits this behavior under high fragmentation                 |

---

## Environment Setup

Set up XFS filesystems on separate disks for PGDATA and PGWAL with appropriate mount options:

```bash
# Format PGDATA disk with separate journal device and 128 allocation groups
mkfs.xfs -f -d agcount=128 -l logdev=/dev/journal_disk,size=64m /dev/pgdata_disk

# Format PGWAL disk
mkfs.xfs -f -d agcount=16 /dev/pgwal_disk

# Create mount points
mkdir /pgdata
mkdir /pgwal

# Mount PGDATA with allocsize=1M
mount -t xfs -o logdev=/dev/journal_disk,allocsize=1048576 /dev/pgdata_disk /pgdata

# Mount PGWAL
mount -t xfs /dev/pgwal_disk /pgwal
```

Important configuration details:

* PGDATA filesystem: XFS with separate journal device, mounted with allocsize=1M option
* Allocation groups: 128 AGs for PGDATA to increase fragmentation potential
* Separate mount: PGWAL on different disk/filesystem to isolate WAL impact
* Disk sizing: PGDATA disk should have sufficient space (≥ 50GB recommended)
* For PostgreSQL configuration, ensure data_directory points to /pgdata and consider setting WAL directory to /pgwal.

---

## Reproduction Script

The following bash script reproduces the ENOSPC error.

```bash
# Step 1: create initial table which will be used for copying rows
echo "preparing data.."
pgbench -U postgres -h localhost -p 5432 -i -I t postgres

# Step 2: Insert baseline data
psql -U postgres -h localhost -p 5432 -c "INSERT INTO pgbench_accounts(aid,bid,abalance,filler) SELECT gs.i AS aid,NULL,0,substring(md5(random()::text),0,84) from generate_series(1, 200000) gs(i)"

# Step 3: create 128 small tables in parallel (preallocates extents across AGs)
for i in $(seq 1 128); do echo $i; done | xargs -r -P 12 -I $$ psql -U postgres -h localhost -p 5432 -c "create table pgbench_accounts$$ as select * from pgbench_accounts" > /dev/null

# Step 4: clean up initial schema
pgbench -U postgres -h localhost -p 5432 -i -I d postgres

# Step 5
echo "reproducing.."
export THREADS=100
export PARTS=100
export TOTAL=6000000
export RANGE=$((TOTAL/PARTS))

# Step 6: insert 6M rows in 100 parallel batches into pgbench_accounts1
for i in $(seq 1 $PARTS); do echo $i; done | xargs -r -P $THREADS -I $$ psql -U postgres -h localhost -p 5432 -c "INSERT INTO pgbench_accounts1(aid,bid,abalance,filler) SELECT ($$*$RANGE)::integer+gs.i AS aid,NULL,0,substring(md5(random()::text),0,84) from generate_series(1, $RANGE) gs(i)" > /dev/null

# Step 7: final insert to push past threshold
psql -U postgres -h localhost -p 5432 -c "INSERT INTO pgbench_accounts1(aid,bid,abalance,filler) SELECT gs.i AS aid,NULL,0,substring(md5(random()::text),0,84) from generate_series(1, 200000) gs(i)" > /dev/null
```

---

## Important Notes
1. Step3 leads to creation of 128 tables, this consumes many allocation groups (AGs) on XFS and produces a lot of delayed preallocation events.
2. Step6 causes real issue with message "FATAL: could not extend file "base/xxxxx/xxxxxxxxx.xxxxx" with FileFallocate(): No space left on device" due to prior fragmentation from small tables, the filesystem cannot find a large enough contiguous free region — even if total free space is high ( but not available due to keeping by opened files descriptors )
3. Step7 should complete successfully if the ENOSPC issue did NOT occur, so that is prooving that space is enough for last step.
4. After a crash or restart, space is reclaimed as file descriptors are released. This makes the issue appear intermittent — but the root cause is filesystem fragmentation due to speculative preallocation, not actual disk exhaustion.
