RFC: PostgreSQL Storage I/O Transformation Hooks
RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
Technical Protocol Between RDBMS Core and Data Security Experts
*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the PostgreSQL
storage layer and the addition of a *Transformation ID* field to the
PageHeader.
A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is *“Separation of Concerns and Mutual
Respect.”*
Historically, discussions around Transparent Data Encryption (TDE) have
often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not as
defendants to be judged by database-specific rules, but as an *equal
neighboring community* with their own specialized sovereignty.
*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were mired in
the technicalities of “how to hardcode encryption into the core,” this
proposal shifts the debate toward an architectural solution: “what
interface the core should provide to external experts.”
- *RDBMS Experts* provide a trusted pipeline responsible for data I/O
paths and consistency.
- *Security Experts* take responsibility for the specialized domain of
encryption algorithms and key management.
This hook system functions as a *Technical Protocol*—a high-level agreement
that allows these two expert groups to exchange data securely without
encroaching on each other’s territory.
------------------------------
2. Design Principles
1. *Delegation of Authority:* The core remains independent of specific
encryption standards, providing a “free territory” where security experts
can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The engine
uses this ID to identify the state of the data and hands over control to
the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks are not
in use, ensuring the native performance of the PostgreSQL engine.
------------------------------
3. Proposal Specifications 3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points along
the I/O path:
- *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
(Transformation of the data area)
- *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
transaction logs)
3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a page.
This serves as a *Status Message* sent by the security expert to the
engine, utilized for key versioning and as a migration marker.
------------------------------
4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
for Security Experts
This reference implementation exists not as a commercial product, but to
define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.
1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the engine
(e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations that
security logic must follow within “Critical Sections” to maintain system
stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that allows
peaceful coexistence with other expert tools (e.g., compression, auditing).
------------------------------
5. Scope
- *In-Scope:* Backend hook infrastructure, Transformation ID field, and
reference code demonstrating diplomatic protocol compliance.
- *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
specific cryptographic algorithms, and integration with external tools.
This proposal represents a strategic diplomatic choice: rather than the
PostgreSQL core assuming all security responsibilities, it grants security
experts a *sovereign territory through extensions* where they can perform
at their best.
Hello,
Following up on the RFC, I am submitting the initial patch set for the
proposed infrastructure. These patches introduce a minimal hook-based
protocol to allow extensions to handle data transformation, such as TDE,
while keeping the PostgreSQL core independent of specific cryptographic
implementations.
Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:
mdread_post_hook: Called after blocks are read from disk. The extension can
reverse-transform data in place.
mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
blocks. These hooks return a pointer to transformed buffers.
xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL
records during insertion and replay.
Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification
protocol:
On Write: The extension transforms the page, sets the Transform ID, then
recalculates the checksum on the transformed data.
On Read: The extension verifies the on-disk checksum of the transformed
data first. After reverse-transformation, it clears the Transform ID and
recalculates the checksum for the plaintext data. This ensures corruption
is detected regardless of the transformation state.
WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark
transformed data. If the decryption extension is not loaded, the WAL reader
will encounter this unknown block ID and fail-fast, preventing the system
from incorrectly interpreting encrypted data as valid WAL records.
PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform ID.
This allows the engine and extensions to identify the transformation state
of a page (e.g., key versioning or algorithm type) without attempting
decryption. It ensures backward compatibility: pages with Transform ID 0
are treated as standard untransformed pages.
Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher
contexts are pre-allocated in _PG_init to avoid memory allocation during
critical sections. For WAL transformation,
MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
within critical sections; if OOM occurs during buffer growth, it results in
a controlled PANIC.
Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL
pointer comparison per I/O operation. This is architecturally consistent
with existing PostgreSQL hooks and is designed to have a negligible impact
on performance.
Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core
infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
implementation using AES-256-CTR.
I look forward to your comments and feedback.
Regards,
Henson Choi
2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:
Show quoted text
RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
Technical Protocol Between RDBMS Core and Data Security Experts*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------
1. Summary & MotivationThis RFC proposes the introduction of minimal hooks into the PostgreSQL
storage layer and the addition of a *Transformation ID* field to the
PageHeader.
A Diplomatic Protocol Between Expert GroupsThe core motivation of this proposal is *“Separation of Concerns and
Mutual Respect.”*Historically, discussions around Transparent Data Encryption (TDE) have
often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not as
defendants to be judged by database-specific rules, but as an *equal
neighboring community* with their own specialized sovereignty.*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were mired
in the technicalities of “how to hardcode encryption into the core,” this
proposal shifts the debate toward an architectural solution: “what
interface the core should provide to external experts.”- *RDBMS Experts* provide a trusted pipeline responsible for data I/O
paths and consistency.
- *Security Experts* take responsibility for the specialized domain of
encryption algorithms and key management.This hook system functions as a *Technical Protocol*—a high-level
agreement that allows these two expert groups to exchange data securely
without encroaching on each other’s territory.
------------------------------
2. Design Principles1. *Delegation of Authority:* The core remains independent of specific
encryption standards, providing a “free territory” where security experts
can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The engine
uses this ID to identify the state of the data and hands over control to
the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks are
not in use, ensuring the native performance of the PostgreSQL engine.------------------------------
3. Proposal Specifications 3.1 The Interface (Hook Points)We allow intervention by security experts through five contact points
along the I/O path:- *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
(Transformation of the data area)
- *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a page.
This serves as a *Status Message* sent by the security expert to the
engine, utilized for key versioning and as a migration marker.
------------------------------
4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
for Security ExpertsThis reference implementation exists not as a commercial product, but to
define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the engine
(e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations
that security logic must follow within “Critical Sections” to maintain
system stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that allows
peaceful coexistence with other expert tools (e.g., compression, auditing).------------------------------
5. Scope- *In-Scope:* Backend hook infrastructure, Transformation ID field,
and reference code demonstrating diplomatic protocol compliance.
- *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
specific cryptographic algorithms, and integration with external tools.This proposal represents a strategic diplomatic choice: rather than the
PostgreSQL core assuming all security responsibilities, it grants security
experts a *sovereign territory through extensions* where they can perform
at their best.
Attachments:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchapplication/x-patch; name=v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchDownload+194-2
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patchapplication/x-patch; name=v20251228-0002-Add-test_tde-extension-for-TDE-testing.patchDownload+1488-3
Updated patches with meson build support:
v2:
- Added meson.build for test_tde extension
- Added test_tde to contrib/meson.build
Regards,
Henson Choi
2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:
Show quoted text
Hello,
Following up on the RFC, I am submitting the initial patch set for the
proposed infrastructure. These patches introduce a minimal hook-based
protocol to allow extensions to handle data transformation, such as TDE,
while keeping the PostgreSQL core independent of specific cryptographic
implementations.Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:mdread_post_hook: Called after blocks are read from disk. The extension
can reverse-transform data in place.mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
blocks. These hooks return a pointer to transformed buffers.xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for WAL
records during insertion and replay.Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific verification
protocol:On Write: The extension transforms the page, sets the Transform ID, then
recalculates the checksum on the transformed data.On Read: The extension verifies the on-disk checksum of the transformed
data first. After reverse-transformation, it clears the Transform ID and
recalculates the checksum for the plaintext data. This ensures corruption
is detected regardless of the transformation state.WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark
transformed data. If the decryption extension is not loaded, the WAL reader
will encounter this unknown block ID and fail-fast, preventing the system
from incorrectly interpreting encrypted data as valid WAL records.PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform
ID. This allows the engine and extensions to identify the transformation
state of a page (e.g., key versioning or algorithm type) without attempting
decryption. It ensures backward compatibility: pages with Transform ID 0
are treated as standard untransformed pages.Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher
contexts are pre-allocated in _PG_init to avoid memory allocation during
critical sections. For WAL transformation,
MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
within critical sections; if OOM occurs during buffer growth, it results in
a controlled PANIC.Performance Considerations
When hooks are not set (default), the overhead is limited to a single NULL
pointer comparison per I/O operation. This is architecturally consistent
with existing PostgreSQL hooks and is designed to have a negligible impact
on performance.Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core
infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
implementation using AES-256-CTR.I look forward to your comments and feedback.
Regards,
Henson Choi
2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:
RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
Technical Protocol Between RDBMS Core and Data Security Experts*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------
1. Summary & MotivationThis RFC proposes the introduction of minimal hooks into the PostgreSQL
storage layer and the addition of a *Transformation ID* field to the
PageHeader.
A Diplomatic Protocol Between Expert GroupsThe core motivation of this proposal is *“Separation of Concerns and
Mutual Respect.”*Historically, discussions around Transparent Data Encryption (TDE) have
often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not as
defendants to be judged by database-specific rules, but as an *equal
neighboring community* with their own specialized sovereignty.*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were mired
in the technicalities of “how to hardcode encryption into the core,” this
proposal shifts the debate toward an architectural solution: “what
interface the core should provide to external experts.”- *RDBMS Experts* provide a trusted pipeline responsible for data I/O
paths and consistency.
- *Security Experts* take responsibility for the specialized domain
of encryption algorithms and key management.This hook system functions as a *Technical Protocol*—a high-level
agreement that allows these two expert groups to exchange data securely
without encroaching on each other’s territory.
------------------------------
2. Design Principles1. *Delegation of Authority:* The core remains independent of
specific encryption standards, providing a “free territory” where security
experts can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The engine
uses this ID to identify the state of the data and hands over control to
the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks are
not in use, ensuring the native performance of the PostgreSQL engine.------------------------------
3. Proposal Specifications 3.1 The Interface (Hook Points)We allow intervention by security experts through five contact points
along the I/O path:- *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
(Transformation of the data area)
- *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a page.
This serves as a *Status Message* sent by the security expert to the
engine, utilized for key versioning and as a migration marker.
------------------------------
4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
for Security ExpertsThis reference implementation exists not as a commercial product, but to
define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the engine
(e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations
that security logic must follow within “Critical Sections” to maintain
system stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that allows
peaceful coexistence with other expert tools (e.g., compression, auditing).------------------------------
5. Scope- *In-Scope:* Backend hook infrastructure, Transformation ID field,
and reference code demonstrating diplomatic protocol compliance.
- *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
specific cryptographic algorithms, and integration with external tools.This proposal represents a strategic diplomatic choice: rather than the
PostgreSQL core assuming all security responsibilities, it grants security
experts a *sovereign territory through extensions* where they can
perform at their best.
Attachments:
v20251228-v2-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchapplication/octet-stream; name=v20251228-v2-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchDownload+194-2
v20251228-v2-0002-Add-test_tde-extension-for-TDE-testing.patchapplication/octet-stream; name=v20251228-v2-0002-Add-test_tde-extension-for-TDE-testing.patchDownload+1526-3
On 28/12/2025 9:49 AM, Henson Choi wrote:
RFC: PostgreSQL Storage I/O Transformation Hooks
Infrastructure for a Technical Protocol Between RDBMS Core and
Data Security Experts*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------------------------------------------------
1. Summary & Motivation
This RFC proposes the introduction of minimal hooks into the
PostgreSQL storage layer and the addition of a *Transformation ID*
field to the |PageHeader|.A Diplomatic Protocol Between Expert Groups
The core motivation of this proposal is *“Separation of Concerns and
Mutual Respect.”*Historically, discussions around Transparent Data Encryption (TDE)
have often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not
as defendants to be judged by database-specific rules, but as an
*equal neighboring community* with their own specialized sovereignty.*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were
mired in the technicalities of “how to hardcode encryption into the
core,” this proposal shifts the debate toward an architectural
solution: “what interface the core should provide to external experts.”* *RDBMS Experts* provide a trusted pipeline responsible for data
I/O paths and consistency.
* *Security Experts* take responsibility for the specialized domain
of encryption algorithms and key management.This hook system functions as a *Technical Protocol*—a high-level
agreement that allows these two expert groups to exchange data
securely without encroaching on each other’s territory.------------------------------------------------------------------------
2. Design Principles
1. *Delegation of Authority:* The core remains independent of
specific encryption standards, providing a “free territory” where
security experts can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The
engine uses this ID to identify the state of the data and hands
over control to the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks are
not in use, ensuring the native performance of the PostgreSQL engine.------------------------------------------------------------------------
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points
along the I/O path:* *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre|
(Transformation of the data area)
* *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation
of transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of |pd_flags| to define the “Security State” of a
page. This serves as a *Status Message* sent by the security expert to
the engine, utilized for key versioning and as a migration marker.------------------------------------------------------------------------
4. Reference Implementation: |contrib/test_tde|
A Standard Code of Conduct for Security Experts
This reference implementation exists not as a commercial product, but
to define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the
engine (e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations
that security logic must follow within “Critical Sections” to
maintain system stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that allows
peaceful coexistence with other expert tools (e.g., compression,
auditing).------------------------------------------------------------------------
5. Scope
* *In-Scope:* Backend hook infrastructure, Transformation ID field,
and reference code demonstrating diplomatic protocol compliance.
* *Out-of-Scope:* Specific Key Management Systems (KMS), selection
of specific cryptographic algorithms, and integration with
external tools.This proposal represents a strategic diplomatic choice: rather than
the PostgreSQL core assuming all security responsibilities, it grants
security experts a *sovereign territory through extensions* where they
can perform at their best.
I wonder if instead of support a lot of extra hooks it will be better to
provide extensible SMGR API:
/messages/by-id/CAPP=Hha_wV1MV9yR70QZ5pk5dtNP+bOyBiFxPmrMKqnQeKMAwQ@mail.gmail.com
It seems to be much more straightforward, convenient and flexible
mechanism than adding hooks, which can be used for many other purposes
except transparent encryption.
Hello!
I am glad to see that there are multiple TDE extension proposals being
worked on. For context, I am one of the developers working on the
pg_tde[1]https://github.com/percona/pg_tde extension, as well as on the extensible SMGR proposal that
Konstantin already linked.
This patch/proposal contains two distinct parts of
encryption/extensibility, WAL and buffer manager/table data. Based on
earlier discussions, the opinions of adding extension points to these
two are quite different, and because of that I'm not sure if bundling
them together is helpful.
It also appears to be missing some extension points that would be
required for a more complete encryption solution, such as encrypting
temporary files or system tables, or handling command-line utilities
like pg_waldump. Do you have ideas or patches in mind for those areas
as well?
I have the same question as Konstantin, why did you choose custom
hooks for the buffer manager instead of the already existing smgr
interface / extensibility patch? While that patch is not part of the
core (but I hope it will be), it is already used by multiple companies
as it supports other use cases, not only encryption. We plan to focus
more on that thread early next year, we would appreciate any
feedback/suggestions that could make it better for others.
I also noticed that you added additional flags to the page header.
Initially we were thinking about something like this, but decided that
the fork files are better for any encryption (or other storage
related) extra data. These few bits try to be generic, while also
restrictive because of the limited amount of data. (and that data is
specifically per page, if I want something per file or per page range,
I still need a custom solution)
Regarding the WAL encryption part, we took a completely different
approach, similar to how we handle normal table data (page-based). I
will need to think more about this before I can provide meaningful
feedback on that part of the patch. One initial question, however, is
whether you have run detailed benchmarks with different workloads.
That seems to be the trickiest part there, since most of the code runs
in a critical section. (Not the "unused"/"empty hook" path, but the
overhead caused by a real encryption plugin using this hook in
practice)
Hi,
Here is v3 of the Storage I/O Transform Hooks patch.
Changes from v2:
- Fix -Wincompatible-pointer-types error in bufmgr.c by casting
&bufdata to (void **) for mdread_post_hook call
v2 changes were:
- Add meson.build test configuration for test_tde extension
--
Best regards,
Sungkyun Park
2025년 12월 28일 (일) PM 7:44, Henson Choi <assam258@gmail.com>님이 작성:
Show quoted text
Updated patches with meson build support:
v2:
- Added meson.build for test_tde extension
- Added test_tde to contrib/meson.buildRegards,
Henson Choi2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:
Hello,
Following up on the RFC, I am submitting the initial patch set for the
proposed infrastructure. These patches introduce a minimal hook-based
protocol to allow extensions to handle data transformation, such as TDE,
while keeping the PostgreSQL core independent of specific cryptographic
implementations.Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:mdread_post_hook: Called after blocks are read from disk. The extension
can reverse-transform data in place.mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
blocks. These hooks return a pointer to transformed buffers.xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for
WAL records during insertion and replay.Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific
verification protocol:On Write: The extension transforms the page, sets the Transform ID, then
recalculates the checksum on the transformed data.On Read: The extension verifies the on-disk checksum of the transformed
data first. After reverse-transformation, it clears the Transform ID and
recalculates the checksum for the plaintext data. This ensures corruption
is detected regardless of the transformation state.WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark
transformed data. If the decryption extension is not loaded, the WAL reader
will encounter this unknown block ID and fail-fast, preventing the system
from incorrectly interpreting encrypted data as valid WAL records.PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform
ID. This allows the engine and extensions to identify the transformation
state of a page (e.g., key versioning or algorithm type) without attempting
decryption. It ensures backward compatibility: pages with Transform ID 0
are treated as standard untransformed pages.Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher
contexts are pre-allocated in _PG_init to avoid memory allocation during
critical sections. For WAL transformation,
MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
within critical sections; if OOM occurs during buffer growth, it results in
a controlled PANIC.Performance Considerations
When hooks are not set (default), the overhead is limited to a single
NULL pointer comparison per I/O operation. This is architecturally
consistent with existing PostgreSQL hooks and is designed to have a
negligible impact on performance.Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core
infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
implementation using AES-256-CTR.I look forward to your comments and feedback.
Regards,
Henson Choi
2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:
RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
Technical Protocol Between RDBMS Core and Data Security Experts*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------
1. Summary & MotivationThis RFC proposes the introduction of minimal hooks into the PostgreSQL
storage layer and the addition of a *Transformation ID* field to the
PageHeader.
A Diplomatic Protocol Between Expert GroupsThe core motivation of this proposal is *“Separation of Concerns and
Mutual Respect.”*Historically, discussions around Transparent Data Encryption (TDE) have
often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not as
defendants to be judged by database-specific rules, but as an *equal
neighboring community* with their own specialized sovereignty.*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were mired
in the technicalities of “how to hardcode encryption into the core,” this
proposal shifts the debate toward an architectural solution: “what
interface the core should provide to external experts.”- *RDBMS Experts* provide a trusted pipeline responsible for data
I/O paths and consistency.
- *Security Experts* take responsibility for the specialized domain
of encryption algorithms and key management.This hook system functions as a *Technical Protocol*—a high-level
agreement that allows these two expert groups to exchange data securely
without encroaching on each other’s territory.
------------------------------
2. Design Principles1. *Delegation of Authority:* The core remains independent of
specific encryption standards, providing a “free territory” where security
experts can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The engine
uses this ID to identify the state of the data and hands over control to
the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks are
not in use, ensuring the native performance of the PostgreSQL engine.------------------------------
3. Proposal Specifications 3.1 The Interface (Hook Points)We allow intervention by security experts through five contact points
along the I/O path:- *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
(Transformation of the data area)
- *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a
page. This serves as a *Status Message* sent by the security expert to
the engine, utilized for key versioning and as a migration marker.
------------------------------
4. Reference Implementation: contrib/test_tde A Standard Code of
Conduct for Security ExpertsThis reference implementation exists not as a commercial product, but to
define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the engine
(e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations
that security logic must follow within “Critical Sections” to maintain
system stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that allows
peaceful coexistence with other expert tools (e.g., compression, auditing).------------------------------
5. Scope- *In-Scope:* Backend hook infrastructure, Transformation ID field,
and reference code demonstrating diplomatic protocol compliance.
- *Out-of-Scope:* Specific Key Management Systems (KMS), selection
of specific cryptographic algorithms, and integration with external tools.This proposal represents a strategic diplomatic choice: rather than the
PostgreSQL core assuming all security responsibilities, it grants security
experts a *sovereign territory through extensions* where they can
perform at their best.
Attachments:
v20251228-v3-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchapplication/octet-stream; name=v20251228-v3-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchDownload+194-2
v20251228-v3-0002-Add-test_tde-extension-for-TDE-testing.patchapplication/octet-stream; name=v20251228-v3-0002-Add-test_tde-extension-for-TDE-testing.patchDownload+1526-3
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Konstantin,
I have great respect for the work being done on the extensible SMGR API.
It is a powerful tool for use cases that require replacing the entire
storage layer (like Neon's architecture).
However, I believe we should distinguish between Storage Management
(where/how data is stored) and Data Transformation (what the data looks
like). I see a strong case for both approaches to coexist for the
following practical reasons:
1. Separation of Concerns and Safety
Is it reasonable to ask cryptography experts to clone the entire SMGR
implementation and maintain code they don't fully understand just to
insert encryption logic? If an extension developer clones md.c to add
encryption, they become responsible for the fundamental integrity of
PostgreSQL's file I/O. Any bug in their cloned storage logic could lead
to data loss unrelated to encryption itself.
2. The Maintenance Debt of "Cloning"
When md.c receives critical security patches or bug fixes in the core,
every TDE extension maintainer would need to manually backport those
changes to their specific SMGR implementation. This creates a fragmented
ecosystem where security extensions might actually introduce storage
vulnerabilities by running outdated cloned logic.
3. Minimalist Integration
The hook approach allows crypto experts to focus strictly on transform()
and reverse_transform(). The complex storage orchestration remains with
the PostgreSQL core where it is most rigorously tested. This is a cleaner
separation of responsibilities: the core provides the trusted pipeline,
and the extension provides the specialized transformation.
Conclusion:
I believe these hooks provide a "low-barrier, high-safety" path for data
transformation that the SMGR API—by its very nature of being a full
replacement—cannot easily provide. Let's provide the SMGR for those who
want to reinvent the storage, and hooks for those who simply want to
secure the data.
Best regards,
Henson Choi
2025년 12월 28일 (일) PM 9:11, Konstantin Knizhnik <knizhnik@garret.ru>님이 작성:
Show quoted text
On 28/12/2025 9:49 AM, Henson Choi wrote:
RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
Technical Protocol Between RDBMS Core and Data Security Experts*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------
1. Summary & MotivationThis RFC proposes the introduction of minimal hooks into the PostgreSQL
storage layer and the addition of a *Transformation ID* field to the
PageHeader.
A Diplomatic Protocol Between Expert GroupsThe core motivation of this proposal is *“Separation of Concerns and
Mutual Respect.”*Historically, discussions around Transparent Data Encryption (TDE) have
often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not as
defendants to be judged by database-specific rules, but as an *equal
neighboring community* with their own specialized sovereignty.*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were mired
in the technicalities of “how to hardcode encryption into the core,” this
proposal shifts the debate toward an architectural solution: “what
interface the core should provide to external experts.”- *RDBMS Experts* provide a trusted pipeline responsible for data I/O
paths and consistency.
- *Security Experts* take responsibility for the specialized domain of
encryption algorithms and key management.This hook system functions as a *Technical Protocol*—a high-level
agreement that allows these two expert groups to exchange data securely
without encroaching on each other’s territory.
------------------------------
2. Design Principles1. *Delegation of Authority:* The core remains independent of specific
encryption standards, providing a “free territory” where security experts
can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The engine
uses this ID to identify the state of the data and hands over control to
the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks are
not in use, ensuring the native performance of the PostgreSQL engine.------------------------------
3. Proposal Specifications 3.1 The Interface (Hook Points)We allow intervention by security experts through five contact points
along the I/O path:- *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
(Transformation of the data area)
- *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a page.
This serves as a *Status Message* sent by the security expert to the
engine, utilized for key versioning and as a migration marker.
------------------------------
4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
for Security ExpertsThis reference implementation exists not as a commercial product, but to
define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the engine
(e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations
that security logic must follow within “Critical Sections” to maintain
system stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that allows
peaceful coexistence with other expert tools (e.g., compression, auditing).------------------------------
5. Scope- *In-Scope:* Backend hook infrastructure, Transformation ID field,
and reference code demonstrating diplomatic protocol compliance.
- *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
specific cryptographic algorithms, and integration with external tools.This proposal represents a strategic diplomatic choice: rather than the
PostgreSQL core assuming all security responsibilities, it grants security
experts a *sovereign territory through extensions* where they can perform
at their best.I wonder if instead of support a lot of extra hooks it will be better to
provide extensible SMGR API:/messages/by-id/CAPP=Hha_wV1MV9yR70QZ5pk5dtNP+bOyBiFxPmrMKqnQeKMAwQ@mail.gmail.com
It seems to be much more straightforward, convenient and flexible
mechanism than adding hooks, which can be used for many other purposes
except transparent encryption.
Is it reasonable to ask cryptography experts to clone the entire SMGR
implementation and maintain code they don't fully understand just to
insert encryption logic?
You don't have to clone the md.c logic with the recent smgr extension
patch, it does the same thing your patch does: it lets you hook into
it while still keeping the original md.c implementation. The
difference is that it doesn't add additional hooks to the API, instead
it makes all of the existing smgr/md.c functions hooks.
This also means that it lets different extensions work together in a
more generic way. For example an extension that wants to retrieve data
files from cloud storage when needed (prepending the original md.c
logic), and an encryption extension that wants to decrypt data after
loading it (appending to the original md.c logic) can both work
together while keeping the original logic in place.
Or if it's about mdwritev, in this patch you added a new
mdwrite_pre_hook - but it is executed at a specific point during
mdwrite. In the generic smgr patch, mdwritev itself (or smgr_writev
more specifically) is a hook, you can change it, and then call the
previous implementation (typically mdwritev) when you want it, either
before or after your custom code.
(the latest submitted version of the smgr patch doesn't use typical
postgres-style hooks, but that's one of the things we probably should
change. The intention is the same)
There's no maintenance fee of cloning, because neither extension
cloned the original md.c logic, both extended it.
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Zsolt,
Thank you for your detailed questions. I'll address each point:
1. Bundling WAL and Buffer Manager
WAL and heap pages are simply different representations of the same
underlying data. Protecting only one side would be cryptographically
incomplete; an attacker could bypass encryption by reading the
unprotected side. Therefore, they must be treated as a single atomic
unit of protection.
2. Scope: Temporary Files, System Tables, and Frontend Tools
I intentionally kept the scope focused. Past TDE proposals often stalled
because they tried to solve everything at once, becoming too large to
review. I prefer a "divide-and-conquer" approach:
- Temporary files: Out of scope for this initial infrastructure proposal.
- System tables: While they cannot be encrypted during bootstrap (since
extensions aren't loaded), they can be transformed page-by-page during
normal operation.
- Frontend tools (pg_waldump, etc.): I am aware of this and have modified
versions. Currently, there is no standard mechanism for frontend hooks,
making this a broader challenge. For production, extensions could ship
their own modified frontend tools temporarily. Long-term, we may need
initdb-time configurations to unify backend/frontend hook behavior
that are fixed for the lifetime of the cluster.
3. Why Hooks Instead of SMGR
Please see my response to Konstantin in this thread regarding maintenance
debt and the "Separation of Concerns" between storage management and data
transformation.
4. Page Header Flags vs. Fork Files
My primary concern with using fork files for encryption metadata is crash
recovery. If a fork file and the actual data page become inconsistent
(e.g., during a crash), recovery becomes problematic because fork files
are not typically protected by WAL.
Storing the Transform ID in the header flags ensures that the metadata
travels with the page. This is essential for incremental key rotation,
where pages are gradually re-encrypted with newer keys over time. The
oldest key's pages are force-rotated, allowing continuous key rotation
without service interruption. I plan to propose a separate RFC for this
"gradual rotation" mechanism.
5. Benchmarks and Critical Section Overhead
Transformation happens inside the critical section but before acquiring
the WAL lock. On consumer-grade SSDs, the encryption latency is largely
masked by I/O wait times with negligible performance impact. On
high-performance storage (production SSDs, Apple Silicon, etc.), the
reduced I/O wait exposes the encryption overhead, which is visible but
modest. Detailed benchmarks require company approval - I will follow up
later.
Best regards,
Henson Choi
2025년 12월 28일 (일) PM 10:12, Zsolt Parragi <zsolt.parragi@percona.com>님이 작성:
Show quoted text
Hello!
I am glad to see that there are multiple TDE extension proposals being
worked on. For context, I am one of the developers working on the
pg_tde[1] extension, as well as on the extensible SMGR proposal that
Konstantin already linked.This patch/proposal contains two distinct parts of
encryption/extensibility, WAL and buffer manager/table data. Based on
earlier discussions, the opinions of adding extension points to these
two are quite different, and because of that I'm not sure if bundling
them together is helpful.It also appears to be missing some extension points that would be
required for a more complete encryption solution, such as encrypting
temporary files or system tables, or handling command-line utilities
like pg_waldump. Do you have ideas or patches in mind for those areas
as well?I have the same question as Konstantin, why did you choose custom
hooks for the buffer manager instead of the already existing smgr
interface / extensibility patch? While that patch is not part of the
core (but I hope it will be), it is already used by multiple companies
as it supports other use cases, not only encryption. We plan to focus
more on that thread early next year, we would appreciate any
feedback/suggestions that could make it better for others.I also noticed that you added additional flags to the page header.
Initially we were thinking about something like this, but decided that
the fork files are better for any encryption (or other storage
related) extra data. These few bits try to be generic, while also
restrictive because of the limited amount of data. (and that data is
specifically per page, if I want something per file or per page range,
I still need a custom solution)Regarding the WAL encryption part, we took a completely different
approach, similar to how we handle normal table data (page-based). I
will need to think more about this before I can provide meaningful
feedback on that part of the patch. One initial question, however, is
whether you have run detailed benchmarks with different workloads.
That seems to be the trickiest part there, since most of the code runs
in a critical section. (Not the "unused"/"empty hook" path, but the
overhead caused by a real encryption plugin using this hook in
practice)
On 28/12/2025 4:53 PM, Henson Choi wrote:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Konstantin,
I have great respect for the work being done on the extensible SMGR API.
It is a powerful tool for use cases that require replacing the entire
storage layer (like Neon's architecture).However, I believe we should distinguish between Storage Management
(where/how data is stored) and Data Transformation (what the data looks
like). I see a strong case for both approaches to coexist for the
following practical reasons:1. Separation of Concerns and Safety
Is it reasonable to ask cryptography experts to clone the entire SMGR
implementation and maintain code they don't fully understand just to
insert encryption logic? If an extension developer clones md.c to add
encryption, they become responsible for the fundamental integrity of
PostgreSQL's file I/O. Any bug in their cloned storage logic could lead
to data loss unrelated to encryption itself.2. The Maintenance Debt of "Cloning"
When md.c receives critical security patches or bug fixes in the core,
every TDE extension maintainer would need to manually backport those
changes to their specific SMGR implementation. This creates a fragmented
ecosystem where security extensions might actually introduce storage
vulnerabilities by running outdated cloned logic.3. Minimalist Integration
The hook approach allows crypto experts to focus strictly on transform()
and reverse_transform(). The complex storage orchestration remains with
the PostgreSQL core where it is most rigorously tested. This is a cleaner
separation of responsibilities: the core provides the trusted pipeline,
and the extension provides the specialized transformation.Conclusion:
I believe these hooks provide a "low-barrier, high-safety" path for data
transformation that the SMGR API—by its very nature of being a full
replacement—cannot easily provide. Let's provide the SMGR for those who
want to reinvent the storage, and hooks for those who simply want to
secure the data.Best regards,
Henson Choi
I do not think that custom SMGR API contradicts to the idea of Data
Transformation.
Do you know about decorator pattern?
If you want to implement i.e. data encryption, you definitely do not
need to write your storage manager from the scratch.
Obviously you can (and should) use standard storage manager (md.c) for
actually performing IO.
But your storage manager can perform some extra action prior of after
IO, for example encrypt data before write and decrypt it after read.
So any pre/post/instead hooks can be easily implemented using custom SMGR.
Opposite unfortunately is not possible. You can not for example
implement encryption+compression using hooks.
But you can easily do it using custom SMGR: this is how compressed file
system (CFS) was implemented in PgPro.
Hi Konstantin,
I understand the decorator pattern, and yes, it can work for some cases.
But decorators can only intercept at the beginning and end of functions.
Looking at the actual hook locations in md.c:
- mdextend_pre_hook: after error checks, before file open → Decorator
possible
- mdwrite_pre_hook: after assertions, before I/O loop → Decorator possible
- mdread_post_hook: inside the segment loop → Decorator NOT possible
The mdreadv() function, introduced in PostgreSQL 17 as part of the
vectored I/O API, processes multiple blocks in a loop that respects
segment boundaries. The decryption hook must be called inside this loop,
after each segment's FileReadV() completes. A decorator wrapping mdreadv()
from the outside cannot access this internal loop timing.
With the SMGR decorator approach, the extension developer must:
- Track upstream md.c changes
- Replicate the internal loop logic to find the right decryption point
With hooks, the extension developer only needs to:
- Implement encrypt() and decrypt()
Regarding encryption+compression: that's a valid use case for SMGR,
but our primary concern is different. In South Korea, government
regulations require the use of nationally-approved cryptographic
algorithms (such as ARIA, SEED). This means organizations often cannot
adopt foreign TDE solutions, regardless of their technical merit.
We need a simple, stable hook interface that allows local security
experts to integrate these required algorithms - experts who understand
cryptography but not PostgreSQL storage internals.
If both approaches can coexist, why not provide hooks for the simple
case and SMGR for the complex case?
Best regards,
Henson Choi
2025년 12월 29일 (월) AM 12:27, Konstantin Knizhnik <knizhnik@garret.ru>님이 작성:
Show quoted text
On 28/12/2025 4:53 PM, Henson Choi wrote:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Konstantin,
I have great respect for the work being done on the extensible SMGR API.
It is a powerful tool for use cases that require replacing the entire
storage layer (like Neon's architecture).However, I believe we should distinguish between Storage Management
(where/how data is stored) and Data Transformation (what the data looks
like). I see a strong case for both approaches to coexist for the
following practical reasons:1. Separation of Concerns and Safety
Is it reasonable to ask cryptography experts to clone the entire SMGR
implementation and maintain code they don't fully understand just to
insert encryption logic? If an extension developer clones md.c to add
encryption, they become responsible for the fundamental integrity of
PostgreSQL's file I/O. Any bug in their cloned storage logic could lead
to data loss unrelated to encryption itself.2. The Maintenance Debt of "Cloning"
When md.c receives critical security patches or bug fixes in the core,
every TDE extension maintainer would need to manually backport those
changes to their specific SMGR implementation. This creates a fragmented
ecosystem where security extensions might actually introduce storage
vulnerabilities by running outdated cloned logic.3. Minimalist Integration
The hook approach allows crypto experts to focus strictly on transform()
and reverse_transform(). The complex storage orchestration remains with
the PostgreSQL core where it is most rigorously tested. This is a cleaner
separation of responsibilities: the core provides the trusted pipeline,
and the extension provides the specialized transformation.Conclusion:
I believe these hooks provide a "low-barrier, high-safety" path for data
transformation that the SMGR API—by its very nature of being a full
replacement—cannot easily provide. Let's provide the SMGR for those who
want to reinvent the storage, and hooks for those who simply want to
secure the data.Best regards,
Henson ChoiI do not think that custom SMGR API contradicts to the idea of Data
Transformation.
Do you know about decorator pattern?
If you want to implement i.e. data encryption, you definitely do not
need to write your storage manager from the scratch.
Obviously you can (and should) use standard storage manager (md.c) for
actually performing IO.
But your storage manager can perform some extra action prior of after
IO, for example encrypt data before write and decrypt it after read.
So any pre/post/instead hooks can be easily implemented using custom SMGR.Opposite unfortunately is not possible. You can not for example
implement encryption+compression using hooks.
But you can easily do it using custom SMGR: this is how compressed file
system (CFS) was implemented in PgPro.
On 28/12/2025 5:51 PM, Henson Choi wrote:
Hi Konstantin,
I understand the decorator pattern, and yes, it can work for some cases.
But decorators can only intercept at the beginning and end of functions.Looking at the actual hook locations in md.c:
- mdextend_pre_hook: after error checks, before file open → Decorator
possible
- mdwrite_pre_hook: after assertions, before I/O loop → Decorator possible
- mdread_post_hook: inside the segment loop → Decorator NOT possibleThe mdreadv() function, introduced in PostgreSQL 17 as part of the
vectored I/O API, processes multiple blocks in a loop that respects
segment boundaries. The decryption hook must be called inside this loop,
after each segment's FileReadV() completes. A decorator wrapping mdreadv()
from the outside cannot access this internal loop timing.With the SMGR decorator approach, the extension developer must:
- Track upstream md.c changes
- Replicate the internal loop logic to find the right decryption pointWith hooks, the extension developer only needs to:
- Implement encrypt() and decrypt()Regarding encryption+compression: that's a valid use case for SMGR,
but our primary concern is different. In South Korea, government
regulations require the use of nationally-approved cryptographic
algorithms (such as ARIA, SEED). This means organizations often cannot
adopt foreign TDE solutions, regardless of their technical merit.We need a simple, stable hook interface that allows local security
experts to integrate these required algorithms - experts who understand
cryptography but not PostgreSQL storage internals.If both approaches can coexist, why not provide hooks for the simple
case and SMGR for the complex case?Best regards,
Henson Choi
Hi Henson,
Thank you for explanations.
I personally do not like hooks, I considered them as some kind of
crutches which are needed to fix some problems with existed APIs:)
But them are quite popular in Postgres and really make it extensible.
The task of transparent data encryption is really very important for
Postgres (if for some reasons it can not be done at file system level).
If we need to add more hooks to make it possible to add to Postgres,
then dozen of yet another hooks may be acceptable...
I have not investigated it precisely, may be you are right that it is
possible to implement transparent encryption using using decorator
approach and custom SMGR. Frankly speaking I am quite upset how AIO was
added to PG18. It introduces orthogonal hierarchy to SMGR and cause
some tight dependencies between this two modules which makes extension
of any of them problematic if ever possible (i.e. if I want to add my
storage manager and make AIO use it to access files system, rather than
calling pread/pwrite directly). I am not sure that AIO can not be added
through SMGR hierarchy (certainly by extending this interface), but it
is certainly separate store having no relation to the topic of this
discussion.
So I can assume that current coupling of AIO with SMGR makes it not
possible to plugin transparent encryption rather than adding this hooks.
Still not quite sure that proposed set of hooks is absolutely necessary
and sufficient...
- mdread_post_hook: inside the segment loop → Decorator NOT possible
The mdreadv() function, introduced in PostgreSQL 17 as part of the
vectored I/O API, processes multiple blocks in a loop that respects
segment boundaries. The decryption hook must be called inside this loop,
after each segment's FileReadV() completes. A decorator wrapping mdreadv()
from the outside cannot access this internal loop timing.
It is possible - or rather, we plan to propose a different patch for
that. There are already some discussions about extendibility of AIO,
which is currently quite minimal, and this is another point for that.
If you look into the AIO sources, it already uses an array of
callbacks, and there's only a small missing piece there - making it
possible for extensions to add entries to that array. With that patch,
it is possible to decorate smgr_startreadv, add your own callback, and
then call the original mdstartreadv function. Since aio callbacks are
executed in the opposite order, this will work out exactly as needed,
as the AIO handler will first call the md completion handler, then
yours.
My logic here is similar to the previous argument: this AIO
extensibility for startreadv is also needed for other uses of the smgr
extension, most likely for everyone who uses the current patch. It
shouldn't be specific to encryption.
With the SMGR decorator approach, the extension developer must:
- Track upstream md.c changes
- Replicate the internal loop logic to find the right decryption point
With hooks, the extension developer only needs to:
- Implement encrypt() and decrypt()
We need a simple, stable hook interface that allows local security
experts to integrate these required algorithms - experts who understand
cryptography but not PostgreSQL storage internals.
Extension developers still have to understand the multiprocess nature
of postgres (with AIO you also have to remember that it is possible
for the completion to happen in a different process, possibly in a
worker process), or its unusual memory management patterns, critical
sections, and so on. You most likely also have to deal with shared
memory caches, locks, and so on.
(And as I said above, you don't have to replicate/track md.c, we only
need a good, generic extension point usable for many extensions)
In South Korea, government
regulations require the use of nationally-approved cryptographic
algorithms (such as ARIA, SEED). This means organizations often cannot
adopt foreign TDE solutions, regardless of their technical merit.
Have you considered contributing to existing solutions? Adding support
to multiple algorithms to an existing library is easier than
developing your own from scratch.
WAL and heap pages are simply different representations of the same
underlying data. Protecting only one side would be cryptographically
incomplete; an attacker could bypass encryption by reading the
unprotected side. Therefore, they must be treated as a single atomic
unit of protection.
From a security point of view, I agree. From a practical one, it's a
bit more complicated. As you mentioned South Korean regulations, we
also have regulations in the European Union, and you can conform to
the current regulations by only encrypting your data files (at least
that's what I heard, I'm not a lawyer).
So from a practical point of view, for us, even getting support for
table encryption hooks into the core would be a success.
My primary concern with using fork files for encryption metadata is crash
recovery. If a fork file and the actual data page become inconsistent
(e.g., during a crash), recovery becomes problematic because fork files
are not typically protected by WAL.
Custom WAL records about encryption events (key rotation/change/etc)
should solve this problem?
I plan to propose a separate RFC for this
"gradual rotation" mechanism.
Would this gradual rotation mechanism be useful for anything else
other than encryption extensions? While I also had the same idea, I
don't see how it would be useful for anything else, so I didn't plan
to submit any patches related to this. This is something that can be
easily implemented as a background worker in a tde extension, and
doesn't really require core support.
On 28/12/2025 5:25 PM, Henson Choi wrote:
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Zsolt,
Thank you for your detailed questions. I'll address each point:
1. Bundling WAL and Buffer Manager
WAL and heap pages are simply different representations of the same
underlying data. Protecting only one side would be cryptographically
incomplete; an attacker could bypass encryption by reading the
unprotected side. Therefore, they must be treated as a single atomic
unit of protection.
I am not expert in cryptography, better say I even dummy in this area.
But I have one concern about proposed WAL encryption (record level
encryption).
Content of some WAL records can be almost completely predicated (it
contains no user data,
just some Postgres internal data which can be easily reconstructed).
I wonder if this fact can significantly simplify task of cracking cypher?
May be it is safer to use page level encryption for WAL also?
On 12/28/25 08:49, Henson Choi wrote:
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points
along the I/O path:* *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre|
(Transformation of the data area)
* *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of |pd_flags| to define the “Security State” of a
page. This serves as a *Status Message* sent by the security expert to
the engine, utilized for key versioning and as a migration marker.
Isn't this rather problematic?
This seems to be meant to be extensible, which means there can be
multiple extensions setting the hooks. Which we generally allow, and the
custom is to call the previous hook.
What happens if there are multiple extensions implementing the hook?
Would that be allowed or prohibited in this case? Maybe it doesn't make
sense, but then why wouldn't it be possible?
FWIW I find it very unlikely we'd allow reserving pd_flags bits for an
extension. These bits are meant to be used by core, there's very limited
number of such bits.
In general, I'm somewhat skeptical of the claim a collection of hooks is
"low-barrier, high-safety". It seems pretty fragile to me, and I can
envision a lot of maintenance difficulties in the future. Not just for
the extension developers, but for the project too - adding a bunch of
random hooks is not free for us, we'll need to keep it working in future
releases, etc.
Perhaps the current SMGR code is not extensible/flexible enough, but
then we need to improve that. I'd imagine a simple SMGR doing the
encryption, but federating most of the work to a "full" SMGR. But I
haven't thought about that too much.
regards
--
Tomas Vondra
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Zsolt,
Thank you for the detailed technical feedback. Let me address each point.
1. AIO Extensibility and SMGR Approach
I think the SMGR extensibility approach is equally valid. In fact, when I
realized in PG18 that buffer page reads are split between md.c (mdreadv)
and bufmgr.c (buffer_readv_complete_one), I felt some discomfort about
where to place the decryption hook. "Does this really belong in both
places?" was my first thought.
The SMGR approach could provide a cleaner, more unified integration point
for data transformation.
The main difference is timing and current availability:
- The hook approach is working today and can be used immediately
- Your SMGR extensibility work provides a more comprehensive long-term
solution
I don't see these as competing proposals. Both approaches are valid and
serve different needs. The hook infrastructure can serve as an interim
solution for organizations that need TDE now, while the community develops
the more comprehensive SMGR extensibility.
In the long term, if SMGR extensibility provides better integration points,
extensions could migrate to that approach.
2. Understanding PostgreSQL Internals
You're absolutely right that extension developers need to understand
multiprocess architecture, memory management, critical sections, and so on.
This is precisely why test_tde exists as a reference implementation. It
documents the "dance steps" with the core - showing where memory must be
pre-allocated, how to handle critical sections safely, when AIO completion
might happen in a different process, and so on.
The goal isn't to hide PostgreSQL's complexity, but to provide a working
example that shows cryptography experts exactly where and how to integrate
their algorithms within PostgreSQL's constraints.
3. Contributing to Existing Solutions vs Korean Regulations
I appreciate the suggestion about contributing to existing solutions. I
personally prefer the OpenSSL Provider approach for algorithm extensibility.
However, the reality is more complex.
Cryptography experts often have their own libraries developed over decades.
While it might look like "just encryption code" to me, I don't have the
authority to force them to adopt specific frameworks.
ARIA and SEED are already implemented in OpenSSL. However, Korean law
requires certified implementations. Specifically, companies must use
nationally-certified builds and provide the hash codes of those specific
library binaries to regulators. You cannot simply use the OpenSSL version,
even if the algorithm is identical.
This is why we need an extension mechanism rather than hardcoding specific
libraries into core. Different jurisdictions have different certification
requirements.
4. WAL vs Data File Encryption
You mentioned that EU regulations might be satisfied by encrypting only
data files. That's a valid practical consideration.
In Korea, regulations require the introduction of approved cryptographic
algorithms, but in practice most systems run AES due to lack of CPU
acceleration for ARIA/SEED. It's largely a legal compliance checkbox.
Regarding what to protect (WAL vs heap vs both), there's flexibility
depending on the organization and jurisdiction. The hook approach allows
extensions to choose - you can implement only the buffer hooks if that
satisfies your requirements, or add WAL hooks if needed.
5. Fork Files vs Page Header for Metadata
You asked whether custom WAL records about encryption events could solve
the crash recovery problem with fork files.
That's a reasonable approach for SMGR-based solutions where you control the
storage layer. However, with the hook approach, we don't have the ability
to inject custom WAL records for encryption events.
Currently, in a replication environment, the reference implementation
requires the same key to be configured in the settings on both primary and
replicas (shared key model). For future KMS integration, I'm considering
mechanisms to propagate keys to replicas through external channels rather
than WAL.
The page header approach was chosen because it keeps the encryption state
self-contained within each page, avoiding the need for separate metadata
synchronization.
6. Gradual Rotation Mechanism
I agree with you - I don't think core support is necessary for gradual
rotation either.
I mentioned it in my earlier email response only as a potential reference
implementation concept to guide encryption developers. It's something that
can and should be implemented in the extension's background worker, not in
core.
Summary
I see the hook approach and SMGR extensibility as equally valid, addressing
different timelines and use cases:
- Hooks: Available now, lighter-weight, sufficient for compliance-driven TDE
- SMGR extensibility: More comprehensive, cleaner architecture, better
long-term solution
Both should coexist. Organizations can use hooks today while SMGR
extensibility matures, then migrate if the SMGR approach better fits their
needs.
I'm very interested in your experience with pg_tde and the SMGR
extensibility work. If there are specific design considerations from that
work that would inform these hooks, I'd appreciate your input.
Best regards,
Henson
2025년 12월 29일 (월) AM 2:55, Zsolt Parragi <zsolt.parragi@percona.com>님이 작성:
Show quoted text
- mdread_post_hook: inside the segment loop → Decorator NOT possible
The mdreadv() function, introduced in PostgreSQL 17 as part of the
vectored I/O API, processes multiple blocks in a loop that respects
segment boundaries. The decryption hook must be called inside this loop,
after each segment's FileReadV() completes. A decorator wrappingmdreadv()
from the outside cannot access this internal loop timing.
It is possible - or rather, we plan to propose a different patch for
that. There are already some discussions about extendibility of AIO,
which is currently quite minimal, and this is another point for that.
If you look into the AIO sources, it already uses an array of
callbacks, and there's only a small missing piece there - making it
possible for extensions to add entries to that array. With that patch,
it is possible to decorate smgr_startreadv, add your own callback, and
then call the original mdstartreadv function. Since aio callbacks are
executed in the opposite order, this will work out exactly as needed,
as the AIO handler will first call the md completion handler, then
yours.My logic here is similar to the previous argument: this AIO
extensibility for startreadv is also needed for other uses of the smgr
extension, most likely for everyone who uses the current patch. It
shouldn't be specific to encryption.With the SMGR decorator approach, the extension developer must:
- Track upstream md.c changes
- Replicate the internal loop logic to find the right decryption pointWith hooks, the extension developer only needs to:
- Implement encrypt() and decrypt()We need a simple, stable hook interface that allows local security
experts to integrate these required algorithms - experts who understand
cryptography but not PostgreSQL storage internals.Extension developers still have to understand the multiprocess nature
of postgres (with AIO you also have to remember that it is possible
for the completion to happen in a different process, possibly in a
worker process), or its unusual memory management patterns, critical
sections, and so on. You most likely also have to deal with shared
memory caches, locks, and so on.(And as I said above, you don't have to replicate/track md.c, we only
need a good, generic extension point usable for many extensions)In South Korea, government
regulations require the use of nationally-approved cryptographic
algorithms (such as ARIA, SEED). This means organizations often cannot
adopt foreign TDE solutions, regardless of their technical merit.Have you considered contributing to existing solutions? Adding support
to multiple algorithms to an existing library is easier than
developing your own from scratch.WAL and heap pages are simply different representations of the same
underlying data. Protecting only one side would be cryptographically
incomplete; an attacker could bypass encryption by reading the
unprotected side. Therefore, they must be treated as a single atomic
unit of protection.From a security point of view, I agree. From a practical one, it's a
bit more complicated. As you mentioned South Korean regulations, we
also have regulations in the European Union, and you can conform to
the current regulations by only encrypting your data files (at least
that's what I heard, I'm not a lawyer).So from a practical point of view, for us, even getting support for
table encryption hooks into the core would be a success.My primary concern with using fork files for encryption metadata is crash
recovery. If a fork file and the actual data page become inconsistent
(e.g., during a crash), recovery becomes problematic because fork files
are not typically protected by WAL.Custom WAL records about encryption events (key rotation/change/etc)
should solve this problem?I plan to propose a separate RFC for this
"gradual rotation" mechanism.Would this gradual rotation mechanism be useful for anything else
other than encryption extensions? While I also had the same idea, I
don't see how it would be useful for anything else, so I didn't plan
to submit any patches related to this. This is something that can be
easily implemented as a background worker in a tde extension, and
doesn't really require core support.
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Hi Tomas,
Thank you for this critical feedback. Your concerns go to the heart of the
proposal's viability, and I appreciate your directness.
1. Multiple Extensions and Hook Chaining
You're right to question this. To be honest, I have significant doubts
about allowing multiple transformation extensions simultaneously.
The Transform ID coordination problem is real: without a registry or
protocol between extensions, they cannot cooperate safely. Hook chaining
for read/write operations might work (extension A encrypts, extension B
compresses), but the Transform ID field creates conflicts.
Perhaps I should be more direct: transformation hook chaining is not
realistically possible with the current design. TDE extensions would need
exclusive use of these hooks. This is a fundamental limitation I should
have stated clearly in the RFC.
2. pd_flags Reservation - I Hope You'll Consider This
I understand your concern about reserving pd_flags bits for extensions.
However, I'd like to ask you to consider the reasoning behind this choice.
The 5-bit Transform ID serves a critical purpose: it allows the core to
identify the page's transformation state without attempting decryption.
This is important for:
- Error reporting: "This page is encrypted with transform ID 5, but no
extension is loaded to handle it"
- Migration safety: Distinguishing between untransformed pages (ID=0) and
transformed pages during gradual encryption
- Crash recovery: The core can detect transformation state inconsistencies
That said, I recognize pd_flags is precious and limited. Let me propose an
alternative approach that might better align with core principles:
Instead of extension-specific Transform IDs, what if we allow extensions to
reserve space at pd_upper (similar to how special space works at
pd_special)?
The core could manage a small flag (2-3 bits) indicating "N bytes at
pd_upper are reserved for transformation metadata". By encoding N as
multiples of 2 or 4 bytes, we maximize the flag's efficiency:
- 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
- 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
- 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
This approach uses minimal pd_flags bits while providing substantial
metadata space. It would:
- Keep the flag in core control (not extension-specific)
- Allow extensions to store IV, authentication tags, key version, etc. in a
standardized location
- Be self-describing (the flag tells you how much space is reserved)
- Generalize beyond encryption (compression, checksums, etc. could use it)
In our internal implementation, we actually add opaque bytes to PageHeader
for encryption metadata. This pd_upper approach could formalize that
pattern for extensions.
I believe some form of page-level metadata for transformations is
necessary. Would either approach (Transform ID or pd_upper reservation) be
acceptable with the right design, or do you see fundamental issues with
page-level transformation metadata itself?
3. Maintenance Burden and Test Coverage
I deeply appreciate this concern. Having worked across various DBMS
implementations, I've seen solution vendors ship without comprehensive
regression testing - but never a database vendor. DBMS maintenance is
extraordinarily difficult, and storage errors are catastrophic.
This is precisely why test_tde exists as a reference implementation. But
you've identified the real issue: we need much stronger test coverage for
the hooks themselves.
The test cases should:
- Detect when core changes break hook contracts
- Verify hook behavior under all I/O paths (sync, async, error cases)
- Validate critical section safety
- Test interaction with checksums, crash recovery, replication
I agree the current test coverage is insufficient for core inclusion. Would
expanding the test suite to cover these scenarios address your maintenance
concerns, or do you see fundamental fragility beyond what testing can solve?
4. Hooks vs Transform Layer - Pragmatic Timeline
You suggested improving SMGR extensibility rather than adding hooks. I
think you're architecturally right about the long-term direction.
However, I want to be pragmatic about timelines:
The hook and pd_flags approach, despite its limitations, can deliver
working TDE in the shortest time. Organizations facing regulatory deadlines
need something that works now, not in 2-3 years.
That said, your feedback has sparked a better idea: what if we think of
this not as "SMGR extension" or "hooks" but as a pluggable Transform Layer
that SMGR and WAL subsystems delegate to?
Conceptually:
Application Layer
|
Buffer Manager
|
+------------------+
| Transform Layer | <-- Encryption, etc.
+------------------+
|
SMGR / WAL
|
File I/O
This is architecturally cleaner than scattered hooks, and more focused than
full SMGR extensibility. The Transform Layer would:
- Provide a unified interface for data transformation
- Work across backend, frontend tools, and replication
- Handle metadata management in a standardized way
- Support encryption, compression, or other transformations
I think this deserves its own discussion thread rather than conflating it
with the current hook proposal. Would you be interested in starting a
separate conversation about designing a Transform Layer interface for
PostgreSQL?
In the meantime, the hook approach could serve organizations with immediate
needs, and extensions could migrate to the Transform Layer once it's
stabilized.
5. Frontend Tool Access
Both SMGR and hook approaches face a shared limitation: frontend tools
(pg_checksums, pg_basebackup, etc.) that read files directly.
I previously suggested allowing initdb to specify a shared library that
both backend and frontend can load for transformation. But as I reconsider
this, it feels like it converges toward the Transform Layer idea: a
well-defined interface that any PostgreSQL component can use.
This might be the real architectural question: not "hooks vs SMGR" but "how
should PostgreSQL provide transformation points that work across backend,
frontend, and replication boundaries?"
Summary
Your feedback has clarified three important points:
1. The current hook design has real limitations (multiple extension
conflicts, pd_flags concerns)
2. Test coverage needs to be much more comprehensive
3. A cleaner abstraction might be needed long-term
I propose a dual approach:
Short-term: Move forward with the hook proposal for organizations with
immediate regulatory needs. I commit to:
- Stating clearly that hook chaining is not supported
- Significantly expanding test coverage
- Treating this as a pragmatic solution with known limitations
Long-term: I'd like to start a separate discussion about a Transform Layer
abstraction - a unified interface that could handle data transformation
across backend, frontend tools, and replication. This would be
architecturally cleaner than scattered hooks, and could eventually
supersede this approach.
Would you be willing to review a Transform Layer proposal in a separate
thread? I think it addresses the architectural concerns you've raised,
while the hook approach serves immediate practical needs.
Best regards,
Henson
2025년 12월 29일 (월) AM 4:24, Tomas Vondra <tomas@vondra.me>님이 작성:
Show quoted text
On 12/28/25 08:49, Henson Choi wrote:
3. Proposal Specifications
3.1 The Interface (Hook Points)
We allow intervention by security experts through five contact points
along the I/O path:* *Read/Write Hooks:* |mdread_post|, |mdwrite_pre|, |mdextend_pre|
(Transformation of the data area)
* *WAL Hooks:* |xlog_insert_pre|, |xlog_decode_pre| (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of |pd_flags| to define the “Security State” of a
page. This serves as a *Status Message* sent by the security expert to
the engine, utilized for key versioning and as a migration marker.Isn't this rather problematic?
This seems to be meant to be extensible, which means there can be
multiple extensions setting the hooks. Which we generally allow, and the
custom is to call the previous hook.What happens if there are multiple extensions implementing the hook?
Would that be allowed or prohibited in this case? Maybe it doesn't make
sense, but then why wouldn't it be possible?FWIW I find it very unlikely we'd allow reserving pd_flags bits for an
extension. These bits are meant to be used by core, there's very limited
number of such bits.In general, I'm somewhat skeptical of the claim a collection of hooks is
"low-barrier, high-safety". It seems pretty fragile to me, and I can
envision a lot of maintenance difficulties in the future. Not just for
the extension developers, but for the project too - adding a bunch of
random hooks is not free for us, we'll need to keep it working in future
releases, etc.Perhaps the current SMGR code is not extensible/flexible enough, but
then we need to improve that. I'd imagine a simple SMGR doing the
encryption, but federating most of the work to a "full" SMGR. But I
haven't thought about that too much.regards
--
Tomas Vondra
Hi hackers,
This is the fourth version of the Storage I/O Transformation Hooks patch
series for implementing Transparent Data Encryption (TDE) in PostgreSQL.
Changes in v4:
This version fixes cross-platform compatibility issues found in CI testing
that caused failures on BSD and Windows:
- Fixed BSD regression test warning about tablespace naming conventions
(renamed to "regress_tde_tblspc")
- Fixed Windows test failures caused by platform-specific shell commands
(mkdir -p)
- Replaced filesystem-based tablespace tests with
allow_in_place_tablespaces approach for cross-platform compatibility
The core hook infrastructure (patch 0001) and reference TDE implementation
(patch 0002) remain unchanged from v3. Patch 0003 contains only the test
compatibility fixes.
Patch series:
0001: Core hook infrastructure for I/O transformation
0002: Reference TDE implementation using AES-256-CTR
0003: Cross-platform test fixes for BSD and Windows
Testing:
The test_tde extension demonstrates:
- Page-level encryption/decryption with AES-256-CTR
- IV derivation using LSN, block number, and relation file number
- Tablespace-level encryption configuration
- WAL encryption support
These fixes resolve the BSD and Windows test failures.
Best regards,
2025년 12월 28일 (일) PM 11:19, Henson Choi <assam258@gmail.com>님이 작성:
Show quoted text
Hi,
Here is v3 of the Storage I/O Transform Hooks patch.
Changes from v2:
- Fix -Wincompatible-pointer-types error in bufmgr.c by casting
&bufdata to (void **) for mdread_post_hook callv2 changes were:
- Add meson.build test configuration for test_tde extension--
Best regards,
Sungkyun Park2025년 12월 28일 (일) PM 7:44, Henson Choi <assam258@gmail.com>님이 작성:
Updated patches with meson build support:
v2:
- Added meson.build for test_tde extension
- Added test_tde to contrib/meson.buildRegards,
Henson Choi2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258@gmail.com>님이 작성:
Hello,
Following up on the RFC, I am submitting the initial patch set for the
proposed infrastructure. These patches introduce a minimal hook-based
protocol to allow extensions to handle data transformation, such as TDE,
while keeping the PostgreSQL core independent of specific cryptographic
implementations.Implementation Details:
Hook Points in Storage I/O Path
The patch introduces five strategic hook points:mdread_post_hook: Called after blocks are read from disk. The extension
can reverse-transform data in place.mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
blocks. These hooks return a pointer to transformed buffers.xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for
WAL records during insertion and replay.Data Integrity and Checksum Protocol
To ensure robust error detection, the hooks follow a specific
verification protocol:On Write: The extension transforms the page, sets the Transform ID, then
recalculates the checksum on the transformed data.On Read: The extension verifies the on-disk checksum of the transformed
data first. After reverse-transformation, it clears the Transform ID and
recalculates the checksum for the plaintext data. This ensures corruption
is detected regardless of the transformation state.WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
For WAL records, I have introduced a specific block ID (251) to mark
transformed data. If the decryption extension is not loaded, the WAL reader
will encounter this unknown block ID and fail-fast, preventing the system
from incorrectly interpreting encrypted data as valid WAL records.PageHeader Transform ID (5-bit)
I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform
ID. This allows the engine and extensions to identify the transformation
state of a page (e.g., key versioning or algorithm type) without attempting
decryption. It ensures backward compatibility: pages with Transform ID 0
are treated as standard untransformed pages.Memory and Critical Section Safety
As demonstrated in the contrib/test_tde reference implementation, cipher
contexts are pre-allocated in _PG_init to avoid memory allocation during
critical sections. For WAL transformation,
MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
within critical sections; if OOM occurs during buffer growth, it results in
a controlled PANIC.Performance Considerations
When hooks are not set (default), the overhead is limited to a single
NULL pointer comparison per I/O operation. This is architecturally
consistent with existing PostgreSQL hooks and is designed to have a
negligible impact on performance.Attached Patches:
v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch:
Core infrastructure.
v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
implementation using AES-256-CTR.I look forward to your comments and feedback.
Regards,
Henson Choi
2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258@gmail.com>님이 작성:
RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
Technical Protocol Between RDBMS Core and Data Security Experts*Author:* Henson Choi assam258@gmail.com
*Date:* 2025-12-28
*PostgreSQL Version:* master (Development)
------------------------------
1. Summary & MotivationThis RFC proposes the introduction of minimal hooks into the PostgreSQL
storage layer and the addition of a *Transformation ID* field to the
PageHeader.
A Diplomatic Protocol Between Expert GroupsThe core motivation of this proposal is *“Separation of Concerns and
Mutual Respect.”*Historically, discussions around Transparent Data Encryption (TDE) have
often felt like putting security experts on trial in a foreign
court—specifically, the “Court of RDBMS.” It is time to treat them not as
defendants to be judged by database-specific rules, but as an *equal
neighboring community* with their own specialized sovereignty.*The issue has never been a failure of technology, but rather a
misplacement of the focal point.* While previous discussions were
mired in the technicalities of “how to hardcode encryption into the core,”
this proposal shifts the debate toward an architectural solution: “what
interface the core should provide to external experts.”- *RDBMS Experts* provide a trusted pipeline responsible for data
I/O paths and consistency.
- *Security Experts* take responsibility for the specialized domain
of encryption algorithms and key management.This hook system functions as a *Technical Protocol*—a high-level
agreement that allows these two expert groups to exchange data securely
without encroaching on each other’s territory.
------------------------------
2. Design Principles1. *Delegation of Authority:* The core remains independent of
specific encryption standards, providing a “free territory” where security
experts can respond to an ever-changing security landscape.
2. *Diplomatic Convention:* The Transformation ID acts as a
communication protocol between the engine and the extension. The engine
uses this ID to identify the state of the data and hands over control to
the appropriate expert (the extension).
3. *Minimal Interference:* Overhead is kept near zero when hooks
are not in use, ensuring the native performance of the PostgreSQL engine.------------------------------
3. Proposal Specifications 3.1 The Interface (Hook Points)We allow intervention by security experts through five contact points
along the I/O path:- *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
(Transformation of the data area)
- *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
transaction logs)3.2 The Protocol Identifier (PageHeader Transformation ID)
We allocate 5 bits of pd_flags to define the “Security State” of a
page. This serves as a *Status Message* sent by the security expert to
the engine, utilized for key versioning and as a migration marker.
------------------------------
4. Reference Implementation: contrib/test_tde A Standard Code of
Conduct for Security ExpertsThis reference implementation exists not as a commercial product, but
to define the *Standards of the Diplomatic Protocol* that
encryption/decryption experts must follow when entering the PostgreSQL
domain.1. *Deterministic IV Derivation:* Demonstrates how to achieve
cryptographic safety by trusting unique values provided by the engine
(e.g., LSN).
2. *Critical Section Safety:* Defines memory management regulations
that security logic must follow within “Critical Sections” to maintain
system stability.
3. *Hook Chaining:* Demonstrates a cooperative structure that
allows peaceful coexistence with other expert tools (e.g., compression,
auditing).------------------------------
5. Scope- *In-Scope:* Backend hook infrastructure, Transformation ID field,
and reference code demonstrating diplomatic protocol compliance.
- *Out-of-Scope:* Specific Key Management Systems (KMS), selection
of specific cryptographic algorithms, and integration with external tools.This proposal represents a strategic diplomatic choice: rather than the
PostgreSQL core assuming all security responsibilities, it grants security
experts a *sovereign territory through extensions* where they can
perform at their best.
Attachments:
v20251229-v4-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchapplication/octet-stream; name=v20251229-v4-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patchDownload+194-2
v20251229-v4-0002-Add-test_tde-extension-for-TDE-testing.patchapplication/octet-stream; name=v20251229-v4-0002-Add-test_tde-extension-for-TDE-testing.patchDownload+1526-3
v20251229-v4-0003-Fix-test_tde-tablespace-test-for-cross-platform-compatibility.patchapplication/octet-stream; name=v20251229-v4-0003-Fix-test_tde-tablespace-test-for-cross-platform-compatibility.patchDownload+8-11
Content of some WAL records can be almost completely predicated (it
contains no user data,
just some Postgres internal data which can be easily reconstructed).
I wonder if this fact can significantly simplify task of cracking cypher?
AES is designed to resist known plaintext attacks, this isn't an issue
as long as the code doesn't reuse the same IV twice. The example code
uses a random iv for each WAL record, so that's unlikely.
This is a quite nice solution to keep the encryption of WAL as
parallel as possible. The downside is that it increases the size of
WAL a bit, uses MemoryContextAllowInCriticalSection, and this approach
is definitely slower during recovery than full page decryption.
On the other hand, per page WAL encryption can cause performance
issues with some workloads that write huge amounts of WAL with many
parallel clients. Both have pros and cons.
One thing that seems tricky is wal key rotation. The example code
ignores this, which is fine for a demo, but real extensions should be
able to handle it. We can't simply write a wal record about changing
the wal key, because without holding the write lock things could get
written out of order. The only safe solution I see is to also add the
id of the wal key to the additional wal record data, increasing the
record size even more.
Import Notes
Reply to msg id not found: fd0fe833-09ca-436d-8293-638e0afd9f5d@garret.ru
The main difference is timing and current availability:
- The hook approach is working today and can be used immediately
. - Your SMGR extensibility work provides a more comprehensive
long-term solution
I disagree with this. The SMGR patch is available since 2023/PG16 as a
patch, and it is already used by at least 3 companies I know of (Neon,
Nile, Percona), and probably also by others I don't know of. It is
available immediately.
Compared to that this proposal is something new, and more limited.
The actual advantage of this proposal is that it includes WAL, but I
still think the two should be separate discussions.
Regarding what to protect (WAL vs heap vs both), there's flexibility depending on the organization and jurisdiction. The hook approach allows extensions to choose - you can implement only the buffer hooks if that satisfies your requirements, or add WAL hooks if needed.
My concern is that these two separate discussion about 2 extensibility
points, with different concerns by different people. One part
shouldn't stall the other, as for some, even getting half of it into
the core for PG19 would be useful.
You're absolutely right that extension developers need to understand multiprocess architecture, memory management, critical sections, and so on.
This is precisely why test_tde exists as a reference implementation.
The reference implementation ignores the tricky steps, like key
rotation, caching, configuration, providing a user interface, etc,
which all require knowledge of postgres internals.
ARIA and SEED are already implemented in OpenSSL. However, Korean law requires certified implementations. Specifically, companies must use nationally-certified builds and provide the hash codes of those specific library binaries to regulators. You cannot simply use the OpenSSL version, even if the algorithm is identical.
That could be still solved by introducing an abstraction layer in the
encryption code of a TDE extension :) Encryption is only a small part
of an extension, the other parts (user interface, rotation, key
storage integrations, etc) are a much bigger part. It is still
questionable to reimplement everything because of an encryption
library difference. But I see your point, that is a bit more
difficult.
That's a reasonable approach for SMGR-based solutions where you control the storage layer. However, with the hook approach, we don't have the ability to inject custom WAL records for encryption events.
Currently, in a replication environment, the reference implementation requires the same key to be configured in the settings on both primary and replicas (shared key model). For future KMS integration, I'm considering mechanisms to propagate keys to replicas through external channels rather than WAL.
I originally wrote a long answer about how I don't think this is
related to where the hooks are, and then I realized that the problem
is probably completely different - and this also shows why adding a
few bits to the pages is not a good generic solution for all
extensions.
Our extension uses a 2 level key architecture, as used by most
database servers (there's a master key, and it encodes separate
internal keys, one for each database file). The proposed sample code
in your patch uses a single key, with the IV encoding the database
file. That means you want to encode which key is used for each page
instead of for each file.
So we approach how we map data/pages to keys completely differently.
But I don't think the page header addition is a good solution, because
it is specific to your implementation, not for encryption solutions in
general.
(Also, I just noticed that you forgot about timelineid in derive_iv,
you probably want to include that somehow)