Moving forward with TDE

[1]: /messages/by-id/20211013222648.GA373@momjian.us

david.christensen@crunchydata.com

over 3 years ago

In reply to: Aleksander Alekseev (#2)

Re: Moving forward with TDE

Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].

Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.

Known issues (just discovered by me in testing the latest revision) is
that databases created from `template0` are not decrypting properly,
but `template1` works fine, so going to dig in on that soon.

One limitation of the design described on the wiki I see is that it
seems to heavily rely on AES:

We will use Advanced Encryption Standard (AES) [4]. We will offer three key length options (128, 192, and 256-bits) selected at initdb time with --file-encryption-method

(there doesn't seem to be any mention of the hash/MAC algorithms,
that's odd). In the future we should be able to add the support of
alternative algorithms. The reason is that the algorithms can become
weak every 20 years or so, and the preferred algorithms may also
depend on the region. This should NOT be implemented in this
particular patchset, but the design shouldn't prevent from
implementing this in the future.

Yes, we definitely are considering multiple algorithms support as part
of this effort.

Best,

David

Dilip Kumar

dilipbalaut@gmail.com

over 3 years ago

In reply to: David Christensen (#3)

Re: Moving forward with TDE

On Fri, Nov 4, 2022 at 3:36 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].

Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.

I was looking into the documentation patches 0001 and 0002, I think
the explanation is very clear. I have a few questions/comments

+By not using the database id in the IV, CREATE DATABASE can copy the
+heap/index files from the old database to a new one without
+decryption/encryption.  Both page copies are valid.  Once a database
+changes its pages, it gets new LSNs, and hence new IV.

How about the WAL_LOG method for creating a database? because in that
we get the new LSN for the pages in the new database, so do we
reencrypt, if yes then this documentation needs to be updated
otherwise we might need to add that code.

+changes its pages, it gets new LSNs, and hence new IV.  Using only the
+LSN and page number also avoids requiring pg_upgrade to preserve
+database oids, tablespace oids, and relfilenodes.

I think this line needs to be changed, because now we are already
preserving dbid/tbsid/relfilenode. So even though we are not using
those in IV there is no point in saying we are avoiding that
requirement.

I will review the remaining patches soon.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Jacob Champion

jacob.champion@enterprisedb.com

over 3 years ago

In reply to: David Christensen (#1)

Re: Moving forward with TDE

On Mon, Oct 24, 2022 at 9:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:

I would love to open a discussion about how to move forward and get
some of these features built out. The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature? Any major
developments in direction that would not be reflected in the code from
May 2021?

I don't think the patchset here has incorporated the results of the
discussion [1]/messages/by-id/20211013222648.GA373@momjian.us that happened at the end of 2021. For example, it looks
like AES-CTR is still in use for the pages, which I thought was
already determined to be insufficient.

The following next steps were proposed in that thread:

1. modify temporary file I/O to use a more centralized API
2. modify the existing cluster file encryption patch to use XTS with a
IV that uses more than the LSN
3. add XTS regression test code like CTR
4. create WAL encryption code using CTR

Does this patchset need review before those steps are taken (or was
there additional conversation/work that I missed)?

Thanks,
--Jacob

david.christensen@crunchydata.com

over 3 years ago

In reply to: Jacob Champion (#5)

Re: Moving forward with TDE

On Nov 15, 2022, at 1:08 PM, Jacob Champion <jchampion@timescale.com> wrote:

On Mon, Oct 24, 2022 at 9:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:

I would love to open a discussion about how to move forward and get
some of these features built out. The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature? Any major
developments in direction that would not be reflected in the code from
May 2021?

I don't think the patchset here has incorporated the results of the
discussion [1] that happened at the end of 2021. For example, it looks
like AES-CTR is still in use for the pages, which I thought was
already determined to be insufficient.

Good to know about the next steps, thanks.

The following next steps were proposed in that thread:

1. modify temporary file I/O to use a more centralized API
2. modify the existing cluster file encryption patch to use XTS with a
IV that uses more than the LSN
3. add XTS regression test code like CTR
4. create WAL encryption code using CTR

Does this patchset need review before those steps are taken (or was
there additional conversation/work that I missed)?

This was just a refresh of the old patches on the wiki to work as written on HEAD. If there are known TODOs here this then that work is still needing to be done.

I was going to take 2) and Stephen was going to work on 3); I am not sure about the other two but will review the thread you pointed to. Thanks for pointing that out.

David

Jacob Champion

jacob.champion@enterprisedb.com

over 3 years ago

In reply to: David Christensen (#6)

Re: Moving forward with TDE

On Tue, Nov 15, 2022 at 11:39 AM David Christensen
<david.christensen@crunchydata.com> wrote:

Good to know about the next steps, thanks.

You're welcome!

This was just a refresh of the old patches on the wiki to work as written on HEAD. If there are known TODOs here this then that work is still needing to be done.

I was going to take 2) and Stephen was going to work on 3); I am not sure about the other two but will review the thread you pointed to. Thanks for pointing that out.

I've attached the diffs I'm carrying to build this under meson (as
well as -Wshadow; my removal of the two variables probably needs some
scrutiny). It looks like the testcrypto executable will need
substantial changes after the common/hex.h revert.

--Jacob

david.christensen@crunchydata.com

over 3 years ago

In reply to: Jacob Champion (#7)

Re: Moving forward with TDE

Hi Jacob,

Thanks, I've added this patch in my tree [1]https://github.com/pgguru/postgres/tree/tde. (For now, just adding
fixes and the like atop the original separate patches, but will
eventually get things winnowed down into probably the same 12 parts
the originals were reviewed in.

Best,

David

[1]: https://github.com/pgguru/postgres/tree/tde

david.christensen@crunchydata.com

over 3 years ago

In reply to: Dilip Kumar (#4)

Re: Moving forward with TDE

Hi Dilip,

Thanks for the feedback here. I will review the docs changes and add to my tree.

Best,

David

#10

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: David Christensen (#3)

Re: Moving forward with TDE

On Fri, 4 Nov 2022 at 03:36, David Christensen
<david.christensen@crunchydata.com> wrote:

Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].

Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.

The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3985.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
b82557ecc2ebbf649142740a1c5ce8d19089f620 ===
=== applying patch
./v2-0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patch
patching file src/common/Makefile
Hunk #2 FAILED at 84.
1 out of 2 hunks FAILED -- saving rejects to file src/common/Makefile.rej

[1]: http://cfbot.cputube.org/patch_41_3985.log

Regards,
Vignesh

#11

Chris Travers

chris.travers@gmail.com

about 3 years ago

In reply to: vignesh C (#10)

Re: Moving forward with TDE

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

I have decided to write a review here in terms of whether we want this feature, and perhaps the way we should look at encryption as a project down the road, since I think this is only the beginning. I am hoping to run some full tests of the feature sometime in coming weeks. Right now this review is limited to the documentation and documented feature.

From the documentation, the primary threat model of TDE is to prevent decryption of data from archived wal segments (and data files), for example on a backup system. While there are other methods around this problem to date, I think that this feature is worth pursuing for that reason. I want to address a couple of reasons for this and then go into some reservations I have about how some of this is documented.

There are current workarounds to ensuring encryption at rest, but these have a number of problems. Encryption passphrases end up lying around the system in various places. Key rotation is often difficult. And one mistake can easily render all efforts ineffective. TDE solves these problems. The overall design from the internal docs looks solid. This definitely is something I would recommend for many users.

I have a couple small caveats though. Encryption of data is a large topic and there isn't a one-size-fits-all solution to industrial or state requirements. Having all this key management available in PostgreSQL is a very good thing. Long run it is likely to end up being extensible, and therefore both more powerful and offering a wider range of choices for solution architects. Implementing encryption is also something that is easy to mess up. For this reason I think it would be great if we had a standardized format for discussing encryption options that we could use going forward. I don't think that should be held against this patch but I think we need to start discussing it now because it will be a bigger problem later.

A second caveat I have is that key management is a topic where you really need a good overview of internals in order to implement effectively. If you don't know how an SSL handshake works or what is in a certificate, you can easily make mistakes in setting up SSL. I can see the same thing happening here. For example, I don't think it would be safe to leave the KEK on an encrypted filesystem that is decrypted at runtime (or at least I wouldn't consider that safe -- your appetite for risk may vary).

My proposal would be to have build a template for encryption options in the documentation. This could include topics like SSL as well. In such a template we'd have sections like "Threat model," "How it works," "Implementation Requirements" and so forth. Again I don't think this needs to be part of the current patch but I think it is something we need to start thinking about now. Maybe after this goes in, I can present a proposed documentation patch.

I will also note that I don't consider myself to be very qualified on topics like encryption. I can reason about key management to some extent but some implementation details may be beyond me. I would hope we could get some extra review on this patch set soon.

#12

sfrost@snowman.net

about 3 years ago

In reply to: Chris Travers (#11)

Re: Moving forward with TDE

Greetings,

* Chris Travers (chris.travers@gmail.com) wrote:

From the documentation, the primary threat model of TDE is to prevent decryption of data from archived wal segments (and data files), for example on a backup system. While there are other methods around this problem to date, I think that this feature is worth pursuing for that reason. I want to address a couple of reasons for this and then go into some reservations I have about how some of this is documented.

Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.

There are current workarounds to ensuring encryption at rest, but these have a number of problems. Encryption passphrases end up lying around the system in various places. Key rotation is often difficult. And one mistake can easily render all efforts ineffective. TDE solves these problems. The overall design from the internal docs looks solid. This definitely is something I would recommend for many users.

There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.

I have a couple small caveats though. Encryption of data is a large topic and there isn't a one-size-fits-all solution to industrial or state requirements. Having all this key management available in PostgreSQL is a very good thing. Long run it is likely to end up being extensible, and therefore both more powerful and offering a wider range of choices for solution architects. Implementing encryption is also something that is easy to mess up. For this reason I think it would be great if we had a standardized format for discussing encryption options that we could use going forward. I don't think that should be held against this patch but I think we need to start discussing it now because it will be a bigger problem later.

Do you have a suggestion as to the format to use?

A second caveat I have is that key management is a topic where you really need a good overview of internals in order to implement effectively. If you don't know how an SSL handshake works or what is in a certificate, you can easily make mistakes in setting up SSL. I can see the same thing happening here. For example, I don't think it would be safe to leave the KEK on an encrypted filesystem that is decrypted at runtime (or at least I wouldn't consider that safe -- your appetite for risk may vary).

Agreed that we should document this and make clear that the KEK is
necessary for server start but absolutely should be kept as safe as
possible and certainly not stored on disk somewhere nearby the encrypted
cluster.

My proposal would be to have build a template for encryption options in the documentation. This could include topics like SSL as well. In such a template we'd have sections like "Threat model," "How it works," "Implementation Requirements" and so forth. Again I don't think this needs to be part of the current patch but I think it is something we need to start thinking about now. Maybe after this goes in, I can present a proposed documentation patch.

I'm not entirely sure that it makes sense to lump this and TLS in the
same place as they end up being rather independent at the end of the
day. If you have ideas for how to improve the documentation, I'd
certainly encourage you to go ahead and work on that and submit it as a
patch rather than waiting for this to actually land in core. Having
good and solid documentation is something that will help this get in,
after all, and to the extent that it's covering existing topics like
TLS, those could likely be included independently and that would be of
benefit to everyone.

I will also note that I don't consider myself to be very qualified on topics like encryption. I can reason about key management to some extent but some implementation details may be beyond me. I would hope we could get some extra review on this patch set soon.

Certainly agree with you there though there's an overall trajectory of
patches involved in all of this that's a bit deep. The plan is to
discuss that at PGCon (On the Road to TDE) and at the PGCon
Unconference after. I certainly hope those interested will be there.
I'm also happy to have a call with anyone interested in this effort
independent of that, of course.

Thanks!

Stephen

#13

bruce@momjian.us

about 3 years ago

In reply to: Stephen Frost (#12)

Re: Moving forward with TDE

On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:

Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.

I thought some more about this. GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the same
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.

There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern. Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

Certainly agree with you there though there's an overall trajectory of
patches involved in all of this that's a bit deep. The plan is to
discuss that at PGCon (On the Road to TDE) and at the PGCon
Unconference after. I certainly hope those interested will be there.
I'm also happy to have a call with anyone interested in this effort
independent of that, of course.

I will not be attending Ottawa.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#14

sfrost@snowman.net

about 3 years ago

In reply to: Bruce Momjian (#13)

Re: Moving forward with TDE

Greetings,

On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:

Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.

I thought some more about this. GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the same
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.

What are the two end points you are referring to and why don’t you feel
there is an opportunity between them for a malicious actor to attack the
system?

There are simpler cases to consider than an online attack on a single
independent system where an attacker having access to modify the data in
transit between PG and the storage would imply the attacker also having
access to read keys out of PG’s memory.

As specific examples, consider:

An attack against the database system where the database server is shut
down, or a backup, and the encryption key isn’t available on the system.

The backup system itself, not running as the PG user (an option supported
by PG and at least pgbackrest) being compromised, thus allowing for
injection of changes into a backup or into a restore.

The beginning of this discussion also very clearly had individuals voicing
strong opinions that unauthenticated encryption methods were not acceptable
as an end-state for PG due to the clear issue of there then being no
protection against modification of data. The approach we are working
towards provides both the unauthenticated option, which clearly has value
to a large number of our collective user base considering the number of
commercial implementations which have now arisen, and the authenticated
solution which goes further and provides the level clearly expected of the
PG community. This gets us a win-win situation.

There's clearly user demand for it as there's a number of organizations

who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern. Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen broadly
as a beneficial feature with users clearly leveraging it and in more than
some “checkbox” way.

Indeed, it’s similar also in that commercial implementations were done of
RLS while there were arguments made about it being a checkbox feature which
were used to discourage it from being implemented in core. Were it truly
checkbox, I don’t feel we would have the regular and ongoing discussion
about it on the lists that we do, nor see other tools built on top of PG
which specifically leverage it. Perhaps there are truly checkbox features
out there which we will never implement, but I’m (perhaps due to what my
dad would call selective listening on my part, perhaps not) having trouble
coming up with any presently. Features that exist in other systems that we
don’t want? Certainly. We don’t characterize those as simply “checkbox”
though. Perhaps that’s in part because we provide alternatives- but that’s
not the case here. We have no comparable way to have this capability as
part of the core system.

We, as a community, are clearly losing value by lack of this capability, if
by no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this feature,
for whatever their reasoning.

Thanks,

Stephen

#15

bruce@momjian.us

about 3 years ago

In reply to: Stephen Frost (#14)

Re: Moving forward with TDE

On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:

Greetings,

On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:

Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.

I thought some more about this. GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the same
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.

What are the two end points you are referring to and why don’t you feel there
is an opportunity between them for a malicious actor to attack the system?

Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?

There are simpler cases to consider than an online attack on a single
independent system where an attacker having access to modify the data in
transit between PG and the storage would imply the attacker also having access
to read keys out of PG’s memory.

I consider the operating system and its processes as much more of a
single entity than TLS over a network.

As specific examples, consider:

An attack against the database system where the database server is shut down,
or a backup, and the encryption key isn’t available on the system.

The backup system itself, not running as the PG user (an option supported by PG
and at least pgbackrest) being compromised, thus allowing for injection of
changes into a backup or into a restore.

I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.

The beginning of this discussion also very clearly had individuals voicing
strong opinions that unauthenticated encryption methods were not acceptable as
an end-state for PG due to the clear issue of there then being no protection
against modification of data. The approach we are working towards provides

What were the _technical_ reasons for those objections?

both the unauthenticated option, which clearly has value to a large number of
our collective user base considering the number of commercial implementations
which have now arisen, and the authenticated solution which goes further and
provides the level clearly expected of the PG community. This gets us a win-win
situation.

There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern. Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen broadly as a
beneficial feature with users clearly leveraging it and in more than some
“checkbox” way.

RLS has to overcome that objection, and I think it did, as was better
for doing that.

We, as a community, are clearly losing value by lack of this capability, if by
no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this feature, for
whatever their reasoning.

That is true, but I go back to my concern over useful feature vs. check
box.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#16

sfrost@snowman.net

about 3 years ago

In reply to: Bruce Momjian (#15)

Re: Moving forward with TDE

Greetings,

On Mon, Mar 27, 2023 at 18:17 Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:

Greetings,

On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:

Agreed, though the latest efforts include an option for

*authenticated*

encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected

by

the authenticated encryption being used.

I thought some more about this. GCM-style authentication of

encrypted

data has value because it assumes the two end points are secure but

that

a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the

same

place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the

end

points, and the encryption key exposed.

What are the two end points you are referring to and why don’t you feel

there

is an opportunity between them for a malicious actor to attack the

system?

Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?

TLS does use GCM.. pretty much exclusively as far as I can recall. So do a
lot of other things though..

There are simpler cases to consider than an online attack on a single

independent system where an attacker having access to modify the data in
transit between PG and the storage would imply the attacker also having

access

to read keys out of PG’s memory.

I consider the operating system and its processes as much more of a
single entity than TLS over a network.

This may be the case sometimes but there’s absolutely no shortage of other
cases and it’s almost more the rule these days, that there is some kind of
network between the OS processes and the storage- a SAN, an iSCSI network,
NFS, are all quite common.

As specific examples, consider:

An attack against the database system where the database server is shut

down,

or a backup, and the encryption key isn’t available on the system.

The backup system itself, not running as the PG user (an option

supported by PG

and at least pgbackrest) being compromised, thus allowing for injection

of

changes into a backup or into a restore.

I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.

Pgbackrest has encryption and authentication of it … but that doesn’t
actually address the attack vector that I outlined. If the backup user is
compromised then they can change the data before it gets to the storage.
If the backup user is compromised then they have access to whatever key is
used to encrypt and authenticate the backup and therefore can trivially
manipulate the data.

Encryption of backups by the backup tool serves to protect the data after
it leaves the backup system and is stored in cloud storage or in whatever
format the repository takes. This is beneficial, particularly when the
data itself offers no protection, but simply not the same.

The beginning of this discussion also very clearly had individuals voicing

strong opinions that unauthenticated encryption methods were not

acceptable as

an end-state for PG due to the clear issue of there then being no

protection

against modification of data. The approach we are working towards

provides

What were the _technical_ reasons for those objections?

I believe largely the ones I’m bringing up here and which I outline above…
I don’t mean to pretend that any of this is of my own independent
construction. I don’t believe it is and my apologies if it came across that
way.

both the unauthenticated option, which clearly has value to a large
number of

our collective user base considering the number of commercial

implementations

which have now arisen, and the authenticated solution which goes further

and

provides the level clearly expected of the PG community. This gets us a

win-win

situation.

There's clearly user demand for it as there's a number of

organizations

who have forks which are providing it in one shape or another.

This

kind of splintering of the community is actually an actively bad

thing

for the project and is part of what killed Unix, by at least some

pretty

reputable accounts, in my view.

Yes, the number of commercial implementations of this is a concern.

Of

course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.

I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen

broadly as a

beneficial feature with users clearly leveraging it and in more than some
“checkbox” way.

RLS has to overcome that objection, and I think it did, as was better
for doing that.

Beyond it being called a checkbox - what were the arguments against it? I
don’t object to being challenged to point out the use cases, but I feel
that at least some very clear and straight forward ones are outlined from
what has been said above. I also don’t believe those are the only ones but
I don’t think I could enumerate every use case for RLS either, even after
seeing it used for quite a few years. I do seriously question the level of
effort expected of features that are claimed to be “Checkbox” and tossed
almost exclusively for that reason on this list given the success of the
ones that have been accepted and are in active use by our users today.

We, as a community, are clearly losing value by lack of this capability,
if by

no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this

feature, for

whatever their reasoning.

That is true, but I go back to my concern over useful feature vs. check
box.

While it’s easy to label something as checkbox, I don’t feel we have been
fair to our users in doing so as it has historically prevented features
which our users are demanding and end up getting from commercial providers
until we implement them ultimately anyway. This particular argument simply
doesn’t seem to actually hold the value that proponents of it claim, for us
at least, and we have clear counter-examples which we can point to and I
hope we learn from those.

Thanks!

Stephen

Show quoted text

#17

bruce@momjian.us

about 3 years ago

In reply to: Stephen Frost (#16)

Re: Moving forward with TDE

On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:

I consider the operating system and its processes as much more of a
single entity than TLS over a network.

This may be the case sometimes but there’s absolutely no shortage of other
cases and it’s almost more the rule these days, that there is some kind of
network between the OS processes and the storage- a SAN, an iSCSI network, NFS,
are all quite common.

Yes, but consider that the database cluster is having to get its data
from that remote storage --- the remote storage is not an independent
entity that can be corrupted without the databaes server being
compromised. If everything in PGDATA was GCM-verified, it would be
secure, but because some parts are not, I don't think it would be.

As specific examples, consider:

An attack against the database system where the database server is shut

down,

or a backup, and the encryption key isn’t available on the system.

The backup system itself, not running as the PG user (an option supported

by PG

and at least pgbackrest) being compromised, thus allowing for injection

of

changes into a backup or into a restore.

I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.

Pgbackrest has encryption and authentication of it … but that doesn’t actually
address the attack vector that I outlined. If the backup user is compromised
then they can change the data before it gets to the storage. If the backup
user is compromised then they have access to whatever key is used to encrypt
and authenticate the backup and therefore can trivially manipulate the data.

So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.

What were the _technical_ reasons for those objections?

I believe largely the ones I’m bringing up here and which I outline above… I
don’t mean to pretend that any of this is of my own independent construction. I
don’t believe it is and my apologies if it came across that way.

Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.

I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen broadly

as a

beneficial feature with users clearly leveraging it and in more than some
“checkbox” way.

RLS has to overcome that objection, and I think it did, as was better
for doing that.

Beyond it being called a checkbox - what were the arguments against it? I

The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.

We, as a community, are clearly losing value by lack of this capability,

if by

no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this

feature, for

whatever their reasoning.

That is true, but I go back to my concern over useful feature vs. check
box.

While it’s easy to label something as checkbox, I don’t feel we have been fair

No, actually, it isn't. I am not sure why you are saying that.

to our users in doing so as it has historically prevented features which our
users are demanding and end up getting from commercial providers until we
implement them ultimately anyway. This particular argument simply doesn’t seem
to actually hold the value that proponents of it claim, for us at least, and we
have clear counter-examples which we can point to and I hope we learn from
those.

I don't think you are addressing actual issues above.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#18

sfrost@snowman.net

about 3 years ago

In reply to: Bruce Momjian (#17)

Re: Moving forward with TDE

Greetings,

On Mon, Mar 27, 2023 at 19:19 Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:

I consider the operating system and its processes as much more of a
single entity than TLS over a network.

This may be the case sometimes but there’s absolutely no shortage of

other

cases and it’s almost more the rule these days, that there is some kind

of

network between the OS processes and the storage- a SAN, an iSCSI

network, NFS,

are all quite common.

Yes, but consider that the database cluster is having to get its data
from that remote storage --- the remote storage is not an independent
entity that can be corrupted without the databaes server being
compromised. If everything in PGDATA was GCM-verified, it would be
secure, but because some parts are not, I don't think it would be.

The remote storage is certainly an independent system. Multi-mount LUNs are
entirely possible in a SAN (and absolutely with NFS, or just the NFS server
itself is compromised..), so while the attacker may not have any access to
the database server itself, they may have access to these other systems,
and that’s not even considering in-transit attacks which are also
absolutely possible, especially with iSCSI or NFS.

I don’t understand what is being claimed that the remote storage is “not an
independent system” based on my understanding of, eg, NFS. With NFS, a
directory on the NFS server is exported and the client mounts that
directory as NFS locally, all over a network which may or may not be
secured against manipulation. A user on the NFS server with root access is
absolutely able to access and modify files on the NFS server trivially,
even if they have no access to the PG server. Would you explain what you
mean?

I do agree that the ideal case would be that we encrypt everything we can
(not everything can be for various reasons, but we don’t actually need to
either) in the PGDATA directory is encrypted and authenticated, just like
it would be ideal if everything was checksum’d and isn’t today. We are
progressing in that direction thanks to efforts such as reworking the other
subsystems to used shared buffers and a consistent page format, but just
like with checksums we do not need to have the perfect solution for us to
provide a lot of value here- and our users know that as the same is true of
the unauthenticated encryption approaches being offered by the commercial
solutions.

As specific examples, consider:

An attack against the database system where the database server is

shut

down,

or a backup, and the encryption key isn’t available on the system.

The backup system itself, not running as the PG user (an option

supported

by PG

and at least pgbackrest) being compromised, thus allowing for

injection

of

changes into a backup or into a restore.

I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.

Pgbackrest has encryption and authentication of it … but that doesn’t

actually

address the attack vector that I outlined. If the backup user is

compromised

then they can change the data before it gets to the storage. If the

backup

user is compromised then they have access to whatever key is used to

encrypt

and authenticate the backup and therefore can trivially manipulate the

data.

So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.

That’s perhaps a fair consideration- but it’s clearly of enough value that
many of our users are asking for it and not using PG because we don’t have
it today. Ultimately though, this clearly makes it more than a “checkbox”
feature. I hope we are able to agree on that now.

What were the _technical_ reasons for those objections?

I believe largely the ones I’m bringing up here and which I outline

above… I

don’t mean to pretend that any of this is of my own independent

construction. I

don’t believe it is and my apologies if it came across that way.

Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.

For the users who ask on the lists for this feature, regularly, how many
don’t ask because they google or find prior responses on the list to the
question of if we have this capability? How do we know that their cases
are “checkbox”? Consider that there are standards groups which explicitly
consider these attack vectors and consider them important enough to require
mitigations to address those vectors. Do the end users of PG understand the
attack vectors or why they matter? Perhaps not, but just because they
can’t articulate the reasoning does NOT mean that the attack vector doesn’t
exist or that their environment is somehow immune to it- indeed, as the
standards bodies surely know, the opposite is true- they’re almost
certainly at risk of those attack vectors and therefore the standards
bodies are absolutely justified in requiring them to provide a solution.
Treating these users as unimportant because they don’t have the depth of
understanding that we do or that the standards body does is not helping
them- it’s actively driving them away from PG.

I’ve grown weary of this argument as the other major piece of work

it was

routinely applied to was RLS and yet that has certainly been seen

broadly

as a

beneficial feature with users clearly leveraging it and in more

than some

“checkbox” way.

RLS has to overcome that objection, and I think it did, as was better
for doing that.

Beyond it being called a checkbox - what were the arguments against it?

I

The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.

This is an excellent point- and dovetails very nicely into my argument that
protecting primary data (what is provided by users and ends up in indexes
and heaps) is valuable even if we don’t (yet..) have protection for other
parts of the system. Reducing the size of the attack vector is absolutely
useful, especially when it’s such a large amount of the data in the system.
Yes, we should, and will, continue to improve- as we do with many features,
but we don’t need to wait for perfection to include this feature, just as
with RLS and numerous other features we have.

We, as a community, are clearly losing value by lack of this

capability,

if by

no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this

feature, for

whatever their reasoning.

That is true, but I go back to my concern over useful feature vs.

check

box.

While it’s easy to label something as checkbox, I don’t feel we have

been fair

No, actually, it isn't. I am not sure why you are saying that.

I’m confused as to what is required to label a feature as a “checkbox”
feature then. What did you us to make that determination of this feature?
I’m happy to be wrong here.

to our users in doing so as it has historically prevented features which
our

users are demanding and end up getting from commercial providers until we
implement them ultimately anyway. This particular argument simply

doesn’t seem

to actually hold the value that proponents of it claim, for us at least,

and we

have clear counter-examples which we can point to and I hope we learn

from

those.

I don't think you are addressing actual issues above.

Specifics would be really helpful. I don’t doubt that there are things I’m
missing, but I’ve tried to address each point raised clearly and concisely.

Thanks!

Stephen

Show quoted text

#19

bruce@momjian.us

about 3 years ago

In reply to: Stephen Frost (#18)

Re: Moving forward with TDE

On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:

The remote storage is certainly an independent system. Multi-mount LUNs are
entirely possible in a SAN (and absolutely with NFS, or just the NFS server
itself is compromised..), so while the attacker may not have any access to the
database server itself, they may have access to these other systems, and that’s
not even considering in-transit attacks which are also absolutely possible,
especially with iSCSI or NFS.

I don’t understand what is being claimed that the remote storage is “not an
independent system” based on my understanding of, eg, NFS. With NFS, a
directory on the NFS server is exported and the client mounts that directory as
NFS locally, all over a network which may or may not be secured against
manipulation. A user on the NFS server with root access is absolutely able to
access and modify files on the NFS server trivially, even if they have no
access to the PG server. Would you explain what you mean?

The point is that someone could change values in the storage, pg_xact,
encryption settings, binaries, that would allow the attacker to learn
the encryption key. This is not possible for two secure endpoints and
someone changing data in transit. Yeah, it took me a while to
understand these boundaries too.

So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.

That’s perhaps a fair consideration- but it’s clearly of enough value that many
of our users are asking for it and not using PG because we don’t have it today.
Ultimately though, this clearly makes it more than a “checkbox” feature. I hope
we are able to agree on that now.

It is more than a check box feature, yes, but I am guessing few people
are wanting the this for the actual features beyond check box.

Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.

For the users who ask on the lists for this feature, regularly, how many don’t
ask because they google or find prior responses on the list to the question of
if we have this capability? How do we know that their cases are “checkbox”?

Because I have rarely heard people articulate the value beyond check
box.

Consider that there are standards groups which explicitly consider these attack
vectors and consider them important enough to require mitigations to address
those vectors. Do the end users of PG understand the attack vectors or why they
matter? Perhaps not, but just because they can’t articulate the reasoning does
NOT mean that the attack vector doesn’t exist or that their environment is
somehow immune to it- indeed, as the standards bodies surely know, the opposite
is true- they’re almost certainly at risk of those attack vectors and therefore
the standards bodies are absolutely justified in requiring them to provide a
solution. Treating these users as unimportant because they don’t have the depth
of understanding that we do or that the standards body does is not helping
them- it’s actively driving them away from PG.

Well, then who is going to explain them here, because I have not heard
them yet.

The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.

This is an excellent point- and dovetails very nicely into my argument that
protecting primary data (what is provided by users and ends up in indexes and
heaps) is valuable even if we don’t (yet..) have protection for other parts of
the system. Reducing the size of the attack vector is absolutely useful,
especially when it’s such a large amount of the data in the system. Yes, we
should, and will, continue to improve- as we do with many features, but we
don’t need to wait for perfection to include this feature, just as with RLS and
numerous other features we have.

The issue is that you needed a certain type of user with a certain type
of access to break RLS, while for this, writing to PGDATA is the simple
case for all the breakage, and the thing we are protecting with
authentication.

> We, as a community, are clearly losing value by lack of this

capability,

if by
> no other measure than simply the numerous users of the commercial
> implementations feeling that they simply can’t use PG without this
feature, for
> whatever their reasoning.

That is true, but I go back to my concern over useful feature vs.

check

box.

While it’s easy to label something as checkbox, I don’t feel we have been

fair

No, actually, it isn't. I am not sure why you are saying that.

I’m confused as to what is required to label a feature as a “checkbox” feature
then. What did you us to make that determination of this feature? I’m happy to
be wrong here.

I don't see the point in me continuing to reply here. You just seem to
continue asking questions without actually thinking of what I am saying,
and hope I get tired or something.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

#20