Moving forward with TDE
Hi -hackers,
Working with Stephen, I am attempting to pick up some of the work that
was left off with TDE and the key management infrastructure. I have
rebased Bruce's KMS/TDE patches as they existed on the
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption wiki
page, which are enclosed in this email.
I would love to open a discussion about how to move forward and get
some of these features built out. The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature? Any major
developments in direction that would not be reflected in the code from
May 2021?
Thanks,
David
Attachments:
0001-cfe-01-doc_over_master-squash-commit.patchapplication/octet-stream; name=0001-cfe-01-doc_over_master-squash-commit.patchDownload+165-2
0002-cfe-02-internaldoc_over_cfe-01-doc-squash-commit.patchapplication/octet-stream; name=0002-cfe-02-internaldoc_over_cfe-01-doc-squash-commit.patchDownload+231-1
0003-cfe-03-scripts_over_cfe-02-internaldoc-squash-commit.patchapplication/octet-stream; name=0003-cfe-03-scripts_over_cfe-02-internaldoc-squash-commit.patchDownload+325-2
0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patchapplication/octet-stream; name=0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patchDownload+1159-2
0005-cfe-05-crypto_over_cfe-04-common-squash-commit.patchapplication/octet-stream; name=0005-cfe-05-crypto_over_cfe-04-common-squash-commit.patchDownload+588-17
0006-cfe-06-backend_over_cfe-05-crypto-squash-commit.patchapplication/octet-stream; name=0006-cfe-06-backend_over_cfe-05-crypto-squash-commit.patchDownload+194-9
0007-cfe-07-bin_over_cfe-06-backend-squash-commit.patchapplication/octet-stream; name=0007-cfe-07-bin_over_cfe-06-backend-squash-commit.patchDownload+361-22
0008-cfe-08-pg_alterckey_over_cfe-07-bin-squash-commit.patchapplication/octet-stream; name=0008-cfe-08-pg_alterckey_over_cfe-07-bin-squash-commit.patchDownload+1000-1
0009-cfe-09-test_over_cfe-08-pg_alterckey-squash-commit.patchapplication/octet-stream; name=0009-cfe-09-test_over_cfe-08-pg_alterckey-squash-commit.patchDownload+1709-1
0010-cfe-10-hint_over_cfe-09-test-squash-commit.patchapplication/octet-stream; name=0010-cfe-10-hint_over_cfe-09-test-squash-commit.patchDownload+214-53
0011-cfe-11-gist_over_cfe-10-hint-squash-commit.patchapplication/octet-stream; name=0011-cfe-11-gist_over_cfe-10-hint-squash-commit.patchDownload+48-16
0012-cfe-12-rel_over_cfe-11-gist-squash-commit.patchapplication/octet-stream; name=0012-cfe-12-rel_over_cfe-11-gist-squash-commit.patchDownload+396-49
Hi David,
Working with Stephen, I am attempting to pick up some of the work that
was left off with TDE and the key management infrastructure. I have
rebased Bruce's KMS/TDE patches as they existed on the
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption wiki
page, which are enclosed in this email.
I'm happy to see that the TDE effort was picked up.
I would love to open a discussion about how to move forward and get
some of these features built out. The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature? Any major
developments in direction that would not be reflected in the code from
May 2021?
The patches seem to be well documented and decomposed in small pieces.
That's good.
Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1]https://commitfest.postgresql.org/40/3985/ to "Waiting for Author" since the patchset
doesn't pass the CI [2]http://cfbot.cputube.org/.
One limitation of the design described on the wiki I see is that it
seems to heavily rely on AES:
We will use Advanced Encryption Standard (AES) [4]. We will offer three key length options (128, 192, and 256-bits) selected at initdb time with --file-encryption-method
(there doesn't seem to be any mention of the hash/MAC algorithms,
that's odd). In the future we should be able to add the support of
alternative algorithms. The reason is that the algorithms can become
weak every 20 years or so, and the preferred algorithms may also
depend on the region. This should NOT be implemented in this
particular patchset, but the design shouldn't prevent from
implementing this in the future.
[1]: https://commitfest.postgresql.org/40/3985/
[2]: http://cfbot.cputube.org/
--
Best regards,
Aleksander Alekseev
Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].
Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.
Known issues (just discovered by me in testing the latest revision) is
that databases created from `template0` are not decrypting properly,
but `template1` works fine, so going to dig in on that soon.
One limitation of the design described on the wiki I see is that it
seems to heavily rely on AES:We will use Advanced Encryption Standard (AES) [4]. We will offer three key length options (128, 192, and 256-bits) selected at initdb time with --file-encryption-method
(there doesn't seem to be any mention of the hash/MAC algorithms,
that's odd). In the future we should be able to add the support of
alternative algorithms. The reason is that the algorithms can become
weak every 20 years or so, and the preferred algorithms may also
depend on the region. This should NOT be implemented in this
particular patchset, but the design shouldn't prevent from
implementing this in the future.
Yes, we definitely are considering multiple algorithms support as part
of this effort.
Best,
David
Attachments:
v2-0001-cfe-01-doc_over_master-squash-commit.patchapplication/octet-stream; name=v2-0001-cfe-01-doc_over_master-squash-commit.patchDownload+165-2
v2-0005-cfe-05-crypto_over_cfe-04-common-squash-commit.patchapplication/octet-stream; name=v2-0005-cfe-05-crypto_over_cfe-04-common-squash-commit.patchDownload+588-17
v2-0002-cfe-02-internaldoc_over_cfe-01-doc-squash-commit.patchapplication/octet-stream; name=v2-0002-cfe-02-internaldoc_over_cfe-01-doc-squash-commit.patchDownload+231-1
v2-0003-cfe-03-scripts_over_cfe-02-internaldoc-squash-com.patchapplication/octet-stream; name=v2-0003-cfe-03-scripts_over_cfe-02-internaldoc-squash-com.patchDownload+325-2
v2-0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patchapplication/octet-stream; name=v2-0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patchDownload+1159-2
v2-0007-cfe-07-bin_over_cfe-06-backend-squash-commit.patchapplication/octet-stream; name=v2-0007-cfe-07-bin_over_cfe-06-backend-squash-commit.patchDownload+361-22
v2-0006-cfe-06-backend_over_cfe-05-crypto-squash-commit.patchapplication/octet-stream; name=v2-0006-cfe-06-backend_over_cfe-05-crypto-squash-commit.patchDownload+197-12
v2-0008-cfe-08-pg_alterckey_over_cfe-07-bin-squash-commit.patchapplication/octet-stream; name=v2-0008-cfe-08-pg_alterckey_over_cfe-07-bin-squash-commit.patchDownload+1000-1
v2-0009-cfe-09-test_over_cfe-08-pg_alterckey-squash-commi.patchapplication/octet-stream; name=v2-0009-cfe-09-test_over_cfe-08-pg_alterckey-squash-commi.patchDownload+1709-1
v2-0010-cfe-10-hint_over_cfe-09-test-squash-commit.patchapplication/octet-stream; name=v2-0010-cfe-10-hint_over_cfe-09-test-squash-commit.patchDownload+214-53
v2-0011-cfe-11-gist_over_cfe-10-hint-squash-commit.patchapplication/octet-stream; name=v2-0011-cfe-11-gist_over_cfe-10-hint-squash-commit.patchDownload+48-16
v2-0012-cfe-12-rel_over_cfe-11-gist-squash-commit.patchapplication/octet-stream; name=v2-0012-cfe-12-rel_over_cfe-11-gist-squash-commit.patchDownload+396-49
On Fri, Nov 4, 2022 at 3:36 AM David Christensen
<david.christensen@crunchydata.com> wrote:
Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.
I was looking into the documentation patches 0001 and 0002, I think
the explanation is very clear. I have a few questions/comments
+By not using the database id in the IV, CREATE DATABASE can copy the
+heap/index files from the old database to a new one without
+decryption/encryption. Both page copies are valid. Once a database
+changes its pages, it gets new LSNs, and hence new IV.
How about the WAL_LOG method for creating a database? because in that
we get the new LSN for the pages in the new database, so do we
reencrypt, if yes then this documentation needs to be updated
otherwise we might need to add that code.
+changes its pages, it gets new LSNs, and hence new IV. Using only the
+LSN and page number also avoids requiring pg_upgrade to preserve
+database oids, tablespace oids, and relfilenodes.
I think this line needs to be changed, because now we are already
preserving dbid/tbsid/relfilenode. So even though we are not using
those in IV there is no point in saying we are avoiding that
requirement.
I will review the remaining patches soon.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 24, 2022 at 9:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:
I would love to open a discussion about how to move forward and get
some of these features built out. The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature? Any major
developments in direction that would not be reflected in the code from
May 2021?
I don't think the patchset here has incorporated the results of the
discussion [1]/messages/by-id/20211013222648.GA373@momjian.us that happened at the end of 2021. For example, it looks
like AES-CTR is still in use for the pages, which I thought was
already determined to be insufficient.
The following next steps were proposed in that thread:
1. modify temporary file I/O to use a more centralized API
2. modify the existing cluster file encryption patch to use XTS with a
IV that uses more than the LSN
3. add XTS regression test code like CTR
4. create WAL encryption code using CTR
Does this patchset need review before those steps are taken (or was
there additional conversation/work that I missed)?
Thanks,
--Jacob
On Nov 15, 2022, at 1:08 PM, Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Oct 24, 2022 at 9:29 AM David Christensen
<david.christensen@crunchydata.com> wrote:I would love to open a discussion about how to move forward and get
some of these features built out. The historical threads here are
quite long and complicated; is there a "current state" other than the
wiki that reflects the general thinking on this feature? Any major
developments in direction that would not be reflected in the code from
May 2021?I don't think the patchset here has incorporated the results of the
discussion [1] that happened at the end of 2021. For example, it looks
like AES-CTR is still in use for the pages, which I thought was
already determined to be insufficient.
Good to know about the next steps, thanks.
The following next steps were proposed in that thread:
1. modify temporary file I/O to use a more centralized API
2. modify the existing cluster file encryption patch to use XTS with a
IV that uses more than the LSN
3. add XTS regression test code like CTR
4. create WAL encryption code using CTRDoes this patchset need review before those steps are taken (or was
there additional conversation/work that I missed)?
This was just a refresh of the old patches on the wiki to work as written on HEAD. If there are known TODOs here this then that work is still needing to be done.
I was going to take 2) and Stephen was going to work on 3); I am not sure about the other two but will review the thread you pointed to. Thanks for pointing that out.
David
On Tue, Nov 15, 2022 at 11:39 AM David Christensen
<david.christensen@crunchydata.com> wrote:
Good to know about the next steps, thanks.
You're welcome!
This was just a refresh of the old patches on the wiki to work as written on HEAD. If there are known TODOs here this then that work is still needing to be done.
I was going to take 2) and Stephen was going to work on 3); I am not sure about the other two but will review the thread you pointed to. Thanks for pointing that out.
I've attached the diffs I'm carrying to build this under meson (as
well as -Wshadow; my removal of the two variables probably needs some
scrutiny). It looks like the testcrypto executable will need
substantial changes after the common/hex.h revert.
--Jacob
Attachments:
fix-meson.patch.txttext/plain; charset=US-ASCII; name=fix-meson.patch.txtDownload+35-4
Hi Jacob,
Thanks, I've added this patch in my tree [1]https://github.com/pgguru/postgres/tree/tde. (For now, just adding
fixes and the like atop the original separate patches, but will
eventually get things winnowed down into probably the same 12 parts
the originals were reviewed in.
Best,
David
Hi Dilip,
Thanks for the feedback here. I will review the docs changes and add to my tree.
Best,
David
On Fri, 4 Nov 2022 at 03:36, David Christensen
<david.christensen@crunchydata.com> wrote:
Unless somebody in the community remembers open questions/issues with
TDE that were never addressed I suggest simply iterating with our
usual testing/reviewing process. For now I'm going to change the
status of the CF entry [1] to "Waiting for Author" since the patchset
doesn't pass the CI [2].Thanks, enclosed is a new version that is rebased on HEAD and fixes a
bug that the new pg_control_init() test picked up.
The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3985.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
b82557ecc2ebbf649142740a1c5ce8d19089f620 ===
=== applying patch
./v2-0004-cfe-04-common_over_cfe-03-scripts-squash-commit.patch
patching file src/common/Makefile
Hunk #2 FAILED at 84.
1 out of 2 hunks FAILED -- saving rejects to file src/common/Makefile.rej
[1]: http://cfbot.cputube.org/patch_41_3985.log
Regards,
Vignesh
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
I have decided to write a review here in terms of whether we want this feature, and perhaps the way we should look at encryption as a project down the road, since I think this is only the beginning. I am hoping to run some full tests of the feature sometime in coming weeks. Right now this review is limited to the documentation and documented feature.
From the documentation, the primary threat model of TDE is to prevent decryption of data from archived wal segments (and data files), for example on a backup system. While there are other methods around this problem to date, I think that this feature is worth pursuing for that reason. I want to address a couple of reasons for this and then go into some reservations I have about how some of this is documented.
There are current workarounds to ensuring encryption at rest, but these have a number of problems. Encryption passphrases end up lying around the system in various places. Key rotation is often difficult. And one mistake can easily render all efforts ineffective. TDE solves these problems. The overall design from the internal docs looks solid. This definitely is something I would recommend for many users.
I have a couple small caveats though. Encryption of data is a large topic and there isn't a one-size-fits-all solution to industrial or state requirements. Having all this key management available in PostgreSQL is a very good thing. Long run it is likely to end up being extensible, and therefore both more powerful and offering a wider range of choices for solution architects. Implementing encryption is also something that is easy to mess up. For this reason I think it would be great if we had a standardized format for discussing encryption options that we could use going forward. I don't think that should be held against this patch but I think we need to start discussing it now because it will be a bigger problem later.
A second caveat I have is that key management is a topic where you really need a good overview of internals in order to implement effectively. If you don't know how an SSL handshake works or what is in a certificate, you can easily make mistakes in setting up SSL. I can see the same thing happening here. For example, I don't think it would be safe to leave the KEK on an encrypted filesystem that is decrypted at runtime (or at least I wouldn't consider that safe -- your appetite for risk may vary).
My proposal would be to have build a template for encryption options in the documentation. This could include topics like SSL as well. In such a template we'd have sections like "Threat model," "How it works," "Implementation Requirements" and so forth. Again I don't think this needs to be part of the current patch but I think it is something we need to start thinking about now. Maybe after this goes in, I can present a proposed documentation patch.
I will also note that I don't consider myself to be very qualified on topics like encryption. I can reason about key management to some extent but some implementation details may be beyond me. I would hope we could get some extra review on this patch set soon.
Greetings,
* Chris Travers (chris.travers@gmail.com) wrote:
From the documentation, the primary threat model of TDE is to prevent decryption of data from archived wal segments (and data files), for example on a backup system. While there are other methods around this problem to date, I think that this feature is worth pursuing for that reason. I want to address a couple of reasons for this and then go into some reservations I have about how some of this is documented.
Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.
There are current workarounds to ensuring encryption at rest, but these have a number of problems. Encryption passphrases end up lying around the system in various places. Key rotation is often difficult. And one mistake can easily render all efforts ineffective. TDE solves these problems. The overall design from the internal docs looks solid. This definitely is something I would recommend for many users.
There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.
I have a couple small caveats though. Encryption of data is a large topic and there isn't a one-size-fits-all solution to industrial or state requirements. Having all this key management available in PostgreSQL is a very good thing. Long run it is likely to end up being extensible, and therefore both more powerful and offering a wider range of choices for solution architects. Implementing encryption is also something that is easy to mess up. For this reason I think it would be great if we had a standardized format for discussing encryption options that we could use going forward. I don't think that should be held against this patch but I think we need to start discussing it now because it will be a bigger problem later.
Do you have a suggestion as to the format to use?
A second caveat I have is that key management is a topic where you really need a good overview of internals in order to implement effectively. If you don't know how an SSL handshake works or what is in a certificate, you can easily make mistakes in setting up SSL. I can see the same thing happening here. For example, I don't think it would be safe to leave the KEK on an encrypted filesystem that is decrypted at runtime (or at least I wouldn't consider that safe -- your appetite for risk may vary).
Agreed that we should document this and make clear that the KEK is
necessary for server start but absolutely should be kept as safe as
possible and certainly not stored on disk somewhere nearby the encrypted
cluster.
My proposal would be to have build a template for encryption options in the documentation. This could include topics like SSL as well. In such a template we'd have sections like "Threat model," "How it works," "Implementation Requirements" and so forth. Again I don't think this needs to be part of the current patch but I think it is something we need to start thinking about now. Maybe after this goes in, I can present a proposed documentation patch.
I'm not entirely sure that it makes sense to lump this and TLS in the
same place as they end up being rather independent at the end of the
day. If you have ideas for how to improve the documentation, I'd
certainly encourage you to go ahead and work on that and submit it as a
patch rather than waiting for this to actually land in core. Having
good and solid documentation is something that will help this get in,
after all, and to the extent that it's covering existing topics like
TLS, those could likely be included independently and that would be of
benefit to everyone.
I will also note that I don't consider myself to be very qualified on topics like encryption. I can reason about key management to some extent but some implementation details may be beyond me. I would hope we could get some extra review on this patch set soon.
Certainly agree with you there though there's an overall trajectory of
patches involved in all of this that's a bit deep. The plan is to
discuss that at PGCon (On the Road to TDE) and at the PGCon
Unconference after. I certainly hope those interested will be there.
I'm also happy to have a call with anyone interested in this effort
independent of that, of course.
Thanks!
Stephen
On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.
I thought some more about this. GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the same
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.
There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.
Yes, the number of commercial implementations of this is a concern. Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.
Certainly agree with you there though there's an overall trajectory of
patches involved in all of this that's a bit deep. The plan is to
discuss that at PGCon (On the Road to TDE) and at the PGCon
Unconference after. I certainly hope those interested will be there.
I'm also happy to have a call with anyone interested in this effort
independent of that, of course.
I will not be attending Ottawa.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Greetings,
On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.I thought some more about this. GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the same
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.
What are the two end points you are referring to and why don’t you feel
there is an opportunity between them for a malicious actor to attack the
system?
There are simpler cases to consider than an online attack on a single
independent system where an attacker having access to modify the data in
transit between PG and the storage would imply the attacker also having
access to read keys out of PG’s memory.
As specific examples, consider:
An attack against the database system where the database server is shut
down, or a backup, and the encryption key isn’t available on the system.
The backup system itself, not running as the PG user (an option supported
by PG and at least pgbackrest) being compromised, thus allowing for
injection of changes into a backup or into a restore.
The beginning of this discussion also very clearly had individuals voicing
strong opinions that unauthenticated encryption methods were not acceptable
as an end-state for PG due to the clear issue of there then being no
protection against modification of data. The approach we are working
towards provides both the unauthenticated option, which clearly has value
to a large number of our collective user base considering the number of
commercial implementations which have now arisen, and the authenticated
solution which goes further and provides the level clearly expected of the
PG community. This gets us a win-win situation.
There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.Yes, the number of commercial implementations of this is a concern. Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.
I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen broadly
as a beneficial feature with users clearly leveraging it and in more than
some “checkbox” way.
Indeed, it’s similar also in that commercial implementations were done of
RLS while there were arguments made about it being a checkbox feature which
were used to discourage it from being implemented in core. Were it truly
checkbox, I don’t feel we would have the regular and ongoing discussion
about it on the lists that we do, nor see other tools built on top of PG
which specifically leverage it. Perhaps there are truly checkbox features
out there which we will never implement, but I’m (perhaps due to what my
dad would call selective listening on my part, perhaps not) having trouble
coming up with any presently. Features that exist in other systems that we
don’t want? Certainly. We don’t characterize those as simply “checkbox”
though. Perhaps that’s in part because we provide alternatives- but that’s
not the case here. We have no comparable way to have this capability as
part of the core system.
We, as a community, are clearly losing value by lack of this capability, if
by no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this feature,
for whatever their reasoning.
Thanks,
Stephen
On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:
Greetings,
On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
Agreed, though the latest efforts include an option for *authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protected by
the authenticated encryption being used.I thought some more about this. GCM-style authentication of encrypted
data has value because it assumes the two end points are secure but that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in the same
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can the end
points, and the encryption key exposed.What are the two end points you are referring to and why don’t you feel there
is an opportunity between them for a malicious actor to attack the system?
Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?
There are simpler cases to consider than an online attack on a single
independent system where an attacker having access to modify the data in
transit between PG and the storage would imply the attacker also having access
to read keys out of PG’s memory.
I consider the operating system and its processes as much more of a
single entity than TLS over a network.
As specific examples, consider:
An attack against the database system where the database server is shut down,
or a backup, and the encryption key isn’t available on the system.The backup system itself, not running as the PG user (an option supported by PG
and at least pgbackrest) being compromised, thus allowing for injection of
changes into a backup or into a restore.
I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.
The beginning of this discussion also very clearly had individuals voicing
strong opinions that unauthenticated encryption methods were not acceptable as
an end-state for PG due to the clear issue of there then being no protection
against modification of data. The approach we are working towards provides
What were the _technical_ reasons for those objections?
both the unauthenticated option, which clearly has value to a large number of
our collective user base considering the number of commercial implementations
which have now arisen, and the authenticated solution which goes further and
provides the level clearly expected of the PG community. This gets us a win-win
situation.There's clearly user demand for it as there's a number of organizations
who have forks which are providing it in one shape or another. This
kind of splintering of the community is actually an actively bad thing
for the project and is part of what killed Unix, by at least some pretty
reputable accounts, in my view.Yes, the number of commercial implementations of this is a concern. Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen broadly as a
beneficial feature with users clearly leveraging it and in more than some
“checkbox” way.
RLS has to overcome that objection, and I think it did, as was better
for doing that.
We, as a community, are clearly losing value by lack of this capability, if by
no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without this feature, for
whatever their reasoning.
That is true, but I go back to my concern over useful feature vs. check
box.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Greetings,
On Mon, Mar 27, 2023 at 18:17 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:
Greetings,
On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
Agreed, though the latest efforts include an option for
*authenticated*
encryption as well as unauthenticated. That makes it much more
difficult to make undetected changes to the data that's protectedby
the authenticated encryption being used.
I thought some more about this. GCM-style authentication of
encrypted
data has value because it assumes the two end points are secure but
that
a malicious actor could modify data during transfer. In the Postgres
case, it seems the two end points and the transfer are all in thesame
place. Therefore, it is unclear to me the value of using GCM-style
authentication because if the GCM-level can be modified, so can theend
points, and the encryption key exposed.
What are the two end points you are referring to and why don’t you feel
there
is an opportunity between them for a malicious actor to attack the
system?
Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?
TLS does use GCM.. pretty much exclusively as far as I can recall. So do a
lot of other things though..
There are simpler cases to consider than an online attack on a single
independent system where an attacker having access to modify the data in
transit between PG and the storage would imply the attacker also havingaccess
to read keys out of PG’s memory.
I consider the operating system and its processes as much more of a
single entity than TLS over a network.
This may be the case sometimes but there’s absolutely no shortage of other
cases and it’s almost more the rule these days, that there is some kind of
network between the OS processes and the storage- a SAN, an iSCSI network,
NFS, are all quite common.
As specific examples, consider:
An attack against the database system where the database server is shut
down,
or a backup, and the encryption key isn’t available on the system.
The backup system itself, not running as the PG user (an option
supported by PG
and at least pgbackrest) being compromised, thus allowing for injection
of
changes into a backup or into a restore.
I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.
Pgbackrest has encryption and authentication of it … but that doesn’t
actually address the attack vector that I outlined. If the backup user is
compromised then they can change the data before it gets to the storage.
If the backup user is compromised then they have access to whatever key is
used to encrypt and authenticate the backup and therefore can trivially
manipulate the data.
Encryption of backups by the backup tool serves to protect the data after
it leaves the backup system and is stored in cloud storage or in whatever
format the repository takes. This is beneficial, particularly when the
data itself offers no protection, but simply not the same.
The beginning of this discussion also very clearly had individuals voicing
strong opinions that unauthenticated encryption methods were not
acceptable as
an end-state for PG due to the clear issue of there then being no
protection
against modification of data. The approach we are working towards
provides
What were the _technical_ reasons for those objections?
I believe largely the ones I’m bringing up here and which I outline above…
I don’t mean to pretend that any of this is of my own independent
construction. I don’t believe it is and my apologies if it came across that
way.
both the unauthenticated option, which clearly has value to a large
number ofour collective user base considering the number of commercial
implementations
which have now arisen, and the authenticated solution which goes further
and
provides the level clearly expected of the PG community. This gets us a
win-win
situation.
There's clearly user demand for it as there's a number of
organizations
who have forks which are providing it in one shape or another.
This
kind of splintering of the community is actually an actively bad
thing
for the project and is part of what killed Unix, by at least some
pretty
reputable accounts, in my view.
Yes, the number of commercial implementations of this is a concern.
Of
course, it is also possible that those commercial implementations are
meeting checkbox requirements rather than technical ones, and the
community has been hostile to check box-only features.I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seenbroadly as a
beneficial feature with users clearly leveraging it and in more than some
“checkbox” way.RLS has to overcome that objection, and I think it did, as was better
for doing that.
Beyond it being called a checkbox - what were the arguments against it? I
don’t object to being challenged to point out the use cases, but I feel
that at least some very clear and straight forward ones are outlined from
what has been said above. I also don’t believe those are the only ones but
I don’t think I could enumerate every use case for RLS either, even after
seeing it used for quite a few years. I do seriously question the level of
effort expected of features that are claimed to be “Checkbox” and tossed
almost exclusively for that reason on this list given the success of the
ones that have been accepted and are in active use by our users today.
We, as a community, are clearly losing value by lack of this capability,
if byno other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without thisfeature, for
whatever their reasoning.
That is true, but I go back to my concern over useful feature vs. check
box.
While it’s easy to label something as checkbox, I don’t feel we have been
fair to our users in doing so as it has historically prevented features
which our users are demanding and end up getting from commercial providers
until we implement them ultimately anyway. This particular argument simply
doesn’t seem to actually hold the value that proponents of it claim, for us
at least, and we have clear counter-examples which we can point to and I
hope we learn from those.
Thanks!
Stephen
Show quoted text
On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:
I consider the operating system and its processes as much more of a
single entity than TLS over a network.This may be the case sometimes but there’s absolutely no shortage of other
cases and it’s almost more the rule these days, that there is some kind of
network between the OS processes and the storage- a SAN, an iSCSI network, NFS,
are all quite common.
Yes, but consider that the database cluster is having to get its data
from that remote storage --- the remote storage is not an independent
entity that can be corrupted without the databaes server being
compromised. If everything in PGDATA was GCM-verified, it would be
secure, but because some parts are not, I don't think it would be.
As specific examples, consider:
An attack against the database system where the database server is shut
down,
or a backup, and the encryption key isn’t available on the system.
The backup system itself, not running as the PG user (an option supported
by PG
and at least pgbackrest) being compromised, thus allowing for injection
of
changes into a backup or into a restore.
I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.Pgbackrest has encryption and authentication of it … but that doesn’t actually
address the attack vector that I outlined. If the backup user is compromised
then they can change the data before it gets to the storage. If the backup
user is compromised then they have access to whatever key is used to encrypt
and authenticate the backup and therefore can trivially manipulate the data.
So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.
What were the _technical_ reasons for those objections?
I believe largely the ones I’m bringing up here and which I outline above… I
don’t mean to pretend that any of this is of my own independent construction. I
don’t believe it is and my apologies if it came across that way.
Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.
I’ve grown weary of this argument as the other major piece of work it was
routinely applied to was RLS and yet that has certainly been seen broadlyas a
beneficial feature with users clearly leveraging it and in more than some
“checkbox” way.RLS has to overcome that objection, and I think it did, as was better
for doing that.Beyond it being called a checkbox - what were the arguments against it? I
The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.
We, as a community, are clearly losing value by lack of this capability,
if by
no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without thisfeature, for
whatever their reasoning.
That is true, but I go back to my concern over useful feature vs. check
box.While it’s easy to label something as checkbox, I don’t feel we have been fair
No, actually, it isn't. I am not sure why you are saying that.
to our users in doing so as it has historically prevented features which our
users are demanding and end up getting from commercial providers until we
implement them ultimately anyway. This particular argument simply doesn’t seem
to actually hold the value that proponents of it claim, for us at least, and we
have clear counter-examples which we can point to and I hope we learn from
those.
I don't think you are addressing actual issues above.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Greetings,
On Mon, Mar 27, 2023 at 19:19 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:
I consider the operating system and its processes as much more of a
single entity than TLS over a network.This may be the case sometimes but there’s absolutely no shortage of
other
cases and it’s almost more the rule these days, that there is some kind
of
network between the OS processes and the storage- a SAN, an iSCSI
network, NFS,
are all quite common.
Yes, but consider that the database cluster is having to get its data
from that remote storage --- the remote storage is not an independent
entity that can be corrupted without the databaes server being
compromised. If everything in PGDATA was GCM-verified, it would be
secure, but because some parts are not, I don't think it would be.
The remote storage is certainly an independent system. Multi-mount LUNs are
entirely possible in a SAN (and absolutely with NFS, or just the NFS server
itself is compromised..), so while the attacker may not have any access to
the database server itself, they may have access to these other systems,
and that’s not even considering in-transit attacks which are also
absolutely possible, especially with iSCSI or NFS.
I don’t understand what is being claimed that the remote storage is “not an
independent system” based on my understanding of, eg, NFS. With NFS, a
directory on the NFS server is exported and the client mounts that
directory as NFS locally, all over a network which may or may not be
secured against manipulation. A user on the NFS server with root access is
absolutely able to access and modify files on the NFS server trivially,
even if they have no access to the PG server. Would you explain what you
mean?
I do agree that the ideal case would be that we encrypt everything we can
(not everything can be for various reasons, but we don’t actually need to
either) in the PGDATA directory is encrypted and authenticated, just like
it would be ideal if everything was checksum’d and isn’t today. We are
progressing in that direction thanks to efforts such as reworking the other
subsystems to used shared buffers and a consistent page format, but just
like with checksums we do not need to have the perfect solution for us to
provide a lot of value here- and our users know that as the same is true of
the unauthenticated encryption approaches being offered by the commercial
solutions.
As specific examples, consider:
An attack against the database system where the database server is
shut
down,
or a backup, and the encryption key isn’t available on the system.
The backup system itself, not running as the PG user (an option
supported
by PG
and at least pgbackrest) being compromised, thus allowing for
injection
of
changes into a backup or into a restore.
I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.Pgbackrest has encryption and authentication of it … but that doesn’t
actually
address the attack vector that I outlined. If the backup user is
compromised
then they can change the data before it gets to the storage. If the
backup
user is compromised then they have access to whatever key is used to
encrypt
and authenticate the backup and therefore can trivially manipulate the
data.
So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.
That’s perhaps a fair consideration- but it’s clearly of enough value that
many of our users are asking for it and not using PG because we don’t have
it today. Ultimately though, this clearly makes it more than a “checkbox”
feature. I hope we are able to agree on that now.
What were the _technical_ reasons for those objections?
I believe largely the ones I’m bringing up here and which I outline
above… I
don’t mean to pretend that any of this is of my own independent
construction. I
don’t believe it is and my apologies if it came across that way.
Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.
For the users who ask on the lists for this feature, regularly, how many
don’t ask because they google or find prior responses on the list to the
question of if we have this capability? How do we know that their cases
are “checkbox”? Consider that there are standards groups which explicitly
consider these attack vectors and consider them important enough to require
mitigations to address those vectors. Do the end users of PG understand the
attack vectors or why they matter? Perhaps not, but just because they
can’t articulate the reasoning does NOT mean that the attack vector doesn’t
exist or that their environment is somehow immune to it- indeed, as the
standards bodies surely know, the opposite is true- they’re almost
certainly at risk of those attack vectors and therefore the standards
bodies are absolutely justified in requiring them to provide a solution.
Treating these users as unimportant because they don’t have the depth of
understanding that we do or that the standards body does is not helping
them- it’s actively driving them away from PG.
I’ve grown weary of this argument as the other major piece of work
it was
routinely applied to was RLS and yet that has certainly been seen
broadly
as a
beneficial feature with users clearly leveraging it and in more
than some
“checkbox” way.
RLS has to overcome that objection, and I think it did, as was better
for doing that.Beyond it being called a checkbox - what were the arguments against it?
I
The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.
This is an excellent point- and dovetails very nicely into my argument that
protecting primary data (what is provided by users and ends up in indexes
and heaps) is valuable even if we don’t (yet..) have protection for other
parts of the system. Reducing the size of the attack vector is absolutely
useful, especially when it’s such a large amount of the data in the system.
Yes, we should, and will, continue to improve- as we do with many features,
but we don’t need to wait for perfection to include this feature, just as
with RLS and numerous other features we have.
We, as a community, are clearly losing value by lack of this
capability,
if by
no other measure than simply the numerous users of the commercial
implementations feeling that they simply can’t use PG without thisfeature, for
whatever their reasoning.
That is true, but I go back to my concern over useful feature vs.
check
box.
While it’s easy to label something as checkbox, I don’t feel we have
been fair
No, actually, it isn't. I am not sure why you are saying that.
I’m confused as to what is required to label a feature as a “checkbox”
feature then. What did you us to make that determination of this feature?
I’m happy to be wrong here.
to our users in doing so as it has historically prevented features which
ourusers are demanding and end up getting from commercial providers until we
implement them ultimately anyway. This particular argument simplydoesn’t seem
to actually hold the value that proponents of it claim, for us at least,
and we
have clear counter-examples which we can point to and I hope we learn
from
those.
I don't think you are addressing actual issues above.
Specifics would be really helpful. I don’t doubt that there are things I’m
missing, but I’ve tried to address each point raised clearly and concisely.
Thanks!
Stephen
Show quoted text
On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:
The remote storage is certainly an independent system. Multi-mount LUNs are
entirely possible in a SAN (and absolutely with NFS, or just the NFS server
itself is compromised..), so while the attacker may not have any access to the
database server itself, they may have access to these other systems, and that’s
not even considering in-transit attacks which are also absolutely possible,
especially with iSCSI or NFS.I don’t understand what is being claimed that the remote storage is “not an
independent system” based on my understanding of, eg, NFS. With NFS, a
directory on the NFS server is exported and the client mounts that directory as
NFS locally, all over a network which may or may not be secured against
manipulation. A user on the NFS server with root access is absolutely able to
access and modify files on the NFS server trivially, even if they have no
access to the PG server. Would you explain what you mean?
The point is that someone could change values in the storage, pg_xact,
encryption settings, binaries, that would allow the attacker to learn
the encryption key. This is not possible for two secure endpoints and
someone changing data in transit. Yeah, it took me a while to
understand these boundaries too.
So the idea is that the backup user can be compromised without the data
being vulnerable --- makes sense, though that use-case seems narrow.That’s perhaps a fair consideration- but it’s clearly of enough value that many
of our users are asking for it and not using PG because we don’t have it today.
Ultimately though, this clearly makes it more than a “checkbox” feature. I hope
we are able to agree on that now.
It is more than a check box feature, yes, but I am guessing few people
are wanting the this for the actual features beyond check box.
Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and the
check-box is what most people are asking for, I think.For the users who ask on the lists for this feature, regularly, how many don’t
ask because they google or find prior responses on the list to the question of
if we have this capability? How do we know that their cases are “checkbox”?
Because I have rarely heard people articulate the value beyond check
box.
Consider that there are standards groups which explicitly consider these attack
vectors and consider them important enough to require mitigations to address
those vectors. Do the end users of PG understand the attack vectors or why they
matter? Perhaps not, but just because they can’t articulate the reasoning does
NOT mean that the attack vector doesn’t exist or that their environment is
somehow immune to it- indeed, as the standards bodies surely know, the opposite
is true- they’re almost certainly at risk of those attack vectors and therefore
the standards bodies are absolutely justified in requiring them to provide a
solution. Treating these users as unimportant because they don’t have the depth
of understanding that we do or that the standards body does is not helping
them- it’s actively driving them away from PG.
Well, then who is going to explain them here, because I have not heard
them yet.
The RLS arguments were that queries could expoose some of the underlying
data, but in summary, that was considered acceptable.This is an excellent point- and dovetails very nicely into my argument that
protecting primary data (what is provided by users and ends up in indexes and
heaps) is valuable even if we don’t (yet..) have protection for other parts of
the system. Reducing the size of the attack vector is absolutely useful,
especially when it’s such a large amount of the data in the system. Yes, we
should, and will, continue to improve- as we do with many features, but we
don’t need to wait for perfection to include this feature, just as with RLS and
numerous other features we have.
The issue is that you needed a certain type of user with a certain type
of access to break RLS, while for this, writing to PGDATA is the simple
case for all the breakage, and the thing we are protecting with
authentication.
> We, as a community, are clearly losing value by lack of this
capability,
if by
> no other measure than simply the numerous users of the commercial
> implementations feeling that they simply can’t use PG without this
feature, for
> whatever their reasoning.That is true, but I go back to my concern over useful feature vs.
check
box.
While it’s easy to label something as checkbox, I don’t feel we have been
fair
No, actually, it isn't. I am not sure why you are saying that.
I’m confused as to what is required to label a feature as a “checkbox” feature
then. What did you us to make that determination of this feature? I’m happy to
be wrong here.
I don't see the point in me continuing to reply here. You just seem to
continue asking questions without actually thinking of what I am saying,
and hope I get tired or something.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Embrace your flaws. They make you human, rather than perfect,
which you will never be.
Greetings,
On Mon, Mar 27, 2023 at 21:35 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 02:03:50AM +0200, Stephen Frost wrote:
The remote storage is certainly an independent system. Multi-mount LUNs
are
entirely possible in a SAN (and absolutely with NFS, or just the NFS
server
itself is compromised..), so while the attacker may not have any access
to the
database server itself, they may have access to these other systems, and
that’s
not even considering in-transit attacks which are also absolutely
possible,
especially with iSCSI or NFS.
I don’t understand what is being claimed that the remote storage is “not
an
independent system” based on my understanding of, eg, NFS. With NFS, a
directory on the NFS server is exported and the client mounts thatdirectory as
NFS locally, all over a network which may or may not be secured against
manipulation. A user on the NFS server with root access is absolutelyable to
access and modify files on the NFS server trivially, even if they have no
access to the PG server. Would you explain what you mean?The point is that someone could change values in the storage, pg_xact,
encryption settings, binaries, that would allow the attacker to learn
the encryption key. This is not possible for two secure endpoints and
someone changing data in transit. Yeah, it took me a while to
understand these boundaries too.
This depends on the specific configuration of the systems, clearly. Being
able to change values in other parts of the system isn’t great and we
should work to improve on that, but clearly that isn’t so much of an issue
that people aren’t willing to accept a partial solution or existing
commercial solutions wouldn’t be accepted or considered viable. Indeed,
using GCM is objectively an improvement over what’s being offered commonly
today.
I also generally object to the idea that being able to manipulate the
PGDATA directory necessarily means being able to gain access to the KEK. In
trivial solutions, sure, it’s possible, but the NFS server should never be
asking some external KMS for the key to a given DB server and a reasonable
implementation won’t allow this, and instead would flag and log such an
attempt for someone to review, leading to a much faster realization of a
compromised system.
Certainly it’s much simpler to reason about an attacker with no knowledge
of either system and only network access to see if they can penetrate the
communications between the two end-points, but that is not the only case
where authenticated encryption is useful.
So the idea is that the backup user can be compromised without the
databeing vulnerable --- makes sense, though that use-case seems narrow.
That’s perhaps a fair consideration- but it’s clearly of enough value
that many
of our users are asking for it and not using PG because we don’t have it
today.
Ultimately though, this clearly makes it more than a “checkbox” feature.
I hope
we are able to agree on that now.
It is more than a check box feature, yes, but I am guessing few people
are wanting the this for the actual features beyond check box.
As I explained previously, perhaps the people asking are doing so for only
the “checkbox”, but that doesn’t mean it isn’t a useful feature or that it
isn’t valuable in its own right. Those checklists were compiled and
enforced for a reason, which the end users might not understand but is
still absolutely valuable. Sad to say, but frankly this is becoming more
and more common but we shouldn’t be faulting the users asking for it- if it
were truly useless then eventually it would be removed from the standard,
but it hasn’t and it won’t be because, while not every end user has a depth
of understanding to explain it, it is actually a useful and important
capability to have and one that is important to implement.
Yes, there is value beyond the check-box, but in most cases those
values are limited considering the complexity of the features, and
the
check-box is what most people are asking for, I think.
For the users who ask on the lists for this feature, regularly, how many
don’t
ask because they google or find prior responses on the list to the
question of
if we have this capability? How do we know that their cases are
“checkbox”?
Because I have rarely heard people articulate the value beyond check
box.
Have I done so sufficiently then that we can agree that calling it
“checkbox” is inappropriate and detrimental to our user base?
Consider that there are standards groups which explicitly consider these
attackvectors and consider them important enough to require mitigations to
address
those vectors. Do the end users of PG understand the attack vectors or
why they
matter? Perhaps not, but just because they can’t articulate the
reasoning does
NOT mean that the attack vector doesn’t exist or that their environment
is
somehow immune to it- indeed, as the standards bodies surely know, the
opposite
is true- they’re almost certainly at risk of those attack vectors and
therefore
the standards bodies are absolutely justified in requiring them to
provide a
solution. Treating these users as unimportant because they don’t have
the depth
of understanding that we do or that the standards body does is not
helping
them- it’s actively driving them away from PG.
Well, then who is going to explain them here, because I have not heard
them yet.
I thought I was doing so.
The RLS arguments were that queries could expoose some of the
underlyingdata, but in summary, that was considered acceptable.
This is an excellent point- and dovetails very nicely into my argument
that
protecting primary data (what is provided by users and ends up in
indexes and
heaps) is valuable even if we don’t (yet..) have protection for other
parts of
the system. Reducing the size of the attack vector is absolutely useful,
especially when it’s such a large amount of the data in the system. Yes,we
should, and will, continue to improve- as we do with many features, but
we
don’t need to wait for perfection to include this feature, just as with
RLS and
numerous other features we have.
The issue is that you needed a certain type of user with a certain type
of access to break RLS, while for this, writing to PGDATA is the simple
case for all the breakage, and the thing we are protecting with
authentication.
This goes back to the “if it isn’t perfect then it’s useless” argument …
but that’s exactly the discussion which was had around RLS and ultimately
we decided that RLS was still useful even with the leaks- and our users
accepted that also and have benefitted from it ever since it was included
in core. The same exists here- yes, more needs to be done than the absolute
simplest “make install” to have the system be secure (not unlike today with
our defaults from a source build with “make install”..) but at least with
this capability included it’s possible, and we can write “securing
PostgreSQL” documentation on how to, whereas without it there is simply no
way to address the attack vectors I’ve articulated here.
We, as a community, are clearly losing value by lack of this
capability,
if by
no other measure than simply the numerous users of the
commercial
implementations feeling that they simply can’t use PG
without this
feature, for
whatever their reasoning.
That is true, but I go back to my concern over useful feature
vs.
check
box.
While it’s easy to label something as checkbox, I don’t feel we
have been
fair
No, actually, it isn't. I am not sure why you are saying that.
I’m confused as to what is required to label a feature as a “checkbox”
feature
then. What did you us to make that determination of this feature? I’m
happy to
be wrong here.
I don't see the point in me continuing to reply here. You just seem to
continue asking questions without actually thinking of what I am saying,
and hope I get tired or something.
I hope we have others who have a moment to chime in here and provide their
viewpoints as I don’t feel this is an accurate representation of the
discussion thus far.
Thanks,
Stephen