[PoC] run SQL over ciphertext
Hi all,
We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined functions
that extend encrypted data types and support commonly used expression
operations. Our tests validated its effectiveness with TPC-C and TPC-H
benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB.
This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, *meeting the requirements of laws such as
GDPR.*
I am wondering if anyone thinks this is a nice feature. If so, I am curious
about the steps to further it mature and potentially have it incorporated
as a part of PostgreSQL contrib.
Best regards,
Mingyu Li
Hello,
I think this is a very interesting topic, especially for European companies
where data sovereignty in the cloud has become critical.
If I understand correctly, the idea is to split users into 'client users'
who can see data unencrypted, and 'server users', who are administrators
unable to decrypt data.
A few questions:
- how are secrets managed? Do you use a sort of vault to keep encryption
keys? Is there a master key to encrypt session keys?
- what about performances? Is it possible to use indexes on encrypted
columns?
Hi all,
We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined functions
that extend encrypted data types and support commonly used expression
operations. Our tests validated its effectiveness with TPC-C and TPC-H
benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB
.This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, *meeting the requirements of laws such as
GDPR.*I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.Best regards,
Mingyu Li
--
best regards
Giampaolo Capelli
On 10.10.23 08:42, Mingyu Li wrote:
We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined
functions that extend encrypted data types and support commonly used
expression operations. Our tests validated its effectiveness with TPC-C
and TPC-H benchmarks. You may find the code here:
https://github.com/SJTU-IPADS/HEDB <https://github.com/SJTU-IPADS/HEDB>.This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, /meeting the requirements of laws such
as GDPR./I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.
FYI, see also
</messages/by-id/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com>
for a similar project.
Hi,
the idea is to split users into 'client users' who can see data
unencrypted, and 'server users', who are administrators unable to decrypt
data.
Exactly!
how are secrets managed? Do you use a sort of vault to keep encryption
keys?
Good question. The client holds the key and uses a proxy for transparent
encryption. The implementation also assumes secure storage of encryption
keys in hardware-protected memory called "enclaves". Only client users and
server enclaves have access to the plaintext. Please take a glance at page
5 of the slide: www.usenix.org/system/files/osdi23_slides_li_mingyu_v2.pdf.
Modern clouds like OVH and Azure now offer hardware enclaves. If enclaves
are not available, a rich client-side proxy can be used, with extra
round-trip costs.
Is there a master key to encrypt session keys?
There should be.
what about performances?
TPC-C overhead is <50%. TPC-H overhead ranges from 5-20 times the baseline;
there is room for TPC-H improvement and we are working on it.
Is it possible to use indexes on encrypted columns?
Yes. The extension allows client users to intentionally reveal the ordering
of encrypted columns for indexing purposes.
--
Best,
Mingyu
Giampaolo Capelli <giampow@gmail.com> 于2023年10月10日周二 16:18写道:
Show quoted text
Hello,
I think this is a very interesting topic, especially for European
companies where data sovereignty in the cloud has become critical.If I understand correctly, the idea is to split users into 'client users'
who can see data unencrypted, and 'server users', who are administrators
unable to decrypt data.A few questions:
- how are secrets managed? Do you use a sort of vault to keep encryption
keys? Is there a master key to encrypt session keys?
- what about performances? Is it possible to use indexes on encrypted
columns?Hi all,
We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined functions
that extend encrypted data types and support commonly used expression
operations. Our tests validated its effectiveness with TPC-C and TPC-H
benchmarks. You may find the code here:
https://github.com/SJTU-IPADS/HEDB.This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, *meeting the requirements of laws such
as GDPR.*I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.Best regards,
Mingyu Li--
best regards
Giampaolo Capelli
Hello Peter,
/messages/by-id/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com
Thanks for referring me to your TCE project, nice work! It takes time to go
through the long thread of discussion and the patch.
A quick question: what operations do pg_encrypted_* support? Are
(in)equality checks sufficient to fulfill real-world queries?
--
Best,
Mingyu
Peter Eisentraut <peter@eisentraut.org> 于2023年10月11日周三 14:43写道:
Show quoted text
On 10.10.23 08:42, Mingyu Li wrote:
We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined
functions that extend encrypted data types and support commonly used
expression operations. Our tests validated its effectiveness with TPC-C
and TPC-H benchmarks. You may find the code here:
https://github.com/SJTU-IPADS/HEDB <https://github.com/SJTU-IPADS/HEDB>.This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, /meeting the requirements of laws such
as GDPR./I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.FYI, see also
<
/messages/by-id/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com>for a similar project.