[PoC] run SQL over ciphertext

Started by Mingyu Liover 2 years ago5 messages
#1Mingyu Li
lmy2010lmy@gmail.com

Hi all,

We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined functions
that extend encrypted data types and support commonly used expression
operations. Our tests validated its effectiveness with TPC-C and TPC-H
benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB.

This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, *meeting the requirements of laws such as
GDPR.*

I am wondering if anyone thinks this is a nice feature. If so, I am curious
about the steps to further it mature and potentially have it incorporated
as a part of PostgreSQL contrib.

Best regards,
Mingyu Li

#2Giampaolo Capelli
giampow@gmail.com
In reply to: Mingyu Li (#1)
Re: [PoC] run SQL over ciphertext

Hello,
I think this is a very interesting topic, especially for European companies
where data sovereignty in the cloud has become critical.

If I understand correctly, the idea is to split users into 'client users'
who can see data unencrypted, and 'server users', who are administrators
unable to decrypt data.

A few questions:
- how are secrets managed? Do you use a sort of vault to keep encryption
keys? Is there a master key to encrypt session keys?
- what about performances? Is it possible to use indexes on encrypted
columns?

Hi all,

We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined functions
that extend encrypted data types and support commonly used expression
operations. Our tests validated its effectiveness with TPC-C and TPC-H
benchmarks. You may find the code here: https://github.com/SJTU-IPADS/HEDB
.

This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, *meeting the requirements of laws such as
GDPR.*

I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.

Best regards,
Mingyu Li

--
best regards
Giampaolo Capelli

#3Peter Eisentraut
peter@eisentraut.org
In reply to: Mingyu Li (#1)
Re: [PoC] run SQL over ciphertext

On 10.10.23 08:42, Mingyu Li wrote:

We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined
functions that extend encrypted data types and support commonly used
expression operations. Our tests validated its effectiveness with TPC-C
and TPC-H benchmarks. You may find the code here:
https://github.com/SJTU-IPADS/HEDB <https://github.com/SJTU-IPADS/HEDB&gt;.

This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, /meeting the requirements of laws such
as GDPR./

I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.

FYI, see also
</messages/by-id/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com&gt;
for a similar project.

#4Mingyu Li
lmy2010lmy@gmail.com
In reply to: Giampaolo Capelli (#2)
Re: [PoC] run SQL over ciphertext

Hi,

the idea is to split users into 'client users' who can see data

unencrypted, and 'server users', who are administrators unable to decrypt
data.

Exactly!

how are secrets managed? Do you use a sort of vault to keep encryption

keys?

Good question. The client holds the key and uses a proxy for transparent
encryption. The implementation also assumes secure storage of encryption
keys in hardware-protected memory called "enclaves". Only client users and
server enclaves have access to the plaintext. Please take a glance at page
5 of the slide: www.usenix.org/system/files/osdi23_slides_li_mingyu_v2.pdf.
Modern clouds like OVH and Azure now offer hardware enclaves. If enclaves
are not available, a rich client-side proxy can be used, with extra
round-trip costs.

Is there a master key to encrypt session keys?

There should be.

what about performances?

TPC-C overhead is <50%. TPC-H overhead ranges from 5-20 times the baseline;
there is room for TPC-H improvement and we are working on it.

Is it possible to use indexes on encrypted columns?

Yes. The extension allows client users to intentionally reveal the ordering
of encrypted columns for indexing purposes.

--
Best,
Mingyu

Giampaolo Capelli <giampow@gmail.com> 于2023年10月10日周二 16:18写道:

Show quoted text

Hello,
I think this is a very interesting topic, especially for European
companies where data sovereignty in the cloud has become critical.

If I understand correctly, the idea is to split users into 'client users'
who can see data unencrypted, and 'server users', who are administrators
unable to decrypt data.

A few questions:
- how are secrets managed? Do you use a sort of vault to keep encryption
keys? Is there a master key to encrypt session keys?
- what about performances? Is it possible to use indexes on encrypted
columns?

Hi all,

We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined functions
that extend encrypted data types and support commonly used expression
operations. Our tests validated its effectiveness with TPC-C and TPC-H
benchmarks. You may find the code here:
https://github.com/SJTU-IPADS/HEDB.

This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, *meeting the requirements of laws such
as GDPR.*

I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.

Best regards,
Mingyu Li

--
best regards
Giampaolo Capelli

#5Mingyu Li
lmy2010lmy@gmail.com
In reply to: Peter Eisentraut (#3)
Re: [PoC] run SQL over ciphertext

Hello Peter,

/messages/by-id/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com

Thanks for referring me to your TCE project, nice work! It takes time to go
through the long thread of discussion and the patch.

A quick question: what operations do pg_encrypted_* support? Are
(in)equality checks sufficient to fulfill real-world queries?

--
Best,
Mingyu

Peter Eisentraut <peter@eisentraut.org> 于2023年10月11日周三 14:43写道:

Show quoted text

On 10.10.23 08:42, Mingyu Li wrote:

We have developed an extension, allowing PostgreSQL to run queries over
encrypted data. This functionality is achieved via user-defined
functions that extend encrypted data types and support commonly used
expression operations. Our tests validated its effectiveness with TPC-C
and TPC-H benchmarks. You may find the code here:
https://github.com/SJTU-IPADS/HEDB <https://github.com/SJTU-IPADS/HEDB&gt;.

This PoC is a reimplementation fork while collaborating with a cloud
database company; the aim is to enable their DBAs to manage databases
without the risk of data leaks, /meeting the requirements of laws such
as GDPR./

I am wondering if anyone thinks this is a nice feature. If so, I am
curious about the steps to further it mature and potentially have it
incorporated as a part of PostgreSQL contrib.

FYI, see also
<
/messages/by-id/89157929-c2b6-817b-6025-8e4b2d89d88f@enterprisedb.com&gt;

for a similar project.