Why the index is not used ?
Hi
I would like to submit the following problem to the PostgreSQL community. In my company, we have data encryption needs.
So I decided to use the following procedure :
(1) Creating a table with a bytea type column to store the encrypted data
CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username VARCHAR(100), cc bytea);
(2) inserting encrypted data
INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
(3) Querying the table
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
pgp_sym_decrypt
-----------------
test value 32
(1 row)
Time: 115735.035 ms (01:55.735)
-> the execution time is very long. So, I decide to create an index
(4) Creating an index on encrypted data
CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc);
(5) Querying the table again
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
pgp_sym_decrypt
-----------------
test value 32
(1 row)
Time: 118558.485 ms (01:58.558) -> almost 2 minutes !!
postgres=# explain analyze SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3647.25 rows=500 width=32) (actual time=60711.787..102920.509 rows=1 loops=1)
Filter: (pgp_sym_decrypt(cc, 'motdepasse'::text) = 'test value 32'::text)
Rows Removed by Filter: 99999
Planning time: 0.112 ms
Execution time: 102920.585 ms
(5 rows)
? the index is not used in the execution plan. maybe because of the use of a function in the WHERE clause. I decide to modify the SQL query
(6) Querying the table
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
pgp_sym_decrypt
-----------------
(0 rows)
Time: 52659.571 ms (00:52.660)
? The execution time is very long and I get no result (!?)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3646.00 rows=1 width=32) (actual time=61219.989..61219.989 rows=0 loops=1)
Filter: (cc = pgp_sym_encrypt('test value 32'::text, 'motdepasse'::text))
Rows Removed by Filter: 100000
Planning time: 0.157 ms
Execution time: 61220.035 ms
(5 rows)
? My index is not used.
QUESTIONS :
- why I get no result ?
- why the index is not used?
Thanks in advance
Best Regards
Didier
[cid:image002.png@01D14E0E.8515EB90]
Didier ROS
Expertise SGBD
DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD
Attachments:
so 6. 10. 2018 v 11:57 odesílatel ROS Didier <didier.ros@edf.fr> napsal:
Hi
I would like to submit the following problem to the PostgreSQL community.
In my company, we have data encryption needs.
So I decided to use the following procedure :(1) Creating a table with a bytea type column to store the encrypted
data
CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username
VARCHAR(100), cc bytea);(2) inserting encrypted data
INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id,
pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2,
cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);(3) Querying the table
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE
pgp_sym_decrypt(cc, 'motdepasse')='test value 32';pgp_sym_decrypt
-----------------
test value 32
(1 row)
Time: 115735.035 ms (01:55.735)
-> the execution time is very long. So, I decide to create an index(4) Creating an index on encrypted data
CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc);
this index cannot to help.
but functional index can cartedecredit(pgp_sym_decrypt(cc, 'motdepasse').
Unfortunately index file will be decrypted in this case.
CREATE INDEX ON
(5) Querying the table again
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE
pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
pgp_sym_decrypt-----------------
test value 32
(1 row)
Time: 118558.485 ms (01:58.558) -> almost 2 minutes !!
postgres=# explain analyze SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM
cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3647.25 rows=500 width=32) (actual
time=60711.787..102920.509 rows=1 loops=1)Filter: (pgp_sym_decrypt(cc, 'motdepasse'::text) = 'test value
32'::text)Rows Removed by Filter: 99999
Planning time: 0.112 ms
Execution time: 102920.585 ms
(5 rows)
è the index is not used in the execution plan. maybe because of the use
of a function in the WHERE clause. I decide to modify the SQL query(6) Querying the table
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE *cc*=pgp_sym_encrypt('test
value 32', 'motdepasse');
it is strange - this should to use index, when there is usual index over cc
column.
What is result of explain analyze when you penalize seq scan by
set enable_seqscan to off
Show quoted text
pgp_sym_decrypt
-----------------
(0 rows)
Time: 52659.571 ms (00:52.660)
è The execution time is very long and I get no result (!?)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3646.00 rows=1 width=32) (actual
time=61219.989..61219.989 rows=0 loops=1)Filter: (cc = pgp_sym_encrypt('test value 32'::text,
'motdepasse'::text))Rows Removed by Filter: 100000
Planning time: 0.157 ms
Execution time: 61220.035 ms
(5 rows)
è My index is not used.
QUESTIONS :
- why I get no result ?- why the index is not used?
Thanks in advance
Best Regards
Didier[image: cid:image002.png@01D14E0E.8515EB90]
* Didier ROS*
* Expertise SGBD**DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD *
Ce message et toutes les pièces jointes (ci-après le 'Message') sont
établis à l'intention exclusive des destinataires et les informations qui y
figurent sont strictement confidentielles. Toute utilisation de ce Message
non conforme à sa destination, toute diffusion ou toute publication totale
ou partielle, est interdite sauf autorisation expresse.Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
sur quelque support que ce soit. Nous vous remercions également d'en
avertir immédiatement l'expéditeur par retour du message.Il est impossible de garantir que les communications par messagerie
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
erreur ou virus.
____________________________________________________This message and any attachments (the 'Message') are intended solely for
the addressees. The information contained in this Message is confidential.
Any use of information contained in this Message not in accord with its
purpose, any dissemination or disclosure, either whole or partial, is
prohibited except formal approval.If you are not the addressee, you may not copy, forward, disclose or use
any part of it. If you have received this message in error, please delete
it and all copies from your system and notify the sender immediately by
return message.E-mail communication cannot be guaranteed to be timely secure, error or
virus-free.
Hello Didier,
(3), (5) to find the match, you decrypt the whole table, apparently this
take quite a long time.
Index cannot help here because indexes work on exact match of type and
value, but you compare mapped value, not indexed. Functional index should
help, but like it was said, it against the idea of encrypted storage.
(6) I never used pgp_sym_encrypt() but I see that in INSERT INTO you
supplied additional parameter 'compress-algo=2, cipher-algo=aes256' while
in (6) you did not. Probably this is the reason.
In general matching indexed bytea column should use index, you can ensure
in this populating the column unencrypted and using 'test value 32'::bytea
for match.
In you case I believe pgp_sym_encrypt() is not marked as STABLE or
IMMUTABLE that's why it will be evaluated for each row (very inefficient)
and cannot use index. From documentation:
"Since an index scan will evaluate the comparison value only once, not once
at each row, it is not valid to use a VOLATILE function in an index scan
condition."
https://www.postgresql.org/docs/10/static/xfunc-volatility.html
If you cannot add STABLE/IMMUTABLE to pgp_sym_encrypt() (which apparently
should be there), you can encrypt searched value as a separate operation
and then search in the table using basic value match.
Vlad
Attachments:
I haven’t looked up what pgp_sym_encrypt() does but assuming it does encryption the way you should be for credit card data then it will be using a random salt and the same input value won’t encrypt to the same output value so
====
WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
====
wouldn’t work because the value generated by the function when you are searching on isn’t the same value as when you stored it.
Paul
Show quoted text
On 6 Oct 2018, at 19:57, ROS Didier <didier.ros@edf.fr> wrote:
WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
Hi Pavel
Thanks you for your answer. here is a procedure that works :
- CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username VARCHAR(100), cc bytea);
- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
- CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256'));
- SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256')='test value 32';
pgp_sym_decrypt
-----------------
test value 32
(1 row)
Time: 2.237 ms
- explain analyze SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256')='test value 32';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_cartedecredit_cc02 on cartedecredit (cost=0.42..8.44 rows=1 width=32) (actual time=1.545..1.546 rows=1 loops=1)
Index Cond: (pgp_sym_decrypt(cc, 'motdepasse'::text, 'compress-algo=2, cipher-algo=aes256'::text) = 'test value 32'::text)
Planning time: 0.330 ms
Execution time: 1.580 ms
(4 rows)
OK that works great.
Thank you for the recommendation
Best Regards
[cid:image002.png@01D14E0E.8515EB90]
Didier ROS
Expertise SGBD
DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD
Nanterre Picasso - E2 565D (aile nord-est)
32 Avenue Pablo Picasso
92000 Nanterre
didier.ros@edf.fr<mailto:didier.ros@edf.fr>
support-postgres-niveau3@edf.fr<mailto:support-postgres-niveau3@edf.fr>
support-oracle-niveau3@edf.fr<mailto:support-oracle-niveau3@edf.fr>
Tél. : 01 78 66 61 14
Tél. mobile : 06 49 51 11 88
Lync : ros.didier@edf.fr<sip:ros.didier@edf.fr>
De : pavel.stehule@gmail.com [mailto:pavel.stehule@gmail.com]
Envoyé : samedi 6 octobre 2018 12:14
À : ROS Didier <didier.ros@edf.fr>
Cc : pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
so 6. 10. 2018 v 11:57 odesílatel ROS Didier <didier.ros@edf.fr<mailto:didier.ros@edf.fr>> napsal:
Hi
I would like to submit the following problem to the PostgreSQL community. In my company, we have data encryption needs.
So I decided to use the following procedure :
(1) Creating a table with a bytea type column to store the encrypted data
CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username VARCHAR(100), cc bytea);
(2) inserting encrypted data
INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id<http://x.id>, pgp_sym_encrypt('test value ' || x.id<http://x.id>, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
(3) Querying the table
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
pgp_sym_decrypt
-----------------
test value 32
(1 row)
Time: 115735.035 ms (01:55.735)
-> the execution time is very long. So, I decide to create an index
(4) Creating an index on encrypted data
CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc);
this index cannot to help.
but functional index can cartedecredit(pgp_sym_decrypt(cc, 'motdepasse'). Unfortunately index file will be decrypted in this case.
CREATE INDEX ON
(5) Querying the table again
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
pgp_sym_decrypt
-----------------
test value 32
(1 row)
Time: 118558.485 ms (01:58.558) -> almost 2 minutes !!
postgres=# explain analyze SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3647.25 rows=500 width=32) (actual time=60711.787..102920.509 rows=1 loops=1)
Filter: (pgp_sym_decrypt(cc, 'motdepasse'::text) = 'test value 32'::text)
Rows Removed by Filter: 99999
Planning time: 0.112 ms
Execution time: 102920.585 ms
(5 rows)
==> the index is not used in the execution plan. maybe because of the use of a function in the WHERE clause. I decide to modify the SQL query
(6) Querying the table
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
it is strange - this should to use index, when there is usual index over cc column.
What is result of explain analyze when you penalize seq scan by
set enable_seqscan to off
pgp_sym_decrypt
-----------------
(0 rows)
Time: 52659.571 ms (00:52.660)
==> The execution time is very long and I get no result (!?)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3646.00 rows=1 width=32) (actual time=61219.989..61219.989 rows=0 loops=1)
Filter: (cc = pgp_sym_encrypt('test value 32'::text, 'motdepasse'::text))
Rows Removed by Filter: 100000
Planning time: 0.157 ms
Execution time: 61220.035 ms
(5 rows)
==> My index is not used.
QUESTIONS :
- why I get no result ?
- why the index is not used?
Thanks in advance
Best Regards
Didier
[cid:image002.png@01D14E0E.8515EB90]
Didier ROS
Expertise SGBD
DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Attachments:
Hi Paul
Thanks for the explanation. I think you are right.
I understand why the WHERE clause “cc=pgp_sym_encrypt('test value 32', 'motdepasse');” does not bring anything back.
Best Regards
Didier ROS
De : paul@paulmcgarry.com [mailto:paul@paulmcgarry.com]
Envoyé : dimanche 7 octobre 2018 04:21
À : ROS Didier <didier.ros@edf.fr>
Cc : pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
I haven’t looked up what pgp_sym_encrypt() does but assuming it does encryption the way you should be for credit card data then it will be using a random salt and the same input value won’t encrypt to the same output value so
====
WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
====
wouldn’t work because the value generated by the function when you are searching on isn’t the same value as when you stored it.
Paul
On 6 Oct 2018, at 19:57, ROS Didier <didier.ros@edf.fr<mailto:didier.ros@edf.fr>> wrote:
WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
ROS:
On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier <didier.ros@edf.fr> wrote:
....
- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
- CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256'));
If my french is not too rusty you are encrypting a credit-card, and
then storing an UNENCRYPTED copy in the index. So, getting it from the
server is trivial for anyone with filesystem access.
Francisco Olarte.
Hi Francisco
Thank you for your remark.
You're right, but it's the only procedure I found to make search on encrypted fields with good response times (using index) !
Regarding access to the file system, our servers are in protected network areas. few people can connect to it.
it's not the best solution, but we have data encryption needs and good performance needs too. I do not know how to do it except the specified procedure..
if anyone has any proposals to put this in place, I'm interested.
Thanks in advance
Best Regards
Didier ROS
-----Message d'origine-----
De : folarte@peoplecall.com [mailto:folarte@peoplecall.com]
Envoyé : dimanche 7 octobre 2018 17:58
À : ROS Didier <didier.ros@edf.fr>
Cc : pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
ROS:
On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier <didier.ros@edf.fr> wrote:
....
- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
- CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256'));
If my french is not too rusty you are encrypting a credit-card, and then storing an UNENCRYPTED copy in the index. So, getting it from the server is trivial for anyone with filesystem access.
Francisco Olarte.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
You can consider outside DB encryption which is less of worry for performance and data at rest will be encrypted.
Regards,
Virendra
-----Original Message-----
From: ROS Didier [mailto:didier.ros@edf.fr]
Sent: Sunday, October 07, 2018 2:33 PM
To: folarte@peoplecall.com
Cc: pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Subject: RE: Why the index is not used ?
Hi Francisco
Thank you for your remark.
You're right, but it's the only procedure I found to make search on encrypted fields with good response times (using index) !
Regarding access to the file system, our servers are in protected network areas. few people can connect to it.
it's not the best solution, but we have data encryption needs and good performance needs too. I do not know how to do it except the specified procedure..
if anyone has any proposals to put this in place, I'm interested.
Thanks in advance
Best Regards
Didier ROS
-----Message d'origine-----
De : folarte@peoplecall.com [mailto:folarte@peoplecall.com]
Envoyé : dimanche 7 octobre 2018 17:58
À : ROS Didier <didier.ros@edf.fr>
Cc : pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
ROS:
On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier <didier.ros@edf.fr> wrote:
....
- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
- CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256'));
If my french is not too rusty you are encrypting a credit-card, and then storing an UNENCRYPTED copy in the index. So, getting it from the server is trivial for anyone with filesystem access.
Francisco Olarte.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
________________________________
This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.
If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.
Didier,
you was given a few things to check in another my message on the same day.
You have not provided any feedback.
It is up to you how to implement your system, but you can with no doubt
consider your database as not encrypted with your approach. You (or
probably your management) have no understanding from which risks you
protect your data.
Regards,
Vlad
вс, 7 окт. 2018 г. в 11:33, ROS Didier <didier.ros@edf.fr>:
Show quoted text
Hi Francisco
Thank you for your remark.
You're right, but it's the only procedure I found to make search
on encrypted fields with good response times (using index) !Regarding access to the file system, our servers are in protected
network areas. few people can connect to it.it's not the best solution, but we have data encryption needs and
good performance needs too. I do not know how to do it except the specified
procedure..
if anyone has any proposals to put this in place, I'm interested.Thanks in advance
Best Regards
Didier ROS-----Message d'origine-----
De : folarte@peoplecall.com [mailto:folarte@peoplecall.com]
Envoyé : dimanche 7 octobre 2018 17:58
À : ROS Didier <didier.ros@edf.fr>
Cc : pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org;
pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?ROS:
On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier <didier.ros@edf.fr> wrote:
....- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' ||
x.id, pgp_sym_encrypt('test value ' || x.id,
'motdepasse','compress-algo=2, cipher-algo=aes256') FROM
generate_series(1,100000) AS x(id);- CREATE INDEX idx_cartedecredit_cc02 ON
cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2,
cipher-algo=aes256'));If my french is not too rusty you are encrypting a credit-card, and then
storing an UNENCRYPTED copy in the index. So, getting it from the server is
trivial for anyone with filesystem access.Francisco Olarte.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont
établis à l'intention exclusive des destinataires et les informations qui y
figurent sont strictement confidentielles. Toute utilisation de ce Message
non conforme à sa destination, toute diffusion ou toute publication totale
ou partielle, est interdite sauf autorisation expresse.Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
sur quelque support que ce soit. Nous vous remercions également d'en
avertir immédiatement l'expéditeur par retour du message.Il est impossible de garantir que les communications par messagerie
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
erreur ou virus.
____________________________________________________This message and any attachments (the 'Message') are intended solely for
the addressees. The information contained in this Message is confidential.
Any use of information contained in this Message not in accord with its
purpose, any dissemination or disclosure, either whole or partial, is
prohibited except formal approval.If you are not the addressee, you may not copy, forward, disclose or use
any part of it. If you have received this message in error, please delete
it and all copies from your system and notify the sender immediately by
return message.E-mail communication cannot be guaranteed to be timely secure, error or
virus-free.
Hello Didier,
Your email is didier.ros@edf.fr. Are you working at Electricite de France,
and storing actual customers' credit card details? How many millions of
them?
Note that this mailing list is public; people looking for targets with poor
security from which they can harvest credit card numbers might be
reading it.
And after you are hacked and all your customers' credit card details
are made
public, someone will find this thread.
it's not the best solution, but we have data encryption needs and good
performance needs too. I do not know how to do it except the specified
procedure..
You should probably employ someone who knows what they are doing.
Sorry for being so direct, but really... storing large quantities of credit
card details is the text book example of something that has to be done
correctly.
if anyone has any proposals to put this in place, I'm interested.
Why do you need to search by credit card number?
If you really really need to do that, then one option is to use an
encryption function that doesn't salt the data. Or you could store part
of the number (last 4 digits?), or an unsalted hash of the number,
unencrypted and indexed, and then you need only to sequentially decrypt
(using the salted encryption) e.g. 1/10000 of the card numbers. But there
are complex security issues and tradeoffs involved here. You probably
need to comply with regulations (e.g. "PCI standards") which will specify
what is allowed and what isn't. And if you didn't already know that,
you shouldn't be doing this.
Good luck, I suppose.
Phil.
P.S. It seems that you were asking about this a year ago, and got the
same answers...
Import Notes
Resolved by subject fallback
Additionally it is not clear why you want to search in table on encrypted
data. Usually you match user with it's unpersonalized data (such as login,
user ID) and then decrypt personalized data. If you need to store user
identifying data encrypted as well (e.g. bank account number) you can use a
deterministic algorithm for it (without salt) because it is guaranteed to
be unique and you don't need to have different encrypted data for two same
input strings.
Vlad
Hi,
On 10/07/2018 08:32 PM, ROS Didier wrote:
Hi Francisco
Thank you for your remark.
You're right, but it's the only procedure I found to make search on
encrypted fields with good response times (using index) !
Unfortunately, that kinda invalidates the whole purpose of in-database
encryption - you'll have encrypted on-disk data in one place, and then
plaintext right next to it. If you're dealing with credit card numbers,
then you presumably care about PCI DSS, and this is likely a direct
violation of that.
Regarding access to the file system, our servers are in protected
network areas. few people can connect to it.
Then why do you need encryption at all? If you assume access to the
filesystem / storage is protected, why do you bother with encryption?
What is your threat model?
it's not the best solution, but we have data encryption needs and
good performance needs too. I do not know how to do it except the
specified procedure..if anyone has any proposals to put this in place, I'm interested.
One thing you could do is hashing the value and then searching by the
hash. So aside from having the encrypted column you'll also have a short
hash, and you may use it in the query *together* with the original
condition. It does not need to be unique (in fact it should not be to
make it impossible to reverse the hash), but it needs to have enough
distinct values to make the index efficient. Say, 10k values should be
enough, because that means 0.01% selectivity.
So the function might look like this, for example:
CREATE FUNCTION cchash(text) RETURNS int AS $$
SELECT abs(hashtext($1)) % 10000;
$$ LANGUAGE sql;
and then be used like this:
CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cchash(cc));
and in the query
SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit
WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'
AND cchash(cc) = cchash('test value 32');
Obviously, this does not really solve the issues with having to pass the
password to the query, making it visible in pg_stat_activity, various
logs etc.
Which is why people generally use FDE for the whole disk, which is
transparent and provides the same level of protection.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Didier,
I’m sorry to tell you that you are probably doing something (ie handling/storing credit cards) which would mean you have to comply with PCI DSS requirements.
As such you should probably have a QSA (auditor) who you can run any proposed solution by (so you know they will be comfortable with it when they do their audit).
I think your current solution would be frowned upon because:
- cards are effectively stored in plaintext in the index.
- your encryption/decryption is being done in database, rather than by something with that as its sole role.
People have already mentioned the former so I won’t go into it further
But for the second part if someone can do a
Select pgp_sym_decrypt(cc)
then you are one sql injection away from having your card data stolen. You do have encryption, but in practice everything is available unencrypted so in practice the encryption is more of a tick in a box than an actual defence against bad things happening. In a properly segmented system even your DBA should not be able to access decrypted card data.
You probably should look into doing something like:
- store the first 6 and last 4 digits of the card unencrypted.
- store the remaining card digits encrypted
- have the encryption/decryption done by a seperate service called by your application code outside the db.
You haven’t gone into what your requirements re search are (or I missed them) but while the above won’t give you a fast exact cc lookup in practice being able to search using the first 6 and last 4 can get you a small enough subset than can then be filtered after decrypting the middle.
We are straying a little off PostgreSQL topic here but if you and/or your management aren’t already looking at PCI DSS compliance I’d strongly recommend you do so. It can seem like a pain but it is much better to take that pain up front rather than having to reengineer everything later. There are important security aspects it helps make sure you cover but maybe some business aspects (ie possible partners who won’t be able to deal with you without your compliance sign off documentation).
The alternative, if storing cc data isn’t a core requirement, is to not store the credit card data at all. That is generally the best solution if it meets your needs, ie if you just want to accept payments then use a third party who is PCI compliant to handle the cc part.
I hope that helps a little.
Paul
Sent from my iPhone
Show quoted text
On 8 Oct 2018, at 05:32, ROS Didier <didier.ros@edf.fr> wrote:
Hi Francisco
Thank you for your remark.
You're right, but it's the only procedure I found to make search on encrypted fields with good response times (using index) !Regarding access to the file system, our servers are in protected network areas. few people can connect to it.
it's not the best solution, but we have data encryption needs and good performance needs too. I do not know how to do it except the specified procedure..
if anyone has any proposals to put this in place, I'm interested.Thanks in advance
Best Regards
Didier ROS-----Message d'origine-----
De : folarte@peoplecall.com [mailto:folarte@peoplecall.com]
Envoyé : dimanche 7 octobre 2018 17:58
À : ROS Didier <didier.ros@edf.fr>
Cc : pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?ROS:
On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier <didier.ros@edf.fr> wrote:
....
- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
- CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256'));If my french is not too rusty you are encrypting a credit-card, and then storing an UNENCRYPTED copy in the index. So, getting it from the server is trivial for anyone with filesystem access.
Francisco Olarte.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Hi Vlad
Your remark is very interesting. You want to say that it's better to run SQL queries on unpersonalized data, and then retrieve the encrypted data for those records.
OK, I take this recommendation into account and I will forward it to my company's projects.
Nevertheless, you say that it is possible, in spite of everything, to use indexes on the encrypted data by using deterministic algorithms.
Can you tell me some examples of these algorithms?
Thanks in advance
Best Regards
[cid:image002.png@01D14E0E.8515EB90]
Didier ROS
Expertise SGBD
De : greatvovan@gmail.com [mailto:greatvovan@gmail.com]
Envoyé : dimanche 7 octobre 2018 21:33
À : ROS Didier <didier.ros@edf.fr>
Cc : folarte@peoplecall.com; pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
Additionally it is not clear why you want to search in table on encrypted data. Usually you match user with it's unpersonalized data (such as login, user ID) and then decrypt personalized data. If you need to store user identifying data encrypted as well (e.g. bank account number) you can use a deterministic algorithm for it (without salt) because it is guaranteed to be unique and you don't need to have different encrypted data for two same input strings.
Vlad
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Attachments:
Hi Virendra
You think that outside encryption of the database is the best solution ?
How do you manage the encryption key ?
Can you give me some examples of this kind of solution.
Best Regards
Didier ROS
-----Message d'origine-----
De : Virendra.Kumar@guycarp.com [mailto:Virendra.Kumar@guycarp.com]
Envoyé : dimanche 7 octobre 2018 20:41
À : ROS Didier <didier.ros@edf.fr>; folarte@peoplecall.com
Cc : pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : RE: Why the index is not used ?
You can consider outside DB encryption which is less of worry for performance and data at rest will be encrypted.
Regards,
Virendra
-----Original Message-----
From: ROS Didier [mailto:didier.ros@edf.fr]
Sent: Sunday, October 07, 2018 2:33 PM
To: folarte@peoplecall.com
Cc: pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Subject: RE: Why the index is not used ?
Hi Francisco
Thank you for your remark.
You're right, but it's the only procedure I found to make search on encrypted fields with good response times (using index) !
Regarding access to the file system, our servers are in protected network areas. few people can connect to it.
it's not the best solution, but we have data encryption needs and good performance needs too. I do not know how to do it except the specified procedure..
if anyone has any proposals to put this in place, I'm interested.
Thanks in advance
Best Regards
Didier ROS
-----Message d'origine-----
De : folarte@peoplecall.com [mailto:folarte@peoplecall.com] Envoyé : dimanche 7 octobre 2018 17:58 À : ROS Didier <didier.ros@edf.fr> Cc : pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
ROS:
On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier <didier.ros@edf.fr> wrote:
....
- INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
- CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256'));
If my french is not too rusty you are encrypting a credit-card, and then storing an UNENCRYPTED copy in the index. So, getting it from the server is trivial for anyone with filesystem access.
Francisco Olarte.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
________________________________
This message is intended only for the use of the addressee and may contain information that is PRIVILEGED AND CONFIDENTIAL.
If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately. Thank you.
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Dear Didier,
Le lundi 08 octobre 2018 ᅵ 08:32 +0000, ROS Didier a ᅵcrit :
Hi Virendra
You think that outside encryption of the database is the best solution ?
How do you manage the encryption key ?
Can you give me some examples of this kind of solution.
Best Regards
Didier ROS
If I understand your need well, you need to store credit card information into your database.
This is ruled by the Payment Card Industry Data Security Standard (aka PCI DSS).
I attend some years ago a real good presentation from Denish Patel, a well known community member.
I saw this talk in pgconf 2015 in Moscow: https://pgconf.ru/media2015c/patel.pdf
I recommend you read it, if you had not already? It shows code examples, etc.
My 2 cents...
--
Jean-Paul Argudo
Hi Vlad
Sorry for this delay, but apparently the subject is of interest to many people in the community. I received a lot of comments and answers.
I wrote my answers in the body of your message below
Best Regards
Didier
De : greatvovan@gmail.com [mailto:greatvovan@gmail.com]
Envoyé : samedi 6 octobre 2018 18:51
À : ROS Didier <didier.ros@edf.fr>
Cc : pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
Hello Didier,
(3), (5) to find the match, you decrypt the whole table, apparently this take quite a long time.
Index cannot help here because indexes work on exact match of type and value, but you compare mapped value, not indexed. Functional index should help, but like it was said, it against the idea of encrypted storage.
<<
I tested the solution of the functional index. It works very well, but the data is no longer encrypted. This is not the right solution
(6) I never used pgp_sym_encrypt() but I see that in INSERT INTO you supplied additional parameter 'compress-algo=2, cipher-algo=aes256' while in (6) you did not. Probably this is the reason.
In general matching indexed bytea column should use index, you can ensure in this populating the column unencrypted and using 'test value 32'::bytea for match.
In you case I believe pgp_sym_encrypt() is not marked as STABLE or IMMUTABLE that's why it will be evaluated for each row (very inefficient) and cannot use index. From documentation:
"Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition."
https://www.postgresql.org/docs/10/static/xfunc-volatility.html
If you cannot add STABLE/IMMUTABLE to pgp_sym_encrypt() (which apparently should be there), you can encrypt searched value as a separate operation and then search in the table using basic value match.
you're right about the missing parameter 'compress-algo=2, cipher-algo=aes256'. I agree with you.
(1) I have tested your proposal :
DROP TABLE cartedecredit;
CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username VARCHAR(100), cc bytea);
INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, decode('test value ' || x.id,'escape') FROM generate_series(1,100000) AS x(id);
è I inserted unencrypted data into the bytea column
postgres=# select * from cartedecredit limit 5 ;
card_id | username | cc
---------+-------------+------------------------------
1 | individu 1 | \x746573742076616c75652031
2 | individu 2 | \x746573742076616c75652032
3 | individu 3 | \x746573742076616c75652033
4 | individu 4 | \x746573742076616c75652034
5 | individu 5 | \x746573742076616c75652035
CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc);
SELECT encode(cc,'escape') FROM cartedecredit WHERE cc=decode('test value 32','escape');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_cartedecredit_cc02 on cartedecredit (cost=0.42..8.44 rows=1 width=32) (actual time=0.033..0.034 rows=1 loops=1)
Index Cond: (cc = '\x746573742076616c7565203332'::bytea)
Heap Fetches: 1
Planning time: 0.130 ms
Execution time: 0.059 ms
(5 rows)
è It works but the data is not encrypted. everyone can have access to the data
(2) 2nd test :
DROP TABLE cartedecredit;
CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username VARCHAR(100), cc bytea);
INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,100000) AS x(id);
postgres=# select * from cartedecredit limit 5 ;
card_id | username | cc
---------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------
1 | individu 1 | \xc30d0409030296304d007bf50ed768d2480153cd4a4e2d240249f94b31ec168391515ea80947f97970f7a4e058bff648f752df194498dd480c3b8a5c0d2942f90c6dde21a6b9bf4e9fd7986c6f986e3783
647e7a6205b48c03
2 | individu 2 | \xc30d0409030257b50bc0e6bcd8d270d248010984b60126af01ba922da27e2e78c33110f223f0210cf34da77243277305254cba374708d447fc7d653dd9e00ff9a96803a2c47ee95269534f2c24fab1c9dc
31f7909ca7adeaf0
3 | individu 3 | \xc30d040903023c5f8cb688c7945275d24801a518d70c6cc2d4a31f99f3738e736c5312f78bb9c3cc187a65d0cf7f893dbc9448825d39b79df5d0460508fc93336c2bec7794893bb08a290afd649ae15fe2
2b0433eff89222f7
4 | individu 4 | \xc30d04090302dcc3bb49a41b297578d2480167f17b09004e7dacc0891fc0cc7276dd551273eec72644520f8d0543abe8e795af7c1b84fc8e5b4adc33994c479d5ff17988e60bf446dc8c77caf3f3b008c1
c06bf0a3c4df41ae
5 | individu 5 | \xc30d04090302a8c3552fb4b297b567d24801c060fb9241355b49717479107ff59d2928b3c0d9001dabd0035a0419b1a54c0b15f1907a981f08a4227784ac5cf3994b32ba594eff35933825730ac42af8ca
76bd497c5079b127
CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc);
SELECT pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM cartedecredit WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse'::text,'compress-algo=2, cipher-algo=aes256');
pgp_sym_decrypt
-----------------
(0 rows)
è No row returned !
Time: 116185.300 ms (01:56.185)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on cartedecredit (cost=0.00..3309.00 rows=1 width=32) (actual time=105969.099..105969.099 rows=0 loops=1)
Filter: (cc = pgp_sym_encrypt('test value 32'::text, 'motdepasse'::text, 'compress-algo=2, cipher-algo=aes256'::text))
Rows Removed by Filter: 100000
Planning time: 0.150 ms
Execution time: 105969.166 ms
(5 rows)
Time: 105969.912 ms (01:45.970)
-> Index is not used .
Best Regards
Vlad
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Hi Phil
Thank you for this recommendation, but I posted on this public list only generic examples that have nothing to do with the works done in my company.
These examples serve me only to discuss about the subject of data encryption and performance
My answers to your remarks :
Why do you need to search by credit card number?
<<
Again, this is just an example. I just want to find a solution to query a column containing encrypted data with good performance.
one option is to use an encryption function that doesn't salt the data
<<
I am interested. Can you give some examples of these encryption function that doesn't salt the data.
Best Regards
Didier ROS
-----Message d'origine-----
De : phil_tnlcz_endecott@chezphil.org [mailto:phil_tnlcz_endecott@chezphil.org]
Envoyé : dimanche 7 octobre 2018 21:17
À : ROS Didier <didier.ros@edf.fr>; pgsql-general@lists.postgresql.org
Objet : RE: Why the index is not used ?
Hello Didier,
Your email is didier.ros@edf.fr. Are you working at Electricite de France, and storing actual customers' credit card details? How many millions of them?
Note that this mailing list is public; people looking for targets with poor security from which they can harvest credit card numbers might be reading it.
And after you are hacked and all your customers' credit card details are made public, someone will find this thread.
it's not the best solution, but we have data encryption needs and good
performance needs too. I do not know how to do it except the specified
procedure..
You should probably employ someone who knows what they are doing.
Sorry for being so direct, but really... storing large quantities of credit card details is the text book example of something that has to be done correctly.
if anyone has any proposals to put this in place, I'm interested.
Why do you need to search by credit card number?
If you really really need to do that, then one option is to use an encryption function that doesn't salt the data. Or you could store part of the number (last 4 digits?), or an unsalted hash of the number, unencrypted and indexed, and then you need only to sequentially decrypt (using the salted encryption) e.g. 1/10000 of the card numbers. But there are complex security issues and tradeoffs involved here. You probably need to comply with regulations (e.g. "PCI standards") which will specify what is allowed and what isn't. And if you didn't already know that, you shouldn't be doing this.
Good luck, I suppose.
Phil.
P.S. It seems that you were asking about this a year ago, and got the same answers...
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Import Notes
Reply to msg id not found: 1538939797640@dmwebmail.dmwebmail.chezphil.org
Hi Vlad
OK, I take into account your remark about the need to do research on encrypted data.
My answers to your remarks :
you can use a deterministic algorithm for it (without salt)
<<
Can you give me on of these deterministic algorithms(without salt) ?
Best Regards
Didier
De : greatvovan@gmail.com [mailto:greatvovan@gmail.com]
Envoyé : dimanche 7 octobre 2018 21:33
À : ROS Didier <didier.ros@edf.fr>
Cc : folarte@peoplecall.com; pavel.stehule@gmail.com; pgsql-sql@lists.postgresql.org; pgsql-performance@lists.postgresql.org; pgsql-general@lists.postgresql.org
Objet : Re: Why the index is not used ?
Additionally it is not clear why you want to search in table on encrypted data. Usually you match user with it's unpersonalized data (such as login, user ID) and then decrypt personalized data. If you need to store user identifying data encrypted as well (e.g. bank account number) you can use a deterministic algorithm for it (without salt) because it is guaranteed to be unique and you don't need to have different encrypted data for two same input strings.
Vlad
Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________
This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.
E-mail communication cannot be guaranteed to be timely secure, error or virus-free.