[PATCH] by request: base64 for bytea
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.
Code was initially taken from public-domain base64.c by John Walker but
much simplified (such as, breaking up long string into multiple lines is
not done, EBCDIC support removed).
--
Alex Pilosov | http://www.acedsl.com/home.html
CTO - Acecape, Inc. | AceDSL:The best ADSL in the world
325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
New York, NY 10018 |
Attachments:
base64bytea.patchtext/plain; charset=US-ASCII; name=base64bytea.patchDownload
Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /home/cvs/pgsql/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.55
diff --unified -r1.55 datatype.sgml
--- doc/src/sgml/datatype.sgml 2001/05/22 16:37:15 1.55
+++ doc/src/sgml/datatype.sgml 2001/06/22 20:09:08
@@ -48,6 +48,12 @@
<tbody>
<row>
+ <entry><type>bytea</type></entry>
+ <entry><entry>
+ <entry>a variable-length raw string, could contain nulls</entry>
+ </row>
+
+ <row>
<entry><type>bigint</type></entry>
<entry><type>int8</type></entry>
<entry>signed eight-byte integer</entry>
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /home/cvs/pgsql/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.61
diff --unified -r1.61 func.sgml
--- doc/src/sgml/func.sgml 2001/06/15 21:03:07 1.61
+++ doc/src/sgml/func.sgml 2001/06/22 20:19:41
@@ -1037,6 +1037,32 @@
<entry>a23x5</entry>
</row>
+ <row>
+ <entry>
+ base64_encode(<parameter>string</parameter> <type>bytea</type>)
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>
+ Uses base64 algorithm to encode binary data in <parameter>string</parameter>.
+ Returns text value guaranteed to contain only printable string
+ </entry>
+ <entry>base64_encode('123\\000\\001')</entry>
+ <entry>MTIzAAE=</entry>
+ </row>
+
+ <row>
+ <entry>
+ base64_decode(<parameter>string</parameter> <type>text</type>)
+ </entry>
+ <entry><type>bytea</type></entry>
+ <entry>
+ Decodes binary data from <parameter>string</parameter> previously
+ encoded with base64 algorithm.
+ </entry>
+ <entry>base64_decode('MTIzAAE=')</entry>
+ <entry>123\000\001</entry>
+ </row>
+
</tbody>
</tgroup>
</table>
Index: src/backend/utils/adt/varlena.c
===================================================================
RCS file: /home/cvs/pgsql/pgsql/src/backend/utils/adt/varlena.c,v
retrieving revision 1.70
diff --unified -r1.70 varlena.c
--- src/backend/utils/adt/varlena.c 2001/05/03 19:00:36 1.70
+++ src/backend/utils/adt/varlena.c 2001/06/22 21:12:51
@@ -20,7 +20,14 @@
#include "miscadmin.h"
#include "utils/builtins.h"
+static char base64_dtable[256];
+static char base64_etable[256];
+static bool base64_tables_set=false;
+
static int text_cmp(text *arg1, text *arg2);
+static void init_base64_tables(void);
+static int do_base64_encode(char *in, int maxlen, char *out);
+static int do_base64_decode(char *in, int maxlen, char *out);
/*****************************************************************************
@@ -875,3 +882,172 @@
PG_RETURN_TEXT_P(result);
}
+
+/* base64_encode()
+ * Converts a bytea value to base-64 encoded 'text' string
+ */
+Datum
+base64_encode(PG_FUNCTION_ARGS)
+{
+ bytea *s = PG_GETARG_BYTEA_P(0);
+ text *result;
+ int lenin, lenout;
+
+ lenin = VARSIZE(s) - VARHDRSZ;
+
+ result = (text *) palloc(lenin*4/3+10+VARHDRSZ);
+
+ lenout=do_base64_encode(VARDATA(s), lenin, VARDATA(result));
+
+ VARATT_SIZEP(result) = lenout+VARHDRSZ;
+
+ PG_RETURN_TEXT_P(result);
+}
+
+/* base64_decode()
+ * Converts base-64 encoded 'text' string to bytea value
+ */
+Datum
+base64_decode(PG_FUNCTION_ARGS)
+{
+ text * s = PG_GETARG_TEXT_P(0);
+ bytea *result;
+ int lenin, lenout;
+
+ lenin = VARSIZE(s) - VARHDRSZ;
+
+ result = (bytea *) palloc((int) lenin*3/4+10+VARHDRSZ);
+
+ lenout = do_base64_decode(VARDATA(s), lenin, VARDATA(result));
+
+ VARATT_SIZEP(result) = lenout+VARHDRSZ;
+
+ PG_RETURN_BYTEA_P(result);
+}
+
+/* most of code lifted from public-domain base64 by John Walker */
+
+void init_base64_tables(void) {
+ int i;
+ if (base64_tables_set) return;
+
+ for (i = 0; i < 26; i++) {
+ base64_etable[i] = 'A' + i;
+ base64_etable[26 + i] = 'a' + i;
+ }
+ for (i = 0; i < 10; i++) {
+ base64_etable[52 + i] = '0' + i;
+ }
+ base64_etable[62] = '+';
+ base64_etable[63] = '/';
+
+ for (i = 0; i < 255; i++) {
+ base64_dtable[i] = 0x80;
+ }
+ for (i = 'A'; i <= 'Z'; i++) {
+ base64_dtable[i] = 0 + (i - 'A');
+ }
+ for (i = 'a'; i <= 'z'; i++) {
+ base64_dtable[i] = 26 + (i - 'a');
+ }
+ for (i = '0'; i <= '9'; i++) {
+ base64_dtable[i] = 52 + (i - '0');
+ }
+
+ base64_dtable['+'] = 62;
+ base64_dtable['/'] = 63;
+ base64_dtable['='] = 0;
+
+ base64_tables_set=true;
+}
+
+
+/* this function requires out to be already allocated,
+ it must have (maxlen*4/3)+3 bytes
+ returns number of characters in out
+*/
+int do_base64_encode(char *in, int maxlen, char *out)
+{
+ int i;
+ bool hiteof = false; /* read the last character */
+ int p; /* index of current character */
+ char *s=out;
+ char igroup[3];
+ char ogroup[4];
+ int n;
+
+ init_base64_tables();
+
+ for(p=0;p<maxlen;) {
+ igroup[0] = igroup[1] = igroup[2] = 0;
+ for (n = 0; n < 3; n++) {
+ if (p==maxlen) {
+ hiteof = TRUE;
+ break;
+ }
+ igroup[n] = in[p++];
+ }
+ if (n > 0) {
+ ogroup[0] = base64_etable[igroup[0] >> 2];
+ ogroup[1] = base64_etable[((igroup[0] & 3) << 4) | (igroup[1] >> 4)];
+ ogroup[2] = base64_etable[((igroup[1] & 0xF) << 2) | (igroup[2] >> 6)];
+ ogroup[3] = base64_etable[igroup[2] & 0x3F];
+
+ /* Replace characters in output stream with "=" pad
+ characters if fewer than three characters were
+ read from the end of the input stream. */
+
+ if (n < 3) {
+ ogroup[3] = '=';
+ if (n < 2) {
+ ogroup[2] = '=';
+ }
+ }
+ for (i = 0; i < 4; i++) {
+ *s++=ogroup[i];
+ }
+ }
+ }
+ return s-out;
+}
+
+
+int do_base64_decode(char *in, int maxlen, char *out)
+{
+ int p=0;
+ int i,j;
+ char *s=out;
+ init_base64_tables();
+ while (true) {
+ char a[4], b[4], o[3];
+ int c;
+
+ for (i = 0; i < 4; i++) {
+ while (true) { /* get a printable character*/
+ if (p==maxlen) { /* end of string */
+ if (i > 0)
+ elog(ERROR, "base64_decode: Input string incomplete");
+ return s-out; /* done */
+ }
+ c = in[p++];
+ if ( c > ' ' ) break;
+ }
+
+ if ( (base64_dtable[c] & 0x80) || (!isprint(c)) )
+ elog(ERROR, "base64_decode: Illegal character '%c' in input string.\n", c);
+ a[i] = c;
+ b[i] = base64_dtable[c];
+ }
+ o[0] = (b[0] << 2) | (b[1] >> 4);
+ o[1] = (b[1] << 4) | (b[2] >> 2);
+ o[2] = (b[2] << 6) | b[3];
+ i = a[2] == '=' ? 1 : (a[3] == '=' ? 2 : 3);
+
+ for (j=0;j<i;j++) *s++=o[j]; /* write i bytes */
+ if (i < 3) { /*alternative exit*/
+ return s-out; /* done */
+ }
+ }
+}
+
+
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /home/cvs/pgsql/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.194
diff --unified -r1.194 pg_proc.h
--- src/include/catalog/pg_proc.h 2001/06/14 01:09:22 1.194
+++ src/include/catalog/pg_proc.h 2001/06/22 18:06:09
@@ -2646,6 +2646,11 @@
DATA(insert OID = 1927 ( has_table_privilege PGUID 12 f t f t 2 f 16 "26 25" 100 0 0 100 has_table_privilege_id - ));
DESCR("current user privilege on relation by rel oid");
+DATA(insert OID = 1928 ( base64_encode PGUID 12 f t t t 1 f 25 "17" 100 0 0 100 base64_encode - ));
+DESCR("Convert bytea value into base64-encoded text string");
+DATA(insert OID = 1929 ( base64_decode PGUID 12 f t t t 1 f 17 "25" 100 0 0 100 base64_decode - ));
+DESCR("Convert base64-encoded text string into bytea value");
+
/*
* prototypes for functions pg_proc.c
Index: src/include/utils/builtins.h
===================================================================
RCS file: /home/cvs/pgsql/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.154.1000.1
diff --unified -r1.154.1000.1 builtins.h
--- src/include/utils/builtins.h 2001/06/17 00:21:34 1.154.1000.1
+++ src/include/utils/builtins.h 2001/06/22 18:07:23
@@ -461,6 +461,8 @@
extern Datum byteaGetBit(PG_FUNCTION_ARGS);
extern Datum byteaSetByte(PG_FUNCTION_ARGS);
extern Datum byteaSetBit(PG_FUNCTION_ARGS);
+extern Datum base64_encode(PG_FUNCTION_ARGS);
+extern Datum base64_decode(PG_FUNCTION_ARGS);
/* version.c */
extern Datum pgsql_version(PG_FUNCTION_ARGS);
Your patch has been added to the PostgreSQL unapplied patches list at:
http://candle.pha.pa.us/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.Code was initially taken from public-domain base64.c by John Walker but
much simplified (such as, breaking up long string into multiple lines is
not done, EBCDIC support removed).--
Alex Pilosov | http://www.acedsl.com/home.html
CTO - Acecape, Inc. | AceDSL:The best ADSL in the world
325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
New York, NY 10018 |
Content-Description:
[ Attachment, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.
Btw, there are functions in form encode(data, 'base64'),
decode(data, 'base64') in contrib/pgcrypto. They do also
encode(data, 'hex'). In the future I need to do probably
encode(data, 'pgp-armor') too...
I agree those functionality should be in core code, and if
the Alex ones get there, maybe he could use same interface?
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.
--
marko
On Sat, 23 Jun 2001, Marko Kreen wrote:
On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.Btw, there are functions in form encode(data, 'base64'),
decode(data, 'base64') in contrib/pgcrypto. They do also
encode(data, 'hex'). In the future I need to do probably
encode(data, 'pgp-armor') too...I agree those functionality should be in core code, and if
the Alex ones get there, maybe he could use same interface?
Oy, I didn't notice them in contrib/pgcrypt.
Bruce, you can take my patch out of queue, stuff in pgcrypt is far more
comprehensive than what I done.
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.
I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...
-alex
On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
On Sat, 23 Jun 2001, Marko Kreen wrote:
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...
Ok, I think I look into it. I am anyway preparing a big update
to pgcrypto.
Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?
It is weird because internal representation is exactly the same.
As I want my funtions to operate on both, do I need to create
separate funtion entries to every combination of parameters?
It gets crazy on encrypt_iv(data, key, iv, type) which has 3
parameters that can be both bytea or text...
--
marko
On Sat, 23 Jun 2001, Marko Kreen wrote:
On Fri, Jun 22, 2001 at 09:55:46PM -0400, Bruce Momjian wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.Btw, there are functions in form encode(data, 'base64'),
decode(data, 'base64') in contrib/pgcrypto. They do also
encode(data, 'hex'). In the future I need to do probably
encode(data, 'pgp-armor') too...I agree those functionality should be in core code, and if
the Alex ones get there, maybe he could use same interface?Oy, I didn't notice them in contrib/pgcrypt.
Bruce, you can take my patch out of queue, stuff in pgcrypt is far more
comprehensive than what I done.
Sure. Done. Funny we didn't need them as much for crypto but we do
need them for binary insertion into the database.
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...
Agreed.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
On Sat, 23 Jun 2001, Marko Kreen wrote:
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...Ok, I think I look into it. I am anyway preparing a big update
to pgcrypto.Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?It is weird because internal representation is exactly the same.
As I want my funtions to operate on both, do I need to create
separate funtion entries to every combination of parameters?
It gets crazy on encrypt_iv(data, key, iv, type) which has 3
parameters that can be both bytea or text...
We just need to mark them as binary compatible. I will do that now and
commit. We really weren't sure what bytea was for in the past (or
forgot) so I am sure it was an oversight.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Patch removed.
Attached is a patch (including documentation this time :) containing two
functions, base64_encode(bytea) and base64_decode(text) with obvious
functionality.Code was initially taken from public-domain base64.c by John Walker but
much simplified (such as, breaking up long string into multiple lines is
not done, EBCDIC support removed).--
Alex Pilosov | http://www.acedsl.com/home.html
CTO - Acecape, Inc. | AceDSL:The best ADSL in the world
325 W 38 St. Suite 1005 | (Stealth Marketing Works! :)
New York, NY 10018 |
Content-Description:
[ Attachment, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
On Sat, 23 Jun 2001, Marko Kreen wrote:
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...Ok, I think I look into it. I am anyway preparing a big update
to pgcrypto.Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?It is weird because internal representation is exactly the same.
As I want my funtions to operate on both, do I need to create
separate funtion entries to every combination of parameters?
It gets crazy on encrypt_iv(data, key, iv, type) which has 3
parameters that can be both bytea or text...
I have commited code to CVS to make bytea binary compatible with text.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sat, Jun 23, 2001 at 08:42:46AM -0400, Alex Pilosov wrote:
On Sat, 23 Jun 2001, Marko Kreen wrote:
Or I can extract it out from pgcrypto and submit to core ;)
I simply had not a need for it because I used those with
pgcrypto, but Alex seems to hint that there would be bit of
interest otherwise too.I think encode/decode should be part of core, as they are basic functions
to manipulate bytea data...Ok, I think I look into it. I am anyway preparing a big update
to pgcrypto.Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?It is weird because internal representation is exactly the same.
As I want my funtions to operate on both, do I need to create
separate funtion entries to every combination of parameters?
It gets crazy on encrypt_iv(data, key, iv, type) which has 3
parameters that can be both bytea or text...
Sorry, backed out bytea binary compatibility code. Tom says it will not
work.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Marko Kreen <marko@l-t.ee> writes:
Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?
Intentional. text and friends do not like embedded nulls.
If there were a cast it would have to be one that implies
an I/O conversion, just like any other type that contains
non-textual data.
regards, tom lane
On Sat, Jun 23, 2001 at 10:46:46PM -0400, Tom Lane wrote:
Marko Kreen <marko@l-t.ee> writes:
Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?Intentional. text and friends do not like embedded nulls.
If there were a cast it would have to be one that implies
an I/O conversion, just like any other type that contains
non-textual data.
Well, I have functions that should work on both - encode(),
digest(), hmac(). Probably should do then several entries. Ok.
But what should be return type of decrypt()? I imagine well
situations where user wants to crypt both bytea and text data.
When there is even not a way to cast them to each other, then
he is stuck for no good reason.
--
marko
On Sun, 24 Jun 2001, Marko Kreen wrote:
On Sat, Jun 23, 2001 at 10:46:46PM -0400, Tom Lane wrote:
Marko Kreen <marko@l-t.ee> writes:
Question to -hackers: currently there is not possible to cast
bytea to text and vice-versa. Is this intentional or bug?Intentional. text and friends do not like embedded nulls.
If there were a cast it would have to be one that implies
an I/O conversion, just like any other type that contains
non-textual data.Well, I have functions that should work on both - encode(),
digest(), hmac(). Probably should do then several entries. Ok.But what should be return type of decrypt()? I imagine well
situations where user wants to crypt both bytea and text data.
When there is even not a way to cast them to each other, then
he is stuck for no good reason.
There SHOULD be a text_bytea function to cast a text as bytea, as it is
always safe. (It doesn't exist yet, but its a trivial patch)
Function to cast bytea as text, I think, should do proper checking that
input did not contain nulls, and return text data back.
Your encrypt/decrypt should take bytea and return bytea. Its user's
responsibility to cast the things to bytea when needed.
Alex Pilosov <alex@pilosoft.com> writes:
Function to cast bytea as text, I think, should do proper checking that
input did not contain nulls, and return text data back.
That is most definitely not good enough. In MULTIBYTE installations
you'd have to also check that there were no illegal multibyte sequences.
The whole approach seems misguided to me anyway. bytea isn't equivalent
to text and conversion functions based on providing incomplete binary
equivalence are fundamentally wrong. hex or base64 encode/decode
functions seem like reasonable conversion paths, or you could provide
a function that mimics the existing I/O conversions for bytea, ugly as
they are.
In the case that Marko is describing, it seems to me he is providing
two independent sets of encryption functions, one for text and one
for bytea. That they happen to share code under the hood is an
implementation detail of his code, not a reason to contort the type
system. If someone wanted to add functions to encrypt, say, polygons,
would you start looking for ways to create a binary equivalence between
polygon and text? I sure hope not.
regards, tom lane
Marko Kreen <marko@l-t.ee> writes:
But what should be return type of decrypt()?
You'll need more than one name: decrypt to text, decrypt to bytea, etc.
Think about what happens when you need to support additional types.
Relying on implicit conversions or binary equivalence will not scale.
regards, tom lane
Alex Pilosov <alex@pilosoft.com> writes:
Function to cast bytea as text, I think, should do proper checking that
input did not contain nulls, and return text data back.That is most definitely not good enough. In MULTIBYTE installations
you'd have to also check that there were no illegal multibyte sequences.The whole approach seems misguided to me anyway. bytea isn't equivalent
to text and conversion functions based on providing incomplete binary
equivalence are fundamentally wrong. hex or base64 encode/decode
functions seem like reasonable conversion paths, or you could provide
a function that mimics the existing I/O conversions for bytea, ugly as
they are.
He can create an output function just to text, and varchar, etc will work
OK, right?
I think the main issue is that char(), varchar(), text all input/output
strings of the same format while bytea has special backslash handling
for binary/null values.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sun, 24 Jun 2001, Tom Lane wrote:
Alex Pilosov <alex@pilosoft.com> writes:
Function to cast bytea as text, I think, should do proper checking that
input did not contain nulls, and return text data back.That is most definitely not good enough. In MULTIBYTE installations
you'd have to also check that there were no illegal multibyte sequences.
True, but see below.
The whole approach seems misguided to me anyway. bytea isn't equivalent
to text and conversion functions based on providing incomplete binary
equivalence are fundamentally wrong. hex or base64 encode/decode
functions seem like reasonable conversion paths, or you could provide
a function that mimics the existing I/O conversions for bytea, ugly as
they are.In the case that Marko is describing, it seems to me he is providing
two independent sets of encryption functions, one for text and one
for bytea. That they happen to share code under the hood is an
implementation detail of his code, not a reason to contort the type
system. If someone wanted to add functions to encrypt, say, polygons,
would you start looking for ways to create a binary equivalence between
polygon and text? I sure hope not.
Well, encrypt/decrypt are special kinds of functions. When the data is
decrypted, its type is not known, as it is not stored anywhere in the
data. Caller is responsible to casting the result to whatever he needs to,
thus, there must be some way to cast output of decrypted data to any type.
I may be going a bit too far, but, if you think about it, if one wanted to
encrypt a generic type t, these ar e the alternatives:
a) to encrypt, caller must use encrypt(t_out(val)) and to decrypt
t_in(decrypt(val)).
Problem with that is non-existance of CSTRING datatype as of yet, and a
possible inefficiency of it compared to b).
b) make encrypt operate on 'opaque' type, and just encrypt raw data in
memory, as many as there are, and store the original varlen separately.
(most encrypt-decrypt algorithms do not preserve data length anyway, they
operate in blocks of n bytes). Question in this situation what to do with
decrypt, options are:
b1) make decrypt return opaque and to allow conversion from opaque to any
datatype, (by blindly setting the oid of return type), I'm not sure how
hard is this one to do with current type system, and do not like safety of
this since an ordinary user would be able to put garbage data into type
that may not be prepared to handle it.
b2) make encrypt store the name of original type in encrypted data. make
decrypt return opaque which would contain (type,data,length) triple, and
allow to cast opaque into any type but _checking_ that opaque has correct
format and that type stored in opaque matches type its being cast to.
This has additional benefit of being able to serialize/deserialize data,
preserving type, which may be used by something else...
In my opinion, a) is probably the easiest option to implement. b2) is
(IMHO) the most correct one, but it may be a bit too much work for not
that much of benefit?
This may be going a bit too far, since original question only dealt with
text-bytea conversions, but maybe its time to look at 'generic' functions
which return generic types.
On Sun, Jun 24, 2001 at 06:20:39PM -0400, Alex Pilosov wrote:
On Sun, 24 Jun 2001, Tom Lane wrote:
In the case that Marko is describing, it seems to me he is providing
two independent sets of encryption functions, one for text and one
for bytea. That they happen to share code under the hood is an
implementation detail of his code, not a reason to contort the type
system. If someone wanted to add functions to encrypt, say, polygons,
would you start looking for ways to create a binary equivalence between
polygon and text? I sure hope not.Well, encrypt/decrypt are special kinds of functions. When the data is
decrypted, its type is not known, as it is not stored anywhere in the
data. Caller is responsible to casting the result to whatever he needs to,
thus, there must be some way to cast output of decrypted data to any type.I may be going a bit too far, but, if you think about it, if one wanted to
encrypt a generic type t, these ar e the alternatives:
[ ... bunch of good ideas ... ]
I do not want to go that far and imagine current encrypt() as
something low-level, that encrypts a unstructured array of 8bit
values. That makes bytea as 'natural' type to use for it.
I now took the Tom suggestion that all functions do not operate
well on 8bit values - so now I declared that all my funtions
that _do_ operate on 8bit values, get data as bytea.
Btw, the length is preserved - I use padding if needed. But no
additional info is preserved.
Now, if you want to do something higher-level, in POV of
PostgreSQL - to attach type data or something else, you can
very well build some higher-level functions on encrypt() that
add some additional structure for it. This is easy - you can
do it in SQL level if you want, but I also tried to make
all crypto stuff accesible from C level too. I do not think it
belongs to current encrypt() - this is 'next level'. So I do
not worry about encrypting polygons yet.
Tho' current encrypt() has some 'negative' points on crypto POV.
As it does basically pure cipher, and has no structure I cant
use some higher features as key generation, attaching algorithm
info to data and checksums. (Actually it _does_ support
attaching a MD or HMAC to encrypted data, but I consider it as
too hackish). So, ee, someday, when I have more time I would like
to use current code as building block and do a minimal OpenPGP
implementation that does support all of it.
This again does not offer anything for 'generic types', but
again I do not consider it job for that level.
This may be going a bit too far, since original question only dealt with
text-bytea conversions, but maybe its time to look at 'generic' functions
which return generic types.
I did want to encrypt() etc. to operate on 'text' too, as it
would be _very_ convinient, and they really are similar on POV
of encrypt().
Hmm, on the other hand -
Idea for 'generic types', taking account of PostgreSQL current
type system - functions:
pack(data::whatever)::bytea,
unpack_text(data::bytea)::text,
unpack_polygon(data::bytea)::polygon
...
pack() does a compact representation of data, with type attached
unpack*() checks if it is of correct type and sane. It may be
textual but this takes much room, binary is probably not
portable. Eg. it could be done using *in(), *out() functions,
maybe even keep the '\0', and prepends type info (oid/name).
So later it could be given to encrypt()... ?
--
marko