BUG #13442: ISBN doesn't always roundtrip with text
The following bug has been logged on the website:
Bug reference: 13442
Logged by: B.Z
Email address: bz@mailinator.com
PostgreSQL version: 9.4.4
Operating system: Linux think 4.0.0-1-amd64 #1 SMP Debian 4.0.2-1 (2
Description:
I have a table containing ISBNs stored as the isbn type provided by the isn
extension. In certain cases an ISBN in ISBN10 form is considered unequal to
the same ISBN in ISBN13 form.
Assuming the isn extension is installed:
CREATE TEMP TABLE t(i isbn);
INSERT INTO t VALUES('9791020902573');
SELECT * FROM t WHERE i = '9791020902573'; -- Returns 10-209-0257-4
SELECT * FROM t WHERE i = '10-209-0257-4'; -- Returns 0 rows
SELECT '9791020902573'::isbn = '10-209-0257-4'::isbn; -- Returns f
Other examples of ISBNs with this behavior:
select isbn from isbns where isbn::isbn != isbn::isbn::text::isbn;
isbn
10-91986-01-0
10-91414-02-5
975891300-X
598687008-5
904072556-X
10-209-0257-4
10-209-0014-8
960838204-1
10-91447-03-9
963208937-5
10-209-0011-3
975297291-8
957849470-X
157607283-5
10-93509-00-7
10-209-0014-8
10-209-0273-6
975574374-X
979984511-4
953173635-9
853390230-1
963930301-1
10-223-0179-9
10-209-0141-1
10-209-0041-5
960218202-4
598340069-X
10-91600-01-5
598340069-X
(29 rows)
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
bz@mailinator.com writes:
I have a table containing ISBNs stored as the isbn type provided by the isn
extension. In certain cases an ISBN in ISBN10 form is considered unequal to
the same ISBN in ISBN13 form.
I poked at this a little bit. I think the issue may be that this bit
at line 832 in isn.c:
case ISBN:
memcpy(buf, "978", 3);
supposes that all ISBNs should have prefix 978, whereas your example is
using prefix 979, which seems to be also valid according to code a few
lines above. I don't know enough about the whole ISBN/EAN mess to
understand what this should be doing instead, though.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I have a table containing ISBNs stored as the isbn type provided by the isn
extension. In certain cases an ISBN in ISBN10 form is considered unequal to
the same ISBN in ISBN13 form.I poked at this a little bit. I think the issue may be that this bit
at line 832 in isn.c:case ISBN:
memcpy(buf, "978", 3);supposes that all ISBNs should have prefix 978, whereas your example is
using prefix 979, which seems to be also valid according to code a few
lines above. I don't know enough about the whole ISBN/EAN mess to
understand what this should be doing instead, though.
Hmmm. ISBN-10 -> ISBN-13 is really just adding "978" in front, so there is
no casted ISBN-10 number starting with 979. ISTM that "979" is just a
recorded prefix for books, just as "978", for ISBN-13. AFAICR the prefix
was chosen so that the checksum digit can be kept in the conversion.
SELECT '9791020902573'::isbn = '10-209-0257-4'::isbn; -- Returns f
Indeed, but this one should return true:
SELECT '9781020902574'::isbn = '10-209-0257-4'::isbn;
So I would say there is no bug as the number are indeed different, or that
I missed something.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
(I'm the original reporter).
I think this gets to the crux of the matter, although I disagree that no
bug exists.
ISBN10s, i.e. 10-digit ISBNs, can be converted to ISBN13s, and back again.
i.e. they round-trip. ISBN13s with the 979 prefix can't be converted to
ISBN10s. (Evidence includes http://pcn.loc.gov/isbncnvt.html failing on
ISBN13s with 979- prefixes with "This 13-digit ISBN does not begin with
"978"
It cannot be converted to a 10-digit ISBN.", and
https://247pearsoned.custhelp.com/app/answers/detail/a_id/6922/~/isbn-converter
stating "The revised ISBN standard does not provide rules for converting or
assigning 10-digit ISBNs to 13-digit ISBNs prefixed with "979". There is no
industry approved method for converting ISBN-13s prefixed with "979" to an
ISBN-10. Thirteen-digit ISBNs prefixed with "979" should not be represented
to trading partners as 10-digit ISBNs. ").
The key bug, then, is that the isn extension attempts to make this illegal
conversion:
select '9791020902573'::isbn; --gives 10-209-0257-4
I suspect the correct behaviour is raising an exception in these
circumstances.
Import Notes
Resolved by subject fallback
The key bug, then, is that the isn extension attempts to make this illegal
conversion:select '9791020902573'::isbn; --gives 10-209-0257-4
Indeed, this one looks like a bug, it should really be an error.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Jun 15, 2015 at 10:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I poked at this a little bit. I think the issue may be that this bit
at line 832 in isn.c:case ISBN:
memcpy(buf, "978", 3);supposes that all ISBNs should have prefix 978, whereas your example is
using prefix 979, which seems to be also valid according to code a few
lines above.
I actually didn't realize that the ISBN type assumed the "bookland"
country code, which is 978. That's terrible, and is really a distinct
way that the type is broken (distinct from the range enforcement
misfeature that I always go on about).
I was planning on suggesting completely removing any enforcement of
ISBN ranges by the ISBN type. We could then rely entirely on the check
digit, which has virtually no disadvantage relative to "more complete"
enforcement. But even that's not going to help here, because the ISBN
output function doesn't show the country code or the check digit:
postgres=# select '978-0-393-04002-9'::isbn;
isbn
---------------
0-393-04002-X
(1 row)
--
Peter Geoghegan
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Fabien COELHO <coelho@cri.ensmp.fr> writes:
I have a table containing ISBNs stored as the isbn type provided by the isn
extension. In certain cases an ISBN in ISBN10 form is considered unequal to
the same ISBN in ISBN13 form.
So I would say there is no bug as the number are indeed different, or that
I missed something.
I think this definitely indicates a bug:
regression=# select '9791020902573'::isbn;
isbn
---------------
10-209-0257-4
(1 row)
regression=# select '10-209-0257-4'::isbn;
isbn
---------------
1-02-090257-4
(1 row)
I don't mind if '9791020902573' gets converted to something canonical,
but surely the canonical value ought to reproduce without further change.
Furthermore,
regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn;
?column?
----------
f
(1 row)
so apparently the canonicalization conversion wasn't right to begin with.
I have no idea which of these forms should be considered valid or
canonical, but surely the two behaviors exhibited above are wrong.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I think this definitely indicates a bug:
regression=# select '9791020902573'::isbn;
isbn
---------------
10-209-0257-4
(1 row)
Yes, indeed, this one is definitely a bug.
regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn;
?column?
----------
f
(1 row)so apparently the canonicalization conversion wasn't right to begin with.
I still do not get it, "False" is expected on this last one, as this is
not the same book.
I have no idea which of these forms should be considered valid or
canonical, but surely the two behaviors exhibited above are wrong.
I think that it is only the first one, i.e. converting the ISBN13 code
beginning with 979 to a ISBN10 should have raised an exception.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tuesday, June 16, 2015, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I think this definitely indicates a bug:
regression=# select '9791020902573'::isbn;
isbn
---------------
10-209-0257-4
(1 row)Yes, indeed, this one is definitely a bug.
regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn;
?column?
----------
f
(1 row)so apparently the canonicalization conversion wasn't right to begin with.
I still do not get it, "False" is expected on this last one, as this is
not the same book.
I have no idea which of these forms should be considered valid or
canonical, but surely the two behaviors exhibited above are wrong.
I think that it is only the first one, i.e. converting the ISBN13 code
beginning with 979 to a ISBN10 should have raised an exception.
There is only one problem/bug (hopefully) but it is evidenced by the fact
that the two queries provided contradict each other. It is basically a
different form of proof. You know the first query is wrong using external
knowledge but Tom has shown that a bug exists using strictly knowledge
local to the system. Without the external knowledge one cannot say which
alternative is correct but one must be incorrect.
Fixing the first query to become an error brings the system basic into
internal consistency since that result is congruous with the second query.
And is externally affirmed to be the correct outcome (I trust...).
Making the second query return true makes the system internally consistent
too...but doesn't actually provide the correct answer (again, I trust)
David J.
regression=# select '9791020902573'::isbn = '10-209-0257-4'::isbn;
f
beginning with 979 to a ISBN10 should have raised an exception.
Making the second query return true makes the system internally consistent
too...but doesn't actually provide the correct answer (again, I trust)
Yes.
On third thoughs, the above query should raise an exception because the
cast is illegal, 979* is not an ISBN, so the answer is neither true nor
false. The query should be simply written to use the larger type and not
cast to the subtype.
SELECT EAN13 '9791234567896' = ISBN '123456789X'; -- False
SELECT EAN13 '9781234567897' = ISBN '123456789X'; -- True
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I think this definitely indicates a bug:
regression=# select '9791020902573'::isbn;
isbn
---------------
10-209-0257-4
(1 row)Yes, indeed, this one is definitely a bug.
Attached patch makes isn to distinguish when needed between ISBN and
ISBN13. ISBN is the legacy ISBN10 number which is part of ISBN13, but not
all ISBN13 numbers are ISBN.
The initial version was confusing both types, leading to accept valid
ISBN13 number as valid ISBN(10) number although they use the 979 prefix,
hence the reported issues.
The patch also adds some tests.
There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function
is changed, as well as some casts. I'm not sure of a safe way to upgrade,
because the initial version was accepting invalid ISBN which would now be
rejected, thus some store data may be considered corrupted in a database.
On the other hand, probably few people would use a legacy ISBN type and
put ISBN13 number in them, hopefully? Hmmm...
--
Fabien.
Attachments:
isn-fix-1.patchtext/x-diff; name=isn-fix-1.patchDownload+3580-13
On 06/20/2015 10:04 AM, Fabien COELHO wrote:
I think this definitely indicates a bug:
regression=# select '9791020902573'::isbn;
isbn
---------------
10-209-0257-4
(1 row)Yes, indeed, this one is definitely a bug.
Attached patch makes isn to distinguish when needed between ISBN and
ISBN13. ISBN is the legacy ISBN10 number which is part of ISBN13, but not
all ISBN13 numbers are ISBN.The initial version was confusing both types, leading to accept valid
ISBN13 number as valid ISBN(10) number although they use the 979 prefix,
hence the reported issues.The patch also adds some tests.
There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function
is changed, as well as some casts. I'm not sure of a safe way to upgrade,
because the initial version was accepting invalid ISBN which would now be
rejected, thus some store data may be considered corrupted in a database.
On the other hand, probably few people would use a legacy ISBN type and
put ISBN13 number in them, hopefully? Hmmm...
10-digit ISBNs are indeed legacy, I don't have much sympathy for an
application that still uses the 10-digit isbn datatype in a table. The
cast might well be used for display purposes, though. You might do
something like "SELECT isbn_column as bisbn13, isbn_column::isbn as isbn
FROM ...". So I think your patch is OK, and we should provide an upgrade
script, and put a warning in the release notes that you should not be
using the 10-digit isbn datatype as a column type anymore.
As long as you don't use the isbn datatype in any tables or views, the
upgrade script could just DROP the datatype, and recreate it. It's
unfortunate that it won't work if you've used it in a view, but it seems
acceptable. Could you write an upgrade script like that, please?
In the long term, I think we should deprecate all the legacy "short"
datatypes, isbn, issn, ismn and upc. As a replacement for display
purposes, provide functions to print an ean13 in those legacy forms. I'm
not even sure we need the isbn13, issn13 etc. datatypes, to be honest. A
single ean13 datatype and a function to check if an EAN is an ISBN, ISSN
etc. should be enough. You could then use that in a CHECK constraint or
create a domain if you really need to. Oh, and get rid of the "weak
input mode" while we're at it - that's just horrible.
Perhaps we should add those functions now, so that you can rewrite your
application to use them now, and we could remove the legacy datatypes in
a future release.
- Heikki
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hello Heikki,
There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function
is changed, as well as some casts. I'm not sure of a safe way to upgrade,
because the initial version was accepting invalid ISBN which would now be
rejected, thus some store data may be considered corrupted in a database.
On the other hand, probably few people would use a legacy ISBN type and
put ISBN13 number in them, hopefully? Hmmm...10-digit ISBNs are indeed legacy, I don't have much sympathy for an
application that still uses the 10-digit isbn datatype in a table. The cast
might well be used for display purposes, though. You might do something like
"SELECT isbn_column as bisbn13, isbn_column::isbn as isbn FROM ...". So I
think your patch is OK, and we should provide an upgrade script, and put a
warning in the release notes that you should not be using the 10-digit isbn
datatype as a column type anymore.As long as you don't use the isbn datatype in any tables or views, the
upgrade script could just DROP the datatype, and recreate it. It's
unfortunate that it won't work if you've used it in a view, but it seems
acceptable. Could you write an upgrade script like that, please?
Probably I could. I'll have a look. It has to drop the type *and* some
casts I think. Should it cascade? Probably not, because people will no
like loosing their library data:-)
In the long term, I think we should deprecate all the legacy "short"
datatypes, isbn, issn, ismn and upc. As a replacement for display purposes,
provide functions to print an ean13 in those legacy forms. I'm not even sure
we need the isbn13, issn13 etc. datatypes, to be honest. A single ean13
datatype and a function to check if an EAN is an ISBN, ISSN etc. should be
enough. You could then use that in a CHECK constraint or create a domain if
you really need to. Oh, and get rid of the "weak input mode" while we're at
it - that's just horrible.
Hmmm. I like the principle of using a type to embed the constraint check,
and I*N13 types are fine, only the short types are indeed legacy and have
an issue.
Perhaps we should add those functions now, so that you can rewrite your
application to use them now, and we could remove the legacy datatypes in a
future release.
I do not have an application, I just wrote the patch because I used these
types some time ago. Also, I'm not sure how to deprecate a type.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Mon, 27 Jul 2015 11:15:17 +0200 (CEST)
Hello Heikki,
Attached patch makes isn to distinguish when needed between ISBN and
ISBN13. ISBN is the legacy ISBN10 number which is part of ISBN13, but not
all ISBN13 numbers are ISBN.The initial version was confusing both types, leading to accept valid
ISBN13 number as valid ISBN(10) number although they use the 979 prefix,
hence the reported issues.The patch also adds some tests.
There is no clear upgrade path from 1.0 to 2.0, as the ISBN "in" function
is changed, as well as some casts. I'm not sure of a safe way to upgrade,
because the initial version was accepting invalid ISBN which would now be
rejected, thus some store data may be considered corrupted in a database.
On the other hand, probably few people would use a legacy ISBN type and
put ISBN13 number in them, hopefully? Hmmm...10-digit ISBNs are indeed legacy, I don't have much sympathy for an
application that still uses the 10-digit isbn datatype in a table. The cast
might well be used for display purposes, though. You might do something like
"SELECT isbn_column as bisbn13, isbn_column::isbn as isbn FROM ...". So I
think your patch is OK, and we should provide an upgrade script,
After some tests, it seems that an upgrade script cannot be provided,
or I missed something.
Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the
ISBN type definition must be modified. However, there is no such thing as
ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type
belongs to an extension (hey I know that, this is what is being done!).
Probably this is some kind of feature/bug of the extension mecanism, that
I'm not planing to fix for isn...
So the v2 attached provides the two versions (1.0 more or less bug
compatible, 2.0 without the confusion bug), without path for an upgrade
between the two, can't help it. Maybe providing a 1.0 buggy version is a
bad idea, though.
and put a warning in the release notes that you should not be using the
10-digit isbn datatype as a column type anymore.
A warning would be a good think.
--
Fabien.
Attachments:
isn-fix-2.patchtext/x-diff; name=isn-fix-2.patchDownload+3612-19
On Mon, Jul 27, 2015 at 2:15 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
After some tests, it seems that an upgrade script cannot be provided,
or I missed something.Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the
ISBN type definition must be modified. However, there is no such thing as
ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type
belongs to an extension (hey I know that, this is what is being done!).
You can't test extension upgrade scripts by running the component
statements at the command line. You have to make an actual dummy upgrade
script and test it via "alter extension ... update ...". When executed in
that context, they are exempted from the rule you are running into. (Or
maybe there is another way that I don't know of).
But I wonder what will happen when you try to drop a type which is in use?
Cheers,
Jeff
Hello,
After some tests, it seems that an upgrade script cannot be provided,
or I missed something.Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the
ISBN type definition must be modified. However, there is no such thing as
ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type
belongs to an extension (hey I know that, this is what is being done!).You can't test extension upgrade scripts by running the component
statements at the command line. You have to make an actual dummy upgrade
script and test it via "alter extension ... update ...".
Yep... but I actually tried that.
When executed in that context, they are exempted from the rule you are
running into.
With DROP TYPE isbn:
# ALTER EXTENSION isn UPDATE;
ERROR: cannot drop type isbn because other objects depend on it
DETAIL: extension isn depends on type isbn
HINT: Use DROP ... CASCADE to drop the dependent objects too.
With DROP TYPE isbn CASCADE:
# ALTER EXTENSION isn UPDATE;
ERROR: cannot drop extension "isn" because it is being modified
(Or maybe there is another way that I don't know of).
But I wonder what will happen when you try to drop a type which is in use?
Probably it should drop the dependent objects recursively, but as isn
depends on isbn we do not want to drop the extension while upgrading it.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Fabien COELHO wrote:
After some tests, it seems that an upgrade script cannot be provided,
or I missed something.Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the
ISBN type definition must be modified. However, there is no such thing as
ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type
belongs to an extension (hey I know that, this is what is being done!).
If I understand this correctly, you need to ALTER EXTENSION .. DROP
(which removes the type from the extension), then drop the type, create
it fresh, then ALTER EXTENSION ADD to put it back.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Basically version 1.0 was confusing ISBN & ISBN13, and for upgrading the
ISBN type definition must be modified. However, there is no such thing as
ALTER TYPE, and trying DROP TYPE/ALTER TYPE is rejected because the type
belongs to an extension (hey I know that, this is what is being done!).If I understand this correctly, you need to ALTER EXTENSION .. DROP
(which removes the type from the extension), then drop the type, create
it fresh, then ALTER EXTENSION ADD to put it back.
Ok, so the upgrade script must do that manually, that is what I was
missing. I'll try that. Thanks.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
If I understand this correctly, you need to ALTER EXTENSION .. DROP
(which removes the type from the extension), then drop the type, create
it fresh, then ALTER EXTENSION ADD to put it back.
After some testing with that, I have a conundrum: I would like the update
to *fail* if the user actually uses the isbn type so as not to harm data.
However, when the script is at:
DROP TYPE isbn;
It fails, because the isbn_in & out functions depends on it.
If you try:
DROP FUNCTION isbn_in(cstring);
It fails, because the isbn type depends on it. Sure.
The usual exit strategy is:
DROP TYPE isbn CASCADE;
Which works fine... but as a side effect silently remove ISBN columns that
may exist in user table. I do not think it is a good idea for an extension
update script to do that to a librarian:-)
Bar any idea around this issue, I think that an extension upgrade script
is really unadvisable, and the user should do a dump data, drop extension,
create extension, restore data.
--
Fabien.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Fabien COELHO <coelho@cri.ensmp.fr> writes:
The usual exit strategy is:
DROP TYPE isbn CASCADE;
Which works fine... but as a side effect silently remove ISBN columns that
may exist in user table. I do not think it is a good idea for an extension
update script to do that to a librarian:-)
Certainly.
Bar any idea around this issue, I think that an extension upgrade script
is really unadvisable, and the user should do a dump data, drop extension,
create extension, restore data.
AFAICS, that isn't exactly an improvement. You're telling the user
"if you have an ISBN column, you're screwed, and we are going to make it
as painful as it can possibly be to get out of that situation".
I think we'd be better off trying to migrate to a situation where these
type names all still exist but they all act like ISBN13. I'm not sure
what the stages on that journey are.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs