How to remove non-UTF values from a table?
An easy question for some I hope.
I have a DB from 8.2 days that when I now dump and try to take into
the 8.3.7, it gives me errors about utf-8 stuff.
I tried searching this list's archives but could not come up with an answer.
Google returns some sites like these:
http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
but I'm not clear on how to use them.
Following the SQL on this site I could identify some columns that
contain text like this:
"Évolution générale de la situation démographique"
So my guess is that the non-English characters were originally not
getting written in proper utf-8 variants.
Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).
We're on CentOS.
Thanks!
Actually the title of my email should have been "how to **replace**
utf-8 values".
Thanks.
Show quoted text
On Mon, Dec 14, 2009 at 7:03 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
An easy question for some I hope.
I have a DB from 8.2 days that when I now dump and try to take into
the 8.3.7, it gives me errors about utf-8 stuff.I tried searching this list's archives but could not come up with an answer.
Google returns some sites like these:
http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
but I'm not clear on how to use them.Following the SQL on this site I could identify some columns that
contain text like this:"Évolution générale de la situation démographique"
So my guess is that the non-English characters were originally not
getting written in proper utf-8 variants.Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).We're on CentOS.
Thanks!
Actually I just realized that the SQL below will also pick up on these
perfectly valid-looking columns:
Which part of this is non-UTF8? Why is this going into a UTF8 table
with corrupted values? The lc_collate etc and all settings I can
imagine are already utf-8!
Thanks for any pointers.
Show quoted text
On Mon, Dec 14, 2009 at 7:04 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
Actually the title of my email should have been "how to **replace**
utf-8 values".Thanks.
On Mon, Dec 14, 2009 at 7:03 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
An easy question for some I hope.
I have a DB from 8.2 days that when I now dump and try to take into
the 8.3.7, it gives me errors about utf-8 stuff.I tried searching this list's archives but could not come up with an answer.
Google returns some sites like these:
http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
but I'm not clear on how to use them.Following the SQL on this site I could identify some columns that
contain text like this:"Évolution générale de la situation démographique"
So my guess is that the non-English characters were originally not
getting written in proper utf-8 variants.Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).We're on CentOS.
Thanks!
On 14 Dec 2009, at 13:21, Phoenix Kiula wrote:
Actually I just realized that the SQL below will also pick up on these
perfectly valid-looking columns:
I think you accidentally posted your holiday plans?
Which part of this is non-UTF8? Why is this going into a UTF8 table
with corrupted values? The lc_collate etc and all settings I can
imagine are already utf-8!Thanks for any pointers.
Alban Hertroys
--
Screwing up is the best way to attach something to the ceiling.
!DSPAM:737,4b264f49228051716694515!
Phoenix Kiula <phoenix.kiula@gmail.com> writes:
Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).
I've bookmarked this for later:
http://archives.postgresql.org/pgsql-general/2009-07/msg00904.php
Regards,
--
dim
Phoenix Kiula wrote:
An easy question for some I hope.
I have a DB from 8.2 days that when I now dump and try to take into
the 8.3.7, it gives me errors about utf-8 stuff.I tried searching this list's archives but could not come up with an answer.
Google returns some sites like these:
http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
but I'm not clear on how to use them.Following the SQL on this site I could identify some columns that
contain text like this:"�volution g�n�rale de la situation d�mographique"
So my guess is that the non-English characters were originally not
getting written in proper utf-8 variants.Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).We're on CentOS.
Thanks!
My recommendation would be to install the iconv utility and run it on a
plain text (pg_dump -Fp) backup as suggested in the google article - and
then reimport the clean UTF-8.
I am surprised that you managed to install the original backup on 8.3
because it seems to be much more strict on encoding - Unless your
database is not in UTF-8?
Howard
www.selestial.com
On Tue, Dec 15, 2009 at 9:26 PM, Howard Cole <howardnews@selestial.com> wrote:
Phoenix Kiula wrote:
An easy question for some I hope.
I have a DB from 8.2 days that when I now dump and try to take into
the 8.3.7, it gives me errors about utf-8 stuff.I tried searching this list's archives but could not come up with an
answer.Google returns some sites like these:
http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
but I'm not clear on how to use them.Following the SQL on this site I could identify some columns that
contain text like this:"Évolution générale de la situation démographique"
So my guess is that the non-English characters were originally not
getting written in proper utf-8 variants.Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).We're on CentOS.
Thanks!
My recommendation would be to install the iconv utility and run it on a
plain text (pg_dump -Fp) backup as suggested in the google article - and
then reimport the clean UTF-8.I am surprised that you managed to install the original backup on 8.3
because it seems to be much more strict on encoding - Unless your database
is not in UTF-8?
Thanks Howard.
I ran the SQL and it finds anything that has non-English characters.
For example:
Part of this URL is actually in Japanese, but when I paste it in this
email it comes up with all these percentage signs. I suppose this is
"url encoded".
Shouldn't this be valid UTF-8? How does PG calculate if something is
not valid UTF-8?
Thanks.
Phoenix Kiula wrote:
On Tue, Dec 15, 2009 at 9:26 PM, Howard Cole <howardnews@selestial.com> wrote:
Phoenix Kiula wrote:
An easy question for some I hope.
I have a DB from 8.2 days that when I now dump and try to take into
the 8.3.7, it gives me errors about utf-8 stuff.I tried searching this list's archives but could not come up with an
answer.Google returns some sites like these:
http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
but I'm not clear on how to use them.Following the SQL on this site I could identify some columns that
contain text like this:"�volution g�n�rale de la situation d�mographique"
So my guess is that the non-English characters were originally not
getting written in proper utf-8 variants.Is there any SQL possibility to find these columns and replace them
with utf-8 equivalents using some postgresql commands? Couldn't find
anything in the "Strings functions" (chapter 9 of manual).We're on CentOS.
Thanks!
My recommendation would be to install the iconv utility and run it on a
plain text (pg_dump -Fp) backup as suggested in the google article - and
then reimport the clean UTF-8.I am surprised that you managed to install the original backup on 8.3
because it seems to be much more strict on encoding - Unless your database
is not in UTF-8?Thanks Howard.
I ran the SQL and it finds anything that has non-English characters.
For example:Part of this URL is actually in Japanese, but when I paste it in this
email it comes up with all these percentage signs. I suppose this is
"url encoded".Shouldn't this be valid UTF-8? How does PG calculate if something is
not valid UTF-8?Thanks.
I am sure that postgres uses standard test to see if text is valid UTF8,
however I guess you will only get a warning when you try to read or
write the data, and the warning only appears in the log as far as I am
aware. Encoding checking has become more strict with each upgrade to
postgres, this is why you are seeing errors as you import into 8.3 from
an 8.2 backup.
I still maintain that you should check and fix using iconv rather than
SQL. Read your distro notes on how to install iconv (if it isn't already
installed) and run it on your plain text backup. You can then pinpoint
individual changes using diff if you want to find it in your exising 8.2
database. Better still, just fix using Iconv then import the clean data
into 8.3 or 8.4
Howard Cole
www.selestial.com
On Wed, Dec 16, 2009 at 6:05 AM, Howard Cole <howardnews@selestial.com> wrote:
Better still, just fix using Iconv then import the clean data into
8.3 or 8.4
FWIW, If I was going to make a choice between 8.3 and 8.4, I would
choose 8.4. It looks like the future 8.5 release will be able to
preform an in-place upgrade on 8.4. This *should* eliminate to future
need to dump and reload for future upgrades.
--
Regards,
Richard Broersma Jr.
Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug
On 16/12/2009 15:01, Richard Broersma wrote:
It looks like the future 8.5 release will be able to
preform an in-place upgrade on 8.4.
Really? That would be *wonderful*. I know it's impossible to be
definitive, but how likely would you say this is?
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
"Raymond O'Donnell" <rod@iol.ie> writes:
On 16/12/2009 15:01, Richard Broersma wrote:
It looks like the future 8.5 release will be able to
preform an in-place upgrade on 8.4.
Really? That would be *wonderful*. I know it's impossible to be
definitive, but how likely would you say this is?
It's wishful thinking. We tried to have that for 8.4, and failed: there
are so many constraints on pg_upgrade that it's not a usable mainstream
solution. So far as I've seen there has been no work done that would
make it any more reliable for 8.5.
regards, tom lane
Tom Lane wrote:
"Raymond O'Donnell" <rod@iol.ie> writes:
On 16/12/2009 15:01, Richard Broersma wrote:
It looks like the future 8.5 release will be able to
preform an in-place upgrade on 8.4.Really? That would be *wonderful*. I know it's impossible to be
definitive, but how likely would you say this is?It's wishful thinking. We tried to have that for 8.4, and failed: there
are so many constraints on pg_upgrade that it's not a usable mainstream
solution. So far as I've seen there has been no work done that would
make it any more reliable for 8.5.
I do not share Tom's conclusions.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote:
"Raymond O'Donnell" <rod@iol.ie> writes:
On 16/12/2009 15:01, Richard Broersma wrote:
It looks like the future 8.5 release will be able to
preform an in-place upgrade on 8.4.Really? That would be *wonderful*. I know it's impossible to be
definitive, but how likely would you say this is?It's wishful thinking. We tried to have that for 8.4, and failed: there
are so many constraints on pg_upgrade that it's not a usable mainstream
solution. So far as I've seen there has been no work done that would
make it any more reliable for 8.5.
Some people just enjoy discouraging people.
pg_migrator (not pg_upgrade) has been used by many people to migrate
from 8.3 to 8.4. I just helped someone yesterday with a migration.
pg_migrator threw an error because they had reindexed pg_largeobject,
and pg_migrator was not prepared to handle that. They also had an old
cluster with a higher autovacuum_freeze_max_age than the new cluster,
causing autovacuum freeze to run during the upgrade and remove clog
files. These are unusual cases, so I assume there are many people who
successfully use pg_migrator that I never hear from.
Both of these issues are fixed in pg_migrator 8.4.6, released yesterday.
I also talked to Alvaro about using fixed oids to remove these migration
restrictions in 8.5:
o a user-defined composite data type
o a user-defined array data type
o a user-defined enum data type
So, pg_migrator is a work in progress. It current requires migration
testing and should be used by experienced Postgres users. Someday
pg_migrator will be easier to use and more reliable.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
pg_migrator (not pg_upgrade) has been used by many people to migrate
from 8.3 to 8.4. I just helped someone yesterday with a migration.
pg_migrator threw an error because they had reindexed pg_largeobject,
and pg_migrator was not prepared to handle that. They also had an old
cluster with a higher autovacuum_freeze_max_age than the new cluster,
causing autovacuum freeze to run during the upgrade and remove clog
files. These are unusual cases, so I assume there are many people who
successfully use pg_migrator that I never hear from.Both of these issues are fixed in pg_migrator 8.4.6, released yesterday.
I also talked to Alvaro about using fixed oids to remove these migration
restrictions in 8.5:o a user-defined composite data type
o a user-defined array data type
o a user-defined enum data typeSo, pg_migrator is a work in progress. It current requires migration
testing and should be used by experienced Postgres users. Someday
pg_migrator will be easier to use and more reliable.
Oh, and pg_migrator 8.4.6 works for migrations from 8.4 to 8.5 CVS.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Fri, Dec 18, 2009 at 3:54 AM, Bruce Momjian <bruce@momjian.us> wrote:
Bruce Momjian wrote:
pg_migrator (not pg_upgrade) has been used by many people to migrate
from 8.3 to 8.4. I just helped someone yesterday with a migration.
pg_migrator threw an error because they had reindexed pg_largeobject,
and pg_migrator was not prepared to handle that. They also had an old
cluster with a higher autovacuum_freeze_max_age than the new cluster,
causing autovacuum freeze to run during the upgrade and remove clog
files. These are unusual cases, so I assume there are many people who
successfully use pg_migrator that I never hear from.Both of these issues are fixed in pg_migrator 8.4.6, released yesterday.
I also talked to Alvaro about using fixed oids to remove these migration
restrictions in 8.5:o a user-defined composite data type
o a user-defined array data type
o a user-defined enum data typeSo, pg_migrator is a work in progress. It current requires migration
testing and should be used by experienced Postgres users. Someday
pg_migrator will be easier to use and more reliable.Oh, and pg_migrator 8.4.6 works for migrations from 8.4 to 8.5 CVS.
Is pg_migrator worth trying between 8.2.9 and 8.4.x?
Thanks
Phoenix Kiula wrote:
On Fri, Dec 18, 2009 at 3:54 AM, Bruce Momjian <bruce@momjian.us> wrote:
Bruce Momjian wrote:
pg_migrator (not pg_upgrade) has been used by many people to migrate
from 8.3 to 8.4. ?I just helped someone yesterday with a migration.
pg_migrator threw an error because they had reindexed pg_largeobject,
and pg_migrator was not prepared to handle that. ?They also had an old
cluster with a higher autovacuum_freeze_max_age than the new cluster,
causing autovacuum freeze to run during the upgrade and remove clog
files. ?These are unusual cases, so I assume there are many people who
successfully use pg_migrator that I never hear from.Both of these issues are fixed in pg_migrator 8.4.6, released yesterday.
I also talked to Alvaro about using fixed oids to remove these migration
restrictions in 8.5:? ? ? ? o ?a user-defined composite data type
? ? ? ? o ?a user-defined array data type
? ? ? ? o ?a user-defined enum data typeSo, pg_migrator is a work in progress. ?It current requires migration
testing and should be used by experienced Postgres users. ?Someday
pg_migrator will be easier to use and more reliable.Oh, and pg_migrator 8.4.6 works for migrations from 8.4 to 8.5 CVS.
Is pg_migrator worth trying between 8.2.9 and 8.4.x?
No, it only handles 8.3 and later.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +