selecting for type cast failures

Started by Natalie Wenzabout 13 years ago5 messagesgeneral
Jump to latest
#1Natalie Wenz
nataliewenz@ebureau.com

Hi!

I am working on updating some of our tables to use appropriate native data types; they were all defined as text when they were created years ago.

What I am running into, though, is there are some records that have bad data in them, where they can't be successfully converted to int, or float, or boolean, for example.

Is there a straightforward way to identify offending records?

I've been able to identify some with things like "...not similar to '(0|1)'..." for the boolean fields, and "...not similar to '[0-9]{1,}'..." for int.
Are regular expressions the best approach here or is there a better way?

Thoughts?

I've poked around on the internet and have found some people suggesting user-defined functions. I'd prefer to just use a query, since it's a one-time clean-up.

(I'm using postgres 9.2)

Thanks!
Natalie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Guy Rouillier
guy.rouillier@gmail.com
In reply to: Natalie Wenz (#1)
Re: selecting for type cast failures

On 3/7/2013 8:08 PM, Natalie Wenz wrote:

I am working on updating some of our tables to use appropriate native
data types; they were all defined as text when they were created
years ago.

What I am running into, though, is there are some records that have
bad data in them, where they can't be successfully converted to int,
or float, or boolean, for example.

Is there a straightforward way to identify offending records?

I've been able to identify some with things like "...not similar to
'(0|1)'..." for the boolean fields, and "...not similar to
'[0-9]{1,}'..." for int. Are regular expressions the best approach
here or is there a better way?

I did some quick searching also, looks like regular expressions are your
way to go. Here is one for isInteger, for example:

varchar ~ '^[0-9]+$'

--
Guy Rouillier

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Natalie Wenz (#1)
Re: selecting for type cast failures

On 03/07/2013 05:08 PM, Natalie Wenz wrote:

Hi!

I am working on updating some of our tables to use appropriate native data types; they were all defined as text when they were created years ago.

What I am running into, though, is there are some records that have bad data in them, where they can't be successfully converted to int, or float, or boolean, for example.

Is there a straightforward way to identify offending records?

I've been able to identify some with things like "...not similar to '(0|1)'..." for the boolean fields, and "...not similar to '[0-9]{1,}'..." for int.
Are regular expressions the best approach here or is there a better way?

Thoughts?

My opinion, it would take more time to concoct regexes that cover all
the corner cases than to write a script that walks the through the data
, finds the problem data and flags them.

I've poked around on the internet and have found some people suggesting user-defined functions. I'd prefer to just use a query, since it's a one-time clean-up.

Again, most 'one time' things I have done turned out not to be:)

(I'm using postgres 9.2)

Thanks!
Natalie

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Adrian Klaver (#3)
Re: selecting for type cast failures

Adrian Klaver-3 wrote

My opinion, it would take more time to concoct regexes that cover all
the corner cases than to write a script that walks the through the data
, finds the problem data and flags them.

ISTM that using regular expressions is necessary regardless of whether you
put them into a function/script or otherwise use them interactively via
queries...

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/selecting-for-type-cast-failures-tp5747875p5747890.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David G. Johnston (#4)
Re: selecting for type cast failures

On 03/07/2013 08:56 PM, David Johnston wrote:

Adrian Klaver-3 wrote

My opinion, it would take more time to concoct regexes that cover all
the corner cases than to write a script that walks the through the data
, finds the problem data and flags them.

ISTM that using regular expressions is necessary regardless of whether you
put them into a function/script or otherwise use them interactively via
queries...

Not necessarily. I have done this sort of thing in Python by 'pre'
casting, using Python casting to weed out the problem children.

David J.

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general