Defining character sets for indicidual fields

Started by Ram Ravichandranalmost 18 years ago3 messagesgeneral
Jump to latest
#1Ram Ravichandran
ramkaka@gmail.com

Hi,

By default, my postgresql server is set to use UTF8 character set. I was
wondering if there is any way to make sure that certain fields like url etc.
only makes use of ascii. My main aim is to save space by using only 1 byte /
character for urls (some of the urls are over 200 characters long). Is this
possible? Or are all characters eventually converted to UTF8 during storage?

Thanks,

Ram

#2Steve Atkins
steve@blighty.com
In reply to: Ram Ravichandran (#1)
Re: Defining character sets for indicidual fields

On May 31, 2008, at 6:22 PM, Ram Ravichandran wrote:

Hi,

By default, my postgresql server is set to use UTF8 character set. I
was wondering if there is any way to make sure that certain fields
like url etc. only makes use of ascii. My main aim is to save space
by using only 1 byte / character for urls (some of the urls are
over 200 characters long). Is this possible? Or are all characters
eventually converted to UTF8 during storage?

An ascii string and the UTF8 representation of it will take exactly
the same number of bytes, so if space used is your concern it's not an
issue.

Cheers,
Steve

#3Tino Wildenhain
tino@wildenhain.de
In reply to: Steve Atkins (#2)
Re: Defining character sets for indicidual fields

Hi,

Steve Atkins wrote:

On May 31, 2008, at 6:22 PM, Ram Ravichandran wrote:

Hi,

By default, my postgresql server is set to use UTF8 character set. I
was wondering if there is any way to make sure that certain fields
like url etc. only makes use of ascii. My main aim is to save space by
using only 1 byte / character for urls (some of the urls are over 200
characters long). Is this possible? Or are all characters eventually
converted to UTF8 during storage?

An ascii string and the UTF8 representation of it will take exactly the
same number of bytes, so if space used is your concern it's not an issue.

Even more, if you convert URLs from urlencoding to clear text, you can
quickly leave the ASCII char range (think punicode for the fqdn, think
utf-8 for the path)

Cheers
Tino