CHAR vs NVARCHAR vs TEXT performance
Hi Guys,
I wanted to get some thoughts about a type-specific performance problem
we hit
through our application tier.
The full conversation is here:
https://github.com/npgsql/npgsql/issues/2283
Basically, if a table exists with a PK which is CHAR(n) and a query is
sent with
VARCHAR or CHAR then it uses an Index Scan. If the query is sent with
TEXT as the
type then postgresql casts the column to TEXT (rather than the value to
CHAR) and
it does a Seq Scan.
So far this has only showed itself on npgsql (I've been unable to
reproduce on
other drivers), I think it's because npgsql only sends TEXT whereas
other drivers
tend to send VARCHAR (other drivers includes the official JDBC driver).
I guess the root question is: is TEXT supposed to be identical to
VARCHAR in all scenarios?
Thanks,
Rob
Rob <postgresql@mintsoft.net> writes:
Basically, if a table exists with a PK which is CHAR(n) and a query is
sent with VARCHAR or CHAR then it uses an Index Scan. If the query is
sent with TEXT as the type then postgresql casts the column to TEXT
(rather than the value to CHAR) and it does a Seq Scan.
Yeah, this is an artifact of the fact that text is considered a
"preferred type" so it wins out in the parser's choice of which
type to promote to. See
https://www.postgresql.org/docs/current/typeconv-oper.html
I guess the root question is: is TEXT supposed to be identical to
VARCHAR in all scenarios?
It's not for this purpose, because varchar isn't a preferred type.
FWIW, my recommendation for this sort of thing is almost always
to not use CHAR(n). The use-case for that datatype pretty much
disappeared with the last IBM Model 029 card punch.
regards, tom lane
I agree in principle, however in this particular scenario it's not
our schema so we're a little reluctant to migrate the types etc.
We're in a bit of a bad place because the combination of NHibernate
+ npgsql3/4 + this table = seqScans everywhere. Basically when npgsql
changed their default type for strings from VARCHAR to TEXT it caused
this behaviour.
I suppose the follow up question is: should drivers
default to sending types that are preferred by postgres (i.e. TEXT)
rather than compatible types (VARCHAR). If so, is there a reason why
the JDBC driver doesn't send TEXT (possibly a question for the JDBC
guys rather than here)?
Thanks,
Rob
Show quoted text
On 2019-04-30 00:16, Thomas Munro wrote:
On Tue, Apr 30, 2019 at 5:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
FWIW, my recommendation for this sort of thing is almost always
to not use CHAR(n). The use-case for that datatype pretty much
disappeared with the last IBM Model 029 card punch.+1 on the recommendation for PostgreSQL.
I do think it's useful on slightly more recent IBM technology than the
029 though. It's been a few years since I touched it, but DB2 manuals
and experts in this decade recommended fixed size types in some
circumstances, and they might in theory be useful on any
in-place-update system (and maybe us in some future table AM?). For
example, you can completely exclude the possibility of having to spill
to another page when updating (DB2 DBAs measure and complain about
rate of 'overflow' page usage which they consider failure and we
consider SOP), you can avoid wasting space on the length (at the cost
of wasting space on trailing spaces, if the contents vary in length),
you can get O(1) access to fixed sized attributes (perhaps even
updating single attributes). These aren't nothing, and I've seen DB2
DBAs get TPS improvements from that kind of stuff. (From memory this
type of thing was also a reason to think carefully about which tables
should use compression, because the fixed size space guarantees went
out the window.).
Import Notes
Reply to msg id not found: CA+hUKGK3gXyCufbFaULg7fKrxeky5umH3hv1FYP_Ebu2MeBsQ@mail.gmail.com
On Tue, Apr 30, 2019 at 5:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
FWIW, my recommendation for this sort of thing is almost always
to not use CHAR(n). The use-case for that datatype pretty much
disappeared with the last IBM Model 029 card punch....
Perhaps the "tip" on the character datatype page (
https://www.postgresql.org/docs/11/datatype-character.html) should be
updated as the statement "There is no performance difference among these
three types..." could easily lead a reader down the wrong path. The
statement may be true if one assumes the planner is able to make an optimal
choice but clearly there are cases that prevent that. If the situation is
better explained elsewhere in the documentation then just a link to that
explanation may be all that is needed.
Cheers,
Steve