Physical column size
Hi,
I've created a table like this :
CREATE TABLE tmp_A (
c "char",
i int4
);
And another one
CREATE TABLE tmp_B (
i int4,
ii int4
);
I then inerted a bit more than 19 million rows in each table (exactly the
same number of rows in each).
The end result is that the physical size on disk used by table tmp_A is
exactly the same as table tmp_B (as revealed by the pg_relation_size
function) ! Given that a "char" field is supposed to be 1 byte in size and a
int4 4 bytes, shouldn't the tmp_A use a smaller disk space ? Or is it that
any value, whatever the type, requires at least 4 bytes to be stored ?
Thanks,
Paul
Am Donnerstag, 26. Januar 2006 11:06 schrieb Paul Mackay:
Hi,
I've created a table like this :
CREATE TABLE tmp_A (
c "char",
i int4
);And another one
CREATE TABLE tmp_B (
i int4,
ii int4
);I then inerted a bit more than 19 million rows in each table (exactly the
same number of rows in each).The end result is that the physical size on disk used by table tmp_A is
exactly the same as table tmp_B (as revealed by the pg_relation_size
function) ! Given that a "char" field is supposed to be 1 byte in size and
a int4 4 bytes, shouldn't the tmp_A use a smaller disk space ? Or is it
that any value, whatever the type, requires at least 4 bytes to be stored ?
I think this is caused by alignment.
Hey guys, how u been. This is quite a newbie question, but I need to ask it. I'm trying to wrap my mind around the syntax of join and why and when to use it. I understand the concept of making a query go faster by creating indexes, but it seems that when I want data from multiple tables that link together the query goes slow. The slow is typically due to expensive nested loops. The reason is, all my brain understands is:
select
tablea.data
tableb.data
tablec.data
from
tablea
tableb
tablec
where
tablea.pri_key = tableb.foreign_key AND
tableb.pri_key = tablec.foreign_key AND...
From what I read, it seems you can use inner/outer right/left join on (bla) but when I see syntax examples I see that sometimes tables are omitted from the 'from' section of the query and other times, no. Sometimes I see that the join commands are nested and others, no and sometimes I see joins syntax that only applies to one table. From what I understand join can be used to tell the database the fast way to murge table data together to get results by specifiying the table that has the primary keys and the table that has the foreign keys.
I've read all through the postgres docs on this command and I'm still left lost. Can someone please explain to me in simple language how to use these commands or provide me with a link. I need it to live right now. Thanx.
First, this isn't really the right place to ask -- this forum is about performance, not SQL syntax.
Second, this isn't a question anyone can answer in a reasonable length of time. What you're asking for usually is taught in a class on relational database theory, which is typically a semester or two in college.
If you really need a crash course, dig around on the web for terms like "SQL Tutorial".
Good luck,
Craig
J@Planeti.Biz wrote:
Show quoted text
Hey guys, how u been. This is quite a newbie question, but I need to ask
it. I'm trying to wrap my mind around the syntax of join and why and
when to use it. I understand the concept of making a query go faster by
creating indexes, but it seems that when I want data from multiple
tables that link together the query goes slow. The slow is typically due
to expensive nested loops. The reason is, all my brain understands is:select
tablea.data
tableb.data
tablec.data
from
tablea
tableb
tablec
where
tablea.pri_key = tableb.foreign_key AND
tableb.pri_key = tablec.foreign_key AND...From what I read, it seems you can use inner/outer right/left join on
(bla) but when I see syntax examples I see that sometimes tables are
omitted from the 'from' section of the query and other times, no.
Sometimes I see that the join commands are nested and others, no and
sometimes I see joins syntax that only applies to one table. From what I
understand join can be used to tell the database the fast way to murge
table data together to get results by specifiying the table that has the
primary keys and the table that has the foreign keys.I've read all through the postgres docs on this command and I'm still
left lost. Can someone please explain to me in simple language how to
use these commands or provide me with a link. I need it to live right
now. Thanx.
If I want my database to go faster, due to X then I would think that the
issue is about performance. I wasn't aware of a paticular constraint on X.
I have more that a rudementary understanding of what's going on here, I was
just hoping that someone could shed some light on the basic principal of
this JOIN command and its syntax. Most people I ask, don't give me straight
answers and what I have already read on the web is not very helpful thus
far.
----- Original Message -----
From: "Craig A. James" <cjames@modgraph-usa.com>
To: <pgsql-performance@postgresql.org>
Sent: Thursday, January 26, 2006 11:12 AM
Subject: Re: [PERFORM] Query optimization with X Y JOIN
Show quoted text
First, this isn't really the right place to ask -- this forum is about
performance, not SQL syntax.Second, this isn't a question anyone can answer in a reasonable length of
time. What you're asking for usually is taught in a class on relational
database theory, which is typically a semester or two in college.If you really need a crash course, dig around on the web for terms like
"SQL Tutorial".Good luck,
CraigJ@Planeti.Biz wrote:
Hey guys, how u been. This is quite a newbie question, but I need to ask
it. I'm trying to wrap my mind around the syntax of join and why and when
to use it. I understand the concept of making a query go faster by
creating indexes, but it seems that when I want data from multiple tables
that link together the query goes slow. The slow is typically due to
expensive nested loops. The reason is, all my brain understands is:
select
tablea.data
tableb.data
tablec.data
from
tablea
tableb
tablec
where
tablea.pri_key = tableb.foreign_key AND
tableb.pri_key = tablec.foreign_key AND...
From what I read, it seems you can use inner/outer right/left join on
(bla) but when I see syntax examples I see that sometimes tables are
omitted from the 'from' section of the query and other times, no.
Sometimes I see that the join commands are nested and others, no and
sometimes I see joins syntax that only applies to one table. From what I
understand join can be used to tell the database the fast way to murge
table data together to get results by specifiying the table that has the
primary keys and the table that has the foreign keys.
I've read all through the postgres docs on this command and I'm still
left lost. Can someone please explain to me in simple language how to use
these commands or provide me with a link. I need it to live right now.
Thanx.---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
On 1/26/06, J@planeti.biz <J@planeti.biz> wrote:
If I want my database to go faster, due to X then I would think that the
issue is about performance. I wasn't aware of a paticular constraint on X.I have more that a rudementary understanding of what's going on here, I was
just hoping that someone could shed some light on the basic principal of
this JOIN command and its syntax. Most people I ask, don't give me straight
answers and what I have already read on the web is not very helpful thus
far.
http://www.postgresql.org/docs/current/static/sql-select.html
--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)
J@Planeti.Biz wrote:
If I want my database to go faster, due to X then I would think that
the issue is about performance. I wasn't aware of a paticular
constraint on X.I have more that a rudementary understanding of what's going on here,
I was just hoping that someone could shed some light on the basic
principal of this JOIN command and its syntax. Most people I ask,
don't give me straight answers and what I have already read on the web
is not very helpful thus far.
What you are looking for is here:
It is an excellent website that discusses in depth but at a tutorial
style level how and what SQL is and how to use it. Including JOINS.
FYI, a JOIN is basically a FROM with an integrated WHERE clause. That is
a very simplified description and isn't 100% accurate
but it is close. I strongly suggest the website I mentioned above as it
will resolve your question.
Joshua D. Drake
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/
J@Planeti.Biz wrote:
If I want my database to go faster, due to X then I would think that the
issue is about performance. I wasn't aware of a paticular constraint on X.
You haven't asked a performance question yet though.
I have more that a rudementary understanding of what's going on here, I
was just hoping that someone could shed some light on the basic
principal of this JOIN command and its syntax. Most people I ask, don't
give me straight answers and what I have already read on the web is not
very helpful thus far.
OK - firstly it's not a JOIN command. It's a SELECT query that happens
to join (in your example) three tables together. The syntax is specified
in the SQL reference section of the manuals, and I don't think it's
different from the standard SQL spec here.
A query that joins two or more tables (be they real base-tables, views
or sub-query result-sets) produces the product of both. Normally you
don't want this so you apply constraints to that join (table_a.col1 =
table_b.col2).
In some cases you want all the rows from one side of a join, whether or
not you get a match on the other side of the join. This is called an
outer join and results in NULLs for all the columns on the "outside" of
the join. A left-join returns all rows from the table on the left of the
join, a right-join from the table on the right of it.
When planning a join, the planner will try to estimate how many matches
it will see on each side, taking into account any extra constraints (you
might want only some of the rows in table_a anyway). It then decides
whether to use any indexes on the relevant column(s).
Now, if you think the planner is making a mistake we'll need to see the
output of EXPLAIN ANALYSE for the query and will want to know that
you've vacuumed and analysed the tables in question.
Does that help at all?
--
Richard Huxton
Archonet Ltd
Yes, that helps a great deal. Thank you so much.
----- Original Message -----
From: "Richard Huxton" <dev@archonet.com>
To: <J@planeti.biz>
Cc: <pgsql-performance@postgresql.org>
Sent: Thursday, January 26, 2006 11:47 AM
Subject: Re: [PERFORM] Query optimization with X Y JOIN
Show quoted text
J@Planeti.Biz wrote:
If I want my database to go faster, due to X then I would think that the
issue is about performance. I wasn't aware of a paticular constraint on
X.You haven't asked a performance question yet though.
I have more that a rudementary understanding of what's going on here, I
was just hoping that someone could shed some light on the basic principal
of this JOIN command and its syntax. Most people I ask, don't give me
straight answers and what I have already read on the web is not very
helpful thus far.OK - firstly it's not a JOIN command. It's a SELECT query that happens to
join (in your example) three tables together. The syntax is specified in
the SQL reference section of the manuals, and I don't think it's different
from the standard SQL spec here.A query that joins two or more tables (be they real base-tables, views or
sub-query result-sets) produces the product of both. Normally you don't
want this so you apply constraints to that join (table_a.col1 =
table_b.col2).In some cases you want all the rows from one side of a join, whether or
not you get a match on the other side of the join. This is called an outer
join and results in NULLs for all the columns on the "outside" of the
join. A left-join returns all rows from the table on the left of the join,
a right-join from the table on the right of it.When planning a join, the planner will try to estimate how many matches it
will see on each side, taking into account any extra constraints (you
might want only some of the rows in table_a anyway). It then decides
whether to use any indexes on the relevant column(s).Now, if you think the planner is making a mistake we'll need to see the
output of EXPLAIN ANALYSE for the query and will want to know that you've
vacuumed and analysed the tables in question.Does that help at all?
--
Richard Huxton
Archonet Ltd---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Hi,
I've created a table like this :
CREATE TABLE tmp_A (
c "char",
i int4
);
And another one
CREATE TABLE tmp_B (
i int4,
ii int4
);
I then inserted a bit more than 19 million rows in each table (exactly the
same number of rows in each).
The end result is that the physical size on disk used by table tmp_A is
exactly the same as table tmp_B (as revealed by the pg_relation_size
function) ! Given that a "char" field is supposed to be 1 byte in size and a
int4 4 bytes, shouldn't the tmp_A use a smaller disk space ? Or is it that
any value, whatever the type, requires at least 4 bytes to be stored ?
Thanks,
Paul
Am Freitag, 3. M�rz 2006 11:03 schrieb Paul Mackay:
I've created a table like this :
CREATE TABLE tmp_A (
c "char",
i int4
);And another one
CREATE TABLE tmp_B (
i int4,
ii int4
);
The end result is that the physical size on disk used by table tmp_A is
exactly the same as table tmp_B (as revealed by the pg_relation_size
function) !
An int4 field is required to be aligned at a 4-byte boundary internally, so
there are 3 bytes wasted between tmp_A.c and tmp_A.i. If you switch the
order of the fields you should see space savings. (Note, however, that the
per-row overhead is about 32 bytes, so you'll probably only save about 10%
overall, rather than the 37.5% that one might expect.)
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On Fri, Mar 03, 2006 at 11:03:24AM +0100, Paul Mackay wrote:
The end result is that the physical size on disk used by table tmp_A is
exactly the same as table tmp_B (as revealed by the pg_relation_size
function) ! Given that a "char" field is supposed to be 1 byte in size and a
int4 4 bytes, shouldn't the tmp_A use a smaller disk space ? Or is it that
any value, whatever the type, requires at least 4 bytes to be stored ?
Alignment. An int4 value must start on a multiple of 4 offset, so you
get three bytes of padding. If you put the int4, then the char it
should work better. Although whole rows have alignment requirements
too...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.
On f�s, 2006-03-03 at 11:03 +0100, Paul Mackay wrote:
Hi,
I've created a table like this :
CREATE TABLE tmp_A (
c "char",
i int4
);And another one
CREATE TABLE tmp_B (
i int4,
ii int4
);I then inserted a bit more than 19 million rows in each table (exactly
the same number of rows in each).The end result is that the physical size on disk used by table tmp_A
is exactly the same as table tmp_B (as revealed by the
pg_relation_size function) ! Given that a "char" field is supposed to
be 1 byte in size and a int4 4 bytes, shouldn't the tmp_A use a
smaller disk space ? Or is it that any value, whatever the type,
requires at least 4 bytes to be stored ?
the int4 needs to be aligned at 4 bytes boundaries,
making wasted space after the char.
this would probably be the same size:
CREATE TABLE tmp_C (
c "char",
cc "char",
i int4
);
and this would be smaller:
CREATE TABLE tmp_D (
c "char",
cc "char",
ccc "char",
);
P.S.: I did not actually check to
see if the "char" type needs to be aligned,
by I assumed not.
Peter Eisentraut <peter_e@gmx.net> writes:
An int4 field is required to be aligned at a 4-byte boundary internally, so
there are 3 bytes wasted between tmp_A.c and tmp_A.i. If you switch the
order of the fields you should see space savings.
Probably not, because the row-as-a-whole has alignment requirements too.
In this example you'll just move the pad bytes from one place to
another.
regards, tom lane