quick question abt pg_dump and restore
Hi,
When restoring the pg_dumped data thro psql does the rows of the table are
restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore
it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?
Thanks
josh
On 09/01/2008 14:02, Josh Harrison wrote:
When restoring the pg_dumped data thro psql does the rows of the table
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
restore it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?
If you use the text dump format, you can see the order in which the rows
are restored.
Ray.
---------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
---------------------------------------------------------------
am Wed, dem 09.01.2008, um 9:02:23 -0500 mailte Josh Harrison folgendes:
Hi,
When restoring the pg_dumped data thro psql does the rows of the table are
restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore it
to another database, will it have the rows in the same order r1,r2,r3,r4,r5?
No. If you need an order than you need an ORDER BY in the
SELECT-Statement.
Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
am Wed, dem 09.01.2008, um 14:07:13 +0000 mailte Raymond O'Donnell folgendes:
On 09/01/2008 14:02, Josh Harrison wrote:
When restoring the pg_dumped data thro psql does the rows of the table
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
restore it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?If you use the text dump format, you can see the order in which the rows
are restored.
Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:
test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,2) | 2
(0,3) | 3
(3 rows)
test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,3) | 3
(0,5) | 2
(3 rows)
Now a pg_dump:
ALTER TABLE public.order_test OWNER TO webmaster;
--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--
COPY order_test (i) FROM stdin;
1
3
2
\.
Now the question: what is the correct order?
Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Jan 9, 2008 9:12 AM, A. Kretschmer <andreas.kretschmer@schollglas.com>
wrote:
am Wed, dem 09.01.2008, um 9:02:23 -0500 mailte Josh Harrison folgendes:
Hi,
When restoring the pg_dumped data thro psql does the rows of the tableare
restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump andrestore it
to another database, will it have the rows in the same order
r1,r2,r3,r4,r5?
No. If you need an order than you need an ORDER BY in the
SELECT-Statement.
Fine. I can use order by when I want to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???
Thanks
josh
Josh Harrison escribi�:
Fine. I can use order by when I want to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???
You can't.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jan 9, 2008 9:35 AM, A. Kretschmer <andreas.kretschmer@schollglas.com>
wrote:
am Wed, dem 09.01.2008, um 14:07:13 +0000 mailte Raymond O'Donnell
folgendes:On 09/01/2008 14:02, Josh Harrison wrote:
When restoring the pg_dumped data thro psql does the rows of the table
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
restore it to another database, will it have the rows in the same orderr1,r2,r3,r4,r5? Does this apply to big tables also?
If you use the text dump format, you can see the order in which the rows
are restored.Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,2) | 2
(0,3) | 3
(3 rows)test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,3) | 3
(0,5) | 2
(3 rows)Now a pg_dump:
ALTER TABLE public.order_test OWNER TO webmaster;
--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--COPY order_test (i) FROM stdin;
1
3
2
\.Now the question: what is the correct order?
All my requirement is that the dumped table in database2 should be in the
same order as the original table(at the time of dump) in database1 .
Thanks
josh
am Wed, dem 09.01.2008, um 9:45:11 -0500 mailte Josh Harrison folgendes:
What if I want to maintain the same order as in the database1? ie., I want my
rows of TableABC in Database2 to be the same order as the rows in TableABC in
Database 1 ???
For what reason?
Again: there is no order within the database.
Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Jan 9, 2008 9:59 AM, A. Kretschmer <andreas.kretschmer@schollglas.com>
wrote:
am Wed, dem 09.01.2008, um 9:45:11 -0500 mailte Josh Harrison folgendes:
What if I want to maintain the same order as in the database1? ie., I
want my
rows of TableABC in Database2 to be the same order as the rows in
TableABC in
Database 1 ???
For what reason?
Again: there is no order within the database.
Thanks...
Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10 or last 10 or the first 10 from
first block or.... ?
And this query always returns the same 10 rows (when there are no
updates/deletes)
josh
Josh Harrison escribi�:
Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10 or last 10 or the first 10 from
first block or.... ?
Any 10. (First 10 in the physical table _if_ a seqscan is used).
And this query always returns the same 10 rows (when there are no
updates/deletes)
Yes, assuming there is no VACUUM or CLUSTER either. And no inserts
either if there is free space on the first pages.
You really shouldn't trust it.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
Josh Harrison escribi�:
Fine. I can use order by when I want to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???
You can't.
According to the SQL standard, a table is an *unordered* collection of
rows, and the results of any query are produced in an unspecified order
(unless you use ORDER BY). The ambiguity about row ordering is
intentional and is exploited by most DBMSes including Postgres to
improve implementation efficiency. If you assume there is such a thing
as a specific ordering within a table, you'll live to regret it
eventually.
regards, tom lane
On Jan 9, 2008 10:27 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Josh Harrison escribió:
Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10 or last 10 or the first 10from
first block or.... ?
Any 10. (First 10 in the physical table _if_ a seqscan is used).
Okay. Here is another scenario where Im confused.
I have a a table with around 30,000,000 recs. This is not a production
system but a test system. So in the test system generally we upload the rows
in some order say rows corresponding to a particular patient or something
like that. But in the production system, it generally doesn't happen like
that. The rows of 1 particular patient can be shuffled anywhere (ie.,
inserted in any order). We r trying to duplicate the same by shuffling te
data in the table so that the rows are not in any order and also not stored
in contiguous blocks
So now I have a table Dummy with 30,000,000 recs and a table Shuffled_Dummy
(Create table Shuffled_Dummy as select * from Dummy order by random() ) with
the same shuffled rows of dummy.
My questions
1. I pg_dumped dummy and Shuffled_dummy (from database1) to another
database(database2)
When I issued the query in both database (database1 and database2)
select * from dummy limit 1000 ( the planner chooses seq scan for this
query)
----- the output results from dummy are different in the 2 databases
But
select * from shuffled_dummy limit 1000 (planner chooses seq scan)
----- the outputs from shuffled_dummy are same from both the database
Why?
2. Also when does the planner switch from choosing index scan to bitmap
index scan? Is it dependent on the number of rows to be retrieved or the
position of the relevant data in the blocks or something else?
Thanks
josh
On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Josh Harrison escribió:
Fine. I can use order by when I want to order it in terms of
some columns. But What if I want to maintain the same order as
in the database1? ie., I want my rows of TableABC in Database2
to be the same order as the rows in TableABC in Database 1 ???You can't.
According to the SQL standard, a table is an *unordered* collection
of rows, and the results of any query are produced in an
unspecified order (unless you use ORDER BY). The ambiguity about
row ordering is intentional and is exploited by most DBMSes
including Postgres to improve implementation efficiency. If you
assume there is such a thing as a specific ordering within a table,
you'll live to regret it eventually.
Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?
And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?
eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:
Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?
Well, this is more or less what CLUSTER does. There are some cases where
happening to know about the order the table is in will yield happy effects,
yes.
A
On Jan 9, 2008 11:28 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:Alvaro Herrera <alvherre@commandprompt.com> writes:
Josh Harrison escribió:
Fine. I can use order by when I want to order it in terms of
some columns. But What if I want to maintain the same order as
in the database1? ie., I want my rows of TableABC in Database2
to be the same order as the rows in TableABC in Database 1 ???You can't.
According to the SQL standard, a table is an *unordered* collection
of rows, and the results of any query are produced in an
unspecified order (unless you use ORDER BY). The ambiguity about
row ordering is intentional and is exploited by most DBMSes
including Postgres to improve implementation efficiency. If you
assume there is such a thing as a specific ordering within a table,
you'll live to regret it eventually.Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?
Okay. Let me explain this again
Lets say you load the data related to a particular person sequentially into
a table in the test database. This results in all of the data for any one
person being located one or a very few sequential data blocks. Testing
access to the person's data then reveals access to be very fast. However,
the data is never loaded in that way in the production database. It is
almost always spread out across many data blocks within the database,
roughly organized by the date-time in which the data arrived.In this case
access to a particular person's data is not as fast as compared to the
previous one where the data are located close to each other.
we have this problem when we compare Oracle's performance with postgres
since Oracle has index-only scan where it can access the data just from the
index when the query involves only indexed columns. But since postgres
currently doesn't implement index-only scan and goes to the heap for
fetching the tuples it becomes very slow when the data are shuffled
Let me know if it makes sense now
Thanks
josh
On Jan 9, 2008 11:39 AM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:
Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?Well, this is more or less what CLUSTER does. There are some cases where
happening to know about the order the table is in will yield happy
effects,
yes.
You are right. Sometimes when i cluster the table according to the
frequently accessed indexes then it makes queries pretty fast. But its not a
feasible solution always since some tables have more indexes which are
accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes. Index-only scan
is a good solution for this I guess for queries involving indexed columns
(like in oracle) !!!
josh
Josh Harrison <joshques@gmail.com> schrieb:
My questions
1. I pg_dumped dummy and Shuffled_dummy (from database1) to another database
(database2)
When I issued the query in both database (database1 and database2)select * from dummy limit 1000 ( the planner chooses seq scan for this query)
select * from shuffled_dummy limit 1000 (planner chooses seq scan)2. Also when does the planner switch from choosing index scan to bitmap index
scan? Is it dependent on the number of rows to be retrieved or the position of
the relevant data in the blocks or something else?
For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.
A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
(a index isn't useful, for instance, for a small table or when almost
all rows are in the result set)
A bitmap index scan performed when:
- 2 or more propper indexes available
- see above
Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�
On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:
accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes.
Maybe not poor, but certainly not optimised.
Index-only scan is a good solution for this I guess for queries involving
indexed columns (like in oracle) !!!
I think I don't know what you mean by "index-only scan". Oracle can't be
returning you data just by scanning an index, can it? It needs to get the
tuple if you need it back.
My bet is that you haven't tuned your vacuums correctly, or you aren't doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).
A
On Jan 9, 2008 11:56 AM, Andreas Kretschmer <akretschmer@spamfence.net>
wrote:
Josh Harrison <joshques@gmail.com> schrieb:
My questions
1. I pg_dumped dummy and Shuffled_dummy (from database1) to anotherdatabase
(database2)
When I issued the query in both database (database1 and database2)select * from dummy limit 1000 ( the planner chooses seq scan for this
query)
select * from shuffled_dummy limit 1000 (planner chooses seq scan)
2. Also when does the planner switch from choosing index scan to bitmap
index
scan? Is it dependent on the number of rows to be retrieved or the
position of
the relevant data in the blocks or something else?
For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
(a index isn't useful, for instance, for a small table or when almost
all rows are in the result set)A bitmap index scan performed when:
- 2 or more propper indexes available
- see aboveThanks
In my database, I have a table 'person' containing roughly 30,000,000
records
explain select count(*) from person where person_id > 1147000000
QUERY
PLAN
------------------------------------------------------------------------------------------------------------
Aggregate (cost=307708.20..307708.21 rows=1
width=0)
-> Index Scan using person_pk on person
(cost=0.00..307379.79rows=131364 width=0)
Index Cond: (person_id >
1147000000::numeric)
3 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
This returns the result
count
--------
78718
But for this query where the condition is slightly different the query plan
is different. The planner goes for bitmap index
explain select count(*) from person where person_id > 1146000000
QUERY
PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=342178.51..342178.52 rows=1
width=0)
-> Bitmap Heap Scan on person (cost=3120.72..341806.71 rows=148721
width=0)
Recheck Cond: (person_id >
1146000000::numeric)
-> Bitmap Index Scan on person_pk (cost=0.00..3083.53 rows=148721
width=0)
Index Cond: (person_id >
1146000000::numeric)
5 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
and the result is
count
--------
90625
How does the planner choose the plan?
josh
On Jan 9, 2008 12:11 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:
accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes.Maybe not poor, but certainly not optimised.
Index-only scan is a good solution for this I guess for queries
involving
indexed columns (like in oracle) !!!
I think I don't know what you mean by "index-only scan". Oracle can't be
returning you data just by scanning an index, can it? It needs to get the
tuple if you need it back.
For example if I have a query like
select column2 from ABC where column1 > 20
and table ABC is indexed on (column1,column2) then Oracle will not goto the
heap to fetch the tuples. It will return them from the index itself since
the column information is available in the index. But postgres always goes
to the heap even if it has all the queried columns in the index.
For queries that involves all the columns(like select * from ABC where
column1>20) ... of course oracle goes to the heap to fetch them
My bet is that you haven't tuned your vacuums correctly, or you aren't
doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).
I had vacuumed and analysed the tables ...tried increasing the statistics
too. But the performance compared to oracle for these types of queries(that
i had mentioned above) is pretty slow
josh