quick question abt pg_dump and restore

Started by Josh Harrisonabout 18 years ago22 messagesgeneral
Jump to latest
#1Josh Harrison
joshques@gmail.com

Hi,
When restoring the pg_dumped data thro psql does the rows of the table are
restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore
it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?
Thanks
josh

In reply to: Josh Harrison (#1)
Re: quick question abt pg_dump and restore

On 09/01/2008 14:02, Josh Harrison wrote:

When restoring the pg_dumped data thro psql does the rows of the table
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
restore it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?

If you use the text dump format, you can see the order in which the rows
are restored.

Ray.

---------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
---------------------------------------------------------------

#3A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Josh Harrison (#1)
Re: quick question abt pg_dump and restore

am Wed, dem 09.01.2008, um 9:02:23 -0500 mailte Josh Harrison folgendes:

Hi,
When restoring the pg_dumped data thro psql does the rows of the table are
restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and restore it
to another database, will it have the rows in the same order r1,r2,r3,r4,r5?

No. If you need an order than you need an ORDER BY in the
SELECT-Statement.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#4A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Raymond O'Donnell (#2)
Re: quick question abt pg_dump and restore

am Wed, dem 09.01.2008, um 14:07:13 +0000 mailte Raymond O'Donnell folgendes:

On 09/01/2008 14:02, Josh Harrison wrote:

When restoring the pg_dumped data thro psql does the rows of the table
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
restore it to another database, will it have the rows in the same order
r1,r2,r3,r4,r5? Does this apply to big tables also?

If you use the text dump format, you can see the order in which the rows
are restored.

Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:

test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,2) | 2
(0,3) | 3
(3 rows)

test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,3) | 3
(0,5) | 2
(3 rows)

Now a pg_dump:

ALTER TABLE public.order_test OWNER TO webmaster;

--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--

COPY order_test (i) FROM stdin;
1
3
2
\.

Now the question: what is the correct order?

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#5Josh Harrison
joshques@gmail.com
In reply to: A. Kretschmer (#3)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 9:12 AM, A. Kretschmer <andreas.kretschmer@schollglas.com>
wrote:

am Wed, dem 09.01.2008, um 9:02:23 -0500 mailte Josh Harrison folgendes:

Hi,
When restoring the pg_dumped data thro psql does the rows of the table

are

restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and

restore it

to another database, will it have the rows in the same order

r1,r2,r3,r4,r5?

No. If you need an order than you need an ORDER BY in the
SELECT-Statement.

Fine. I can use order by when I want to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???

Thanks
josh

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Harrison (#5)
Re: quick question abt pg_dump and restore

Josh Harrison escribi�:

Fine. I can use order by when I want to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???

You can't.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Josh Harrison
joshques@gmail.com
In reply to: A. Kretschmer (#4)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 9:35 AM, A. Kretschmer <andreas.kretschmer@schollglas.com>
wrote:

am Wed, dem 09.01.2008, um 14:07:13 +0000 mailte Raymond O'Donnell
folgendes:

On 09/01/2008 14:02, Josh Harrison wrote:

When restoring the pg_dumped data thro psql does the rows of the table
are restored in the same order? ie for example if
Table A has rows r1,r2,r3,r4,r5 in this order, then if I pg_dump and
restore it to another database, will it have the rows in the same order

r1,r2,r3,r4,r5? Does this apply to big tables also?

If you use the text dump format, you can see the order in which the rows
are restored.

Right, but within the table the rows hasn't an order. You can see this
when you select the ctid-column:

test=# create table order_test (i int);
CREATE TABLE
test=*# insert into order_test values (1);
INSERT 0 1
test=*# insert into order_test values (2);
INSERT 0 1
test=*# insert into order_test values (3);
INSERT 0 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,2) | 2
(0,3) | 3
(3 rows)

test=*# update order_test set i=20 where i=2;
UPDATE 1
test=*# update order_test set i=2 where i=20;
UPDATE 1
test=*# select ctid, i from order_test ;
ctid | i
-------+---
(0,1) | 1
(0,3) | 3
(0,5) | 2
(3 rows)

Now a pg_dump:

ALTER TABLE public.order_test OWNER TO webmaster;

--
-- Data for Name: order_test; Type: TABLE DATA; Schema: public; Owner:
webmaster
--

COPY order_test (i) FROM stdin;
1
3
2
\.

Now the question: what is the correct order?

All my requirement is that the dumped table in database2 should be in the

same order as the original table(at the time of dump) in database1 .
Thanks
josh

#8A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Josh Harrison (#5)
Re: quick question abt pg_dump and restore

am Wed, dem 09.01.2008, um 9:45:11 -0500 mailte Josh Harrison folgendes:

What if I want to maintain the same order as in the database1? ie., I want my
rows of TableABC in Database2 to be the same order as the rows in TableABC in
Database 1 ???

For what reason?

Again: there is no order within the database.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#9Josh Harrison
joshques@gmail.com
In reply to: A. Kretschmer (#8)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 9:59 AM, A. Kretschmer <andreas.kretschmer@schollglas.com>
wrote:

am Wed, dem 09.01.2008, um 9:45:11 -0500 mailte Josh Harrison folgendes:

What if I want to maintain the same order as in the database1? ie., I

want my

rows of TableABC in Database2 to be the same order as the rows in

TableABC in

Database 1 ???

For what reason?

Again: there is no order within the database.

Thanks...

Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10 or last 10 or the first 10 from
first block or.... ?
And this query always returns the same 10 rows (when there are no
updates/deletes)

josh

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Harrison (#9)
Re: quick question abt pg_dump and restore

Josh Harrison escribi�:

Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10 or last 10 or the first 10 from
first block or.... ?

Any 10. (First 10 in the physical table _if_ a seqscan is used).

And this query always returns the same 10 rows (when there are no
updates/deletes)

Yes, assuming there is no VACUUM or CLUSTER either. And no inserts
either if there is free space on the first pages.

You really shouldn't trust it.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: quick question abt pg_dump and restore

Alvaro Herrera <alvherre@commandprompt.com> writes:

Josh Harrison escribi�:

Fine. I can use order by when I want to order it in terms of some columns.
But What if I want to maintain the same order as in the database1? ie., I
want my rows of TableABC in Database2 to be the same order as the rows in
TableABC in Database 1 ???

You can't.

According to the SQL standard, a table is an *unordered* collection of
rows, and the results of any query are produced in an unspecified order
(unless you use ORDER BY). The ambiguity about row ordering is
intentional and is exploited by most DBMSes including Postgres to
improve implementation efficiency. If you assume there is such a thing
as a specific ordering within a table, you'll live to regret it
eventually.

regards, tom lane

#12Josh Harrison
joshques@gmail.com
In reply to: Alvaro Herrera (#10)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 10:27 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Josh Harrison escribió:

Another quick question...When you issue a query like this
select * from dummy limit 10
What 10 rows are fetched? like first 10 or last 10 or the first 10

from

first block or.... ?

Any 10. (First 10 in the physical table _if_ a seqscan is used).

Okay. Here is another scenario where Im confused.
I have a a table with around 30,000,000 recs. This is not a production
system but a test system. So in the test system generally we upload the rows
in some order say rows corresponding to a particular patient or something
like that. But in the production system, it generally doesn't happen like
that. The rows of 1 particular patient can be shuffled anywhere (ie.,
inserted in any order). We r trying to duplicate the same by shuffling te
data in the table so that the rows are not in any order and also not stored
in contiguous blocks

So now I have a table Dummy with 30,000,000 recs and a table Shuffled_Dummy
(Create table Shuffled_Dummy as select * from Dummy order by random() ) with
the same shuffled rows of dummy.

My questions
1. I pg_dumped dummy and Shuffled_dummy (from database1) to another
database(database2)
When I issued the query in both database (database1 and database2)

select * from dummy limit 1000 ( the planner chooses seq scan for this
query)
----- the output results from dummy are different in the 2 databases
But
select * from shuffled_dummy limit 1000 (planner chooses seq scan)
----- the outputs from shuffled_dummy are same from both the database

Why?

2. Also when does the planner switch from choosing index scan to bitmap
index scan? Is it dependent on the number of rows to be retrieved or the
position of the relevant data in the blocks or something else?

Thanks
josh

#13Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Tom Lane (#11)
Re: quick question abt pg_dump and restore

On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Josh Harrison escribió:

Fine. I can use order by when I want to order it in terms of
some columns. But What if I want to maintain the same order as
in the database1? ie., I want my rows of TableABC in Database2
to be the same order as the rows in TableABC in Database 1 ???

You can't.

According to the SQL standard, a table is an *unordered* collection
of rows, and the results of any query are produced in an
unspecified order (unless you use ORDER BY). The ambiguity about
row ordering is intentional and is exploited by most DBMSes
including Postgres to improve implementation efficiency. If you
assume there is such a thing as a specific ordering within a table,
you'll live to regret it eventually.

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?

eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#14Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Ivan Sergio Borgonovo (#13)
Re: quick question abt pg_dump and restore

On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

Well, this is more or less what CLUSTER does. There are some cases where
happening to know about the order the table is in will yield happy effects,
yes.

A

#15Josh Harrison
joshques@gmail.com
In reply to: Ivan Sergio Borgonovo (#13)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 11:28 AM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:

On Wed, 09 Jan 2008 10:54:21 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Josh Harrison escribió:

Fine. I can use order by when I want to order it in terms of
some columns. But What if I want to maintain the same order as
in the database1? ie., I want my rows of TableABC in Database2
to be the same order as the rows in TableABC in Database 1 ???

You can't.

According to the SQL standard, a table is an *unordered* collection
of rows, and the results of any query are produced in an
unspecified order (unless you use ORDER BY). The ambiguity about
row ordering is intentional and is exploited by most DBMSes
including Postgres to improve implementation efficiency. If you
assume there is such a thing as a specific ordering within a table,
you'll live to regret it eventually.

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

And yeah I know that once you start deleting/updating row you may
lose the advantage you gained betting on some peculiarity of the
implementation... but in case you're dealing with a mostly static
table?

eg. if I'm importing a table does it make any sense to pre-sort it
before importing it in postgres?

Okay. Let me explain this again
Lets say you load the data related to a particular person sequentially into
a table in the test database. This results in all of the data for any one
person being located one or a very few sequential data blocks. Testing
access to the person's data then reveals access to be very fast. However,
the data is never loaded in that way in the production database. It is
almost always spread out across many data blocks within the database,
roughly organized by the date-time in which the data arrived.In this case
access to a particular person's data is not as fast as compared to the
previous one where the data are located close to each other.

we have this problem when we compare Oracle's performance with postgres
since Oracle has index-only scan where it can access the data just from the
index when the query involves only indexed columns. But since postgres
currently doesn't implement index-only scan and goes to the heap for
fetching the tuples it becomes very slow when the data are shuffled

Let me know if it makes sense now
Thanks
josh

#16Josh Harrison
joshques@gmail.com
In reply to: Andrew Sullivan (#14)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 11:39 AM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:

On Wed, Jan 09, 2008 at 05:28:15PM +0100, Ivan Sergio Borgonovo wrote:

Does it make any sense *knowing* how the implementation works to load
records in a table in a specific order to improve performances?

Well, this is more or less what CLUSTER does. There are some cases where
happening to know about the order the table is in will yield happy
effects,
yes.

You are right. Sometimes when i cluster the table according to the
frequently accessed indexes then it makes queries pretty fast. But its not a
feasible solution always since some tables have more indexes which are
accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes. Index-only scan
is a good solution for this I guess for queries involving indexed columns
(like in oracle) !!!

josh

#17Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Josh Harrison (#12)
Re: quick question abt pg_dump and restore

Josh Harrison <joshques@gmail.com> schrieb:

My questions
1. I pg_dumped dummy and Shuffled_dummy (from database1) to another database
(database2)
When I issued the query in both database (database1 and database2)

select * from dummy limit 1000 ( the planner chooses seq scan for this query)
select * from shuffled_dummy limit 1000 (planner chooses seq scan)

2. Also when does the planner switch from choosing index scan to bitmap index
scan? Is it dependent on the number of rows to be retrieved or the position of
the relevant data in the blocks or something else?

For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.

A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
(a index isn't useful, for instance, for a small table or when almost
all rows are in the result set)

A bitmap index scan performed when:
- 2 or more propper indexes available
- see above

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#18Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Josh Harrison (#16)
Re: quick question abt pg_dump and restore

On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:

accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes.

Maybe not poor, but certainly not optimised.

Index-only scan is a good solution for this I guess for queries involving
indexed columns (like in oracle) !!!

I think I don't know what you mean by "index-only scan". Oracle can't be
returning you data just by scanning an index, can it? It needs to get the
tuple if you need it back.

My bet is that you haven't tuned your vacuums correctly, or you aren't doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).

A

#19Josh Harrison
joshques@gmail.com
In reply to: Andreas Kretschmer (#17)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 11:56 AM, Andreas Kretschmer <akretschmer@spamfence.net>
wrote:

Josh Harrison <joshques@gmail.com> schrieb:

My questions
1. I pg_dumped dummy and Shuffled_dummy (from database1) to another

database

(database2)
When I issued the query in both database (database1 and database2)

select * from dummy limit 1000 ( the planner chooses seq scan for this

query)

select * from shuffled_dummy limit 1000 (planner chooses seq scan)

2. Also when does the planner switch from choosing index scan to bitmap

index

scan? Is it dependent on the number of rows to be retrieved or the

position of

the relevant data in the blocks or something else?

For a select * ... without a WHERE the db can't use an index, this query
forced a seq-scan.

A index is used when:
- a index are created
- a propper WHERE or ORDER BY in the query
- this index is useful
(a index isn't useful, for instance, for a small table or when almost
all rows are in the result set)

A bitmap index scan performed when:
- 2 or more propper indexes available
- see above

Thanks

In my database, I have a table 'person' containing roughly 30,000,000
records

explain select count(*) from person where person_id > 1147000000
QUERY
PLAN

------------------------------------------------------------------------------------------------------------

Aggregate (cost=307708.20..307708.21 rows=1
width=0)
-> Index Scan using person_pk on person
(cost=0.00..307379.79rows=131364 width=0)
Index Cond: (person_id >
1147000000::numeric)

3 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
This returns the result
count
--------
78718

But for this query where the condition is slightly different the query plan
is different. The planner goes for bitmap index

explain select count(*) from person where person_id > 1146000000
QUERY
PLAN

-----------------------------------------------------------------------------------------------

Aggregate (cost=342178.51..342178.52 rows=1
width=0)
-> Bitmap Heap Scan on person (cost=3120.72..341806.71 rows=148721
width=0)
Recheck Cond: (person_id >
1146000000::numeric)
-> Bitmap Index Scan on person_pk (cost=0.00..3083.53 rows=148721
width=0)
Index Cond: (person_id >
1146000000::numeric)

5 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms]
and the result is
count
--------
90625

How does the planner choose the plan?
josh

#20Josh Harrison
joshques@gmail.com
In reply to: Andrew Sullivan (#18)
Re: quick question abt pg_dump and restore

On Jan 9, 2008 12:11 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:

On Wed, Jan 09, 2008 at 11:51:16AM -0500, Josh Harrison wrote:

accessed frequently. So clustering the table according to one index will
yield poor performance to queries involving other indexes.

Maybe not poor, but certainly not optimised.

Index-only scan is a good solution for this I guess for queries

involving

indexed columns (like in oracle) !!!

I think I don't know what you mean by "index-only scan". Oracle can't be
returning you data just by scanning an index, can it? It needs to get the
tuple if you need it back.

For example if I have a query like
select column2 from ABC where column1 > 20
and table ABC is indexed on (column1,column2) then Oracle will not goto the
heap to fetch the tuples. It will return them from the index itself since
the column information is available in the index. But postgres always goes
to the heap even if it has all the queried columns in the index.
For queries that involves all the columns(like select * from ABC where
column1>20) ... of course oracle goes to the heap to fetch them

My bet is that you haven't tuned your vacuums correctly, or you aren't
doing
ANALYSE often enough on the affected tables, or you need to SET STATISTICS
higher on some of the columns in order to get better estimates (and
therefore better plans).

I had vacuumed and analysed the tables ...tried increasing the statistics
too. But the performance compared to oracle for these types of queries(that
i had mentioned above) is pretty slow

josh

#21Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Josh Harrison (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Josh Harrison (#19)