Using the right tool

Started by Nathan Hopperalmost 24 years ago5 messagesgeneral

nathanh@broszengineering.com

almost 24 years ago

I spent about 12 hours crawling over Google searches, Usenet archives, and a
great deal of GIS matter without finding much in the way of solid answers.
Perhaps someone on this list can provide a bit of insight.

I'm in the need of cataloging a growing collection of USGS satellite
imagery. That in itself has been handled rather handily using flat files,
but with a recent decision to acquire non-USGS data (potentially tens of
thousands of images) covering geographical regions, the need for a reliable
catalog is paramount. You can imaging how easily this would become a
management nightmare. This data will be tagged, cataloged, and filed by me,
rather than the easy-to-handle USGS data that has a common packaging and
naming style.

Most people who create an image library are doing so for web-based
applications, usually, it seems, using MySQL. While I have no convictions to
either database (it's just a tool, so the end is more important than the
means), I've decided to use Postgres since it has strong ties to the GIS
community.

But for this task, I've encountered little in the way of stories from users
who have used Postgres to catalog image data, despite fairly intense
digging. The thing is, these are not small images by any means. So what
works well for a few thousand < 250k JPEGS might not work so well for a few
thousand 25-125 meg TIFFs and SID files.

Can anyone suggest some references for storage of binary objects of this
magnitude (100 gig)? While it isn't particularly important to store the file
in the database itself, it sure would simplify things -- that way if the
file is moved, the db link isn't broken. Rather than reinvent the wheel and
build a database and interface solution, is any information available on
implementations such as this?

Regards,
Nathan Hopper

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Nathan Hopper (#1)

Re: Using the right tool

"Nathan Hopper" <nathanh@broszengineering.com> writes:

But for this task, I've encountered little in the way of stories from users
who have used Postgres to catalog image data, despite fairly intense
digging. The thing is, these are not small images by any means. So what
works well for a few thousand < 250k JPEGS might not work so well for a few
thousand 25-125 meg TIFFs and SID files.
Can anyone suggest some references for storage of binary objects of this
magnitude (100 gig)?

Nathan, I'd suggest talking to the guys at Refractions Inc (Dave Blasby
and Paul Ramsey). I'm pretty sure they have direct experience with
exactly this sort of thing ... but I dunno how closely they follow the
PG mailing lists, so they might not see your request.

regards, tom lane

Joe Murphy

joe.murphy@aersoft.ie

almost 24 years ago

In reply to: Nathan Hopper (#1)

Vacuum and indexes problem

I'm running a simple query on a simple table (see create syntax below).

before running vacuum on the table explain tells me that the index
"mytable_id_name_idx" is being used
after running vacuum on the table explain tells me that a sequential scan is
being used.
If I run reindex, I'm back to the index being used.

Any ideas why this is happening?

PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00)

Output of the commands below

CREATE TABLE mytable (
id INT NOT NULL,
name TEXT NOT NULL,
num INT NOT NULL,
answer INT NOT NULL,
field1 INT,
field2 INT,
field3 TEXT,
field4 TEXT
);

CREATE INDEX mytable_id_name_idx ON mytable (id,name);
CREATE INDEX mytable_num_idx ON mytable (num);
CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num);

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE: QUERY PLAN:

Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1
width=116)

EXPLAIN

aw_db_joe_1=# vacuum mytable;
VACUUM

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE: QUERY PLAN:

Seq Scan on mytable (cost=0.00..0.00 rows=1 width=116)

EXPLAIN

aw_db_joe_1=# reindex table mytable;
REINDEX

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE: QUERY PLAN:

Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1
width=116)

EXPLAIN

Martijn van Oosterhout

kleptog@svana.org

almost 24 years ago

In reply to: Joe Murphy (#3)

Re: Vacuum and indexes problem

Firstly, how many rows in the table? If it's less than a few dozen, a seq
scan is the right answer. Secondly, vacuum analyse is usually recommended
reasonably often.

HTH,

On Mon, Sep 09, 2002 at 03:04:30PM +0100, Joe Murphy wrote:

I'm running a simple query on a simple table (see create syntax below).

before running vacuum on the table explain tells me that the index
"mytable_id_name_idx" is being used
after running vacuum on the table explain tells me that a sequential scan is
being used.
If I run reindex, I'm back to the index being used.

Any ideas why this is happening?

PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00)

Output of the commands below

CREATE TABLE mytable (
id INT NOT NULL,
name TEXT NOT NULL,
num INT NOT NULL,
answer INT NOT NULL,
field1 INT,
field2 INT,
field3 TEXT,
field4 TEXT
);

CREATE INDEX mytable_id_name_idx ON mytable (id,name);
CREATE INDEX mytable_num_idx ON mytable (num);
CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num);

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE: QUERY PLAN:

Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1
width=116)

EXPLAIN

aw_db_joe_1=# vacuum mytable;
VACUUM

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE: QUERY PLAN:

Seq Scan on mytable (cost=0.00..0.00 rows=1 width=116)

EXPLAIN

aw_db_joe_1=# reindex table mytable;
REINDEX

aw_db_joe_1=# explain select * from mytable where id = 1 and name = 'john';
NOTICE: QUERY PLAN:

Index Scan using mytable_id_name_idx on mytable (cost=0.00..4.83 rows=1
width=116)

EXPLAIN

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

Joe Murphy

joe.murphy@aersoft.ie

almost 24 years ago

In reply to: Nathan Hopper (#1)

Re: Vacuum and indexes problem

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
There were fewer than a dozen rows OK, I've tested with ~1000 and indexes
are used.
 Thanks.
Martijn van Oosterhout wrote:
<blockquote TYPE=CITE>Firstly, how many rows in the table? If it's less
than a few dozen, a seq
 scan is the right answer. Secondly, vacuum analyse is usually recommended
 reasonably often.
HTH,
On Mon, Sep 09, 2002 at 03:04:30PM +0100, Joe Murphy wrote:
 > I'm running a simple query on a simple table (see create syntax below).
 >
 > before running vacuum on the table explain tells me that the index
 > "mytable_id_name_idx" is being used
 > after running vacuum on the table explain tells me that a sequential
scan is
 > being used.
 > If I run reindex, I'm back to the index being used.
 >
 > Any ideas why this is happening?
 >
 > PG 7.2.1 on Solaris 2.6 (and HP-UX 11.00)
 >
 > Output of the commands below
 >
 >
 > CREATE TABLE mytable (
 >         id         
INT NOT NULL,
 >         name       
TEXT NOT NULL,
 >         num    
INT NOT NULL,
 >         answer     
INT NOT NULL,
 >         field1    
INT,
 >         field2    
INT,
 >         field3    
TEXT,
 >         field4    
TEXT
 >         );
 >
 > CREATE INDEX mytable_id_name_idx ON mytable (id,name);
 > CREATE INDEX mytable_num_idx ON mytable (num);
 > CREATE UNIQUE INDEX mytable_id_num_idx ON mytable (id,num);
 >
 >
 > aw_db_joe_1=# explain select * from mytable where id = 1 and name
= 'john';
 > NOTICE:  QUERY PLAN:
 >
 > Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83
rows=1
 > width=116)
 >
 > EXPLAIN
 >
 > aw_db_joe_1=# vacuum mytable;
 > VACUUM
 >
 > aw_db_joe_1=# explain select * from mytable where id = 1 and name
= 'john';
 > NOTICE:  QUERY PLAN:
 >
 > Seq Scan on mytable  (cost=0.00..0.00 rows=1 width=116)
 >
 > EXPLAIN
 >
 > aw_db_joe_1=# reindex table mytable;
 > REINDEX
 >
 > aw_db_joe_1=# explain select * from mytable where id = 1 and name
= 'john';
 > NOTICE:  QUERY PLAN:
 >
 > Index Scan using mytable_id_name_idx on mytable  (cost=0.00..4.83
rows=1
 > width=116)
 >
 > EXPLAIN
 >
 >
 > ---------------------------(end of broadcast)---------------------------
 > TIP 3: if posting/reading through Usenet, please send an appropriate
 > subscribe-nomail command to majordomo@postgresql.org so that your
 > message can get through to the mailing list cleanly
--
 Martijn van Oosterhout   <kleptog@svana.org> 
<a href="http://svana.org/kleptog/">http://svana.org/kleptog/</a>
 > There are 10 kinds of people in the world, those that can do binary
 > arithmetic and those that can't.</blockquote>

<pre>-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        Joe Murphy - AerSoft Limited
             2 Northumberland Avenue, Dun Laoghaire, Co. Dublin.            
  phone: +353-1-2301166     direct: +353-1-2145953     fax: +353-1-2301167  
  <A HREF="mailto:joe@aersoft.com">mailto:joe@aersoft.com</A>   mobile: +353-86-8526181  <A HREF="http://www.aersoft.com">http://www.aersoft.com</A>&nbsp;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~</pre>
 </html>