US Census database (Tiger 2004FE)

Started by Mark Woodwardover 20 years ago17 messages
#1Mark Woodward
pgsql@mohawksoft.com

I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

#2David Fetter
david@fetter.org
In reply to: Mark Woodward (#1)
Re: US Census database (Tiger 2004FE)

On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:

I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

Sure. Got a torrent?

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#3Andrew Dunstan
andrew@dunslane.net
In reply to: David Fetter (#2)
Re: US Census database (Tiger 2004FE)

David Fetter wrote:

On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:

I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

Sure. Got a torrent?

How big is it when dumped and compressed?

cheers

andrew

#4Mark Woodward
pgsql@mohawksoft.com
In reply to: Andrew Dunstan (#3)
Re: US Census database (Tiger 2004FE)

Wow! a lot of people seem to want it!
I am dumping out with pg_dump right now, it may take a few hours.
It is in PostgreSQL 8.0.3
Does anyone have access to a high bandwidth server? I could mail it on a
DVD to someone who would host it.

Show quoted text

David Fetter wrote:

On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:

I just finished converting and loading the US census data into
PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

Sure. Got a torrent?

How big is it when dumped and compressed?

cheers

andrew

#5Stephen Frost
sfrost@snowman.net
In reply to: Mark Woodward (#1)
Re: US Census database (Tiger 2004FE)

* Mark Woodward (pgsql@mohawksoft.com) wrote:

I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

How big dumped & compressed? I may be able to host it depending on how
big it ends up being...

Stephen

#6Mark Woodward
pgsql@mohawksoft.com
In reply to: Stephen Frost (#5)
Re: US Census database (Tiger 2004FE)

* Mark Woodward (pgsql@mohawksoft.com) wrote:

I just finished converting and loading the US census data into
PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

How big dumped & compressed? I may be able to host it depending on how
big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

I'll let you know. Hopefully, it will fit on DVD.

You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)

#7Stephen Frost
sfrost@snowman.net
In reply to: Mark Woodward (#6)
Re: US Census database (Tiger 2004FE)

* Mark Woodward (pgsql@mohawksoft.com) wrote:

How big dumped & compressed? I may be able to host it depending on how
big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.

pg_dump tiger | gzip > tiger.pgz

What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form? Fun questions, all of them. :)

I'll let you know. Hopefully, it will fit on DVD.

I guess your upload pipe isn't very big? snail-mail is slow... :)

You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)

Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well. It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.

Thanks,

Stephen

#8Tino Wildenhain
tino@wildenhain.de
In reply to: Mark Woodward (#6)
Re: US Census database (Tiger 2004FE)

Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward:

* Mark Woodward (pgsql@mohawksoft.com) wrote:

I just finished converting and loading the US census data into
PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)

How big dumped & compressed? I may be able to host it depending on how
big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

I'll let you know. Hopefully, it will fit on DVD.

You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)

pg_dump -v maybe? ;) *hint hint*

--
Tino Wildenhain <tino@wildenhain.de>

#9Mark Woodward
pgsql@mohawksoft.com
In reply to: Stephen Frost (#7)
Re: US Census database (Tiger 2004FE)

* Mark Woodward (pgsql@mohawksoft.com) wrote:

How big dumped & compressed? I may be able to host it depending on

how

big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.

Cool.

pg_dump tiger | gzip > tiger.pgz

What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form? Fun questions, all of them. :)

8.0.3, in simple pg_dump form.

I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:

# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I RT
4:I VERSION
10:T TLID
3:S RTSQ
4:Z ZIP4L
4:Z ZIP4R

The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.

I'll let you know. Hopefully, it will fit on DVD.

I guess your upload pipe isn't very big? snail-mail is slow... :)

Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.

Show quoted text

You know, ... maybe pg_dump needs a progress bar? (How would it do that,
I
wonder?)

Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well. It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.

Thanks,

Stephen

#10Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Mark Woodward (#6)
Re: US Census database (Tiger 2004FE)

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

| bzip2 > tiger.sql.bz2 :)

Chris

#11Mark Woodward
pgsql@mohawksoft.com
In reply to: Christopher Kings-Lynne (#10)
Re: US Census database (Tiger 2004FE)

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

| bzip2 > tiger.sql.bz2 :)

I find bzip2 FAR SLOWER than the gain in compression.

#12Mark Woodward
pgsql@mohawksoft.com
In reply to: Stephen Frost (#7)
US Census database (Tiger 2004FE) - 4.4G

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

#13Gavin M. Roy
gmr@ehpg.net
In reply to: Mark Woodward (#12)
Re: US Census database (Tiger 2004FE) - 4.4G

You can send it to me, and ehpg will host it. I'll send you a
private email with my info.

Gavin

On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Gavin M. Roy
800 Pound Gorilla
gmr@ehpg.net

#14Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Mark Woodward (#12)
Re: US Census database (Tiger 2004FE) - 4.4G

Mark Woodward wrote:

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

Would it be easier to release the program you did to do
this conversion?

I use this pretty short (274 line) C program:
http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.

The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.

#15Mark Woodward
pgsql@mohawksoft.com
In reply to: Ron Mayer (#14)
Re: US Census database (Tiger 2004FE) - 4.4G

I thought bout it, but it isn't the best program around, but it does work.
My program also reformats numbers, i.e. long/lat become properly
decimal-ed numerics, zips become integers, etc.

The question is...

Do you download the raw data and convert it into a database, or do you
download the pre-formatted database?

I would say the preformated database is easier to manage. There are
hundreds of individual zips files, in each of those files 10 or so data
files.

Show quoted text

Mark Woodward wrote:

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

Would it be easier to release the program you did to do
this conversion?

I use this pretty short (274 line) C program:
http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.

The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

#16Darcy Buskermolen
darcy@wavefire.com
In reply to: Gavin M. Roy (#13)
Re: US Census database (Tiger 2004FE) - 4.4G

On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:

You can send it to me, and ehpg will host it. I'll send you a
private email with my info.

Gavin

On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

I'm wondering if this is now available for consumption by the rest of us??

(ie what's the link)

--
Darcy Buskermolen
Wavefire Technologies Corp.

http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759

#17Joshua D. Drake
jd@commandprompt.com
In reply to: Darcy Buskermolen (#16)
Re: US Census database (Tiger 2004FE) - 4.4G

Darcy Buskermolen wrote:

On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:

You can send it to me, and ehpg will host it. I'll send you a
private email with my info.

Gavin

On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

Command Prompt would be willing to host it.

Sincerely,

Joshua D. Drake

I'm wondering if this is now available for consumption by the rest of us??

(ie what's the link)

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/