US Census database (Tiger 2004FE)
I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?
It's a *LOT* of data (about 40+ Gig in PostgreSQL)
On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?It's a *LOT* of data (about 40+ Gig in PostgreSQL)
Sure. Got a torrent?
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778
Remember to vote!
David Fetter wrote:
On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?It's a *LOT* of data (about 40+ Gig in PostgreSQL)
Sure. Got a torrent?
How big is it when dumped and compressed?
cheers
andrew
Wow! a lot of people seem to want it!
I am dumping out with pg_dump right now, it may take a few hours.
It is in PostgreSQL 8.0.3
Does anyone have access to a high bandwidth server? I could mail it on a
DVD to someone who would host it.
Show quoted text
David Fetter wrote:
On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
I just finished converting and loading the US census data into
PostgreSQL
would anyone be interested in it for testing purposes?It's a *LOT* of data (about 40+ Gig in PostgreSQL)
Sure. Got a torrent?
How big is it when dumped and compressed?
cheers
andrew
* Mark Woodward (pgsql@mohawksoft.com) wrote:
I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?It's a *LOT* of data (about 40+ Gig in PostgreSQL)
How big dumped & compressed? I may be able to host it depending on how
big it ends up being...
Stephen
* Mark Woodward (pgsql@mohawksoft.com) wrote:
I just finished converting and loading the US census data into
PostgreSQL
would anyone be interested in it for testing purposes?It's a *LOT* of data (about 40+ Gig in PostgreSQL)
How big dumped & compressed? I may be able to host it depending on how
big it ends up being...
It's been running for about an hour now, and it is up to 3.3G.
pg_dump tiger | gzip > tiger.pgz
I'll let you know. Hopefully, it will fit on DVD.
You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)
* Mark Woodward (pgsql@mohawksoft.com) wrote:
How big dumped & compressed? I may be able to host it depending on how
big it ends up being...It's been running for about an hour now, and it is up to 3.3G.
Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.
pg_dump tiger | gzip > tiger.pgz
What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form? Fun questions, all of them. :)
I'll let you know. Hopefully, it will fit on DVD.
I guess your upload pipe isn't very big? snail-mail is slow... :)
You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)
Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well. It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.
Thanks,
Stephen
Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward:
* Mark Woodward (pgsql@mohawksoft.com) wrote:
I just finished converting and loading the US census data into
PostgreSQL
would anyone be interested in it for testing purposes?It's a *LOT* of data (about 40+ Gig in PostgreSQL)
How big dumped & compressed? I may be able to host it depending on how
big it ends up being...It's been running for about an hour now, and it is up to 3.3G.
pg_dump tiger | gzip > tiger.pgz
I'll let you know. Hopefully, it will fit on DVD.
You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)
pg_dump -v maybe? ;) *hint hint*
--
Tino Wildenhain <tino@wildenhain.de>
* Mark Woodward (pgsql@mohawksoft.com) wrote:
How big dumped & compressed? I may be able to host it depending on
how
big it ends up being...
It's been running for about an hour now, and it is up to 3.3G.
Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.
Cool.
pg_dump tiger | gzip > tiger.pgz
What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form? Fun questions, all of them. :)
8.0.3, in simple pg_dump form.
I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:
# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I RT
4:I VERSION
10:T TLID
3:S RTSQ
4:Z ZIP4L
4:Z ZIP4R
The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.
I'll let you know. Hopefully, it will fit on DVD.
I guess your upload pipe isn't very big? snail-mail is slow... :)
Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.
Show quoted text
You know, ... maybe pg_dump needs a progress bar? (How would it do that,
I
wonder?)Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well. It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.Thanks,
Stephen
It's been running for about an hour now, and it is up to 3.3G.
pg_dump tiger | gzip > tiger.pgz
| bzip2 > tiger.sql.bz2 :)
Chris
It's been running for about an hour now, and it is up to 3.3G.
pg_dump tiger | gzip > tiger.pgz
| bzip2 > tiger.sql.bz2 :)
I find bzip2 FAR SLOWER than the gain in compression.
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
You can send it to me, and ehpg will host it. I'll send you a
private email with my info.
Gavin
On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?
Gavin M. Roy
800 Pound Gorilla
gmr@ehpg.net
Mark Woodward wrote:
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
Would it be easier to release the program you did to do
this conversion?
I use this pretty short (274 line) C program:
http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.
The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.
I thought bout it, but it isn't the best program around, but it does work.
My program also reformats numbers, i.e. long/lat become properly
decimal-ed numerics, zips become integers, etc.
The question is...
Do you download the raw data and convert it into a database, or do you
download the pre-formatted database?
I would say the preformated database is easier to manage. There are
hundreds of individual zips files, in each of those files 10 or so data
files.
Show quoted text
Mark Woodward wrote:
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
Would it be easier to release the program you did to do
this conversion?I use this pretty short (274 line) C program:
http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
You can send it to me, and ehpg will host it. I'll send you a
private email with my info.Gavin
On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
I'm wondering if this is now available for consumption by the rest of us??
(ie what's the link)
--
Darcy Buskermolen
Wavefire Technologies Corp.
http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759
Darcy Buskermolen wrote:
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
You can send it to me, and ehpg will host it. I'll send you a
private email with my info.Gavin
On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
Command Prompt would be willing to host it.
Sincerely,
Joshua D. Drake
I'm wondering if this is now available for consumption by the rest of us??
(ie what's the link)
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/