Generating sample data

Started by Rich Shepardover 9 years ago15 messagesgeneral
Jump to latest
#1Rich Shepard
rshepard@appl-ecosys.com

My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Greg Navis
contact@gregnavis.com
In reply to: Rich Shepard (#1)
Re: Generating sample data

In the Ruby land there's a gem called faker
<https://github.com/stympy/faker&gt; that allows you to generate fake data.
However, I'm not sure it can generate data based on a schema so a little
bit of scripting my be necessary. Would this approach work for you?

Yours
Greg

#3Steve Crawford
scrawford@pinpointresearch.com
In reply to: Rich Shepard (#1)
Re: Generating sample data

You could start here:
http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/

I have rolled my own on occasion by just pulling some public lists of most
common given names and family names and toing a full-join. Same for city,
streets, etc.

-Steve

On Tue, Dec 27, 2016 at 11:23 AM, Rich Shepard <rshepard@appl-ecosys.com>
wrote:

Show quoted text

My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Rich Shepard
rshepard@appl-ecosys.com
In reply to: Greg Navis (#2)
Re: Generating sample data

On Tue, 27 Dec 2016, Greg Navis wrote:

In the Ruby land there's a gem called faker
<https://github.com/stympy/faker&gt; that allows you to generate fake data.
However, I'm not sure it can generate data based on a schema so a little
bit of scripting my be necessary. Would this approach work for you?

Greg,

I work in Python, not Ruby, so this might be too big of a hurdle.

Thanks,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Steve Crawford
scrawford@pinpointresearch.com
In reply to: Steve Crawford (#3)
Re: Generating sample data

On Tue, Dec 27, 2016 at 12:01 PM, Steve Crawford <
scrawford@pinpointresearch.com> wrote:

You could start here:
http://www.softwaretestingmagazine.com/tools/open-source-test-data-
generators/

I have rolled my own on occasion by just pulling some public lists of most
common given names and family names and toing a full-join. Same for city,
streets, etc.

-Steve

On Tue, Dec 27, 2016 at 11:23 AM, Rich Shepard <rshepard@appl-ecosys.com>
wrote:

My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm
developing.
My web search finds a bunch of pricey (IMO) commercial products.

Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Sorry, "doing" a full-join. Which also leads to lots of fun cross-cultural
names like "Muhammad Wang" and "Santiago O'Leary".

Cheers,
Steve

#6Rich Shepard
rshepard@appl-ecosys.com
In reply to: Steve Crawford (#3)
Re: Generating sample data

On Tue, 27 Dec 2016, Steve Crawford wrote:

You could start here:
http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/

I have rolled my own on occasion by just pulling some public lists of most
common given names and family names and toing a full-join. Same for city,
streets, etc.

Steve,

Thanks very much for the URL. One application is small (7 tables), the
other is three times that size (23 tables). If I need to find public domain
data on the Web, I'll do that.

Much appreciated,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#4)
Re: Generating sample data

On 12/27/2016 12:03 PM, Rich Shepard wrote:

On Tue, 27 Dec 2016, Greg Navis wrote:

In the Ruby land there's a gem called faker
<https://github.com/stympy/faker&gt; that allows you to generate fake data.
However, I'm not sure it can generate data based on a schema so a little
bit of scripting my be necessary. Would this approach work for you?

Greg,

I work in Python, not Ruby, so this might be too big of a hurdle.

As it happens there is a Python version of the a fore mentioned faker:

https://pypi.python.org/pypi/Faker/0.7.7

It was I use to generate fake/sample data.

Thanks,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Adrian Klaver (#7)
Re: Generating sample data

On 12/27/2016 02:23 PM, Adrian Klaver wrote:

On 12/27/2016 12:03 PM, Rich Shepard wrote:

On Tue, 27 Dec 2016, Greg Navis wrote:

In the Ruby land there's a gem called faker
<https://github.com/stympy/faker&gt; that allows you to generate fake data.
However, I'm not sure it can generate data based on a schema so a little
bit of scripting my be necessary. Would this approach work for you?

Greg,

I work in Python, not Ruby, so this might be too big of a hurdle.

As it happens there is a Python version of the a fore mentioned faker:

https://pypi.python.org/pypi/Faker/0.7.7

It was I use to generate fake/sample data.

Ugh.

It is what I use to generate fake/sample data.

Memo to self: Do one thing at a time!

Thanks,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Rich Shepard (#6)
Re: Generating sample data

On 12/27/2016 12:06 PM, Rich Shepard wrote:

On Tue, 27 Dec 2016, Steve Crawford wrote:

You could start here:
http://www.softwaretestingmagazine.com/tools/open-source-test-data-generators/

I have rolled my own on occasion by just pulling some public lists of
most
common given names and family names and toing a full-join. Same for city,
streets, etc.

Steve,

Thanks very much for the URL. One application is small (7 tables), the
other is three times that size (23 tables). If I need to find public domain
data on the Web, I'll do that.

What sort of data do you want to create?

If it is the standard contact information then the previously mentioned
tools are sufficient.

If it is data specific to a field of study then things might get trickier.

Much appreciated,

Rich

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#9)
Re: Generating sample data

On Tue, 27 Dec 2016, Adrian Klaver wrote:

What sort of data do you want to create?

Adrian,

Various text, date, and numeric values.

If it is data specific to a field of study then things might get trickier.

It's not a common database. I'll probably need to cobble together generic
data of the appropriate types myself.

Thanks,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#7)
Re: Generating sample data

On Tue, 27 Dec 2016, Adrian Klaver wrote:

As it happens there is a Python version of the a fore mentioned faker:
https://pypi.python.org/pypi/Faker/0.7.7
It was I use to generate fake/sample data.

Adrian,

Aha! That's a great start for me.

Many thanks,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Rich Shepard
rshepard@appl-ecosys.com
In reply to: Adrian Klaver (#7)
Re: Generating sample data

On Tue, 27 Dec 2016, Adrian Klaver wrote:

As it happens there is a Python version of the a fore mentioned faker:
https://pypi.python.org/pypi/Faker/0.7.7

Adrian,

Impressive and complete. It will generate all the data I need.

Many thanks,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Berend Tober
btober@broadstripe.net
In reply to: Rich Shepard (#12)
Re: Generating sample data

----- Original Message -----

From: "Rich Shepard" <rshepard@appl-ecosys.com>
To: pgsql-general@postgresql.org
Sent: Tuesday, December 27, 2016 7:23:46 PM
Subject: Re: [GENERAL] Generating sample data

On Tue, 27 Dec 2016, Adrian Klaver wrote:

As it happens there is a Python version of the a fore mentioned faker:
https://pypi.python.org/pypi/Faker/0.7.7

Adrian,

Impressive and complete. It will generate all the data I need.

This is kind of fun:

https://github.com/bmtober/groan

I had to hunt down the original author from the 1990's, which was when I originally downloaded from his personal web site at

http://raingod.com/raingod/resources/Programming/Perl/Software/Groan/

The initial commit on that github page is the original source as provided by Mr. McIntyre.

In a subsequent commit, I removed some of the original code that formatted for HTML output, leaving just plain text, and also posted an example grammar for generating fake names and strings that look like social security numbers (i.e., a U.S. taxpayer identification).

The script will generate duplicates, but you can do something like

for n in {1..20}
do
groan.pl ssn.gn
done | sort -u

to get unique source data.

By defining other custom grammars, you could potentially generate all kinds of data.

-- B

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Rich Shepard (#1)
Re: Generating sample data

Hi,

Not open source, but also not pricey (IMO): Advanced Data Generator.
http://www.upscene.com/advanced_data_generator/

Generates e-mail addresses, street names, first & last names, company names,
complex relationships etc.

And yes, this is our product. ;)

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

My previous databases used real client (or my own) data; now I want to
generate sample data for the tables in the two applications I'm developing.
My web search finds a bunch of pricey (IMO) commercial products.

Are there any open source data generators that can provide sample data
based on each table's schema?

TIA,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Rich Shepard
rshepard@appl-ecosys.com
In reply to: Martijn Tonies (Upscene Productions) (#14)
Re: Generating sample data

On Wed, 28 Dec 2016, Martijn Tonies (Upscene Productions) wrote:

Not open source, but also not pricey (IMO): Advanced Data Generator.
http://www.upscene.com/advanced_data_generator/

Generates e-mail addresses, street names, first & last names, company names,
complex relationships etc.

And yes, this is our product. ;)

Martijn,

Thank you for making me aware of your company and product. However, after
20 years of using only F/OSS to run my business (and personal computing)
needs and contributing to several open source projects along the way my
preference is to use such tools. When I get the large database application
up and running I'll post it on github and turn it loose into the F/OSS world
under the GPL.

Regards,

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general