Creating files with testdata

Started by Habout 1 year ago13 messagesgeneral
Jump to latest
#1H
agents@meddatainc.com

I am developing a complex multi-tenant application in postgresql 16 in Linux. During the development I would like to be able to enter test data into various related tables at any given time for testing purposes.

While this can certainly be accomplished by predefined CTE INSERT statements in an SQL file, I would prefer something more flexible. I am thinking of using a markdown file as a source for these CTE INSERT statements. Specifically, having a certain structure in the markdown file where a given markdown heading level, bullet level etc. would correspond to specific tables and columns.

After entering my test data into the markdown file for the given test scenario, I would then run an awk script or similar to create a SQL file with the various CTE INSERT statements. Howevever, it gets complex since I need to handle 1:N relationships between tables in the markdown file...

I hope the above outline is understandable and am interested in comments and thoughts on my above approach.

Thanks.

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: H (#1)
Re: Creating files with testdata

On 3/10/25 11:09 AM, H wrote:

I am developing a complex multi-tenant application in postgresql 16 in Linux. During the development I would like to be able to enter test data into various related tables at any given time for testing purposes.

While this can certainly be accomplished by predefined CTE INSERT statements in an SQL file, I would prefer something more flexible. I am thinking of using a markdown file as a source for these CTE INSERT statements. Specifically, having a certain structure in the markdown file where a given markdown heading level, bullet level etc. would correspond to specific tables and columns.

Why CTE INSERTs?

After entering my test data into the markdown file for the given test scenario, I would then run an awk script or similar to create a SQL file with the various CTE INSERT statements. Howevever, it gets complex since I need to handle 1:N relationships between tables in the markdown file...

I hope the above outline is understandable and am interested in comments and thoughts on my above approach.

Thanks.

--
Adrian Klaver
adrian.klaver@aklaver.com

#3H
agents@meddatainc.com
In reply to: Adrian Klaver (#2)
Re: Creating files with testdata

On March 10, 2025 2:26:48 PM GMT-04:00, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 3/10/25 11:09 AM, H wrote:

I am developing a complex multi-tenant application in postgresql 16

in Linux. During the development I would like to be able to enter test
data into various related tables at any given time for testing
purposes.

While this can certainly be accomplished by predefined CTE INSERT

statements in an SQL file, I would prefer something more flexible. I am
thinking of using a markdown file as a source for these CTE INSERT
statements. Specifically, having a certain structure in the markdown
file where a given markdown heading level, bullet level etc. would
correspond to specific tables and columns.

Why CTE INSERTs?

After entering my test data into the markdown file for the given test

scenario, I would then run an awk script or similar to create a SQL
file with the various CTE INSERT statements. Howevever, it gets complex
since I need to handle 1:N relationships between tables in the markdown
file...

I hope the above outline is understandable and am interested in

comments and thoughts on my above approach.

Thanks.

There are tables referencing each other using randomly generated IDs, ie. those IDs are not known until after the parent table row is inserted.

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: H (#3)
Re: Creating files with testdata

On Mon, Mar 10, 2025 at 12:17 PM H <agents@meddatainc.com> wrote:

There are tables referencing each other using randomly generated IDs, ie.
those IDs are not known until after the parent table row is inserted.

I just reserve half of the number space of bigint, the negatives, for test
data and assign known IDs as part of the test data specification.

David J.

#5Francisco Olarte
folarte@peoplecall.com
In reply to: H (#1)
Re: Creating files with testdata

On Mon, 10 Mar 2025 at 19:17, H <agents@meddatainc.com> wrote:
...

After entering my test data into the markdown file for the given test scenario, I would then run an awk script or similar to create a SQL file with the various CTE INSERT statements. Howevever, it gets complex since I need to handle 1:N relationships between tables in the markdown file...
I hope the above outline is understandable and am interested in comments and thoughts on my above approach.

I do not think MD would be a good source format. Being a developer I
would recommend taking your favorite scripting language ( perl /
python / lua / js , whatever ), build a big object ( which can be
easily helped by some mini-functions to build a little DSL ) and then
spit sql from it ( for developemewnt it is normally better then using
whatever db connection your language has, as it leaves an
understandable sql script ).

I have done this with perl for some projects, built a driver which
defined several helper functions, then dofile("xx.dat") which returned
a big hash and then a series of loops on the result to write the SQL
in whatever order was neccessary.

Francisco Olarte.

#6Francisco Olarte
folarte@peoplecall.com
In reply to: H (#3)
Re: Creating files with testdata

On Mon, 10 Mar 2025 at 20:18, H <agents@meddatainc.com> wrote:

There are tables referencing each other using randomly generated IDs, ie. those IDs are not known until after the parent table row is inserted.

Random? Then they CAN collide. ;->

Do the IDs need to be random or are they just the default value?

I've encountered a similar problem with sequences and have bypassed it
by using explicit values into the test data, i.e.m, inserting an
invoice with id 1000 followed by its lines with IDs 100010, 100011,
.... and then setting start_values for the sequences at 99999999. This
helped me a lot in a debugging. You could do the same depending on
what domain your random ids are on.

Francisco Olarte.

#7H
agents@meddatainc.com
In reply to: Francisco Olarte (#6)
Re: Creating files with testdata

On March 10, 2025 3:27:02 PM GMT-04:00, Francisco Olarte <folarte@peoplecall.com> wrote:

On Mon, 10 Mar 2025 at 20:18, H <agents@meddatainc.com> wrote:

There are tables referencing each other using randomly generated IDs,

ie. those IDs are not known until after the parent table row is
inserted.

Random? Then they CAN collide. ;->

Do the IDs need to be random or are they just the default value?

I've encountered a similar problem with sequences and have bypassed it
by using explicit values into the test data, i.e.m, inserting an
invoice with id 1000 followed by its lines with IDs 100010, 100011,
.... and then setting start_values for the sequences at 99999999. This
helped me a lot in a debugging. You could do the same depending on
what domain your random ids are on.

Francisco Olarte.

K am using UUIDs which the application will use as well.

#8H
agents@meddatainc.com
In reply to: Francisco Olarte (#5)
Re: Creating files with testdata

On March 10, 2025 3:22:41 PM GMT-04:00, Francisco Olarte <folarte@peoplecall.com> wrote:

On Mon, 10 Mar 2025 at 19:17, H <agents@meddatainc.com> wrote:
...

After entering my test data into the markdown file for the given test

scenario, I would then run an awk script or similar to create a SQL
file with the various CTE INSERT statements. Howevever, it gets complex
since I need to handle 1:N relationships between tables in the markdown
file...

I hope the above outline is understandable and am interested in

comments and thoughts on my above approach.

I do not think MD would be a good source format. Being a developer I
would recommend taking your favorite scripting language ( perl /
python / lua / js , whatever ), build a big object ( which can be
easily helped by some mini-functions to build a little DSL ) and then
spit sql from it ( for developemewnt it is normally better then using
whatever db connection your language has, as it leaves an
understandable sql script ).

I have done this with perl for some projects, built a driver which
defined several helper functions, then dofile("xx.dat") which returned
a big hash and then a series of loops on the result to write the SQL
in whatever order was neccessary.

Francisco Olarte.

The plan is to use the same format for later adding information into the finished application. By the way, all information is text, either paragraphs, sentences or single words.

#9H
agents@meddatainc.com
In reply to: David G. Johnston (#4)
Re: Creating files with testdata

On March 10, 2025 3:21:57 PM GMT-04:00, "David G. Johnston" <david.g.johnston@gmail.com> wrote:

On Mon, Mar 10, 2025 at 12:17 PM H <agents@meddatainc.com> wrote:

There are tables referencing each other using randomly generated IDs,

ie.

those IDs are not known until after the parent table row is inserted.

I just reserve half of the number space of bigint, the negatives, for
test
data and assign known IDs as part of the test data specification.

David J.

The finished application will also use UUIDs which is why am using it now.

#10H
agents@meddatainc.com
In reply to: H (#8)
Re: Creating files with testdata

On March 10, 2025 5:09:46 PM GMT-04:00, H <agents@meddatainc.com> wrote:

On March 10, 2025 3:22:41 PM GMT-04:00, Francisco Olarte
<folarte@peoplecall.com> wrote:

On Mon, 10 Mar 2025 at 19:17, H <agents@meddatainc.com> wrote:
...

After entering my test data into the markdown file for the given

test

scenario, I would then run an awk script or similar to create a SQL
file with the various CTE INSERT statements. Howevever, it gets

complex

since I need to handle 1:N relationships between tables in the

markdown

file...

I hope the above outline is understandable and am interested in

comments and thoughts on my above approach.

I do not think MD would be a good source format. Being a developer I
would recommend taking your favorite scripting language ( perl /
python / lua / js , whatever ), build a big object ( which can be
easily helped by some mini-functions to build a little DSL ) and then
spit sql from it ( for developemewnt it is normally better then using
whatever db connection your language has, as it leaves an
understandable sql script ).

I have done this with perl for some projects, built a driver which
defined several helper functions, then dofile("xx.dat") which returned
a big hash and then a series of loops on the result to write the SQL
in whatever order was neccessary.

Francisco Olarte.

The plan is to use the same format for later adding information into
the finished application. By the way, all information is text, either
paragraphs, sentences or single words.

I am also contemplating using Pandoc to process the markdown file. Pandoc builds an AST which can then be processed as desired. Awk processes a file line-by-line which is not ideal.

#11H
agents@meddatainc.com
In reply to: H (#3)
Re: Creating files with testdata

On March 10, 2025 3:15:10 PM EDT, H <agents@meddatainc.com> wrote:

On March 10, 2025 2:26:48 PM GMT-04:00, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 3/10/25 11:09 AM, H wrote:

I am developing a complex multi-tenant application in postgresql 16

in Linux. During the development I would like to be able to enter test
data into various related tables at any given time for testing
purposes.

While this can certainly be accomplished by predefined CTE INSERT

statements in an SQL file, I would prefer something more flexible. I

am

thinking of using a markdown file as a source for these CTE INSERT
statements. Specifically, having a certain structure in the markdown
file where a given markdown heading level, bullet level etc. would
correspond to specific tables and columns.

Why CTE INSERTs?

After entering my test data into the markdown file for the given

test

scenario, I would then run an awk script or similar to create a SQL
file with the various CTE INSERT statements. Howevever, it gets

complex

since I need to handle 1:N relationships between tables in the

markdown

file...

I hope the above outline is understandable and am interested in

comments and thoughts on my above approach.

Thanks.

There are tables referencing each other using randomly generated IDs,
ie. those IDs are not known until after the parent table row is
inserted.

I thought I should give an example of why I need this. This is /not/ the application but might be a useful example:

Imagine a very simple Wikipedia-clone. You have a couple of DB tables for article subjects in a couple of levels, a table for body text and a couple of different tables for references and for images related to the article in question. References and images are not referenced inline in the body text but at the end.

Everything is generated manually, ie the body text is written, various references added, and a suitable subject area in different levels chosen, all by an author.

I want to be able to write this "article", with the appropriate references, images etc to be done outside this database application. My vision is it's done in a markdown document where a strict use of heading levels, bullet lists etc would correlate with subject areas, body text, references, images and what-nots.

A script of some kind that can parse markdown then generates the necessary SQL statements to insert all of this in the appropriate tables, all according to how the information was laid out in the markdown document.

My initial thought was to use awk but it's (awk)ward since it is strictly line/paragraph oriented. Pandoc parses markdown very well, generates an AST internally and then the output filter should be able to generate complex SQL statements such as CTEs.

Hope that might shed some more light on my thought process.

#13H
agents@meddatainc.com
In reply to: Thiemo Kellner (#12)