Process: From yuml to ddl and test data
I wanted to automate the process from data modelling to testing it with sample data in actual database.
Table of Contents
To describe the data models, I use a very simple domain specific language (DSL) called as yuml
. yuml
is developed and used by some UML tool called yuml.me.
Starting from yuml
the flow of information is as follows:
| yuml -> uml | yuml_to_uml.sh | yuml_to_uml.Rmd |
| yuml -> rdb | yuml_to_rdb.R | yuml_to_rdb.Rmd |
| rdb -> ddl | rdb_to_ddl.R | rdb_to_ddl.Rmd |
| ddl -> data | ddl_to_data.sh | ddl_to_data.Rmd | ex_sql_data_generator_datafiller.Rmd |
| rdb -> data | rdb_to_data.R | rdb_to_data.Rmd |
rdb
is short for “requirements database”. The name is actually a little misleading. Maybe, I should call it as “structural meta database”. rdb
files are some tsv/excel files that store meta data about the final product. I prefer to use excel
instead of tsv/csv
because it is easier to edit and view manually.
yuml -> uml
First step is converting yuml
text to uml
graphs. This is actually not required for data generation. But it is very easy to do and helps me visualize and understand the data model better.
yuml -i datamodel_enum.md -o datamodel_enum.png
This step is explained in yuml_to_uml.
yuml -> rdb
Second step is converting yuml
text to RDB
data.
RDB is an abbreviation that I use for requirements database. I try to store all software specifications as structured data in a relational database format.
Conversion of yuml
to RDB
is done by an R function called yuml_to_rdb
:
rdb = yuml_to_rdb(data_model_dir)
This step is explained in yuml_to_rdb.
rdb -> ddl
Third step is converting RDB
meta data to database schema defined as DDL
statements.
Conversion of RDB
to DDL
is done by an R function called rdb_to_ddl
:
ddl = rdb_to_ddl( data_entity = den, data_field = dfl)
This step is explained in rdb_to_ddl.
Generating Data: ddl -> data
and rdb -> data
Fourth step is generating sample data. This consists of two steps that are independent of each other: ddl -> data
and rdb -> data
In ddl -> data
step, sample data is generated using datafiller
directives embedded as comments inside DDL
statements. This step is explained in ddl_to_data.
In rdb -> data
step, sample data is generated from RDB
data. In general, I generate sample data from DDL
. But some data must be predefined in order to make the system work. This data is mostly special knowledge such as enum values. For example, different types of package, different types of weight units are of this kind. I define this kind of knowledge in RDB
database inside tsv files. Then I generate their SQL
statements automatically using rdb_to_ddl
function.
This step is explained in rdb_to_data