Categorical Databases
Home | Download | Getting Started | Manual | Wiki | Papers | Screen Shots | Github | Google Group | YouTube | Conexus | ContactTransparent Denormalization
Many data integration tasks require denormalizing a schema (adding redundant attributes) to increase query performance. The different copies of a piece of data must then be kept in sync, an error-prone task. In CQL, equational constraints can transparently enforce that the redundant copies of a piece of data are all the same.
In the CQL example below (built-in to the IDE with name Denormalize), the normalized schema contains information about males and their mothers. The denormalized schema contains an additional redundant attribute, the name of each male's mother, as well as an equation specifying how the redundant attribute is derived. When the normalized data is loaded into the denormalized schema, the value of the redundant attribute is automatically computed. The equation linking the redundant data to the master data will be respected by every CQL operation on the denormalized schema, ensuring that the redundant attribute can never become out of sync.
We begin by specifying a normalized source schema containing males and females, a foreign key indicating the mother of each male, and a string attribute for the name of each male and female:
typeside Ty = literal { java_types String = "java.lang.String" java_constants String = "return input[0]" } schema NormalizedSchema = literal : Ty { entities Male Female foreign_keys mother : Male -> Female attributes female_name : Female -> String male_name : Male -> String }
Here is some sample data and its view in the IDE:
instance NormalizedData = literal : NormalizedSchema { generators Al Bob Charlie : Male Ellie Fran : Female equations Al.male_name = Albert Al.mother = Ellie Bob.male_name = George Bob.mother = Ellie Charlie.male_name = Charles Charlie.mother = Fran Ellie.female_name = Elaine Fran.female_name = Francine }
Next, we specify the denormalized schema by importing the normalized schema, adding an attribute for each male's mother's name, and an equation stating that the attribute must equal each male's mother's name:
schema DeNormalizedSchema = literal : Ty { imports NormalizedSchema attributes mother_name : Male -> String observation_equations forall m:Male. mother_name(m) = female_name(mother(m)) }
Finally, we import the normalized data onto the denormalized schema and view it in the IDE:
instance DeNormalizedData = literal : DeNormalizedSchema { imports NormalizedData }
A screen shot of the entire development is shown below: