Categorical Databases

Getting Started

This page describes how to get started with the open-source CQL IDE.

The suggested introductory course of study includes an introductory paper about CQL for computer scientists, introductory slides and video about CQL from a functional programming perspective, introductory slides and video about CQL from a mathematical perspective, introductory slides about category theory from a knowledge management perspective, a paper from NIST arguing that category theory is critical for IT interoperability, and an introductory textbook on category theory, as well as a data science case study.

See the CQL Wiki for links to JDBC drivers, known limitations, etc, and the CQL Manual for a description of CQL's syntax and an index of the IDE's built-in examples. Once downloaded, double click cql.jar to open, or run java -jar cql.jar at the command line. To get started with CQL once it is running, see the examples below.

Basics

Tutorial
This example (built-in to the IDE with name Tutorial) covers the basics of CQL: typesides, schemas, instances, mappings, queries, and data migration / data exchange / ETL. It also illustrates "literate programming" in CQL, where the CQL file itself contains markdown/HTML that generates a web page describing the CQL file and its execution. This example is recommended for all CQL users.
Importing and exporting CSV files
This example is recommended for anyone dealing with CSV data and is built-in to the IDE as QuickCSV. It imports CSV files about land and water animals over the internet, demonstrates basic CQL operations, and then exports the data as local CSV files.
Importing and exporting SQL data over JDBC
This example is recommended for anyone dealing with SQL data and is built-in to the IDE as QuickSQL. It imports a cloud-based MySQL database about employees and departments into CQL, demonstrates basic CQL operations, and then exports the data into a CQL's in-memory SQL database, H2. Accessing any external SQL database management system in CQL requires adding the vendor's JDBC driver to the java classpath when running CQL; this example requires downloading MySQL connector-j version 5.1.47 and running CQL from the terminal in a directory containing both jar files (on Windows, replace : by ;):
```
java -cp "cql.jar:mysql-connector-java-5.1.47.jar" catdata.ide.IDE
```

SQL vs CQL

Difficult Queries
This example (built-in to the IDE with name vsSQL1) defines a schema about employees and departments, with foreign keys taking each employee to the department they work in, each department to the department's secretary, and each employee to their manager. The problem is to obtain a sub-database (including foreign keys), where every employee works in the same department as their manager and every department's secretary works in that department. This problem is difficult without an automated theorem prover because the programmer must invent from scratch what is essentially an inductive hypothesis sufficient to guarantee closure under the business rules, a problem which is notoriously difficult to solve. In practice on this problem a novice CQL user can try one different query per minute and arrive at a solution in an hour with 100% confidence, a development process similar to interactive theorem proving in Coq.
The dual of join: quotients, tuple merges, co-limits, pushouts, and more
This example (built-in to the IDE with name vsSQL2) defines a schema about animals, land animals, water animals, and amphibians, and the problem is to compute the animals without double-counting the amphibians. The problem is solved using CQL's sigma operation from a co-span into a commutative square. This operation cannot be expressed in relational algebra and is awkward in SQL, leading to the development of specialized chase engines such as CQL.
The Chase

This example (built-in to the IDE with name vsSQL3) defines source table:
```
DeptEmp(deptId int not null, mgrName varchar, empId int not null)
```
and the problem is to populate these target tables:
```
Dept(deptId int not null, mgrId  int not null, mgrName varchar) 
Emp (empId  int not null, wrksIn int not null)
```
such that if (d,m,n) is in Dept, then (m,d) is in Emp; if (e,d) is in Emp, then (d,M,N) is in Dept for some M and N; and if (d,n,e) is in DeptEmp, then (e,d) is in Emp and (d,M,n) is in Dept for some M. This specification is written using a formalism called tuple-generating dependencies, the formal counterpart to many informal visual notations in existing ETL and schema mapping tools.

Further Examples

Path Equations
This example (built in to the IDE with name Employees) defines a schema about employees and departments, with foreign keys taking each employee to the department they work in, each department to the department's secretary, and each employee to their manager. Two path equations enforce that every secretary works in the department they are the secretary for, and that every employee works in the same department as their manager.
Transparent Denormalization
This example defines a normalized CQL schema containing information about males and their mothers. The denormalized CQL schema contains an additional redundant attribute, the name of each male's mother, as well as an equation specifying how the redundant attribute is derived. When the normalized data is loaded into the denormalized schema, the value of the redundant attribute is automatically computed. The equation linking the redundant data to the master data will be respected by every CQL operation on the denormalized schema, ensuring that the redundant attribute can never become out of sync.
Transparent Joins
This example defines a source schema about schools, faculty, and departments, and a CQL query to find everyone who works in a school whose largest department is mathematics. The CQL query does not require any joins; in SQL, the query would require two joins.
Reflexive, Symmetric, Transitive Closure
This example defines a source schema about people and who likes whom and a target schema with a single entity representing groups connected by liking. There exists a single schema mapping from the source to the target, and CQL's sigma operation along this mapping computes the connected groups.
High-assurance User-defined Functions
This example defines a source schema about American airplanes, which have wing spans in inches, and a target schema about European airplanes, which have wing spans in centimeters. A user-defined function specifies how to convert inches to centimeters, and a query which does not convert inches to centimeters is rejected. Although this example uses a relatively simply type conversion, CQL supports user-defined types defined by arbitrary equations.
Compile-time Foreign-key Checking
This example defines a source schema about departments, professors, and students, and a query to find all possible same-department professor-student advising matches. The target schema relates matches, professors, students, and departments through four foreign-keys and an equation enforcing that in every match, the department the professor works in is the same as the department the student is majoring in. A query that incorrectly populates the target schema is rejected at compile time.
More on SQL and CQL

This example defines a source schema about employees and populates a CQL instance on that schema by querying a database using JDBC. To be self-contained, this example creates an in-memory SQL database to import from using CQL's built-in SQL execution engine, H2.

Categorical Databases

Getting Started

Basics

SQL vs CQL

The dual of join: quotients, tuple merges, co-limits, pushouts, and more

Further Examples