What is Data Modeling?
Data modeling is the exploration of the structuring of your data. The exercise is astonishingly similar to designing and constructing buildings;
Depending on what type of utility you want to serve (libraries vs opera houses vs gyms) and what interactions you want to optimize (company HQ for employees, residential buildings such as high rises or single family homes), you plan your walls, optimize flows within the structure and finally, you choose the material that can turn the design into reality!
All this applies directly to data modeling as well! The exercise involves understanding both the application requirements and the underlying database platform. Application logic and its data flow leads to the entities and relationships in the data model. The database platform provides the “material” to represent the logical entities with methods to store and retrieve instances of the entities. Materials can make a huge amount of difference in how fast you can construct your design and how easy it can be to reconfigure parts of the builds to fit the tenants! For the rest of the paper, we’ll focus on a new material: document databases and the flexibilities it brings to constructing and evolving your data model.
I will assume most of you are familiar with relational databases and the general properties of the “material” there. RDBMSs provide tables with columns to represent entities and require advance declaration of schema before storing data. The “tenants” that want their own customizations find it hard to evolve this pre-declared schema. A “product catalogue” table can be a nightmare for relational databases. Products share properties like “price” but representing properties of a sports car in one row and vacation package in another row require difficult acrobatics in RDBMS world. On the other hand, document databases use JSON formatted documents and allow great flexibility. Each document carries its own schema and can nest complex structures like “embedded documents” and “array of documents”. You can easily represent the attributes like colour for the car in one document and represent valid travel dates for the vacation package in another document and let the application drive the schema evolution without requiring in-advance declarations.
One last important point before I get started: Some relational and NoSQL databases retrofit JSON type as a data type into their type system (see SQL Server JSON support, Oracle JSON support or late addition of JSON Cassandra JSON support) and shred it to fit into the native data model of the underlying system. Instead, I will be focusing on native document databases like MongoDB, Couchbase and alike.
Data Modeling with Document Databases
Data modeling exercise in document databases isn’t all that different from what you do with relational schema based databases. One goes through the same 2 phases in either case:
Logical data modeling: This phase focuses on describing your entities, their attributes and relationships. Logical data modeling phase is independent of the underlying containers your database platform provides.
Physical data modeling: In this phase you take the logical design and apply the entities and relationships to the containers provided to you by the document database. Based on the access patterns, performance requirements and atomicity and consistency requirements, you choose which type of containers to use to represent your logical data model.
Logical Data Modeling
The logical data modeling phase focuses on describing your entities and relationships. Logical data modeling is done independently of the requirements and facilities of the underlying database platform. You can find various methods of data modeling here: https://en.wikipedia.org/wiki/Data_modeling.
At a high level, the outcome of this phase is a set of entities (objects) and their attributes that are central to your application’s objectives, as well as a description of the relationships between these entities. Imagine an aerospace application: entities may be “satellite”, “module” and “instrument,” where their relationships might be “satellites carry many modules, which in turn are made up of many instruments”. Relationships can vary in type: one-to-many or many-to-many or one-to-one.