Ready to learn Data Science? Browse Data Science Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
What is Data Modeling?
Data Modeling refers to the practice of documenting software and business system design. A Data Model is used to document, define, organize, and show how the data structures within a given database, architecture, application, or platform are connected, stored, accessed, and processed within the given system and between other systems. ~ Wikipedia
- Define and analyze data requirements.
- Define logical and physical structures that support these requirements.”
Why Data Modeling is required?
There are following benefits of using data modeling to store your business data:
- To Manage Data as a Resource: without a good data model, you can find yourself in the possession of a great deal of data, and with no efficient way – or no way at all – to make use of it
- To Integrate Existing Information Systems: by modeling the data in variety of systems, you can see relationships and redundancies, resolve discrepancies, and integrate disparate systems so they can work together
- To Design Databases and Repositories: by modeling your data, you can also drive better decisions about data warehousing and repositories
- Understanding the Business: process of data modeling requires you and your teams to understand detail how the business works in order to define the data that drives it
- Business Intelligence: using proper modeling and reporting, you can spot business trends, spending patterns, and make predictions that will help your business navigate challenges and opportunities
- Knowledge Transfer: data modeling is a form of documentation, both for business stakeholders and technical experts
How to do Data Modeling?
Lets understand how to model your business data. There are three primary types of data models: conceptual, logical and physical.
- A conceptual data model is a model of the things in the business and the relationships among them.
- A logical data model is a fully attributed data model that is fully normalized.
- A physical data model represents the actual structure of a database— mainly tables and columns.
From logical data modeling perspective, there are majorly three types of model:
- ER model: entity relationship modeling is done for operational data stores.
- Star/Snowflake model: star/snowflake schema is built for data warehosing puspose.
- ETL/ELT model: extract, transform & load model is built to transfer data from one model to another model
Others are:
- Object-relational model: this model is relational but with objects, classes & inheritance properties
- Generic model: this is generalization of conventional data models
- Semantic model: this model is the abstraction of real world entities and their relationship
Case Study: Rathi Pizza Inc
Lets say we need to build data models for our pizza business, what kind of data models do we need to build? First, an operational data model, which will keep track of operational activities of our business, it will be based on ER modeling. A typical operation is a customer visits one of our store and places an order of a pizza and get it. After this, lets say management wants to know how good our business is running, which of our stores are performing good, which requires improvements? Which products are having good sales and which aren't? Which age-group of customer is preferring what kind of pizza? To answer these questions we need a data warehouse on top of which we can build an analytics platform. The data model that we would use to build this data warehouse would be of star or snowflake model. Now, how would operational data would be transferred to our data warehouse? And both models are different so how our data will transform from ER to star/snowflake model? ETL or ELT model will take care of this problem.