Ready to learn Data Science? Browse Data Science Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
What is Data?
Data is a collection of facts (numbers, words, measurements, observations, etc) that has been translated into a form that computers can process.
Data is a set of values of qualitative or quantitative variables. Pieces of data are individual pieces of information. While the concept of data is commonly associated with scientific research, data is collected by a huge range of organizations and institutions. ~ Wikipedia
Every object has some attributes which can be described in qualitative or quantitative way. Like if you go to a grocery store and look for chocolates, it has a type, brand, color, shape, weight, packing; if we collect information of these attributes of all the chocolates in the store, that set of items with attributes is data.
Mostly, data is represented in tabular form but there can be other structures and then there can be data which can't have a fix structure as well. We will discuss this in detail in coming posts.
Why Data is important?
Data has become important for everyone like never before, because it makes us to take informed decisions, improve operations. We can only improve things & activities which we can measure, and when we measure anything, it is described in a form of data. So the data about things and activities can be collected, processed & analysed, new insights can be generated, which can give us certain competitive advantage, and hence data is becoming a game changer these days in almost any business.
How to exploit data?
Variables are measurements, characteristics or attributes of an item. We can measure the height of a person, or we can measure the amount of time a person stays on a website, or they might be more qualitative characteristics. So it might be the places that the person looks on the website or Whether that we think the person visiting is a man or a woman.
Data can be exploit by using appropriate mechanism for collecting relevant data, processing data and analyzing data further to generate insights. Due to ever evolving storage & computing techniques available, its now possible to use data to visualize, analyze & predict outcomes. These activities can give immense competitive advantage in today's business world.
Case Study: Rathi Pizza Inc
So lets take a hypothetical case study to understand data basics throughout this series. Rathi Pizza Inc is a big restaurant chain in the world known for making and selling delicious pizzas. What is the data involved in pizza? A pizza has attributes like base type, toppings, size, price etc.
pizza id pizza type base type toppings size price 1 Pan Cheese Burst Pepperoni Small 10 2 Greek Hand Tossed Mushrooms Medium 20 3 Italian Thin Crust Onions Large 30 4 Veg Cheese Burst Bacon Small 10
In above data-set, we can see measurements of different attributes of 4 pizzas. We can analyse this data to understand the distribution & relationships of these attributes which can help us generate further insights.
What is Data Architecture?
Data architecture is a set of rules, policies, standards and models that govern and define the type of data collected and how it is used, stored, managed and integrated within an organization and its database systems.
Data Architecture (the thing) is the way in which information flows around the organisation. What is plumbed from where to where. The picture of pipes at the top of the page is there for a reason.
Data Architecture (the discipline) is the effort to control it – the design, the models, policies, rules, standards, etc. Anything that designs the pipework and tries to get the contents (the data) to the right place at the right time.
Data Architecture is as much a business decision as it is a technical one, as new business models and entirely new ways of working are driven by data and information.
The Data Architect being the person who does one to try and control the other.
Why Data Architecture is required?
If you want to leverage and operationalize data proactively, you need to invest in your underlying data architecture and compile the information map for your organization. Data quality is more important now than ever before, and it should be categorized and correlated to validate that it is meaningful to the business.
A solid information architecture will also set up your foundation for a data governance program. You have to know what the data is and assign business meaning to it, with the proper terminology. You can define what information is considered sensitive, and run audits against it.
In the age of Big Data, the ability to visually model and map out all of the data from these sources, and track data lineage between them, can help you understand the information in the organization and build quality into the data process. To effectively assemble and utilize the information, you need a business-driven data architecture design.
How to define & build Data Architecture?
To create the data architecture, one has to define business information needs.
Building Contextual View: Contextual view describes graphically the interaction of the system with the various entities in its environment. The interactions consist of data-flows from and to such entities.The contextual view clarifies the boundary of the system.
Building Conceptual View: Conceptual view is a high-level description of a business's informational needs. It typically includes only the main concepts and the main relationships among them.
Building Logical View: Logical view of a specific problem domain expressed independently of a particular technology or product but in terms of data structures such as relational tables and columns, object-oriented classes, or XML tags.
Building Physical View: Physical view is how and where the information resides. The physical view is a technical description of the implementation of the logical view.
Case Study: Rathi Pizza Inc
Lets get back to our case study and apply what we have learnt. So, we need to manage our company's data and data architecture is integral part of data management. First, we will build contextual view to identify our architecture's boundary and the external systems it will interact with. Then we will build conceptual view of our data to identify major entities and its relationship. Once this is done, we will come to logical view of our data architecture to identify attributes of entities and their relationship and how to group/re-group these attributes in entities to serve the purpose. The last step is to build the physical view, where we look at technology and infrastructure we need to build to support our data architecture.