Do you understand and trust the data that you have?
Organizations who can’t confidently answer this question likely have a business case for DataOps. This term has been making its rounds within the industry recently, but there’s often confusion associated with the concept.
According to Gartner, “DataOps is a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization.” It’s a technology and people processes combining practice, not a technology or product.
For today’s organizations, DataOps is proving to be paving the groundwork for AI and analytics opportunities, and alleviating the associated data management and process optimization bottlenecks. Because a data bottleneck is an AI bottleneck.
Ronald van Loon is an IBM partner and has been on the cusp of emerging industry developments for over 25 years, enabling him to share a unique perspective about the latest advancements in AI presented at IBM Think 2020.
IBM Think 2020 is an event that brings together collaborators, partners and clients to provide them with the technical education they need. Previously, IBM Think was a physical, face to face event, but due to the global pandemic and the resulting regulations, it was cancelled. The event has been transformed into IBM Think Digital, an immersive two-day digital event that focuses on enhancing essential recovery and transformation through in-depth education and targeted collaborative discussion.
AI thrives on high quality, trustworthy data, but data collection and preparation are the most challenging and time consuming part of AI.
An IBM study reveals that 95% of organizations see negative impacts from poor data quality, resulting in wasted resources and additional costs. Yet there’s also an 80/80 problem to contend with: in 80% of data pipeline building cases, 80% of user time is spent on data prep or in data operations.
Data readiness is crucial to optimized AI outcomes. Enter, DataOps.
What is DataOps?
In the simplest terms, DataOps fast tracks the delivery of high quality, business-ready data to operations, applications, data citizens and AI by orchestrating people, processes and technology. It provides a way for data collection, organization, and analysis to be infused into business processes, opening up the door to a more natural acceleration of AI initiatives.
Many organizations start a DataOps project by first identifying and isolating bottlenecks within their data pipeline to eliminate the bottleneck and iterate more quickly. This serves the main objective of DataOps, which is to streamline more iterations, ensure teams have adequate time to examine data, and generate insights that can lead to positive change or innovation.
DataOps accomplishes this in a few key ways:
- Implements methodologies that help streamline data pipeline processes.
- Automates traditionally manual core operations.
- Integrates agile and workflow processes to shorten time to iteration.
- Offers a view of the data pipeline from source to target by delving into data sources and consumers.
- Fuses automated test data generation and management.
- Facilitates communication and collaboration between stakeholders.
In action, DataOps has proven to be a practice with the ability to provide drastic results. One bank without a formal data quality program utilized DataOps and their data quality increased from 6% to 93%, improving their Net Promoter Score by 230 times. Given the competitive banking market, a data quality improvement of this extent has a powerful business impact.
In another example, when a business uses their data for customer affinity analysis or inventory stock positions, if it takes more than 3 weeks for that data model or data asset to change, it has a significant delay on how long it takes for marketing or stocks teams to have that information. Using a DataOps program, this process can be automated, data changes can occur in under 2 minutes, customer affinity analysis can be completed within the same day, and inventory stock positions can be done in under 4 hours, which is the case for a major European Retailer who faced pressures against huge competitors like Amazon.
The DataOps Role in AI and Automation
Every business has a unique vision or goal for AI, whether it’s improving predictions, automating mundane tasks, freeing up employees to do more fulfilling work, or optimizing processes. But in many cases, there’s no better purpose for AI than in understanding your environment, what your systems are saying through their data, and discovering issues before they snowball into full blown outages.
Organizations use about $26.5 billion in revenue because of IT system outages. IBM’s Watson AIOps understands the systems, normal system behaviors, and acceptable ranges, and provides alerts when a problem arises. In effect, it’s a nervous system that allows CIOs to effectively manage all of their systems.
Given that data scientists lament limited data access and the lack of a line of sight between data and all team members, a solution such as this becomes a facilitator for faster, proactive responsiveness.
AI-enabled automation is integral to DataOps for more than just manual steps; for governance processes, data curation, metadata assignment, and ensuring data is available for self-service. This helps to operationalize consistent high quality data throughout the entire enterprise.
Automation adds proven benefits in areas such as data inventory:
- Organizations who have leveraged DataOps technology have reduced the time required to build a business glossary by 85%.
- Decrease in time required to identify metadata and assign terms by 90%.
- 200,000 assets throughout multiple clouds discovered in less than 5 minutes.
But keep in mind that all of this AI and automation success hinges on the right Information Architecture (IA), which is about building trust in your data and enhancing the capabilities needed to organize, structure, and label your information.
- Data governance
- Data quality and master data management
- Data integration, replication, and virtualization
- Self service interaction for data prep and testing
AI pioneers are 8 times more likely to have a robust data architecture, demonstrating why IA is the prime ingredient for AI. Organizations can simplify the path to mastering AI through a prescriptive approach IBM coins the “AI ladder:”
- Collect: Ensuring data is straightforward and available.
- Organize: Prepare your data for analytics with trusted quality data that can be put to use.
- Analyze: Helps to build AI ready capabilities.
- Infuse: AI needs to be integrated into daily processes. You need the skills and capabilities to scale, train, analyze and test data and models.
- Modernize: Support modern digital businesses processes by using data with cloud capabilities across your AI Journey.
IBM’s approach with the AI ladder challenges the often overwhelming and complex concept of being able to derive real business use from AI into an approachable, step-by-step process. Just like the term ladder implies, there’s a step-by-step approach to gathering, organizing, analyzing, and infusing AI across the entire organization.
Though an AI journey is not a linear path and is different for every company, the AI ladder is a blueprint that simplifies an organization’s ability to evaluate their AI readiness and identify where capability gaps or weak points are, and which areas of the organization are particularly strong. You can also find more information on the AI ladder here.
How to Start and How to Evolve DataOps Maturity
Organizations can start with DataOps practices by introducing them in a focused approach, enabling data and analytics leaders to gravitate towards faster, agile, dependable data pipeline delivery.
For businesses starting from the basics without any DataOps:
Establish a catalogue to lay the foundational groundwork for DataOps, such as creating a data quality program that includes utilizing spreadsheets to monitor metadata, implementing data visualization, increasing communication, and hand coding for data integration.
For businesses with some developed skills:
Explore enterprise catalogue deployment to continue to accelerate DataOps maturity. Those with a data governance program can look to data stewardships and a monitored, governed business glossary. Data usage then transitions to becoming more self-service, leading to advanced DataOps where everything is becoming turnkey.
At this stage of DataOps maturity, organizations are able to enhance their catalogues with third-party data and business terminology, and have a formal compliance program in place with well-developed automated classifications that are a part of SDLC (Software Development Lifecycle) processes.
Every organization will start differently, but DataOps progression will always lead to more business value by dispatching business-ready data to users with more speed.
Future-Ready Your Data
As organizations prepare for an AI-enabled future, they must look to mastering the basics of data and create a smoother, faster path to delivering the data pipeline with trusted, organized, high quality data.
In a career first, Ronald binge-watched all of the IBM Think event videos in one sitting, compelled by the numerous interesting sessions that kept him riveted on the topics and encouraged to continue watching. The DataOps introductory video and the Rob Thomas interview were especially thought provoking. All of the IBM Think event videos can be found on-demand in their entirety here.