Creating A Repeatable Data Library Process

Creating A Repeatable Data Library Process

In this final article in a series on how small analytics teams can build a self-managed data library for effective data management, I will summarize the previous articles and show how to put it all together into a repeatable process.

A Data Library is Built on a Set of Principles for Data Management, not a Technology Stack 

A data library does not require a specific technology or skill set. Rather, it is built on principles of good data management that are geared toward helping you to work more effectively. These principles are theoretical, but also actionable. The process you build should include whatever steps are necessary to implement your team’s data library principles.

A Data Library has an Informal Data Architecture, with “Ponds” and “Reservoirs”

It is easy to get caught up in how to name things. Naming conventions, including what I suggest, are less important than a purposeful architecture that supports collecting data in the ponds and preparing that data for analysis and reporting in the reservoirs. Your process should include, at a minimum, the work that is required to build automated data ponds, with the construction of the analytics and reporting reservoirs done on an as-needed basis.

A Data Library Balances Speed, Agility, and Cost, and is Built According to Priority

Creating a process with many steps can slow things down, but it ensures you do not cut any corners. The data library concept should help your team do things quickly in the future, but it takes time to build up the library infrastructure at the beginning. Prioritization is important so that if you are able to catalogue only one data source a quarter, you know you are working on the most important sources first.

Building a Repeatable Process with a Checklist

To operationalize your data library, you will need to execute many small steps for each data source. Do your best to think of as many possible steps before you get started and then refine that list as you go. At TechSmith, our current list includes 39 steps. In a given project, some of these are unnecessary and others will be added. Starting with this list ensures that we will think about all of the important aspects and address each data library principle.

TechSmith’s checklist includes some final items that I have not mentioned previously. When you finish cataloguing a data source, you should celebrate! It is an accomplishment that will make your team more effective and enable you to better serve your stakeholders. Tell those stakeholders about it, give them examples of what can be done that was not possible before, and allow them to ideate. This process, along with a quarterly check-in with these stakeholders, leads to exciting new ideas.

Take the time to share knowledge with your team and others as well as doing project clean-up. This may include things such as committing all changes to a GitHub repo or standardizing a folder structure. If you are creating or contributing to open-source development, you should make that available to that community and publicize that it is available.

Creating A Repeatable Data Library Process
  • Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Leave a Comment
    Next Post

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »