Ready to learn Big Data? Browse Big Data Training & Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
As enterprises scale, they develop data inefficiencies. Sometimes these inefficiencies arise internally, between departments or branches, while others emerge as the parent company acquires smaller companies, each with their own databases and IT departments. In either case, communication issues ensue as siloed databases and IT redundancies begin costing the enterprise non-trivial sums of money over time.
When these inefficiencies reach a breaking point, enterprises can proceed in one of three ways: 1) Maintain a distributed network of specialized databases; 2) Shift to a centralized database; or 3) Transition gradually to a federated system.
For medium-to-large-scale enterprises, distributed systems are unsustainable and centralized databases expensive to implement. Gradually transitioning to a federated system is, therefore, an efficient solution. Downsizing the satellite databases while building a central database allows large firms to eliminate unnecessary personnel, equipment, and facilities gradually and strategically, increasing the odds of success.
Image Credit: Exago
The Cost of Revolution
The problem with a swift, complete switch to a new centralized database is that core systems and processes will have to go offline or be placed on hold while preparations are underway. Such cutovers are risky and can lead to disaster if not managed well. Smaller companies can make these kinds of sudden switches from one system to another fairly quickly and without much risk to productivity. Large-scale enterprises, however, move more slowly and could require weeks, months, or even years to overhaul a database. Slowed productivity over an extended period of time can cost the parent entity dearly while causing personnel to lose faith in the project. Companies that attempt this approach often find themselves reverting back to the old distributed system in an attempt to protect their bottom line.
Evolution Playbook
A federated system, by contrast, offers all the efficiency of a centralized database with limited risk to productivity. The incremental shift supports essential processes while gradually reducing redundancy and increasing overall efficiency. Here are some best practices companies should consider to ensure the process goes smoothly.
- Start small and low stakes. Each subsidiary database is going to have some combination of highly specialized data specific to its branch or company as well as redundant data. Start with this general, low-stakes subset and move that to a centralized location first. This way, operational costs will stay low even if the federated system encounters issues.
Data identifiers are typically a good place to start in this regard. Coming up with standards for data identifiers and porting those identifiers into the centralized database makes for a productive initial project because it lays the groundwork for a basic inventory and cross reference common to all constituent databases.
The United States’ implementation of zip codes in the early 1960s illustrates how universal identification standards improve efficiency. The earliest incarnation of zip codes began as a means of dealing with a shortage of trained personnel in the wake of World War II. When business mail began to flood the system in later years, the Post Office Department unified the codes to make mail sorting more efficient with limited staff. Some zip codes from the legacy system remained while “in cases where the old zones failed to fit within the delivery areas, new numbers had to be assigned.” Enterprises can realize these same advantages by establishing and enforcing universal naming and identification conventions.
- Back up legacy data. Keep backups of any legacy data you move and/or modify as it’s transferred to the central system. These backups will help your team audit the transfer for errors as well as provide a failsafe in the event you need to either access or revert back to the old data sets.
The Information Systems Audit and Control Association (ISACA) recommends in its best practicesthat businesses develop a backup retention policy, which would include a retention period for legacy records. Eventually, the federated system will be stable enough for the legacy records to be completely expunged (in accordance with any applicable laws and/or regulations) and the space freed up for backups of the new system.
- Train and transition your IT staff. Don’t underestimate your distributed database administrators. Their specialized knowledge can be an asset to the federated system, provided you offer them adequate training. Pull talent from within your constituent companies for best results with the transition, and make an effort to reassign rather than release existing employees whenever possible.
- Downsize, don’t eliminate. Specialized data specific to constituent companies should remain in a distributed system. This, ultimately, is the most stable option, as no one will understand that data better than the company or constituent to which it pertains. Redundant data should be made federated, but the goal isn’t for the federated database to become the only database. The goal is to establish a central, single source of truth for generally pertinent data.
Those in software development see federated databases in action with Git, a version-control application for tracking changes to code repositories. Each engineer maintains his own files, committing changes to the main project and adhering to company-wide coding standards in the process. This ensures that each developer has the freedom to organize and format her files in the manner of her choosing while also keeping the master code clean and easy to navigate.
Even if it takes years to complete, a federated database created carefully and managed meticulously will pay in dividends, in part by helping enterprises improve their Data Management Maturity. Data Stewards working in tandem with a more senior Data Manager or Chief Data Officer mirrors the federated data model by balancing data management roles, responsibilities, and expertise across the entire enterprise. Increased efficiency, reduced costs, and greater company cohesion are all worth the a slower, more methodical transition.