Distributed version control systems (DVCS) such as Git (introduced in 2005 by Linus Torvalds, the creator of Linux) rapidly gained worldwide acceptance by the global software development community. DVCS offer very fast, lightweight and local software change control capabilities that support many parallel code branches across large project code bases. These are compelling benefits for individual developers and development teams. At their heart, DVCS accomplish this by tracking software changes as a series of file system snapshots in local repositories instead of via management of individual file changes in a central database. “A Short History of Git” explains this in more detail.
The DVCS approach is ideal for individuals and small agile teams because it puts control directly in their hands without the need to transact all file changes with burdensome systems that were designed using a central management paradigm.
However, the distributed approach taken by DVCS has historically presented challenges for operations, especially when large-scale enterprise software systems consisting of components from many separate repositories need to be managed and controlled. Enterprises with many developers, large source or binary files (e.g. graphics, audio or video), many artifacts or many large containers need granular end-to-end version traceability. Enterprise-class support requirements exceed the original design intentions of DVCS.
End-to-end software version management offerings that on one hand leverage efficiencies of distributed systems for developers yet still support scalable operations are required to address the needs of Dev and Ops communities in enterprises with large software version management needs. These complementary requirements have driven the evolution and best practices of large-scale hybrid version control systems:
- Simplify builds and manage complex projects by organizing multiple repositories
- Move code and build assets into the DevOps pipeline quickly
- Improve performance for remote sites by using a central system as a proxy/cache
- Speed up many activities at the system level, such as cloning and copying code
- Implement a seamless workflow that incorporates assets such as large graphics and binaries, in addition to source code
As stated, historically, these were problems that could only be addressed by a centralized version control offering. DVCS alone simply couldn’t handle the large-scale continuous integration (CI) and operations requirements of large-scale software development communities. While there are still inherent issues in the way DVCS manages large binary assets, select vendors have begun to tackle the problem by mirroring DVCS repos in real time, which marries the developers’ need for easy branching and local control with the operations team’s needs for faster and more efficient builds.
For enterprises developing complex software incorporating both binary and text-based assets, large-scale enterprise hybrid version control systems with supporting both distributed and centralized structures offer a best practice solution. They provide capabilities for a global single source of truth, replication of all types of artifacts and manage versions of everything from planning to release. By resolving operational management problems resulting from sprawl of individual Git repositories, while at the same time preserving benefits of distributed systems, today’s enterprise-class centralized and/or hybrid version control systems can meet the needs of both Dev and Ops without sacrificing efficiency or operations control.
Summary
What do you think? Do you agree that the benefits of hybrid versions control systems listed above are important to achieving large-scale version control? Are there other solutions you recommend?