Ready to learn Big Data? Browse courses like Big Data Training & Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.
In financial services, the dangers associated with monetizing big data are nearly as great as the rewards. The promises of machine learning, data science and Hadoop are tempered by the realities of regulatory penalties, operational efficiency and profit margins that must quickly justify any such expenditure.
Additionally, there is often another layer of complexity best summarized by Carl Reed, who spent more than 20 years overseeing data-driven applications at Goldman Sachs and Credit Suisse. According to Reed, who currently serves as an adviser to the Enterprise Data Management Council, PwC and Cambridge Semantics, the sheer size of the customer base and infrastructure of larger organizations compounds the issues. He says, “Credit Suisse [had] 45,000 databases; Goldman Sachs [had] 90,000 databases. If you let the financial industry continue to implement technology vertically versus horizontally—because you’ve got no C-suite saying data isn’t a by-product of processes, it’s an asset that needs to be invested in and governed accordingly—you’ll end up with today 10s, tomorrow 100s, and in time 1000s of instances of Hadoop all over your organization.”
The most acute lesson learned from Reed’s tenure within financial services is to avoid such issues with a singular investment in data architecture that pays for itself over a number of applications across the enterprise. In doing so, organizations can justify data expenditure with a staggering number of use cases predicated on variations of the same investment.
Link analysis
The crux of Reed’s approach was to focus on semantic graph technology that linked the numerous silos existent across the organizations for which he worked. “One of the first problems we addressed at Credit Suisse was to harmonize its different infrastructure,” Reed says. “We had Cloudera, we had Hortonworks, we had open source, we had open source Hadoop.” Such infrastructure is readily connected on a semantic graph with standardized models and taxonomies. The graph-based approach pinpoints the relationships between the nodes, empowering the enterprise with several use cases of what is essentially the same dataset. The most eminent is arguably an improved means of determining data lineage for regulatory compliance, which could be the biggest challenge financial entities face after last decade’s fiscal crisis.
“The new world of big data and the evolving world of regulatory and operational reducing margins is about having the data associated with our business at the main sector being first-class citizens,” Reed said. “Nothing’s going to change that. But the relationships between them are now first-class citizens too.”
The graph approach to managing relationships proved equally valuable for improving operations and creating business opportunities. By determining how even seemingly unrelated nodes can contribute to a certain business problem, organizations can transcend regulatory compliance and further enterprise objectives. “For the type of causal reasoning you need to do for this style of link analysis—whether you’re understanding client social circles, how a market is behaving, how a potential change in your environment is going to have positive or negative ramifications, how to triage something that’s gone wrong—it’s all about the linkage between objects,” Reed observes.
Employee entity relationship graphs
Such linkage offers additional utility for monitoring the enterprise and its employees for multiple use cases. For instance, the underlying semantic graph approach is ideal for insider trading surveillance, a task predicated on illustrating the relationships between people who may have knowledge of a trade or business development. “If I’ve got a person who’s an insider and a person who’s a trader, how do I link those two people together to understand whether there has been a chance that the wrong information’s traveled from one to another? I have to start thinking about people relationships,” Reed explains. The ensuing conceptual modeling can contain appropriate organizational structure, electronic communication, employee information (including geographic location and scheduling) and other aspects of the people modeled.
By contextualizing the information with temporal data that indicates points in time people could have exchanged information, that form of link analysis can demonstrate the likelihood of insider trading. Moreover, it’s based on the same graph framework used to demonstrate regulatory compliance—and can involve some of the same data. After creating an exhaustive model for insider trading surveillance centered on those working for a company, “all of a sudden, you’ve got an employee entity relationship graph,” Reed says. “Now you can say these are the traders that are exhibiting potential insider trader activity. These are the people who’ve got the information that, if I can show a path to those traders, something needs to be investigated.” According to Reed, the same graph-based link analysis is used by the intelligence community to determine the movement of money among terrorist organizations.
The true merit of link analysis graphs is that they are reusable for additional organizational functions. Employee entity relationship graphs are useful for more than just monitoring insider trading. “The graph I talked about for insider trading, we built that at Goldman,” Reed adds. “We actually had the security division come to us because they knew we had people relationships.” Internal enterprise knowledge graphs are extremely similar to the employee entity relationship graph described by Reed, but also include knowledge, skills and experiences alongside relationships between parties. The graphs can strengthen security, monitor insider trading and determine which employees are most appropriate for new tasks or client interactions. Their visual representation of which workers have relationships, knowledge and experience relevant to strategic objectives is influential in selecting the best candidate for a project.
Reed says, “Whenever I had a new client or strategy that I wanted to present to an existing client and I went to my salespeople, a bunch of hands went up and they all said ‘me’ because they wanted the revenue recognition. I can use the [employee relationship] graph to understand, before I go into that conference room, who’s had the most contact with the client and who’s the strongest candidate based on records of calls, conversations and interactions with the client to disambiguate the hands in the room.”
Operational change management
There are also several impact analysis use cases predicated on an initial investment in semantic graph technology. Impact analysis is critical to optimizing operational efficiency—particularly when change management is involved. By modeling all the different functions, infrastructure, applications and departments involved in a datacenter in a relationship graph, organizations can understand the effect of each of those objects to streamline operations. Such a graph becomes vital to the proper implementation of change management, which should be done as painlessly and quickly as possible to improve workflows. Otherwise, companies face reactionary situations in which there is no definite knowledge of a proposed change’s impact, which simply leads to delays in which, “no one is sure enough to say ‘no’ or ‘yes’,” Reed explains. “So you thought the decision was no and you build up more and more technical debt as things fall more and more behind in terms of patch levels and homogeneity in your dataset, to the point where you were forced to do something. When you did that something, it was macro and you needed a small army of people to negate the risk.”
Conversely, by graphing all the aspects of the datacenter down to individual switches, companies not only see how something’s affected by a change but also the best time to make it to minimize downtime. The same datacenter relationship graph can be deployed for triage analysis, which further demonstrates the compounding benefits of the approach. In the latter example, organizations can elucidate how to mitigate any issues involving the datacenter and the various objects contained in its graph. “If something breaks, you can now use the same graph to say this is the collateral damage and these are the people I need to get involved to get the business back up and running,” Reed says. “Then I can start doing my root cause analysis to prevent it from happening again. And by the way, I’m going to use the same graph to do that too.”
Market intelligence categories
According to Reed, business intelligence for finance involves four categories: client intelligence, market intelligence, operational intelligence, and risk and reputational intelligence. “They all share common things and common relationships between things; they just use those in a different way,” he says. The insider trading graph addresses risk and reputational intelligence and can be repurposed as an employee entity relationship graph to provide client intelligence. The datacenter graph delivers operational intelligence, while a similar graph yields understanding of the market forces influencing that vertical. The Global Information Classification Standard (GICS) is one of the most commonly used tools to analyze those forces. Reed explains that modeling its data on a semantic graph is influential for ascertaining “how you can traverse it. A lot of companies spend a lot of time trying to figure out, for example, how do I look at supply and demand to understand credit risk across GICS categories. Again, that’s all about relationships.” Once organizations have invested in semantic graph architecture, they can also facilitate cluster analysis to illustrate the density of relevant financial concepts. For instance, cluster analysis can depict collateral concentrations of specific customers or financial organizations according to factors such as entities and region.
Centralized governance
Due to the numerous regulatory entities and punitive measures for non-compliance, financial companies all but require the uniform consistency of centralized data governance. By harmonizing their data (regardless of source or location) in an RDF graph, they can readily implement such governance with a top-down approach. Governance dictated by business unit, application or enterprise function is simply a way to increase the number of silo Hadoop implementations. When attempting to demonstrate provenance for multiple regulatory entities and purposes, local governance aggravates what’s already a difficult task. In contrast, the holistic governance of Reed’s method heightens regulatory compliance capabilities while demonstrating data lineage across the enterprise. Thus, organizations save money on potential non-compliance penalties and perform better with the plethora of use cases graphs facilitate. “The beauty of [centralized] governance in place is that you can start applying this new technology to any existing silo problems and improve it incrementally over time while you deal with these new data fronts with the right underlying architectural approach,” Reed says.
Data as an enterprise asset
Every organization looking to profit from data-driven practices must invest accordingly. The nature of that investment, however, can determine how successfully and efficiently a company monetizes its big data. In finance, organizations have a number of ways in which they can justify what is essentially the same investment for an interminable number of use cases across Reed’s four dimensions of BI. The use cases explained in this article are all variations of the enterprise knowledge graph concept, which is partly so named because of the infinite number of applications it supports. Still, the way to avoid non-compliance fines, inefficient operations and unjustifiable returns on data expenses is to spend wisely.
Reed says, “For me, there’s an intuitive sense once you get into the space where you start realizing data is an enterprise asset, a concrete asset. And, I think a lot of people are there. When you start thinking about assets, it’s far more intuitive to say why wouldn’t I invest in that asset once and leverage it as many times as I could, versus having my organization create it for themselves by process on demand. That makes no sense whatsoever.”
By modeling all the different functions, infrastructure, applications and departments involved in a datacenter in a relationship graph, organizations can understand the effect of each of those objects to streamline operations.
Originally published at KMWorld