Re-envisioning your Information Goldmine with Artificial Intelligence

Ready to learn Artificial Intelligence? Browse courses like Uncertain Knowledge and Reasoning in Artificial Intelligence developed by industry thought leaders and Experfy in Harvard Innovation Lab.

A few weeks ago, I was in Moscow with one of our customers in the financial services industry discussing their over 1,400 legacy-application decommissioning issue. They were looking for a modern and lightweight architected solution that could grant the chain of custody and compliance preservation for their data and content, allowing full decommissioning of their old legacy systems once for all. ROI for this use case is relatively easy to calculate and in the order of the millions of dollars, so the value of decommissioning is clear.

But as so often happens when handling old legacy applications, there is very little knowledge in the enterprise about the actual data and the content that its infrastructure architecture is serving, preventing the company from applying advanced optimization and governance strategies that can generate additional benefits.

The lack of knowledge on what’s actually in the archive will also prevent the company from taking full advantage of the often huge and unexplored goldmine that lies in the archived information and using it as fuel to accelerate and spur on new services and innovations.

As an example, a bank could offer its customers a new service where they could provide all the details of all the expenses in the past 20 years, analyzing the evolution of the spending behaviors over time. This, combined with customer’s real-time expenses could lead the bank to predict its customer’s future investments helping them understand where they could optimize and reduce expenses, as well as providing them with the best deals and suppliers offers.

Businesses ability to ‘re-envision’ their data can have a strong impact on the success of any enterprise in the digital transformation race. But how do they get there? How can businesses get millions or billions of data and content out of the dark, start classifying and categorizing them, to identify key information that will unlock value?

Artificial Intelligence may be a long way off in most organizations’ digital transformation journey, but there are natural language processing techniques and neural networks algorithms that – if used in the right way and in combination – can produce significant results for any organization. The few examples I’ve highlighted below are far from being a comprehensive list of the technologies out there but hopefully will provide some direction on how to minimize human interaction and leverage the power of AI.

‘Bag-of-words’ approach – By creating a dictionary containing all of the words that are relevant for categorizing all documents, organizations can compare each of the documents with the dictionary to create a sequence of values that represent the number of occurrences of each word in the dictionary and in each specific document. Documents or sentences that are related to very similar topics will have also a very similar sequence of numbers. By applying this technique, organizations could compare the sequences derived from the documents and categorize all of the documents in a legacy repository – at a fraction of the cost and time required if trying to achieve similar results with human work force.

Continuous Bag Of Words (CBOW) and ‘Skip Gram’ – CBOW allows for the prediction of current words by using a window of surrounding context words. Similarly, Skip Gram can predict the surrounding window of context words using the current word. These models allow similarities between words to be found, adding a degree of ‘intelligence’ when searching for or trying to associate related information. For example, when searching for all financial transactions related to a customer, a first step would be to discover and search for all terms that apply, like ‘Purchase’ and ‘Credit Card’… But by adding a little ‘intelligence’ and allowing the system to ‘learn’ from the repository, it could actually discover – by itself – that the word ‘Transaction’ is close to each one of the words above, plus ‘Loan’ and ‘Mortgage’ and more, thus extending the query to include all relevant categories of information.

‘Word2Vec’ model – By adding the ‘Word2Vec’ model to the previous techniques, it is then also possible to create high-dimensional space and graphic representations of a large corpus of text, where each unique word is assigned a corresponding vector in the space. In simple terms, words that share common contexts in the corpus are located in close proximity to one another in the space. Given enough data, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances. Those guesses can be used to cluster documents and classify them by topic, forming the basis of search, sentiment analysis and recommendations in diverse fields such as scientific research, legal discovery, e-commerce and customer relationship management.

There are lots of ways to process information with speed and accuracy and minimal human intervention. Depending on the technology used, and the context to which they are applied, they can offer immense value to an overall decommissioning strategy.

Like in the case of our Russian customer, adopting a combination of the above techniques could help enterprises to re-envision their legacy information and turn it into usable fuel to run new innovative services and businesses for their customers.

After all, information is the new currency!

Re-envisioning your Information Goldmine with Artificial Intelligence

GDPR Challenge: Finding the Data That Needs to be Forgotten