Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.
THE INSIDE STORY OF THE PARADISE PAPERS LEAK
- With more than 1,4 terabytes of data, the Paradise Papers is the illustration of the new possibilities for the fraud investigation world.
- Conducting investigations is a challenge in the age of big data. Massive volumes, unstructured and siloed data make it harder for non-experts to find fraudulent activities.
- ICIJ and reporters relied on graph technology to connect the dots and highlight the connections in the data to reveal wrongdoings.
- This approach is used by financial organizations to unveil complex fraud cases and money laundering.
Linkurious is a partner of the International Investigative Journalist Consortium (ICIJ) since the Swiss Leaks and the Panama Papers scandal. ICIJ network of 380 journalists used Linkurious’ data analytics software to investigate the Paradise Papers.
The leak
On November 5th, ICIJ and 96 media organizations around the world shared revelations based on a new massive leak. The Paradise Papers contained 13,4 million documents from offshore service providers Appleby and Asiaciti Trust and 19 other registries of offshore tax havens.
German newspaper Süddeutsche Zeitung managed to obtain 1,4 terabytes of confidential files that were shared with ICIJ. Alongside with 380 journalists worldwide, they processed and scrutinized the information during several months before releasing their findings.
The Paradise Papers represents the second biggest leak in history after last year’s Panama Papers (2,6 terabytes). It is also another case of a successful large-scale data-driven investigation that illustrates the recent shift in fraud investigation. Graph technology is changing the scale and the possibilities of this field by providing investigators, reporters or analysts with new tools to handle the complexity of data-driven investigations.
The complexity of working with big data
When it comes to big data, challenges derive from the nature and the volume of the data. Whether it’s a data leak or a financial company’s internal data, the amount of data we are dealing with is considerable. While in the Paradise Papers leak, journalists were dealing with about 1,4 TB of data, some organizations can gather dozens of terabytes every month.
To complicate things, investigations usually start from raw, unstructured data. And it’s impossible to automate or scale the investigation without a predefined-data model or any kind of organizational logic. The files obtained by Süddeutsche Zeitung included millions of loan agreements, financial statements, emails, trust deeds and other paperwork dating back to nearly 50 years.
The large amounts of data and their unstructured form raise a first difficulty. Organizations have to handle the processing of these large volumes of raw data into computable information that can be organized, stored and analyzed.
“Depending on the source, we had different formats and many of those were not machine-readable” declared Pierre Romera, ICIJ’s Chief Technology Officer.
The second obstacle is related to the way we store data. The success of fraud investigations is determined by the finding of connections between entities. Though, in many investigation cases, data is kept in silos that make it difficult to cross-reference it and highlight connections. For the Paradise Papers, ICIJ’s reporters conducted the investigation with data stemming from the leak but also from public databases. To make siloed data talk, it’s essential to bring everything together.
Finally, data-driven investigations are reducing the availability. Like for ICIJ, making the data exploration accessible to non-tech-savvy reporters is both a challenge and a necessity. Otherwise, without an army of data analysts and database specialists, data-driven investigations would be nearly impossible to lead.
According to Romera, “one of the key challenges is to make our technology user-friendly for the journalists so that everyone around the world is able to use it.”
The ICIJ’s method: an efficient approach to fraud investigation
As for the Panama Papers, ICIJ proceeded in several phases to make the documents exploitable by its 380 journalists network.
The Data & Research unit was in charge of processing the documents into a machine-readable format, indexing and connecting them together through their metadata. ICIJ used Optical character recognition (OCR) and content-extraction technology Extract to transform and Apache Solr to index the unstructured data into a searchable knowledge center.
“The knowledge center was essential to let our partners access and explore all the information,” stated Romera.
Additionally, they used graph technology to bring all the sources and data together. The team made use of Talend ETL (Extract, Transform, Load) tools to load the data into Neo4j, a graph database platform, creating a network of nodes and edges. On top of that, they provided the reporters with a visual investigation and analytics software. Linkurious Enterprise let them explore the data, connect the dots and share visualizations of their stories.
The result provides unique insights into the offshore interests and tax activities of more than 120 politicians and world leaders. Reporters highlighted the relationships between politics, offshore companies and their lawyers.
According to Romera, “graph visualization technologies like Linkurious are a great asset. It’s intuitive for the non-tech-savvy reporters. They just need to click on dots to expand the connections and uncover persons of interest and potential stories in a short time-frame.”
Linkurious Enterprise visualization interface displaying former Icelandic prime minister’s indirect connections to an offshore account.
With this approach, analysts and investigators can deal with data growing complexity and heterogeneity and also gets around the problem of multiple data sources and siloed resources. They can uncover hidden networks by focusing on the relationships in complex data. More globally this method gives significant results when applied to fraud investigation, anti-money laundering or first and third-party bank fraud.
The future of fraud investigation
With 20 people only, ICIJ was able to organize an efficient and reproducible process for 380 journalists to investigate millions of documents for the Paradise Papers. The breakthrough revelations were made possible by Linkurious’ software for data analytics and visualization. Today this technology is used by public authorities such as the French Ministry of Finance, other European countries to fight tax evasion and private organizations like banks.
With the investigation tools of the Paradise Papers available, banks, payment providers and money transfer companies can block more frauds now and comply with anti-money laundering regulations.
Linkurious’ CEO Sébastien Heymann believes that it is the right time for companies in the financial sector to improve their investigation units with modern software. Technology will help them dramatically increase their efficiency, control the cost of compliance, and meet regulatory expectations.