Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.
With an almost endless list of sources – including map and satellite data, catchment areas, service points, building and customer locations, land use data, urban data, and communication pathways – spatial data is a valuable global commodity which comes in many forms. Spatial data also comes in different representations and scale, for example the New York subway is presented as a route map versus an actual physical map.
So why do businesses need to process spatial data and what are some of the challenges they face in doing so at scale?
Spatial data has always played a role in logistics planning and location determination, but in recent years – due to an explosion of human mobility data, produced by smartphones, wearables, and the Internet of Things – its use has become multifaceted. Businesses are keen to analyze human mobility data, including spend, ecommerce, traffic, search, and event data, to better understand human activity and inform a wide variety of applications. And as these human activities happen in the real world, businesses also need to process spatial data to provide a contextual framework within which to analyze human actions, and to link multiple data streams.
Spatial data attributes associated with every type of data in an enterprise is key to link data and establish context around different data points. As the data deluge continues, understanding spatio-temporal relationships between different data points is essential to interpret what is happening in reality – to understand a customer, to understand where your supplier parts are, or what your competitor is doing.
To truly action on human mobility data, it must be analyzed at the appropriate spatial scale, allowing businesses to spot patterns in activity across different geographic areas and build strategies accordingly. By incorporating spatial data into their analysis, businesses can identify location-related aspects that influence consumer trends, and can anticipate future changes that may occur due to changing spatial conditions. Spatial data also enables analytics insights to be visualized within the context of maps, making them easier to understand and more engaging.
There are a number of challenges to be faced in the large-scale processing of spatial data:
Variance in coverage and quality
As with most data types, spatial data coverage varies greatly by country and area, with metro, suburban, and rural areas represented at different density levels. The quality of data surrounding the coverage of place categories and the spatial distribution of places of interest is highly variable – with multiple different standards in effect – which can easily lead businesses to make erroneous inferences. It is vital to understand which types of physical world data to source and how to merge different data streams – such as ping data and trajectories – along both spatial and temporal dimensions to effectively inform business decisions.
Spatial data errors and gaps
There is plenty of room for error with spatial data – from GPS information gathered at varying levels of precision to intermediary manipulation of data – which can distort analysis. To limit the impact of spatial data errors, and support reasonable data-driven decision-making, businesses need to understand the data provenance – its original purpose – and must utilize data from sufficient different sources to identify gaps or inconsistencies.
Big Data support for processing spatial data
Spatial data processing requires large-scale usage of Big Data technology stacks. The ability to process human mobility data along with reference spatial data can happen in both real-time and batch modes. Each mode provides support for varying use-cases within a business depending on the nature of the requirement. For example, a business wanting to locate it’s next retail store will require very different spatial and human mobility data to a business looking for a cab route to collect a customer in real-time. The variety of spatial data available is enormous and figuring out how to use it for better decision-making is a key ongoing R&D activity in many organizations. Investing in the right technology stack and working with key partners is essential to bootstrap this effort successfully.
Availability of data and talent
Spatial data is not always readily available in the variety and quantity necessary to inform business decisions. Although a recent survey by the Open European Location Services (ELS) Project reveals the majority of European National Mapping, Cadastral and Land Registry Authorities (NMCAs) provide at least some geospatial data free of charge, this data is likely to be relatively limited in density and quality, and would need to be fused with a wide variety of other sources. There is also general a lack of knowledge and expertise, meaning businesses need to train their teams in multiple aspects of spatial data representations and use.
Spatial data provides endless opportunities for businesses to understand human activities in a real-world context, driving smarter business decisions. As long as issues of coverage and quality, data error, and the availability of data and talent can be overcome, spatial data really is the next frontier.