Three critical data considerations for IoT analytics success

By the end of 2019, digital transformation spending will reach $1.7 trillion worldwide, a 42 percent increase from 2017, according to IDC. Adding to that, annual spending on big data and business analytics continues to grow and now exceeds $166 billion, the research firm predicts.

Analytics is inextricably linked to digital transformation efforts. It’s reasonable to say that without analytics, digital transformation is unlikely to be successful.

The issue, though, is that many analytics efforts have a poor track record of producing value for businesses. Typically, this is because the organization didn’t determine at the outset their desired business outcomes.

The same could be said for Internet of Things (IoT) initiatives – many have been driven by expedience and technology and not by a clear understanding of the end goals. With IoT-generated data rapidly increasing, businesses must have a clear picture of their desired outcomes in order to ensure that the analytics technology used to gain insight from that data is aligned with business needs.

It’s clear that any complete implementation of IoT analytics will require hundreds of decisions, but there are three vital ones that profoundly shape the optimal architecture for a business. They include:

Time sensitivity – how quickly must data be analyzed?
Volume – can data volumes be reduced and where should they be transferred, or is that even a true requirement?
Visibility – does your IoT analytics require federation?

Let’s explore these three key considerations in further detail.

It’s all a Matter of Time

Mere fractions of a second can make a significant difference in the choice and cost of a technology implementation, depending upon the specific time sensitivity required. While companies may need real-time IoT analytics, that phrase doesn’t necessarily mean the same thing to everyone.

For example, some control decisions in autonomous vehicles and drones require sub-microsecond response times. Industrial control systems might require response in tens of microseconds to avoid damage and ensure safety. While other devices, like climate and temperature sensors, might only need to collect data once every few minutes[1]and respond within a second.

To achieve sub-second response times requires that analytics be conducted close to the source of the data at the edge, including within the connected devices themselves or within a local gateway device.

The arrival of high-speed 5G wireless networks and infrastructure edge data centers will also enable response times measured in milliseconds, so some of this processing can be offloaded to the infrastructure at the edge. It’s critical for a company to understand its end goals for the IoT data it will generate and the level of responsiveness needed in order to implement the appropriate analytics architecture.

Do you Need to Turn Down the Volume? The volume of data generated by different types of IoT devices can vary greatly. In total, an estimated 30 billion connected IoT devices will generate 600 ZB of new data every year. On average that’s more than 50 GB of data per device, every day.

There are strategies that can be used to help manage large volumes of data for analytics purposes – thinning, transferring and extending to the edge.

Data Thinning: It’s essential to understand what data must be retained after initial processing so you know what you can discard. In some cases, it’s clear what to get rid of, as some data doesn’t provide subsequent value. For example, if time-stamped data is sent continuously from an engine to relay the message of “working properly,” this data can be deleted shortly after it’s processed. Reduction would require saving only the data when a questionable operation, communication fault or failure is detected.

Other times, saving data for further analysis is critical. What about atmospheric and telemetry data that provides insight when analyzed at a later time? Is it sufficient to only send a daily summary of the data without any of the underlying detail? Predictive maintenance, forecasting, digital twin modeling and machine learning algorithms often benefit from having as large a dataset as possible—which means retaining a large percentage of the data.

Data Transferring: Once you determine which data should be retained, then it’s straightforward to estimate how long it takes to ship data before it’s available for analysis. Traditionally, data is shipped to a centralized location such as a private data warehouse or a cloud provider’s storage. With escalating data volumes, however, this becomes problematic. Google provides estimates on how long it takes to transfer data to its own cloud platform. For example, 10 TB of data would take approximately 30 hours when 1 Gbps of network bandwidth is available. It would take 12 days to ship the same 10 TB of data across a 10 Mbps Internet connection. And there are some cases related to data privacy and compliance that mandate data not be transferred outside of physical locations or across country borders.

Extending to the Edge: If latency, data privacy and/or transfer costs present an issue for your business, you should consider an edge architecture. The edge makes it possible to analyze petabytes of data aggregated from millions of IoT devices, without having to wait to transfer the data to a central location. Analytics can be distributed to the device edge – by deploying servers in very close proximity to IoT gateways including locations such as factories, hospitals, oil rigs, banks and retail stores. Analytics can also be distributed at the infrastructure edge to thousands of locations such as cell towers and distributed antenna systems hubs.

By extending analytics to the edge, newly generated data is available for analysis almost immediately. It can be processed within a matter of seconds and can be retained efficiently and cost-effectively for weeks, months or years.

Are Your Needs Singular or Cohesive? Not all IoT analytics use cases require the ability to combine data from many physical locations into one data set for analysis, otherwise known as federation. For instance, alerting when a storage facility’s humidity level deviates by more than 5% of the trailing 48-hour average would not need federation. The ability to analyze data within this facility doesn’t depend upon combining it with data from another.

However, other use cases do require federation. One example is to reduce manufacturing defects by analyzing process variations across assembly lines in geographically disparate factories. This analysis requires the data from multiple locations to be analyzed collectively.

One approach is to federate by aggregating analytics using resources that reside very close to the devices and gateways, at either the device edge or the infrastructure edge — or both. Rather than shipping data to a central location, data is kept locally and the queries and algorithms are pushed out to access the data at the edge.

Understanding your company’s distinct requirements about time, volume and visibility will fundamentally influence the proper architecture for your IoT analytics. Pursuing the answers to these three considerations — while not all inclusive of the requirements — will go a long way towards linking business needs with technical implementations, minimizing wasted effort and increasing the likelihood of success.

Three critical data considerations for IoT analytics success

It’s all a Matter of Time

The State of Tech in 2019