A new generation of streaming analytics for a connected world

We live in a world of connected devices. From the watches on our wrists to the doorbells outside our homes – IoT devices are everywhere. According to recent studies, it’s estimated there will be well over 24 billion IoT devices within the next four years, and with an abundance of these devices comes an abundance of data. How can analytics systems that need to manage these devices possibly track and analyze their billions of telemetry messages and respond quickly enough to emerging issues?

For example, imagine a healthcare application that monitors thousands of patients via smartwatches. This application needs to analyze data flowing in from each smartwatch and match it to the corresponding patient’s medical condition and history for analysis. If it is going to detect emerging medical issues, classify their urgency, and respond in seconds, it must be able to continuously analyze telemetry as soon as it arrives. Current software technologies that just log incoming data for query or offline analysis can’t react fast enough to make proactive decisions at the moment for patients.

Further complicating this challenge is the difficulty in writing analytics algorithms for streaming data that track dynamic measurements of physical systems. Sensor data, such as EKG data from cardiac monitors or temperature/pressure/RPM data from engines and air compressors, often have complex waveforms that hide patterns describing emerging issues needing attention.

New software technology promises to address these challenges and enable streaming analytics to track large populations of IoT devices quickly and effectively. This technology combines the power of in-memory computing with the digital twin software model to enable incoming telemetry from millions of devices to be analyzed immediately – as it flows in – instead of requiring log files or historian databases and offline analytics. It can immediately signal abnormal events and send alerts to personnel.

In-memory computing platforms leverage the combined computing power of many cloud-based or on-premises servers working together to host information about each IoT device in memory for fast access. They combine this information with incoming telemetry to update their knowledge about each device’s condition. In-memory computing can do all of this in milliseconds using the digital twin model, a software technique originally created for building and evaluating new devices. When used for streaming analytics, a digital twin for each device holds information about the device and processes incoming telemetry. This both simplifies the design of analytics code and enables it to run fast.

Although analytics code is typically crafted in popular programming languages, such as Java and C#, creating algorithms that uncover emerging issues hidden within a stream of telemetry can be daunting or, at a minimum, complex. In many cases, the algorithm itself may be unknown because the underlying processes which lead to anomalies and, ultimately, device failures are not well understood. Machine learning (ML) algorithms can help tackle this problem by automatically recognizing abnormal patterns in a device’s telemetry messages and associated state information that is otherwise difficult for humans to detect. After training on historical data that has been classified as normal and abnormal, followed by testing and refinement, an ML algorithm can then monitor this dynamic information and alert personnel when it observes suspected abnormal behavior. No manual analytics coding is required.