Machine Learning With Dynamic Data

April 23, 2019
April 23, 2019 Ricky Costa

Machine Learning (abbreviated ML):

1.[mass noun] Technologies which enable the use of statistical techniques / models to give computer systems the ability to learn from data without being explicitly programmed.

Machine Learning is Hard

For machine learning developers, modeling an algorithm is a challenging initiative. Achieving a state-of-the-art ML model for enterprise-ready applications is predicated on obtaining reliable data sources with robust data quality. However, there’s an area in ML data management that deserves more attention, and this is, dealing with dynamic data. What do we mean by “dynamic data”?

There are 2 data in-take methods used in machine learning; online learning and batch learning. Online (dynamic) learning calibrates the ML model over time as new data arrives while batch learning (static) does not learn in increments but instead generates a static model with all of the available data (if new data arrives at a later time, the model needs to be retrained). For most businesses, tracking the latest trends is paramount for executive decision-making which makes it a natural host for leveraging data dynamically.

According to a recent Refinitiv survey, 39% of data scientists in Finance said that managing the size and frequency of data is one of their top challenges when dealing with new data for ML modeling. This is an outcome that emerges organically given the data types found but not limited to the Finance industry.

A crucial parameter when modeling dynamic data is the learning rate. This variable defines the sensitivity with which a ML model adapts to new data. If set too high, the ML model adapts very quickly but has memory problems w.r.t. the old data. If set too low, it is less sensitive to noise, however its learning rate slows down. This is the reason why quality of data is crucial under the dynamic environment. According to Refinitiv, 56% of data scientists claim that “accurate information about the coverage, history and population of the data” is the biggest challenge when handling new data.

Whether its Finance or any business-related industry, decision-makers who choose to adopt machine learning must understand the kind of data that is required to have a successful product/service. Dynamic data is time consuming due to data curation and executives must decide very early on if the risk/benefit of ML can cope with their unique data profile.