When Machine Learning Is Not Really Machine Learning
If you already have "anomaly detection" systems in place, are wondering about those ML features embedded into products or want to deploy an Oracle Database anomalous activity detection system then you need to read this article.
I'm very excited that machine learning (ML) capable products are not only accepted but now desired and in many cases expected. However, knowing this, many companies are touting their "machine learning" enhanced products and services.
Even if they have no true modern ML capabilities.
I Received An Email...
I received an email from a distressed student from one of my Machine Learning For Oracle Professionals LVC classes. Here's the email, pretty much word for word.
I presented my automated anomaly detection system proposal to my team and received pushback. Currently we use Product ABC which has a lot of functionality, including anomaly detection.
We have rules that can alert us based on SQL execution time, active sessions, loads and many other parameters.
I was asked about the practicality of my proposed system and what it can do that Product ABC does not do.
I'm not sure how to respond. Please help.
I am not going to show Product ABC's machine learning centric screen shot, but it uses words like Average Response Time, Errors Per Minute, Business Transactions, Ranges and Deviations.
This all seems harmless. But what is demonstrates is IT does not understand Machine Learning fundamentals and how it is different from typical anomaly detection and rule based systems.
This article is the essence of my response to my student.
Way Beyond Our Human Perception
Machine learning is about detecting useful patterns. Its value is detecting patterns that our experience, rules, statistics and human perception simply cannot recognize.
Anomaly detection has been around forever.
What makes modern ML anomaly detection different is the depth and breadth of this perception. Along with more powerful computers, truly free and mind blowing ML software and lots of super good data (more on that later).
If you have worked on setting alerting thresholds, you know there is an inherit tension between false positives (rules too tight) and false negatives, that is, missing true anomalies (rules too loose).
True ML models can easily integrate hundreds or thousands of independent metrics. Not human rules but raw metrics. The results can produce jaw-dropping anomaly detection capabilities.
Every Product Will Soon Claim Machine Learning
Every product will soon claim "machine learning" and "artificial intelligence." Companies offering products and services will be forced by marketing to use these buzz words.
So, the words will lose their meaning and will be used to mean anything. Therefore, we must understand what a product's true ML capabilities are.
It only takes a few questions to understand if modern ML is being used. I'll share some of these questions below.
What Does Product ABC's Anomaly Detection Mean?
It only takes a few hours of ML study to learn there are an overwhelming number of mathematical algorithms that have been encapsulated into models that are contained within ML libraries.
There many anomaly detection algorithms. So, it is a fair question to ask, "Which anomaly detection algorithms do you use and why?" Then wait for the answer.
If the reply is focused on words like rules, deviation and variance then modern ML is not the core of the product. What you want to hear are words like cluster analysis, k-means, SVM, Birch, Spectral, Mean Shift and thresholds. And also, words like Python, training, testing, optimizing and hyper tuning.
If you don't hear true modern ML words, then it's likely the product's anomaly detection is nothing more than rules and statistics... old school.
What Kind Of Anomalies Are Being Detected?
Oracle, Application, OS, user activity, sunspot activity, company stock price?
No product or service will reliably detect all kinds of anomalies. It's not going to happen. So, ask the question, "What kind of anomalies does your system detect?"
My student is interested in anomalous Oracle database activity. That is very specific, which allows for a higher precision system. Product ABC made absolutely no mention of anomalous Oracle database activity.
So, there really is no conflict in what my student was proposing and what Product ABC does.
Is An Anomaly Bad?
It's easy to forget that an anomaly is inherently neither bad nor good... it's simply highly unusual. So unusual, it warrants further attention.
That's the beauty! It allows us to scale the number of systems we are monitoring, by drawing our attention only to highly unusual and complicated situations.
So, it's a fair question to ask, "Do you consider an anomaly a bad thing or a good thing? And why?" and "How do you differentiate between a bad anomaly and a good anomaly?"
How Is The Model Optimized?
All ML models can be optimized and anomaly detection systems in particular can have their thresholds adjusted. While this can occur manually or automatically, you are going to want some kind of manual overwide... just in case.
Anomaly detection systems can be optimized to highlight different anomalous activity. The "anomaly" threshold can also be adjusted to keep the number of anomalies detected at your comfort level. Every implementation is different, therefore any product will need a way to be optimized.
So, it's a fair question to ask, "How are your models optimized?" and "How can we adjust the detection thresholds?"
Can Our Rules Complement The ML?
The best implementations I have seen use both basic "stupid rules", machine learning "bad performance" and "anomalous activity" detection systems.
For example, having over 200 active sessions on Friday afternoons may not be considered an anomaly or a bad situation. But perhaps over 2000 active sessions is a problem and everyone in your department knows this. So, to be sure you are alerted, integrate the 2000 active session rule into the automated system.
Integrating simple experiential rules with machine learning produce the best systems.
But it's tricky, because most IT professionals will start trying to ratchet down the rules and come up with a lot of them. This is a sure sign of a lack of ML understanding... and trust. Let ML do what it's good at and let us humans do what we are good at.
What Data Is Used?
Product ABC did not specify their data source. In fact, no one knew exactly what data was being looked at.
The best anyone could infer was "application focused" activity. Yet, no one really knew what that meant.
As Oracle professionals, we have an incredible source of clean and reliable performance data in Oracle's Automatic Workload Repository. It contains ASH data, which provides details about SQL statements, modules, actions, waits, and CPU consumption. We have sysmetric data, which is unbelievably valuable in ML projects. We have Oracle time model data. And on and on it goes.
If an Oracle anomaly detection system does not incorporate Oracle data... there is a problem. It just doesn't make sense.
So, it's a fair question to ask, "What is the product's data source?" and "Are you looking at Oracle database activity? If so, what data?"
If Product ABC is focused on application anomalies, then there is really no conflict with a system that focuses on Oracle activity/performance anomalies.
This is important: They do not conflict, they complement.
True ML With Full Control
What my student was proposing is true modern machine learning. He will be using industry standard algorithms, models and processes. He will be using Oracle AWR data, because he is focused on detecting anomalous Oracle activity. And, he will have full control over optimizing the system and it's implementation.
Clearly Product ABC was not designed to do this. Again, there is no conflict because they compliment each other.
The Real Message
I think the real message is this; What my student was proposing was not competing with or trying to replace Product ABC. It was truly complimentary.
The core problem in this situation was IT did not have fundamental ML knowledge. With just basic knowledge, they could have asked a few questions and "they compliment, not compete" truth would have a been realized.
If you would like some help creating an automated Oracle focused anomaly detection system or just want to learn more about how IT can leverage ML, feel free to email me.
All the best in your machine learning work,
Start my FREE 18 lesson Machine Learning For Oracle Professionals E-Course here.
Craig Shallahamer is a long time Oracle DBA who specializes in predictive analytics, machine learning and Oracle performance tuning. Craig is a performance researcher and blogger, consultant, author of two books, an enthusiastic conference speaker a passionate teacher and an Oracle ACE Director. More about Craig Shallahamer...
If you have any questions or comments, feel free to email me directly at craig at orapub.com.
|How To Tell If The IO Subsystem Reads Are Struggling||Five Ways To Help Get Your 2017 IOUG Abstracts Accepted||Which Is Better; Time Model Or ASH Data?|