Why "Rare Events" Can Be A Problem With Machine Learning

In the Oracle performance monitoring and alerting world, we all hope performance is rarely an issue. That is, we hope poor performance is a rare event!

While this is good news for those of us who optimize Oracle systems, for those of us who build machine learning predictive models, it's births all sorts of challenges that simply cannot be ignored.

In this article, I am going to demonstrate why rare events are a problem in machine learning.

Ready? Read on.

Wood And Metal Coins

My objective is determine if a coin is made of wood or metal. Of course, I want the accuracy to be 100% and, 80% is better than 70%.

Suppose I have 100 coins in a bag. I place all 100 coins on table. You and I can clearly see there are 2 coins made of wood and the remaining 98 are metal. But this is not about us, this is about machine learning!

First, I think you would agree that the 2 wood coins are very rare. And, the 98 metal coins are not.

There are many machine learning models we could train to determine if a chosen coin is wood or metal. But I want to challenge you with a question.

Maximizing Accuracy Paradox

Why would I develop a model when I can simply randomly pick a coin and ALWAYS declare it "metal?" This "always metal" strategy results in a 98% accuracy score?

To develop a machine learning model that has an accuracy greater than 98% is not a simple task. In fact, it may not be possible.

So, let's just always proclaim the coin is metal, be correct 98% of the time and be done with this exercise?

Does this sound too good to be true? Read on.

But That's Not Fair!

You may be thinking, "But the ML model doesn't know there are only 2 wood coins." Actually, this is not a problem, because most ML models will quickly learn that "choosing wood" more than 2% of the time results in a poor overall accuracy.

You and I would do same thing. If my strategy was "always wood," after ten attempts and getting probably all ten wrong, I might decide to switch to an "always metal" strategy.

Once I switched to an "always metal" strategy, I would then be correct around 98% of the time!

While different ML algorithms will approach this differently, by default machine learning algorithms will eventually settle on the "always metal" strategy.

To understand this behaviour, it's important to remember that I set the goal of "maximize accuracy." This simple accuracy score is the goal and nothing else matters! Machine learning models are super good at crunching towards their goal, the best they can.

So, why is this "rare event" issue such a potentially big problem? Read on.

Not Wood, But Cancer

In many machine learning projects, detecting the "rare event" or the "wood coin" or "poor system performance" is MORE important then detecting the common event.

A sobering example is cancer. We all hope that detecting cancer is an extremely rare event, and we all hope who or whatever is scanning the "sample" understands that this "rare event" is incredibly important.

Furthermore, being correct 98% of the time is of absolutely no value and could potentially leed to death.

So, properly detecting a rare event can be incredibly important. Enough said.

How Do We Detect The Important Yet Rare Event?

That's really the question, isn't it? As Oracle professionals, we must be able to properly detect "poor performance", or whatever the rare event is. In this blog series, I will cover the topic from a variety of perspectives.

Most people think the solution is to develop a better model. Yes, a better model may be a solution or part of the solution, but there are other options, including more appropriate accuracy metrics.

As I mentioned above, this post is just the introduction to this important topic.

Hopefully, you realize the importance and are ready to dig deep into this topic!

All the best in your machine learning work,

Craig.