Posted on July 24, 2018 by Chris Richards
Featuring an interview with Erik Benner, Mythics VP of Enterprise Transformation
For anyone managing IT ops, setting performance thresholds has been a big, and tedious, part of the job. I mean, if you don’t tell the system that 8,000 server calls in an hour is way too many, how’s it going to know when to send an alert that a problem’s brewing?
The new answer: machine learning, which can help free IT workers for higher-level decision-making and supervisory roles.
Here’s how. Using machine learning, an IT monitoring system now can watch an organization’s hardware, databases, applications, operating systems, and all the connections among them. And the monitoring system learns what is “normal,” and therefore sets its own thresholds. Beyond that mark, it sends an alert for a person to investigate a potential problem. This structure can eliminate several hundred thresholds an IT pro might need to set up and maintain to monitor a complex technology system.
The hard part for IT pros, says Erik Benner, who’s been implementing machine learning-based monitoring systems for consulting and systems integration firm Mythics, is breaking administrators’ old habits.
“Time after time, they want to get in there and set up thresholds,” says Benner, Vice President of Enterprise Transformation at Mythics. “Let the machine learning run for a week, and then set alerts that say ‘Show me the abnormalities.’ The technology works. Don’t try to outsmart it when you get started.”
Embracing machine learning and autonomous IT monitoring requires not only a new mindset, but also new processes. Here are six other lessons Benner has learned from helping organizations implement Oracle Management Cloud Service, which provides machine learning-based, autonomous monitoring of a variety of IT assets from any supplier.
Machine learning isn’t following orders to watch only the 200 things you told it to watch. Instead, it’s monitoring what the normal state of operation is, and alerting you when that gets out of whack. “It catches things you don’t think to monitor,” Benner says.
For example, one Mythics client’s Oracle fast recovery area—used for short-term, backup-related storage and critical archive logs—was running at more than 80% capacity and climbing. And this was at 2 p.m., a time no one should be running the kind of backups you’d usually associate with FRA bursts. “If the FRA fills up, really bad things happen,” Benner says.
No one had set up monitoring for that scenario, but the team got the alert from Oracle Management Cloud Service that this abnormal activity was happening. So a database administrator went into Oracle Enterprise Manager to check it out, and she quickly found a developer running a load outside of the usual hours to try something new he’d written. “She caught it before it caused a problem,” Benner says.
Again, machine learning focuses on what’s abnormal—and spotting odd behavior by systems or people is a huge part of modern security monitoring.
The Oracle Management Cloud system at one of Benner’s clients spotted someone granting nonstandard access to a database table, and it turned out to be a compromised account. Knowing what a person or system is doing inside an app or database, as opposed to just seeing that it has been accessed, can make a big security difference. Without the service, the access may have been missed.
Mythics has systems running millions of transactions a day, and machine learning thrives on that kind of big data. But Benner has had systems doing 20 transactions a day, and after a week the Oracle Management Cloud system was still starting to build a good baseline. “It just needs to know what’s normal,” he says. Exactly how long that takes—a week, a month—depends on the transaction volume and business process involved. Benner’s rule of thumb: at least one business cycle, whatever that is for the system or business.
To do all this machine learning, Oracle Management Cloud Service pulls in all of your log data. So if you’re an ecommerce company, and the dollar amount of each sale is a data point contained in your logs, Oracle Management Cloud can be tweaked to track that, Benner says. Mythics helps IT teams pull that kind of running data into a dashboard. “If the data is in your logs, you can put it on your dashboard,” he says. “You can use it for content you never thought of for an IT management system.”
What makes machine learning special it can sort through a complex fog of data to find insights people couldn’t. In the IT monitoring scenario, machine learning pulls data from every part of the IT organization in order to learn what’s normal and abnormal. Some teams won’t like that construct, though, and they’re often reluctant to let something new monitor their fiefdom, be it databases, apps, or infrastructure.
“You may have individual teams resist, but then the database team sees the visibility they get into apps, and can see whether a performance problem really is a database problem or an application issue,” Benner says. “And then, they see this is a tool that everyone can use to support the user.”
Consider this the flip side of the resistance noted above. In DevOps and Agile shops, where teams already are working across functions, machine learning tends to fit right in, because it puts actionable data in front of teams built to solve problems quickly regardless of the technology pillars—networking, databases, apps—they’re dealing with. “There, you don’t have people fighting it, because they’ve already collapsed the pillars,” Benner says.
To learn more about how Mythics can help transform your organization with Oracle Cloud Soutions: https://www.mythics.com/products-and-training/oracle-cloud/