@samuel.kounev

Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning

, , , , and . Proceedings of the 20th International Middleware Conference, page 149--162. New York, NY, USA, Association for Computing Machinery (ACM), (2019)
DOI: 10.1145/3361525.3361543

Abstract

Today, software operation engineers rely on application key performance indicators (KPIs) for sizing and orchestrating cloud resources dynamically. KPIs are monitored to assess the achievable performance and to configure various cloud-specific parameters such as flavors of instances and autoscaling rules, among others. Usually, keeping KPIs within acceptable levels requires application expertise which is expensive and can slow down the continuous delivery of software. Expertise is required because KPIs are normally based on application-specific quality-of-service metrics, like service response time and processing rate, instead of generic platform metrics, like those typical across various environments (e.g., CPU and memory utilization, I/O rate, etc.)In this paper, we investigate the feasibility of outsourcing the management of application performance from developers to cloud operators. In the same way that the serverless paradigm allows the execution environment to be fully managed by a third party, we discuss a monitorless model to streamline application deployment by delegating performance management. We show that training a machine learning model with platform-level data, collected from the execution of representative containerized services, allows inferring application KPI degradation. This is an opportunity to simplify operations as engineers can rely solely on platform metrics -- while still fulfilling application KPIs -- to configure portable and application agnostic rules and other cloud-specific parameters to automatically trigger actions such as autoscaling, instance migration, network slicing, etc.Results show that monitorless infers KPI degradation with an accuracy of 97% and, notably, it performs similarly to typical autoscaling solutions, even when autoscaling rules are optimally tuned with knowledge of the expected workload.

Links and resources

Tags

community