Adaptive Profiling for Root-cause Analysis of Performance Anomalies in Web-based Applications
Authors
Abstract
The most important factor in the assessment of the availability of a system is the mean-time to repair (MTTR). The lower the MTTR the higher the availability. A significant portion of the MTTR is spent in the detection and localization of the cause of the failure. One possible method that may provide good results in the root-cause analysis of application failures is run-time profiling. The major drawback of run-time profiling is the performance impact.In this paper we describe two algorithms for selective and adaptive profiling of web-based applications. The algorithms make use of a dynamic profiling interval and are mainly triggered when some of the transactions start presenting some symptoms of performance anomaly. The algorithms were tested under different types of degradation scenarios and compared to static sampling strategies. We observed through experimentation that the pinpoint of performance anomalies, supported by the data collected using the adaptive profiling algorithms, stills timely as with full-profiling while the response time overhead is reduced in almost 60%. When compared to a non-profiled version the response time overhead is less than 1.5%. These results show the viability of using run-time profiling to support quickly detection and pinpointing of performance anomalies and enable timely recovery.