Dependable Systems and Networks Workshops
Download PDF

Abstract

It is quite a headache for developers to online detect performance problems in large-scale cloud computing systems. The behavior and the hidden connections among the huge amount of runtime request execution paths in cloud computing systems usually contain useful information for performance problem detection. In this paper, we propose an approach to rapidly diagnose the source of performance degradation in large-scale non-stop cloud computing systems. The approach first groups the user requests into categories with a fast clustering algorithm; then applies the principal components analysis to extract the primary methods; finally compares the normal and abnormal behaviors of the primary methods to localize the main cause of performance problems. We conduct extensive experiments over a real-world enterprise system providing services for the public. The results show that our approach can locate the prime causes of performance problems accurately and efficiently.1
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles