Abstract
Volunteer computing systems are large-scale distributed systems with large number of heterogeneous and unreliable Internet-connected hosts. Volunteer computing resources are suitable mainly to run High-Throughput Computing (HTC) applications due to their unavailability rate and frequent churn. Although they provide Peta-scale computing power for many scientific projects across the globe, efficient usage of this platform for different types of applications still has not been investigated in depth. So, characterizing, analyzing and modeling such resources availability in volunteer computing is becoming essential and important for efficient application scheduling. In this paper, we focus on statistical modeling of volunteer resources, which exhibit non-random pattern in their availability time. The proposed models take into account the autocorrelation structure in subset of hosts whose availability has short/long-range dependence. We apply our methodology on real traces from the SETI@home project with more than 230,000 hosts. We show that Markovian arrival process can model the availability and unavailability intervals of volunteer resources with a reasonable to excellent level of accuracy.