Abstract
A major challenge of running applications in clouds is to determine the right number of resources (virtual machines or VMs) to rent in terms of both performance and cost. Such a challenge becomes greater if the application requires to run across multiple resources. In this paper, we address the problem of scheduling scientific workflow applications. The structure of workflows, dictated by precedence/data dependencies, and the diversity of resources in clouds both at large scale make the resource provisioning and task scheduling very complex. To this end, we design the Resource Demand Aware Scheduling (RDAS) algorithm that schedules workflows based on their resource demands and priorities considering workflow structure. RDAS partitions workflows and allocates resources of possibly different capacities/types to the partitions in a “fair” manner such that their execution times do not vary significantly. RDAS turns resource and application heterogeneity (a major hindering factor in clouds) into an opportunity for optimizing resource provisioning for scientific workflows. Based on our experimental results, RDAS demonstrates its capacity of minimizing the overall workflow completion time (makespan) and in turn minimizing costs of the execution. In particular, RDAS outperforms three existing algorithms by 22%, 13% and 33%, on average, in terms of makespan, cost and the number of resources used, respectively.