Abstract
We derive estimates of mean time to failure and mean time to recover/repair for both hardware and software in a large wireless telecommunications system, based on six months of manually recorded outage data. The observed failure and recovery distributions are not consistent with simple exponential processes. The data can be described by Weibull or two-stage hyper-exponential distributed processes. The duration distributions for scheduled and unscheduled software outages have very different characteristics. The complex distributions observed may be the composition of simple independent processes which cannot be separated in this data set due to a lack of adequately detailed information or proper characterization of outage causes. In this system we found a coverage of \sim 98% for auto-recovery from unscheduled software failures with an auto-repair fraction of \sim 36%.