Abstract
High quality (or photo-realistic) rendering is a computationally intense: farms of hundreds of servers take months to render movies with considerable special effects. However, considerable inherent parallelism means that the rendering time may be reduced by implementing key routines in hardware. Profiling of Pixie, an open source renderer, showed that ~ 95% of CPU cycles were used to calculate ray-triangle intersections. Implemented this routine for an FPGA showed speedups of 100, if data could be fed to the ray-triangle pipeline fast enough. Available busses have insufficient bandwidth, so we developed an architecture with most of the rendering pipeline on the FPGA surface. A key component of this architecture is a cache for object data which allows the system to render scenes of very high complexity (> 106 basic elements) using a usual memory hierarchy - bulk memory plus paging disc. The object cache retains commonly used objects, reducing the load on the system (eg PCI) bus.