Leading CGI Production House
Render Farm Grid Data Optimization
EFFICIENCY & PRODUCTIVITY IN RENDER FARMS
CGI companies deploy CPUs in clusters or grids, going up to several 100s, in order to provide the computing power required to render computer generated imagery (CGI), required for film and television projects. Even with sophisticated computing, rendering 3D or 2D models into graphic images is a time consuming and compute intensive activity and poses several concerns for administrators and IT leaders. Efficiency or getting the best performance from the available resources is certainly a big one amongst them. There are other concerns driven by industry trends - data sets becoming larger, jobs becoming more complex, more collaboration in the projects than before, delivery deadlines becoming tighter and of course, budget spending becoming smaller. Small wonder then, that there has been a demand to make the render farms more efficient and productive.
THE COMPANY'S PROBLEM
At the animation studio, a group of animators collaborated over a movie project. Each one of them designed different frames and scenes. Over the course of the day and weeks, hundreds of jobs were deployed across the different servers in the grid cluster. The same files are repeatedly read creating a large read I/O load over the network and the network storage. At crunch times the large volumes of NAS read operations slowed down the NAS. This not only affected the existing jobs, but also created a ripple effect across the grid. The NAS speed started to fluctuate from a typical 70 MBps to 20 MBps. And when any of the new caching appliances or NAS machine went down, it brought down the whole grid farm to a standstill resulting in job slowdown and productivity losses.
The studio had tried the usual strategy to overcome the performance bottleneck by over provisioning millions of dollars’ worth of NAS hardware. Besides becoming prohibitively expensive, this approach introduced newer problems – that of load balancing. Data management continued to be a challenge for the studio. Data in the studio system was maintained far away from the jobs and had to be retrieved over the network. This increased I/O latency, high cost IOPS, slowed applications down considerable. Faster caches could be used locally, but the pay off was not always consistent and the cost of IOPS was always a question. Optimisation was certainly solution. However, for administrators to optimise their data delivery system for higher speeds, would require for them to know their data better. For instance, they would need to know which files were most frequently used and so could be cached for speedy access or which jobs or applications required frequent storage access and which didn’t. Without this knowledge, optimising their data delivery systems for efficiency and productivity was never going to be easy.
THE DATAGRES PERFACCEL SOLUTION
PerfAccel gave administrators at the studio, for the first time, a point of view of the active data and its dynamics in their grid environment. PerfAccel, not only gave visibility of active data, but, through an analytic mode, it gave administrators the power to understand, to make sense of these dynamics and also the power to control these dynamics, to increase efficiency and acceleration of the application.
VALUE TO THE COMPANY
The studio's render farm consisted of 400 servers, 2 NAS servers and a 1G network interconnect. PerfAccel created cache devices on file system partitions on local SATA and SAS drives with size of 24 GB and 31 GB respectively. PerfAccel allowed NFS mount points to be automatically discovered. This installation configured NFS mount points as “Data Source” for the caching devices.
Initially, the data was read from the source and stored in the cache devices. Over a period of three days, two-thirds of the data was serviced from the cache. This translated into 1TB of network data savings within only three days of observation. More importantly, the extra IOPS on the backend NAS server increased the coverage ratio of number of servers per NAS.
“Once the performance bottleneck is alleviated from the back-end NAS storage, the performance gains are measured in terms of price of local disk capacity. The need to continually upgrade expensive back-end NAS storage for performance gains goes away.”
5 TB of IOPS at high speed per day on the studio infrastructure was estimated to be ~$ 200,000 of Tier-1 storage vendor per year including maintenance and management costs. Acceleration from the local SATA drives in this case, provided a gain of 40 servers per year estimated at ~$ 200,000 every year.
DATAGRES’ PerfAccel provided administrators with full command and control of the data in the grid through a single console. A single pane combined analytics and insight through performance dashboards, as well as a simple command line, for running commands i.e. creation of cache/source, deletion of cache/source, adjusting sizes and so on. For server maintenance, the PerfAccel software could be disabled with a single command without affecting the rest of the operation.
Its flexible interface let users configure their own policies of persistent cache, pre-fetching, predictive cache, real-time cache size configuration and auto-caching hundreds of NFS mount points.
PerfAccel Commands were easy to use and an administrator could learn them in a few minutes. the studio's system administrators were able to learn the commands easily.
One of the big advantages of PerfAccel solution was to enable hardware agnostic cache devices. It gave system administrators a choice to cautiously upgrade the grid to faster local storage as budgets allow. PerfAccel provided flexibility in configuring data management options and seamless working with the cache devices independent of their type and location.