
Facebook has figured out a new way to train computer vision models that will massively accelerate the company’s work with artificial intelligence. Using the new technique, the company can train an image classification model in an hour while maintaining its accuracy.
At peak performance, the new system Facebook laid out in a paper today can train 40,000 images per second using 256 GPUs, without sacrificing the quality of the resulting model. It’s an achievement that should help improve the quality of future research by helping data scientists test their hypotheses faster.
Accelerated machine vision training is incredibly important for Facebook, which sees augmented reality and machine learning as key to its future business. Speeding up model creation means that the company’s data scientists can run through multiple model permutations per day, rather than having to take a day to run a single test, according to Pieter Noordhuis, a software engineer on Facebook’s Applied Machine Learning team.
“They can say ‘OK, let’s start my day, start one of my training runs, have a cup of coffee, figure out how it did,'” he said. “And using the performance that [they] get out of that, form a new hypothesis, run a new experiment, and do that until the day ends. And using that, [they] can probably do six sequenced experiments in a day, whereas otherwise that would set them back a week.”
Facebook’s acceleration works by expanding the mini-batch size of images processed in the training, which makes it possible to accelerate the learning process by running computations across a large number of GPUs. However, increasing the mini-batch size also requires an increase in the learning rate, which has led to a decrease in accuracy in the past.
What the team at Facebook came up with was a new warm-up phase, which slowly ramps up the learning rate and the batch size over time to help maintain the accuracy found in smaller batches. Using that, they were able to maintain roughly the same error rate for a mini-batch of 8,192 images as they were with a mini-batch size of 256 images.
The benefits of this research aren’t limited to Facebook. The company did all of its work on servers with designs published through the Open Compute Project using the open source Caffe2 framework. People using other servers and other frameworks can follow the techniques set out in the paper and should be able to see similar benefits.
That said, it’s not clear of this technique will produce similar results for different problems. However, the Facebook research makes it possible for other data scientists, including those working at the social networking company, to pursue additional lines of questioning.
Another key benefit of this research, according to Noordhuis, was that it proved the value of Facebook’s AI Research team, better known as FAIR. The results reported today resulted from a collaboration between the two organizations. FAIR contributed its insights on expanding batch sizes and creating a warm-up phase, while the company’s Applied Machine Learning team took its expertise to make the resulting system work in a data center.