Apple releases forked version of TensorFlow optimized for macOS Big Sur

Apple today released a forked version of TensorFlow, Google’s AI and machine learning development environment, optimized for Intel Macs and Macs powered by Apple’s new M1 chip. By taking advantage of the ML Compute framework on macOS Big Sur, Apple says, the Mac-optimized release of TensorFlow 2.4 allows developers to take leverage accelerated processor (CPU) and graphics card (GPU) training on hardware like the M1’s 8-core CPU and 8-core GPU.

Training sophisticated AI models can be prohibitively expensive for developers. For instance, Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks. While training models like BERT likely remains beyond the reach of commodity hardware like MacBooks, the new Mac-optimized TensorFlow package promises to lower the barrier to entry, enabling enterprises to train and deploy models more easily and cheaply than before.

According to Apple, the new macOS fork of TensorFlow 2.4 starts by applying higher-level optimizations such as fusing layers of the neural network, selecting the appropriate device type, and compiling and executing the graph as primitives that are accelerated by BNNS on the CPU and Metal Performance Shaders on the GPU. TensorFlow users can get up to 7 times faster training on the 13-inch MacBook Pro with M1, Apple claims.

Above: Training impact on common models using ML Compute on M1- and Intel-powered 13-inch MacBook Pro are shown in seconds per batch, with lower numbers indicating faster training time.

Apple’s internal benchmarks show that popular models like MobileNetV3 train in as little as 1 second on a 13-inch MacBook Pro with M1 and the new TensorFlow release, compared with over 2 seconds on the Intel-powered 13-inch MacBook Pro running an older TensorFlow package. Moreover, the company claims that training a style transfer algorithm on an Intel-powered 2019 Mac Pro with the TensorFlow optimizations can be done in around 2 seconds versus 6 seconds on unoptimized TensorFlow releases.

“With TensorFlow 2, best-in-class training performance on a variety of different platforms, devices and hardware enables developers, engineers, and researchers to work on their preferred platform,” Google technical program manager Pankaj Kanwar and product marketing lead Fred Alcober wrote in a blog post. “These improvements, combined with the ability of Apple developers being able to execute TensorFlow on iOS through TensorFlow Lite, continue to showcase TensorFlow’s breadth and depth in supporting high-performance ML execution on Apple hardware.”

Above: Training impact on common models using ML Compute on the Intel-powered 2019 Mac Pro are shown in seconds per batch, with lower numbers indicating faster training time.

Apple and Gooyl say that users don’t need to make changes to existing TensorFlow scripts to use ML Compute as a backend for TensorFlow. In the near future, the company plans to begin integrating the forked version of TensorFlow 2.4 into the TensorFlow master branch.

Source: Read Full Article