The PyTorch community has made remarkable strides in recent times. Last year, contributors of PyTorch introduced BetterTransformer inference optimizations for transformer models such as GPT, which have significantly improved the performance of these models. This collection of highly optimized code is designed specifically to accelerate transformer models in production workloads, allowing for more accurate and efficient data generation.
The transformative potential of generative AI, for instance, in generating novel data from existing sources, has been widely acknowledged. And the recent breakthroughs in AI have sparked a growing interest in understanding the underlying mechanisms driving these advancements.
To gain further insight for this piece, I sought out leading experts and AI research scientists who shed light on how PyTorch is better and paving the way for a torrent of advancements in AI.
PyTorch allows Hardware Acceleration
PyTorch is already fast by default, but its performance has been further enhanced with the introduction of compiler technology. This technology enables faster training and serving of models by fusing operations, auto-tuning, and optimizing programs to run as quickly as possible on the hardware available, resulting in significant performance gains compared to previous versions of the software.
Dynamo and Inductor, the core of the PyTorch 2.0 stack, respectively acquire a program and optimize it to run as fast as possible on the hardware at hand. “This is achieved through fusing operations so that the computing can be saturated without being bottlenecked by memory access and auto-tuning, so that dedicated kernels can be optimized as they run to achieve maximum performance. Gains can be as high as 40%, both for training and inference, so that’s a very big deal,” commented Luca Antiga, CTO of Lightning AI and contributor to PyTorch.
“Previously, PyTorch had the technology to optimize programs, but it required users to tweak their code for it to work and disallowed certain operations, such as calling into other Python libraries. PyTorch 2.0, on the other hand, will work in all these cases, reporting what it could and couldn’t optimize along the way,” Antiga mentioned.
PyTorch now supports a multitude of different backend and computing devices, making it one of the most versatile deep learning frameworks available. This also makes it easier than ever to deploy models built with PyTorch into production, including on AMD GPUs via ROCm.
“It is excellent for model development,” says Pieter Luitjens, CTO of Private AI, “but it is best to use a different framework for running in production.” He pointed out that this approach is recommended by the PyTorch developers themselves, and as a result, PyTorch offers great support for packages like FasterTransformer, an inference engine created by Nvidia that is used by most of the big tech companies to run models such as GPT.
Researchers Consider PyTorch for Generative AI
PyTorch has shown its flexibility since bursting onto the scene and dethroning TensorFlow circa 2018. Back then, it was all about convolutional neural networks, while now PyTorch is being used for completely different types of models, such as stable diffusion, which didn’t exist back then.
“In my opinion,” Luitjens shares, “PyTorch has become the tool of choice for generative AI due to its focus on dynamic execution, its ease of use for researchers to prototype with, and its ability to easily scale to thousands of GPUs. There’s no better example than the recent open-source language models from GPTNeo and BLOOM – it would never have been possible without PyTorch. The team behind GPTNeo specifically cited their move to PyTorch as a key enabler.”
There’s also a growing preference for PyTorch among researchers. However, it is also apparent that TensorFlow, unlike PyTorch, is tailored for industrial use, boasting a vast array of customizable features and supporting use cases, such as JVM compatibility and online serving. “This makes it easier for companies to use TensorFlow in production and scale TensorFlow use cases up to billions of users. However, this power makes TensorFlow more rigid, more difficult to learn, and harder to adapt to completely new applications,” says Dan Shiebler, Head of Machine Learning at Abnormal Security.
According to Shiebler, TensorFlow’s reliance on static graphs makes variable length sequences (a core component of generative AI!) awkward to manage. PyTorch is, therefore, more widely used by the research community. “This creates a flywheel effect. New models are released in PyTorch first, which causes researchers to start with PyTorch when expanding prior research,” he pointed out.
Aggressively developed for ease
Writing PyTorch feels a lot more like writing plain Python than other frameworks. Control flow, loops, and other operations are fully supported, making the code both readable and expressive. Moreover, the debugging experience with PyTorch is top-notch; Pdb works seamlessly, allowing you to step through a program and have operations eagerly executed as you go. “This experience is much less painful than with other frameworks, enabling you to quickly iterate towards a working model,” Antiga remarked.
PyTorch really shines when coupled with projects like PyTorch Lightning or Lightning Fabric, which complement it by abstracting engineering details and allows AI engineers to scale their models to billions of parameters and clusters of machines without changing their code. “I don’t think there are particular disadvantages to PyTorch. Maybe higher order derivatives and program transforms like vmap, which are provided in functorch but not at the level they are in other projects like JAX, can be relevant limitations for certain domains, although not so much for deep learning today,” Antiga added.
Through his experience contributing to PyTorch, Antiga also specified that most of the research conducted today, both in AI and in making use of AI, is implemented in PyTorch, and the implementation is often shared as an open source. The ability to build on each other’s ideas is an incredibly powerful dynamic, creating an exponential phenomenon.
Reference/ Citations
- Luca Antig is the CTO of Lightning AI and a core contributor to PyTorch. He is the founder of multiple AI companies, including Tensorwerk, which was acquired by Lightning in 2022. Luca co-hosts The AI Buzz podcast, where he discusses the latest trends in AI.
- Pieter Luitjens is the Co-Founder and CTO of Private AI, a Microsoft-backed company that uses machine learning to identify, remove, and replace personally identifiable information from text, audio, and video.
- Dan Shiebler is the Head of Machine Learning at Abnormal Security, where he leads a team of detection engineers to build AI systems that fight cybercrime. Combining foundational data engineering and advanced ML, their technology protects many of the world’s largest companies from cyberattacks.
The post PyTorch is Exceedingly Good for AI and Data Science Practice appeared first on Datafloq.