
NetApp made some announcements last week at NetApp INSIGHT® that will supercharge your organization’s AI investments. We announced a new architecture for AI workloads in the NetApp® AFX 1K; an end-to-end AI data service for GenAI, RAG, and agentic AI in the NetApp AI Data Engine; and much more.
In this blog post, I’ll dive deep into validation of this powerful, AI-ready architecture: NetApp AFX. AFX is certified to bring disaggregated ONTAP® to NVIDIA DGX SuperPODTM systems using DGXTM GB300, making it an established AI workhorse. Additionally, we completed extensive testing with NVIDIA’s Magnum IO GPUDirect Storage (GDS) capabilities. GDS testing measures how quickly data can move between storage systems and NVIDIA accelerated computing using Magnum IO GDS, which allows GPUs to access data from storage without using the CPU or system memory. Instead of data traveling through multiple layers (CPU, RAM, etc.), it moves directly from storage to GPU memory, reducing latency and increasing throughput.
For enterprises running AI workloads, GDS testing helps answer these critical questions:
Can GPU demand be met by the storage system? Will performance scale predictably as workloads grow?
Is it possible to achieve this level of performance without interfering with operations or requiring specialized infrastructure? The 2025 GDS testing validated how well NetApp AFX supports high-performance AI workloads without compromising enterprise-grade reliability, security, and scalability.
Contents
NetApp AFX testing results
For NVIDIA Magnum IO GPUDirect workloads, NetApp achieved 457GiB/s of sustained throughput across 8 AFX nodes in our most recent testing—a 33% increase from last year’s results. We no longer need to overprovision capacity to meet performance requirements because of AFX’s disaggregated architecture, which decouples capacity from performance. Compared to previous results, only one eighth of the storage capacity was used to achieve the 33% increase in performance. We demonstrated the linear scaling of performance with AFX nodes without diminishing returns. Bottom line: NetApp AFX brings the best of enterprise-ready ONTAP, enabling multi-tenancy, nondisruptive scaling, and IT-friendly NFS over RDMA, making it the ideal platform for today’s most demanding AI workloads.
AFX: Built for the AI-powered enterprise
AI for businesses is no longer a science experiment. Now that AI workloads are at the heart of business operations, outages, disruptions, and fragile infrastructure have real financial consequences. Enterprise infrastructure teams cannot afford to hire specialized staff to care for bespoke architectures or brittle, siloed architectures that necessitate forklift upgrades whenever performance requirements increase. That’s why this year’s results matter: NetApp tested the new AFX disaggregated storage systems, which are built for the AI-powered enterprise. These systems provide independently scalable performance and capacity with linear gains through disaggregation, allowing you to predictably scale throughput. Customers get high-performance computing capabilities without having to sacrifice security or uptime. In addition, because it is ONTAP, AFX seamlessly integrates with the enterprise data estates of the customer and provides granular, policy-based security so that AI can only access the specified data.
The AI advantages of AFX in the enterprise
AFX empowers enterprises to accelerate AI adoption with simplicity, scalability, and seamless integration—without disrupting existing workflows or infrastructure.
Simple. With pNFS and session trunking, AFX provides full-cluster performance from a single mount point, making use of the same IT-friendly NFS over RDMA stack that you are already familiar with. No exotic file systems, no specialized training. Grow your AI capabilities without growing your headcount.
Scalable. Add compute nodes or storage enclosures independently to meet performance and capacity demands. Scale linearly without changing your environment’s architecture or over-provisioning. Silo free. AFX uses the same ONTAP data management operating system that is dependable for more than 100 exabytes of enterprise data. It offers hybrid cloud integration, multi-tenant isolation, and the ability to quickly onboard AI for each business function on a single platform. Continuous innovation 2023. With 171GiB/s on an AFF A800 cluster, we demonstrated that ONTAP could achieve HPC-class performance with NVIDIA GPUDirect Storage. 2024. We doubled that on AFF A90—351GiB/s on a 4-system cluster—while simplifying RDMA/NFS operations and enabling nondisruptive upgrades.
2025. With NetApp AFX disaggregated storage, we achieved 457 GiB/s, a 33% improvement in performance for just one eighth of the capacity used.
How we tested
We pushed our new disaggregated AFX system to the limits to deliver this amazing result. GPUs are data hungry, and AFX is built to support them. We need a lot of GPUs to push the system to its limits. The NetApp Performance Team built a disaggregated AFX storage cluster using 8 AFX nodes and a single disk shelf. Compared with our A90 publication, we used 87% fewer NVMe SSDs to achieve even higher performance.
The AFX controllers used 3 high-performance I/O expansion cards running at 200Gbe with support for up to 400Gbe (NetApp P/N X50131A – NVIDIA ConnectXTM-7). The team stayed with a pair of NVIDIA Spectrum-3 SN4600 for apples-to-apples comparison with our previous GDSIO results.
We continued using a mix of NVIDIA DGX A100 systems and Lenovo SR675v3 servers with NVIDIA L40S GPUs to drive I/O. Clients operated at 200Gbe for consistency across the network. Clients leveraged CUDA 12.6, GDS 1.11.0.15, and cuFile tuned for NUMA locality.
The AFX cluster presented a single FlexGroup volume spanning all 8 AFX nodes, harnessing the full performance of the entire cluster in a single, simple-to-manage namespace. The FlexGroup volume was mounted over NFSv4.1, pNFS enabled, 1M rsize/wsize, session trunking with 16 sessions, and of course Magnum IO GPUDirect Storage enabled for GDSIO to leverage. To enable more than 2 subnets/network links to work as anticipated, we used to ARP configuration values to eliminate cross-subnet responses:

 
         
                                 
         
         
         
        