by NVIDIA (dominant ecosystem vendor; GPUs also produced by AMD, Intel, others) • Santa Clara, California, USA (NVIDIA) • Founded 1993
Graphics processing units (GPUs) are massively parallel processors originally designed for rendering graphics that are now widely used to accelerate AI and machine learning inference workloads. For inference, GPUs execute large numbers of matrix and tensor operations concurrently, dramatically reducing latency and increasing throughput versus general‑purpose CPUs. They matter because they underpin most production-scale deep learning services, from recommendation systems to generative AI, enabling cost-effective, high-performance deployment of trained models.
Custom ASICs optimized for TensorFlow and large-scale Google Cloud workloads; offer high performance-per-watt for specific deep learning operations.
Amazon-designed ASICs for cost-optimized training and inference on AWS, tightly integrated with AWS ML services.
GPU accelerators using ROCm software stack, positioned as open alternative to NVIDIA for AI training and inference.
Intel’s AI accelerators and GPUs targeting data center training and inference with oneAPI and Habana software stacks.
Specialized low-power chips for on-device inference in embedded and IoT scenarios.