HardwareInference

Graphics processing units (GPU)

Graphics processing units (GPUs) are massively parallel processors originally designed for rendering graphics that are now widely used to accelerate AI and machine learning inference workloads. For inference, GPUs execute large numbers of matrix and tensor operations concurrently, dramatically reducing latency and increasing throughput versus general‑purpose CPUs. They matter because they underpin most production-scale deep learning services, from recommendation systems to generative AI, enabling cost-effective, high-performance deployment of trained models.

by NVIDIA (dominant ecosystem vendor; GPUs also produced by AMD, Intel, others)BigTech

Key Features

  • Massively parallel architecture with thousands of cores optimized for matrix/tensor math
  • Specialized units (e.g., Tensor Cores) for mixed-precision deep learning inference (FP16, INT8, INT4)
  • High memory bandwidth (HBM/GDDR) for fast access to model parameters and activations
  • Mature software stack (CUDA, cuDNN, TensorRT, ROCm, oneAPI) and framework integrations (PyTorch, TensorFlow, JAX)
  • Scalability from single-GPU servers to multi-GPU nodes and large clusters with NVLink/NVSwitch/PCIe

Pricing

Paid

GPUs are sold as hardware devices (cards, servers, appliances) with pricing varying by model and vendor; cloud providers offer GPU instances billed per second/hour. Additional software (e.g., enterprise support, management suites) may be licensed separately. Pricing is not standardized and depends on configuration, volume, and contracts.

Alternatives

TPUs (Tensor Processing Units)AWS Inferentia / TrainiumAMD Instinct / Radeon GPUsIntel Gaudi / GPU Max SeriesEdge AI ASICs (e.g., Google Edge TPU, Hailo, Mythic)

Industries Using Graphics processing units (GPU)

Often Used With

Quick Stats

1
Total Use Cases
1
Industries
2
Related Technologies

Use Cases Using Graphics processing units (GPU)

No use cases found for this technology.

Browse all technologies