The landscape of AI deployment at the edge has witnessed significant advancements, particularly with the rise of powerful platforms like the NVIDIA Jetson platform for edge AI and robotics. These platforms, renowned for their versatility and computational power, are widely adopted across industries for various use cases, from autonomous machines to smart cities. To measure and showcase the capabilities of such platforms, benchmarks like MLPerf® Inference are critical.

Connect Tech is a global leader in embedded computing, with a focus on developing the solutions for needed for high compute capabilities at the AI edge. In this article, we delve into our latest submission for the MLPerf Inference v4.1 benchmark, highlighting the exceptional performance of the Anvil Embedded System paired with the NVIDIA Jetson AGX Orin 64GB system-on-module (SOM). Connect Tech is an Elite member of the NVIDIA Partner Network.

Understanding MLPerf Inference: Edge Benchmarks

MLPerf Inference is a set of standardized benchmarks developed by MLCommons® to measure the performance of machine learning models in real-world inference tasks. These benchmarks are designed to evaluate how well a system can handle AI workloads, particularly in terms of latency and throughput, across various scenarios such as single-stream inference, multi-stream inference, and offline processing.

For edge devices like the NVIDIA Jetson AGX Orin SOM, the MLPerf Inference: Edge benchmarks are crucial, as they simulate the types of workloads these devices would encounter in real-life applications. These applications range from image classification and object detection to more complex processes like natural language processing (NLP), recommendation systems, and large language models (LLM).

The Anvil Embedded System, when paired with the NVIDIA Jetson AGX Orin 64GB SOM, delivered exceptional performance across both the Single Stream Latency (ms) and Offline (Samples/s) categories.

The Single Stream Latency (ms) metric measures the time taken to process a single input, reflecting the device’s responsiveness. In contrast, the Offline (Sample/s) metric measures throughput, indicating how many samples the device can process in a given time frame when there is no constraint on latency.

This benchmark suite helps developers evaluate how well edge devices can handle tasks like image classification, object detection, NLP, and generative AI so they can push the boundaries of what AI at the edge can achieve. The result from these benchmarks shows the Anvil Embedded System with the NVIDIA Jetson AGX Orin 64GB SOM is capable of running any kind of transformer model, including LLMs and vision transformers, locally.

Our Journey with MLPerf Inference: From Hadron Carrier and Module to Anvil with Full Enclosure

Hadron carrier board and Anvil system side by side for the years particpated in MLPerf, 2023 and 2024.

Connect Tech is committed to innovation, and we have been actively involved in the MLPerf community. 2023 marked a significant milestone for us with our first MLPerf submission, where we tested our Hadron carrier board paired with the NVIDIA Jetson Orin NX 16GB SOM. This submission underscored our dedication to expand the capabilities of AI at the edge, and it provided valuable insights that shaped our development strategies.

Building on that experience, our latest submission features the Anvil Embedded System coupled with the NVIDIA Jetson AGX Orin 64GB SOM – a combination engineered for even greater performance – and we aimed to push performance even further. This submission represents our commitment to continuous improvement and our belief in the transformative power of AI at the edge

Importance of Full-Stack Optimization

One of the key factors behind our successful MLPerf results is our focus on full-stack optimization. This approach involves fine-tuning both the hardware and software layers to work seamlessly together. The NVIDIA Jetson platform, with its integrated NVIDIA CUDA-X AI software stack, provides a solid foundation for these optimizations, helping us optimally use the Jetson AGX Orin SOM.

Initial Results: GPT-J 6B on the Anvil Embedded System

Connect Tech MLPerf results of 4,145.57 ms single stream latency, and 64.01 samples per second for offline throughput

As part of the MLPerf v4.1 Inference benchmark, our Anvil system, paired with the NVIDIA Jetson AGX Orin 64GB SOM, was benchmarked using the GPT-J 6B model—an LLM designed for text generation and summarization tasks. Initial results from the benchmark demonstrated promising performance in the Single Stream Latency (ms) and Offline (Samples/s) categories. The Anvil platform’s ability to process complex LLM workloads at the edge is a testament to its exceptional computational power and optimized software stack.

Single Stream Latency (ms): 4,145.57

Offline Throughput (Samples/s): 64.01

These results are indicative of the Anvil platform’s capacity to meet the growing demand for running generative AI at the edge Rather than developing a model for a specific use case, a user can now use the GPT-J 6B general purpose model to seamlessly interface with human language at the edge.

MLPerf Inference benchmarks provide a clear metric for evaluating the performance of AI models in real-world edge applications. Our latest submission to MLPerf v4.1 with the Anvil Embedded System is a strong demonstration of our commitment to optimizing AI platforms for edge environments. As a member of MLCommons, we are dedicated to advancing AI at the edge, providing our customers with innovative solutions that improve application performance.

To read more about how our results and generative AI applications can improve your operational workflows, see our extended article on large language models and their role in edge AI.