Arrcus triples bookings & unveils AI inference fabric

Tue, 24th Feb 2026

Arrcus has reported triple bookings growth in 2025 and unveiled a new networking product aimed at routing AI inference traffic across distributed infrastructure.

The bookings increase came from data centre, telecoms and enterprise customers. Arrcus said its switching and routing software is in production across thousands of network nodes worldwide.

Arrcus sells the ArcOS network operating system and the ACE platform, positioning them as alternatives to integrated networking stacks from established vendors. The company also emphasises support for a range of open networking hardware.

Inference focus

Alongside the bookings update, Arrcus introduced Arrcus Inference Network Fabric (AINF), which it describes as an AI policy-aware network fabric for inference workloads running across multiple sites.

AINF routes inference traffic between inference nodes, caches and data centres. Arrcus said it targets higher throughput, faster time to first token and lower end-to-end latency for inference requests.

Arrcus linked the product to growth in agentic and physical AI, where systems generate outputs and take actions based on model responses. It said adoption is constrained by response times, the range of models in use and the need to move inference decision-making closer to edge locations.

Inference deployments often span distributed clusters, and operators face limits tied to latency, availability, power grid capacity, data sovereignty rules and cost, the company said.

Arrcus said AINF lets operators apply policy controls that influence how inference traffic moves across the network. It listed latency targets, data-sovereignty boundaries, model preferences and power constraints among the inputs.

AINF includes query-based inference routing with policy management, interconnect routers and edge networking, according to Arrcus. The company also said it built a policy abstraction layer that maps application intent to infrastructure performance while masking underlying complexity from operators.

Integration claims

Arrcus said AINF is designed to integrate with inference frameworks, including vLLM, Nvidia Dynamo, SGLang and Triton, and can be composed and deployed using Kubernetes-based orchestration.

Arrcus also referenced techniques such as prefix awareness to optimise key-value cache usage. It said the approach can support service-level objectives tied to throughput, token retrieval time, latency, sovereignty, power and cost.

Arrcus cited research indicating that AI-aware routing could cut time to first token by more than 60%, improve tokens per second by 15% and reduce end-to-end latency by 40%. It also cited an estimate of up to 30% cost reduction for inference workloads, referencing sources including Anyscale Ray Serve, Red Hat material on vLLM semantic routing and an AWS machine learning blog.

"To enhance agentic AI adoption by improving response times, networks need to become AI-aware," said Shekar Ayyar, Chairman and CEO of Arrcus.

"AINF extends Arrcus' leadership in distributed networking by delivering the first fabric designed to meet the latency, sovereignty, and power constraints of large-scale AI inferencing," Ayyar added.

Market context

Industry analysts used the announcement to place network fabrics in a wider revenue context as AI deployments shift from training to inference. Alan Weckel, Founder and Technology Analyst at 650 Group, said: "AI fabrics-scale-up, scale-out and scale-across-are poised to approach $200B in revenue by 2030, with Ethernet being the major contributor."

Weckel added: "Network fabrics can significantly improve AI fabric performance and help customers scale the network with the rapid growth in accelerators as the market moves from foundational model training to inference being the dominant use case."

Roy Chua, Founder and Principal at AvidThink, said: "Traditional network fabrics weren't designed with AI inference workloads in mind. Arrcus' Inference Network Fabric changes that with a policy-aware, intent-driven approach that understands inference-specific demands-latency sensitivity, model selection, cache optimisation-and dynamically routes traffic accordingly."

"As inferencing scales across distributed environments, this kind of workload-aware networking will be essential to maximising AI-enabled application performance," Chua added.

Scott Raynovich, Founder and Principal Analyst at Futuriom, said: "With its efficient distributed cloud networking platform and newly announced Arrcus Inferencing Network Fabric (AINF), Arrcus is well-positioned to serve diverse networking needs across industries, providing scalable and high-performance connectivity for any application ranging from communications services to AI inference."

Arrcus said AINF is designed to work with a range of inference accelerators and network silicon across hardware providers. It added that partners can integrate components such as load balancers, firewalls and power management policies into deployments.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google