Independent labs network eight NVIDIA GB10 nodes for local inference as commodity switches bypass the cloud
By routing 400GbE RDMA traffic through off-the-shelf MikroTik hardware, operators are running massive frontier models on 1TB of pooled local memory.
The QSFP56-DD cables route out the back of eight NVIDIA GB10 units on a standard studio rack, bypassing the hyperscaler data centre entirely. The assembly of a local, 8-node cluster capable of running massive frontier models marks a structural shift in AI infrastructure: enterprise-scale inference no longer strictly requires enterprise-scale facilities. Until recently, pooling enough compute to run 500-gigabyte models demanded proprietary fabric and dedicated cooling. Now, it requires only the right configuration of off-the-shelf networking gear.
The decoupling relies on commodity hardware intercepting what was previously a closed ecosystem. By routing NVIDIA’s onboard ConnectX-7 200GbE networking through a single MikroTik CRS804 DDQ switch, operators can establish the RDMA connections necessary for multi-node scaling without a traditional spine-and-leaf architecture. This topology allows the cluster to pool 1TB of memory and 160 Arm cores across the eight Grace Blackwell nodes, generating sufficient bandwidth to run heavy models like Kimi K2.5 entirely locally.
Storage economics further isolate the setup from cloud dependencies. By provisioning the GB10 units with baseline 1TB local NVMe drives instead of 4TB variants, operators save roughly $1,000 per node—a $8,000–$10,000 capital reduction across a standard rack. The massive model weights and agent directories are instead offloaded to a shared QNAP network-attached storage array over a secondary 10GbE management layer powered by Marvell silicon. This shared storage architecture allows operators to isolate agent workspaces, using ZFS snapshots to roll back rogue processes without rebuilding the entire cluster.
The winners in this configuration are mid-market hardware vendors like MikroTik and QNAP, whose standard routing and storage equipment is suddenly capturing AI capital expenditure. The losers are the tier-one cloud providers, whose managed inference margins assume a monopoly on the high-bandwidth networking required to run Tensor Parallel workloads across multiple GPUs. When a single switch can coordinate eight Blackwell chips, the cloud premium becomes harder to justify for sovereign data.
What this hardware baseline forecloses is the assumption that secure, air-gapped AI requires a bespoke facility build-out. What it opens is a highly distributed model for enterprise software, where proprietary data never leaves the building and heavy inference happens quietly outside the perimeter of the major clouds.
