Home / Solutions / DC & AI Fabrics

Data Center Modernization
& AI Fabrics

PalC designs and delivers open, AI-ready data center fabrics for high-throughput GPU communication, predictable latency, and operational stability across training and inference environments.

View Architecture
KEY CAPABILITIES

Built for AI workloads from day one

Comprehensive fabric engineering -- from leaf-spine design to storage access and operational telemetry.

DC Fabric Architecture

Scalable leaf-spine designs aligned with GPU workload behaviour, east-west traffic patterns, and long-term growth requirements.

Leaf-Spine ECMP MLAG

AI-Aware Network Design

Network designs engineered for the east-west traffic, high data movement, and performance sensitivity of training and inference platforms.

RoCE v2 PFC DCQCN

Open & Disaggregated Networking

SONiC-based open networking on multi-vendor hardware -- full operational control, zero proprietary lock-in.

SONiC Open HW EVPN-VXLAN

High-Performance Storage Access

NVMe-oF over RoCE transport for low-latency, high-IOPS tiered storage architectures for AI workloads.

NVMe-oF RDMA Tiered

Observability-First Design

Telemetry, monitoring, and diagnostics embedded at fabric design time -- gNMI streaming, Grafana dashboards, and real-time visibility from day zero.

gNMI Grafana Day-0

Production Validation & Readiness

IntelliSuite-driven validation against real traffic, scale limits, and failure scenarios before any production cutover.

SONiC-based validation Failover <200ms 100G proven

Modular, scalable fabric architecture

Leaf-spine at the core, RoCE transport for GPU communication, NVMe-oF for storage, and full observability from Day 0.

Click a component in the diagram or panel to explore details.

Spine Switch 1100GbE/400GbE spine
Spine Switch 2100GbE/400GbE spine
Leaf Switch 125GbE/100GbE leaf
Leaf Switch 225GbE/100GbE leaf
Leaf Switch 325GbE/100GbE leaf
GPU PodNVIDIA A100/H100 cluster
NVMe-oF ClusterHigh-speed storage fabric
Observability LayerPrometheus, Grafana, ELK
DPU OffloadSmartNIC/DPU acceleration
RoCE PipelineRDMA over Converged Ethernet

Components

SONiC Open Networking

Open-source network operating system for disaggregated infrastructure.

  • Vendor-agnostic switches with standardized NOS
  • 40–60% cost reduction vs. proprietary
  • Full control for AI workload customization
  • Scales without lock-in

Used across spine, leaf, and fabric switches.

Network Speeds
  • Spine Interconnect400GbE
  • Leaf Uplinks100 / 200GbE
  • Server / GPU NIC100 / 200GbE
Protocols
  • GPU TransportRoCEv2
  • Congestion CtrlPFC · ECN · DCQCN
  • OverlayEVPN-VXLAN
Storage
  • ProtocolNVMe-oF (TCP/RDMA)
  • ArchitectureTiered · Distributed
  • Per Pod IOPS>1 Million
GPU Support
  • GPU ClassNVIDIA A100 / H100
  • Intra-nodeNVLink
  • Inter-nodeRoCE v2
TECHNICAL CONFIGURATION

RoCE + NVMe-oF — configured for production

Lossless RoCE transport and high-performance NVMe-oF storage, tuned to meet AI traffic patterns and production-scale workloads.

SONiC — PRIORITY FLOW CONTROL

PFC for Lossless RoCE Transport

Zero packet loss for RoCE transport across configured queues and lossless buffer pools.

SONiC PFC configuration for RoCE
{
  "PORT_QOS_MAP": {
    "Ethernet0": {
      "pfc_enable": "3,4"
    }
  },
  "BUFFER_POOL": {
    "ingress_lossless_pool": {
      "type": "ingress",
      "size": "139458560",
      "xoff": "20971520"
    }
  }
}
PFC Queues3 & 4 (RoCE)
ECN MarkDCQCN
Packet LossZero
GPU Latency<1µs
KUBERNETES — NVMe-oF STORAGECLASS

NVMe-oF for GPU-pod Storage

High-IOPS distributed storage tuned for AI workloads with striped pools.

Kubernetes StorageClass for AI workloads
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nvmeof
provisioner: nvmeof.csi.openebs.io
parameters:
  replicas: "3"
  poolType: "striped"
allowVolumeExpansion: true
ProtocolRDMA / TCP
IOPS per Pod>1 Million
Replicas / Stripe3-way stripe
Latency ClassMemory-class
PERFORMANCE

SLOs, Latency & Scale Benchmarks

Designed for AI fabrics, cloud interconnects, and enterprise data centres under real production load.

<1µs

Inter-GPU Latency

Network latency between GPU nodes via RoCE v2 lossless fabric.

>80%

Bandwidth Utilisation

Sustained during large model training workloads at full scale.

>1M

Storage IOPS

Per GPU pod via NVMe-oF with RDMA transport and tiered storage.

>90%

GPU Utilisation

Achieved during training — network no longer the bottleneck.

USE CASES

Who deploys this solution

Purpose-designed infrastructure for the environments where AI performance, reliability, and scale are non-negotiable.

AI & Machine Learning Platforms

Data center networks supporting large-scale distributed training, inference pipelines, and GPU-dense environments—where network latency directly impacts model iteration speed.

BFSI & Regulated Enterprises

Highly available, observable data center networks for transaction systems, analytics platforms, and compliance-driven environments—where audit trails and predictable performance are required.

Cloud-Adjacent & Hybrid Data Centers

Modern data center networks designed to integrate cleanly with public cloud—consistent networking policies, automation-first design, and DC interconnect for hybrid workload placement.

High-Growth Digital Platforms

Environments where rapid scale and frequent infrastructure change demand stable, predictable network behavior—open disaggregated architectures that grow without re-architecture.

HOW WE WORK

Design → Build → Validate → Operate

A proven engineering methodology that delivers production-grade results with operational excellence from day one.

STEP 1

Design

Business goals, workload profiles, and scale requirements translated into architecture and fabric designs.

STEP 2

Build

Engineering open fabric configurations, integration, and deployment tooling for production environments.

STEP 3

Validate

IntelliSuite-driven testing against real traffic, scale limits, and failure scenarios before cutover.

STEP 4

Operate

Ongoing support, telemetry monitoring, and continuous optimization so the fabric remains healthy long term.

INTEGRATED CAPABILITIES
SONiC Integration Cloud Integration IaC/GitOps Observability RoCE Optimisation NVMe-oF Storage

Faster AI Readiness

Deploy and scale AI workloads without infrastructure bottlenecks.

Predictable Performance

Consistent latency and throughput under varying load conditions.

Full Visibility

Real-time telemetry for proactive issue detection and resolution.

No Vendor Lock-in

Open architectures and multi-vendor hardware preserve long-term flexibility.

Scales Operationally

Observability-first design keeps operations manageable as networks grow.

Proven outcomes from the field

Deployments across AI fabrics, multi-cloud, automation, and security.

AI-Powered Technical Assistant

Ask PalC AI

Get instant answers about PalC's solutions, SONiC networking, AI fabrics, cloud infrastructure, and technical specifications powered by our AI assistant.

Suggested Questions:

Technical Assistant

Context: Solution

ODM PARTNERS

TRUSTED BY LEADING TECHNOLOGY PARTNERS

Planning your AI-ready data center fabric?

Share your SLO targets for latency, bandwidth utilization, storage IOPS, and visibility. PalC will help design an open, production-ready RoCE + NVMe-oF architecture.

Get in touch

Discuss your infrastructure goals with our experts.

View Case Studies