DC Fabric Architecture
Scalable leaf-spine designs aligned with GPU workload behaviour, east-west traffic patterns, and long-term growth requirements.
PalC designs and delivers open, AI-ready data center fabrics for high-throughput GPU communication, predictable latency, and operational stability across training and inference environments.
Comprehensive fabric engineering -- from leaf-spine design to storage access and operational telemetry.
Scalable leaf-spine designs aligned with GPU workload behaviour, east-west traffic patterns, and long-term growth requirements.
Network designs engineered for the east-west traffic, high data movement, and performance sensitivity of training and inference platforms.
SONiC-based open networking on multi-vendor hardware -- full operational control, zero proprietary lock-in.
NVMe-oF over RoCE transport for low-latency, high-IOPS tiered storage architectures for AI workloads.
Telemetry, monitoring, and diagnostics embedded at fabric design time -- gNMI streaming, Grafana dashboards, and real-time visibility from day zero.
IntelliSuite-driven validation against real traffic, scale limits, and failure scenarios before any production cutover.
Leaf-spine at the core, RoCE transport for GPU communication, NVMe-oF for storage, and full observability from Day 0.
Click a component in the diagram or panel to explore details.
Components
Open-source network operating system for disaggregated infrastructure.
Used across spine, leaf, and fabric switches.
RDMA over Converged Ethernet for lossless GPU-to-GPU communication.
Critical for GPU pods and AI clusters.
High-performance storage fabric with NVMe-oF for AI workloads.
Used across GPU pods, NVMe-oF clusters, and fabric switches.
High-performance network fabrics optimized for AI/ML workloads.
Spans GPU pods, DPU offload, and fabric.
Flexible, vendor-neutral network architectures that scale with your needs.
Used across spine and leaf layers.
Real-time observability across compute, network, and storage.
Used across GPU pods, NVMe-oF clusters, and fabric switches.
Lossless RoCE transport and high-performance NVMe-oF storage, tuned to meet AI traffic patterns and production-scale workloads.
Zero packet loss for RoCE transport across configured queues and lossless buffer pools.
{
"PORT_QOS_MAP": {
"Ethernet0": {
"pfc_enable": "3,4"
}
},
"BUFFER_POOL": {
"ingress_lossless_pool": {
"type": "ingress",
"size": "139458560",
"xoff": "20971520"
}
}
}
| PFC Queues | 3 & 4 (RoCE) |
|---|---|
| ECN Mark | DCQCN |
| Packet Loss | Zero |
| GPU Latency | <1µs |
High-IOPS distributed storage tuned for AI workloads with striped pools.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nvmeof
provisioner: nvmeof.csi.openebs.io
parameters:
replicas: "3"
poolType: "striped"
allowVolumeExpansion: true
| Protocol | RDMA / TCP |
|---|---|
| IOPS per Pod | >1 Million |
| Replicas / Stripe | 3-way stripe |
| Latency Class | Memory-class |
Designed for AI fabrics, cloud interconnects, and enterprise data centres under real production load.
Network latency between GPU nodes via RoCE v2 lossless fabric.
Sustained during large model training workloads at full scale.
Per GPU pod via NVMe-oF with RDMA transport and tiered storage.
Achieved during training — network no longer the bottleneck.
Purpose-designed infrastructure for the environments where AI performance, reliability, and scale are non-negotiable.
Data center networks supporting large-scale distributed training, inference pipelines, and GPU-dense environments—where network latency directly impacts model iteration speed.
Highly available, observable data center networks for transaction systems, analytics platforms, and compliance-driven environments—where audit trails and predictable performance are required.
Modern data center networks designed to integrate cleanly with public cloud—consistent networking policies, automation-first design, and DC interconnect for hybrid workload placement.
Environments where rapid scale and frequent infrastructure change demand stable, predictable network behavior—open disaggregated architectures that grow without re-architecture.
A proven engineering methodology that delivers production-grade results with operational excellence from day one.
Business goals, workload profiles, and scale requirements translated into architecture and fabric designs.
Engineering open fabric configurations, integration, and deployment tooling for production environments.
IntelliSuite-driven testing against real traffic, scale limits, and failure scenarios before cutover.
Ongoing support, telemetry monitoring, and continuous optimization so the fabric remains healthy long term.
Deploy and scale AI workloads without infrastructure bottlenecks.
Consistent latency and throughput under varying load conditions.
Real-time telemetry for proactive issue detection and resolution.
Open architectures and multi-vendor hardware preserve long-term flexibility.
Observability-first design keeps operations manageable as networks grow.
Deployments across AI fabrics, multi-cloud, automation, and security.
ODM PARTNERS
TRUSTED BY LEADING TECHNOLOGY PARTNERS
Next steps
Share your SLO targets for latency, bandwidth utilization, storage IOPS, and visibility. PalC will help design an open, production-ready RoCE + NVMe-oF architecture.