SOLUTION

EMFASYS:
Elastic AI Memory Fabric System

  • Industry's first RDMA Ethernet based AI Memory Fabric architecture
  • Offloads GPU, HBM, and local head node DDR resource consumption
  • Drives down the cost of LLM inference at fleet scale
  • Seamless integration with popular LLM inference applications frameworks

Under the Hood

Under the Hood

  • Based on Enfabrica's 3.2 Tbps ACF SuperNIC silicon
  • Elastically connects up to 144 CXL memory lanes per system to resilient bundles of 400G / 800G RDMA Ethernet ports
  • Pooled memory target up to 18 TB CXL DDR5 DRAM per system (expandable to 28 TB in the future)
  • High memory bandwidth aggregation enabled by striping transactions across 18+ memory channels per system
  • Read access times in microseconds
  • Designed to interoperate with multiple GPU servers and initiator RDMA NICs

System design for reduced cost per token per user

System design for reduced cost per token per user

  • Ideal for agentic, batched, expert-parallel, high-turn, and/or large-context inference workloads
  • Applies to AI training: activation storage offload, distributed checkpointing, optimizer state sharding
  • Software-enabled, high-performance caching hierarchy hides transfer latency within compute pipelines
  • Designed to outperform flash-based inference storage offload solutions with 100x lower latency and unlimited write/erase transactions
  • 100% fabric-attached, headless memory free from head node CPU thread contention or locality constraints

ACF-S: World’s highest throughput AI SuperNIC