Nccl | Conselara Labs

Qwen3-235B-A22B-GPTQ-Int4 is ~118 GB. A single DGX Spark has 128 GB unified memory, enough in theory, but once CUDA overhead and KV cache are factored in, it’s tight. Running it across two Sparks with TP=2 gives headroom for real workloads. Each DGX Spark is a single logical GPU with no NVSwitch. Tensor parallelism across two units means Ray + NCCL over a direct interconnect. This is what the setup looks like and what will silently fail if not configured correctly. ...