<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Vllm on Conselara Labs</title>
    <link>https://conselara.dev/tags/vllm/</link>
    <description>Recent content in Vllm on Conselara Labs</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Thu, 14 May 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://conselara.dev/tags/vllm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Building a Two-Node Ray Cluster for Distributed LLM Inference on DGX Spark</title>
      <link>https://conselara.dev/notes/two-node-ray-cluster-dgx-spark/</link>
      <pubDate>Thu, 14 May 2026 00:00:00 +0000</pubDate>
      <guid>https://conselara.dev/notes/two-node-ray-cluster-dgx-spark/</guid>
      <description>Step-by-step setup for a two-node Ray cluster running Qwen3-235B across two DGX Sparks, including RoCE verification, model sync, NCCL configuration, and the specific gotchas that cause silent failures.</description>
    </item>
    <item>
      <title>vLLM on DGX Spark: What the SM121 Architecture Actually Requires</title>
      <link>https://conselara.dev/notes/vllm-dgx-spark-sm121-gotchas/</link>
      <pubDate>Wed, 13 May 2026 00:00:00 +0000</pubDate>
      <guid>https://conselara.dev/notes/vllm-dgx-spark-sm121-gotchas/</guid>
      <description>Hard-won config rules for running vLLM on DGX Spark GB10 (SM121). Covers broken backends, unified memory limits, quantization traps, multi-node NCCL, and flags that silently destroy throughput.</description>
    </item>
    <item>
      <title>DGX Spark Benchmark Results: vLLM on SM121</title>
      <link>https://conselara.dev/notes/dgx-spark-benchmarks/</link>
      <pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate>
      <guid>https://conselara.dev/notes/dgx-spark-benchmarks/</guid>
      <description>Measured throughput and latency for Qwen3-235B-A22B (two-node) and gpt-oss-120b (single node) on DGX Spark GB10 hardware with vLLM 0.19.0.</description>
    </item>
    <item>
      <title>DGX Spark Model Comparison: What Fits and What Runs (SM121, 128 GB)</title>
      <link>https://conselara.dev/notes/dgx-spark-model-comparison/</link>
      <pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate>
      <guid>https://conselara.dev/notes/dgx-spark-model-comparison/</guid>
      <description>A quick-reference comparison of open-weight models for a single DGX Spark GB10 — what fits in 128 GB unified memory, expected throughput, and SM121 compatibility notes.</description>
    </item>
    <item>
      <title>vLLM Model Selection for DGX Spark (SM121)</title>
      <link>https://conselara.dev/notes/dgx-spark-model-selection/</link>
      <pubDate>Sat, 09 May 2026 00:00:00 +0000</pubDate>
      <guid>https://conselara.dev/notes/dgx-spark-model-selection/</guid>
      <description>How to choose a model for the DGX Spark GB10 (SM121). Covers architecture compatibility, quantization format requirements, and what to expect from each option.</description>
    </item>
    <item>
      <title>Running Qwen3.5-122B on a Single DGX Spark</title>
      <link>https://conselara.dev/notes/dgx-spark-qwen35-122b/</link>
      <pubDate>Tue, 05 May 2026 00:00:00 +0000</pubDate>
      <guid>https://conselara.dev/notes/dgx-spark-qwen35-122b/</guid>
      <description>Qwen3.5-122B-A10B runs at 51 tok/s on a single DGX Spark in NVFP4 quantization. This post covers the SM121 constraints, required vLLM flags, and what to expect.</description>
    </item>
  </channel>
</rss>
