Llm on Conselara Labs

Llm on Conselara Labs https://conselara.dev/tags/llm/ Recent content in Llm on Conselara Labs Hugo en Thu, 14 May 2026 00:00:00 +0000 Building a Two-Node Ray Cluster for Distributed LLM Inference on DGX Spark https://conselara.dev/notes/two-node-ray-cluster-dgx-spark/ Thu, 14 May 2026 00:00:00 +0000 https://conselara.dev/notes/two-node-ray-cluster-dgx-spark/ Step-by-step setup for a two-node Ray cluster running Qwen3-235B across two DGX Sparks, including RoCE verification, model sync, NCCL configuration, and the specific gotchas that cause silent failures. DGX Spark GB10 Hardware Reference: SM121 Architecture, Memory, and Networking https://conselara.dev/notes/dgx-spark-gb10-hardware-reference/ Thu, 14 May 2026 00:00:00 +0000 https://conselara.dev/notes/dgx-spark-gb10-hardware-reference/ Architecture constraints, memory behavior, networking setup, and kernel compatibility for the NVIDIA DGX Spark GB10 Grace Blackwell Superchip (SM121). vLLM on DGX Spark: What the SM121 Architecture Actually Requires https://conselara.dev/notes/vllm-dgx-spark-sm121-gotchas/ Wed, 13 May 2026 00:00:00 +0000 https://conselara.dev/notes/vllm-dgx-spark-sm121-gotchas/ Hard-won config rules for running vLLM on DGX Spark GB10 (SM121). Covers broken backends, unified memory limits, quantization traps, multi-node NCCL, and flags that silently destroy throughput. We Replaced an MCP Server with FastAPI and It Worked Everywhere https://conselara.dev/notes/mcp-to-fastapi-lessons-learned/ Tue, 12 May 2026 00:00:00 +0000 https://conselara.dev/notes/mcp-to-fastapi-lessons-learned/ We built a company knowledge base server as an MCP SSE endpoint. It worked in Claude Code and nowhere else. Here is what we learned and how we fixed it. AI Across a Health Research Information Platform https://conselara.dev/notes/ai-health-research-platform/ Sat, 09 May 2026 00:00:00 +0000 https://conselara.dev/notes/ai-health-research-platform/ How we are integrating AI into a federal health research platform: publication discovery, LLM evaluations, AI-assisted development, and vetting AI-powered CMS modules. DGX Spark Model Comparison: What Fits and What Runs (SM121, 128 GB) https://conselara.dev/notes/dgx-spark-model-comparison/ Sat, 09 May 2026 00:00:00 +0000 https://conselara.dev/notes/dgx-spark-model-comparison/ A quick-reference comparison of open-weight models for a single DGX Spark GB10 — what fits in 128 GB unified memory, expected throughput, and SM121 compatibility notes. vLLM Model Selection for DGX Spark (SM121) https://conselara.dev/notes/dgx-spark-model-selection/ Sat, 09 May 2026 00:00:00 +0000 https://conselara.dev/notes/dgx-spark-model-selection/ How to choose a model for the DGX Spark GB10 (SM121). Covers architecture compatibility, quantization format requirements, and what to expect from each option. Running Qwen3.5-122B on a Single DGX Spark https://conselara.dev/notes/dgx-spark-qwen35-122b/ Tue, 05 May 2026 00:00:00 +0000 https://conselara.dev/notes/dgx-spark-qwen35-122b/ Qwen3.5-122B-A10B runs at 51 tok/s on a single DGX Spark in NVFP4 quantization. This post covers the SM121 constraints, required vLLM flags, and what to expect.