If you’re running SearXNG as a self-hosted search backend for an automated pipeline, the default engine selection will cause you problems quickly. Here’s what we’ve found running SearXNG 24/7 for a federal procurement intelligence pipeline.
What doesn’t work
Google — returns 403 Forbidden for bot-detected requests. Happens immediately on most self-hosted instances without aggressive Cloudflare bypass configuration. Don’t rely on it for automated queries.
Startpage — CAPTCHAs after a few queries. Fine for occasional manual searches, unusable for scheduled pipelines.
time_range parameter — setting time_range=month or time_range=week causes Bing to return 0 results. The parameter appears to be handled inconsistently between engines; Bing’s implementation simply returns empty when it’s set. Omit it entirely and apply your own date filter in Python.
What works reliably
import urllib.parse, urllib.request, json, time
SEARXNG = "http://your-searxng-host/search"
def search(query: str) -> list[dict]:
params = urllib.parse.urlencode({
"q": query,
"format": "json",
"engines": "duckduckgo,bing",
# Do NOT set time_range — breaks Bing
})
req = urllib.request.Request(
f"{SEARXNG}?{params}",
headers={"User-Agent": "Mozilla/5.0"}
)
try:
data = json.loads(urllib.request.urlopen(req, timeout=15).read())
return data.get("results", [])
except Exception as e:
print(f"Search failed: {e}")
return []
# Always sleep between queries
results = search("your query here")
time.sleep(3) # mandatory — skipping this triggers CAPTCHA within minutes
engines=duckduckgo,bing is the reliable combination. DuckDuckGo handles the bulk of results; Bing covers gaps. Together they’re stable across thousands of queries per day.
Rate limit in practice
From our experience running 20–40 queries per 3-hour cron window:
- Under 20 queries: stable, no CAPTCHA
- 20–40 queries with 3s sleep between: generally stable
- Over 40 queries or sleep < 2s: CAPTCHA within the session
Cap your query count per run slot. We use MAX_QUERIES = 20 as a hard limit.
intitle: operator
intitle: works in DuckDuckGo, not Bing. But since you’re running both engines, you still get value from Bing results alongside the intitle:-filtered DuckDuckGo results.
For federal procurement monitoring, intitle: queries are the highest-signal pattern:
intitle:"sources sought" AHRQ OR ONC OR NIH
intitle:"request for information" "health IT"
intitle:"industry day" CMS OR FDA
Flag hits from intitle: queries as high priority — these are active pre-solicitations, not news articles.
site: operator
site: does not work on most default SearXNG configurations. Queries like site:fda.gov contract 2026 return 0 results because neither DuckDuckGo nor Bing passes the operator through the SearXNG adapter correctly.
Replace with keyword queries:
| Instead of | Use |
|---|---|
site:fda.gov IT contract 2026 | FDA HHS IT health technology contract award 2026 |
site:ahrq.gov procurement | AHRQ health outcomes data cloud contract |
Dynamic year
Never hardcode a year in query templates. Queries go stale on January 1 and you won’t notice until you’re looking at a month of zero results.
import datetime
year = datetime.date.today().year
query = f'intitle:"sources sought" AHRQ {year}'