SearXNG: Engine Selection for Reliable Results

If you’re running SearXNG as a self-hosted search backend for an automated pipeline, the default engine selection will cause you problems quickly. Here’s what we’ve found running SearXNG 24/7 for a federal procurement intelligence pipeline.

What doesn’t work

Google: returns 403 Forbidden for bot-detected requests. Happens immediately on most self-hosted instances without aggressive Cloudflare bypass configuration. Don’t rely on it for automated queries.

Startpage: CAPTCHAs after a few queries. Fine for occasional manual searches, unusable for scheduled pipelines.

time_range parameter: setting time_range=month or time_range=week causes Bing to return 0 results. The parameter appears to be handled inconsistently between engines; Bing’s implementation simply returns empty when it’s set. Omit it entirely and apply your own date filter in Python.

What works reliably

import urllib.parse, urllib.request, json, time

SEARXNG = "http://your-searxng-host/search"

def search(query: str) -> list[dict]:
    params = urllib.parse.urlencode({
        "q": query,
        "format": "json",
        "engines": "duckduckgo,bing",
        # Do NOT set time_range; breaks Bing
    })
    req = urllib.request.Request(
        f"{SEARXNG}?{params}",
        headers={"User-Agent": "Mozilla/5.0"}
    )
    try:
        data = json.loads(urllib.request.urlopen(req, timeout=15).read())
        return data.get("results", [])
    except Exception as e:
        print(f"Search failed: {e}")
        return []

# Always sleep between queries
results = search("your query here")
time.sleep(3)  # mandatory; skipping this triggers CAPTCHA within minutes

engines=duckduckgo,bing is the reliable combination. DuckDuckGo handles the bulk of results; Bing covers gaps. Together they’re stable across thousands of queries per day.

Rate limit in practice

From our experience running 20–40 queries per 3-hour cron window:

Under 20 queries: stable, no CAPTCHA
20–40 queries with 3s sleep between: generally stable
Over 40 queries or sleep < 2s: CAPTCHA within the session

Cap your query count per run slot. We use MAX_QUERIES = 20 as a hard limit.

`intitle:` operator

intitle: works in DuckDuckGo, not Bing. But since you’re running both engines, you still get value from Bing results alongside the intitle:-filtered DuckDuckGo results.

For federal procurement monitoring, intitle: queries are the highest-signal pattern:

intitle:"sources sought" AHRQ OR ONC OR NIH
intitle:"request for information" "health IT"
intitle:"industry day" CMS OR FDA

Flag hits from intitle: queries as high priority: these are active pre-solicitations, not news articles.

`site:` operator

site: does not work on most default SearXNG configurations. Queries like site:fda.gov contract 2026 return 0 results because neither DuckDuckGo nor Bing passes the operator through the SearXNG adapter correctly.

Replace with keyword queries:

Instead of	Use
`site:fda.gov IT contract 2026`	`FDA HHS IT health technology contract award 2026`
`site:ahrq.gov procurement`	`AHRQ health outcomes data cloud contract`

Dynamic year

Never hardcode a year in query templates. Queries go stale on January 1 and you won’t notice until you’re looking at a month of zero results.

import datetime
year = datetime.date.today().year
query = f'intitle:"sources sought" AHRQ {year}'

What doesn’t work#

What works reliably#

Rate limit in practice#

intitle: operator#

site: operator#

Dynamic year#