Get your dream built 10x faster

How to integrate baidu baike search with OpenClaw

We build custom applications 5x faster and cheaper 🚀

Book a Free Consultation
4.9
Clutch rating 🌟
600+
Happy partners
17+
Countries served
190+
Team members
Matt Graham, CEO of Rapid Developers

Book a call with an Expert

Stuck on an error? Book a 30-minute call with an engineer and get a direct fix + next steps. No pressure, no commitment.

How to integrate baidu baike search with OpenClaw

Baidu Baike search can be integrated with OpenClaw by implementing a secure, external search API (either calling Baidu’s official API if you have one, or a responsibly written scraper/proxy if not), then exposing that API to an OpenClaw skill that you install/configure via ClawHub; keep the search service, storage, rate-limiting, and credentials outside the agent runtime, configure API keys or OAuth in ClawHub, and have the skill make explicit authenticated REST calls to your proxy; add caching, logging, and request validation so the agent remains stateless and reliable.

 

Overview & goals

 
  • Goal: Allow an OpenClaw agent/skill to perform Baidu Baike searches and return structured results to users.
  • Constraints: Don’t assume any hidden OpenClaw magic — use explicit REST APIs, credentials, and external services for stateful work. Keep scraping/legal constraints in mind; prefer an official API if available.
  • Architecture: Agent skill (stateless) <--HTTPs--> External Search Service (stateful: cache, DB, rate-limit) <--HTTPs--> Baidu (official API or HTML pages).

 

Step: Choose how to retrieve Baike data

 
  • Official API (preferred): If Baidu provides a documented Baike API with keys/scopes, use that. You’ll manage an API key or OAuth client, request the allowed endpoints, and parse JSON responses into your skill output.
  • Fallback: server-side scraping/proxy: If no official API is available, build a server-side scraper that fetches Baike pages, parses content, and exposes a clean JSON API. Respect robots.txt, rate limits, and local laws/terms-of-service. Never run scraping directly inside the agent runtime.

 

Step: Build the external search service (example Node.js proxy)

 
  • Responsibilities for this service: Authentication with Baidu (if needed), result parsing, caching, rate-limiting, logging, error handling, and returning consistent JSON to the OpenClaw skill.
// Minimal example: Node.js Express search proxy that calls an upstream Baidu endpoint
const express = require('express');
const fetch = require('node-fetch');

const app = express();
app.use(express.json());

const BAIDU_API_URL = process.env.BAIDU_API_URL; // <b>//</b> set to official API endpoint if available
const BAIDU_API_KEY = process.env.BAIDU_API_KEY; // <b>//</b> or token for upstream

app.get('/api/baike/search', async (req, res) => {
  const q = req.query.q;
  if (!q) return res.status(400).json({ error: 'q parameter required' });

  try {
    // <b>//</b> If you have an official Baidu JSON API:
    const url = `${BAIDU_API_URL}?q=${encodeURIComponent(q)}&apikey=${encodeURIComponent(BAIDU_API_KEY)}`;
    const upstream = await fetch(url, { method: 'GET' });
    if (!upstream.ok) {
      const text = await upstream.text();
      return res.status(502).json({ error: 'upstream error', status: upstream.status, body: text });
    }
    const data = await upstream.json();

    // <b>//</b> Normalize upstream fields to a stable shape:
    const normalized = (data.items || []).map(item => ({
      title: item.title || item.name,
      snippet: item.snippet || item.summary,
      url: item.url || item.link
    }));

    return res.json({ query: q, results: normalized });
  } catch (err) {
    console.error('search error', err);
    return res.status(500).json({ error: 'internal_error' });
  }
});

app.listen(process.env.PORT || 3000, () => {
  console.log('Baike proxy listening');
});
  • If scraping is necessary: implement server-side HTML parsing (cheerio) with careful rate-limiting and caching. Put it behind authenticated endpoints. Example omitted here to avoid encouraging fragile scraping; if you must scrape, implement robust selectors, user-agent handling, and respect robots.txt.

 

Step: Secure and operationalize the search service

 
  • Authentication: Protect your proxy with an API key or OAuth token. Do not embed upstream credentials in the skill; store them in ClawHub environment/secret storage and inject into the skill at runtime.
  • Rate-limiting & retries: Implement per-IP and per-client throttling and exponential backoff for upstream calls.
  • Caching: Cache frequent queries in Redis or a DB to reduce upstream load and improve latency.
  • Monitoring & logs: Emit structured logs and metrics (requests, errors, latency). Keep request/response snippets for debugging, but avoid logging secrets or full HTML content long-term.
  • Legal/compliance: Confirm Baidu terms allow your usage. If required, obtain permission or use official APIs.

 

Step: Create the OpenClaw skill (stateless)

 
  • Design: The OpenClaw skill should be a thin, stateless wrapper that receives a user's query and calls your external search API. All heavy lifting (scraping, caching) remains outside the agent runtime.
  • Authentication with ClawHub: In ClawHub, configure the skill and set environment variables/secrets for the skill to call your proxy (for example, BAKE_PROXY_URL and BAKE_PROXY_KEY). Do not hardcode secrets in code.
// Example code that would run in a skill runtime: perform a call to your proxy
// <b>//</b> This code uses fetch to call your external service. Adapt to the skill runtime pattern you use.
const fetch = require('node-fetch');

async function searchBaike(query) {
  const base = process.env.BAIKE_PROXY_URL; // <b>//</b> set in ClawHub environment for the skill
  const key = process.env.BAIKE_PROXY_KEY;  // <b>//</b> stored as a secret in ClawHub

  const url = `${base}/api/baike/search?q=${encodeURIComponent(query)}`;
  const resp = await fetch(url, {
    method: 'GET',
    headers: {
      'Authorization': `Bearer ${key}`,
      'Accept': 'application/json'
    }
  });

  if (!resp.ok) {
    const body = await resp.text();
    throw new Error(`proxy error ${resp.status}: ${body}`);
  }
  const json = await resp.json();
  return json.results;
}

// <b>//</b> Example usage inside the skill
(async () => {
  const results = await searchBaike('äșșć·„æ™ș胜');
  console.log(results);
})();

 

Step: Configure ClawHub / deployment notes

 
  • Upload the skill package: Package your skill code per the packaging rules you follow (zip, container image, etc.) and register it via ClawHub’s UI/CLI — set runtime, CPU/memory, and required environment variables.
  • Secrets & credentials: Store your BAIKE_PROXY_KEY, any OAuth client secrets, and BAIDU_API_KEY in ClawHub’s secret manager (or equivalent). Ensure only the skill has access to the minimum secrets required.
  • Permissions: If using OAuth, configure redirect URIs and scopes with the upstream provider; implement server-side token refresh in your external service, not in the agent.

 

Testing and verification

 
  • Local end-to-end test: Run the external proxy locally and hit it with the skill code to verify normalized JSON shape.
  • Integration test in staging: Deploy the proxy to staging, set ClawHub env vars to point to staging, and test user flows (search → present result → open link).
  • Edge cases: Test empty queries, network failures, upstream 429/503 responses, and HTML changes (if scraping).

 

Debugging checklist when things break

 
  • Is the skill invoking the proxy? Check skill logs to see outgoing requests and HTTP response codes.
  • Authentication errors: Verify API keys / bearer tokens are present, not expired, and have correct scopes.
  • Upstream errors: Inspect proxy logs for 4xx/5xx from Baidu. If scraping, check selectors and page structure changes.
  • Rate limits: Look for 429 responses; confirm client-side and upstream rate limiting is configured.
  • Timeouts & retries: Ensure sensible HTTP timeouts and retry policies so the agent doesn’t hang.
  • Data shape: Confirm the proxy returns the JSON shape the skill expects and include versioning if you change the shape later.

 

Operational tips and best practices

 
  • Keep state outside the agent: Use Redis/DB for caching and a job queue for expensive refreshes.
  • Use consistent result shapes: Provide title, snippet, canonical URL, and a small metadata block so the agent can render summaries or cards.
  • Graceful degradation: Return a helpful message when upstream is down and avoid exposing raw errors to end users.
  • Logging & retention: Log enough context for debugging but scrub PII and avoid logging secrets.
  • Version your API: Put /v1/ in your proxy URL so skills can be upgraded safely.

Concluding summary: build a secured external Baike search service (official API or careful scraper), expose it as a stable JSON API, keep all credentials and state outside the OpenClaw agent runtime, install/configure the skill via ClawHub with secrets injected, and verify via thorough testing and monitoring.

Book Your Free 30‑Minute Migration Call

Speak one‑on‑one with a senior engineer about your no‑code app, migration goals, and budget. In just half an hour you’ll leave with clear, actionable next steps—no strings attached.

Book a Free Consultation

Troubleshooting baidu baike search and OpenClaw Integration

1

Why OpenClaw Spider gets HTTP 302/403 or JS redirects for Baidu Baike search pages?

Baidu Baike returns HTTP 302/403 or JavaScript redirects because its front-end and anti-bot systems detect non-browser requests (missing cookies, headers, JS execution, or suspicious IP/rate patterns) and push you to login/captcha pages or a JS challenge. Your OpenClaw Spider (an agent skill making plain HTTP calls) won’t automatically run page JS or solve captchas, so it hits redirects or forbids access.

 

Why and how to fix it

 

Common causes and practical steps:

  • Anti-bot / rate limits — use throttling, rotate IPs or proxies, obey robots.txt and site terms.
  • Missing browser behavior — add realistic User-Agent, headers, and cookie handling; follow Location redirects and preserve Referer.
  • JS challenge — use a headless browser or external rendering service (outside agent runtime) to execute JS and return final HTML.
  • Authentication — if pages require login, provision credentials and session management securely (env vars/API keys) and log responses and Location headers to debug.

2

How to correctly decode Baidu Baike response content (GBK vs UTF-8) in OpenClaw Spider/Parser?

Always treat the HTTP body as raw bytes, detect encoding from the Content-Type header, the HTML meta tag, or a charset detector, then decode the bytes with the correct codec (map GB2312→GBK). In an OpenClaw Spider/Parser read response bytes, pick encoding, decode to UTF‑8 string, then pass that string to your parser.

 

How to do it

 
  • Check HTTP header for charset first.
  • Fallback to <meta charset> in the HTML.
  • Use a detector (jschardet/chardet) if unclear, then decode (iconv-lite/codec).

const axios = require('axios');
const iconv = require('iconv-lite');
const jschardet = require('jschardet');
const cheerio = require('cheerio');

async function fetchAndParse(url) {
const res = await axios.get(url, { responseType: 'arraybuffer' });
const buf = Buffer.from(res.data);
// prefer header charset
const headerCharset = (res.headers['content-type']||'').match(/charset=([^;]+)/i)?.[1];
let enc = headerCharset || jschardet.detect(buf).encoding || 'utf-8';
enc = enc.toLowerCase().replace('gb2312','gbk');
const text = iconv.decode(buf, enc);
const $ = cheerio.load(text);
// now extract content reliably
return $('body').text();
}

3

How to handle Baidu Baike anti-scraping (JS-rendered results, captcha) using OpenClaw Downloader Middleware or headless rendering?

Use the OpenClaw Downloader Middleware to detect Baidu Baike responses that require JS or show captchas, then route those requests to an external headless-rendering service (outside the agent runtime) and to a captcha-resolution workflow; combine proxy rotation, proper headers/cookies, and environment-stored credentials so skills remain authorized.

 

Practical approach

 

Use middleware to inspect HTTP responses; when you see JS placeholders or a captcha challenge, forward the URL to an external renderer (Puppeteer/Playwright service or headless-render API) that returns fully rendered HTML and cookies. Use rotating residential proxies, realistic User-Agent and Referer headers, and persist cookies back into the downloader for future requests.

  • Captcha: send image/token to a human-in-the-loop queue or verified solver via an external API; store solution in a secure vault and continue.
  • Architecture: run renderers and captcha workers outside OpenClaw agents; call them via REST API from middleware. Keep credentials in env vars or ClawHub secrets.
  • Debug: log raw responses, renderer screenshots, and skill execution paths to trace failures.

4

How to implement Baidu Baike search pagination, request deduplication and item filtering using OpenClaw Scheduler, Request fingerprinting and Item Pipeline?

Use the OpenClaw Scheduler to enqueue page requests and stop when results end; compute a stable request fingerprint (URL + sorted params) and store it in a durable set (Redis, DB) to deduplicate before fetching; run an Item Pipeline step that validates and filters items (fields, language, length, duplicates) and only emits cleaned items. Keep auth and rate limits in the runtime config and move state to external storage for reliability.

 

Implementation sketch

 
  • Pagination: scheduler enqueues next page while results present.
  • Fingerprinting: sha1(url+sorted params) stored in Redis.sadd to check/add atomically.
  • Item pipeline: validate fields, normalize, drop and log bad items.
import hashlib, json, redis, requests
r = redis.Redis()
def fingerprint(url, params):
    key = url + json.dumps(params, sort_keys=True)
    return hashlib.sha1(key.encode()).hexdigest()
def fetch_page(url, params):
    fp = fingerprint(url, params)
    if r.sadd("fingerprints", fp) == 0:
        return []  # <b>//</b> already fetched
    resp = requests.get(url, params=params)
    return resp.json().get("items", [])
def item_pipeline(item):
    if not item.get("title") or len(item.get("summary",""))<30:
        return None  # <b>//</b> drop
    return {"title":item["title"].strip(), "summary":item.get("summary","")}
Book a Free Consultation

Still stuck?
Copy this prompt into ChatGPT and get a clear, personalized explanation.

This prompt helps an AI assistant understand your setup and guide you through the fix step by step, without assuming technical knowledge.

AI AI Prompt


Recognized by the best

Trusted by 600+ businesses globally

From startups to enterprises and everything in between, see for yourself our incredible impact.

RapidDev was an exceptional project management organization and the best development collaborators I've had the pleasure of working with.

They do complex work on extremely fast timelines and effectively manage the testing and pre-launch process to deliver the best possible product. I'm extremely impressed with their execution ability.

Arkady
CPO, Praction
Working with Matt was comparable to having another co-founder on the team, but without the commitment or cost.

He has a strategic mindset and willing to change the scope of the project in real time based on the needs of the client. A true strategic thought partner!

Donald Muir
Co-Founder, Arc
RapidDev are 10/10, excellent communicators - the best I've ever encountered in the tech dev space.

They always go the extra mile, they genuinely care, they respond quickly, they're flexible, adaptable and their enthusiasm is amazing.

Mat Westergreen-Thorne
Co-CEO, Grantify
RapidDev is an excellent developer for custom-code solutions.

We’ve had great success since launching the platform in November 2023. In a few months, we’ve gained over 1,000 new active users. We’ve also secured several dozen bookings on the platform and seen about 70% new user month-over-month growth since the launch.

Emmanuel Brown
Co-Founder, Church Real Estate Marketplace
Matt’s dedication to executing our vision and his commitment to the project deadline were impressive. 

This was such a specific project, and Matt really delivered. We worked with a really fast turnaround, and he always delivered. The site was a perfect prop for us!

Samantha Fekete
Production Manager, Media Production Company
The pSEO strategy executed by RapidDev is clearly driving meaningful results.

Working with RapidDev has delivered measurable, year-over-year growth. Comparing the same period, clicks increased by 129%, impressions grew by 196%, and average position improved by 14.6%. Most importantly, qualified contact form submissions rose 350%, excluding spam.

Appreciation as well to Matt Graham for championing the collaboration!

Michael W. Hammond
Principal Owner, OCD Tech

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We’ll discuss your project and provide a custom quote at no cost.Â