algolia-crawl
v1.2.16
Published
Crawl your site and sync your Algolia search index
Downloads
8
Readme
🕷️🔍 Algolia Crawl
Crawl your website and sync all pages to Algolia search, and auto generate sitemaps from their index.
⭐️ Features
- Crawl your website using Puppeteer
- Sync all pages to an Algolia search index
- Generate
sitemap.xml
from the index
💻 Getting started
Install from npm:
npm install algolia-crawl
Use API for Node.js:
import { algoliaCrawl, generateSitemap } from "algolia-crawl";
await algoliaCrawl(); // Crawl all pages and sync index
await generateSitemap("sitemap.xml"); // Generate a sitemap.xml file
CLI usage:
npx algolia-crawl crawl # Crawl all pages and sync index
npx algolia-crawl sitemap sitemap.xml # Generate a sitemap.xml file
Configuration
You can either create a .algoliacrawlrc.json
configuration file with the following keys:
{
"algoliaCrawlAppId": "2UFBBTMSYW",
"algoliaCrawlIndex": "dev_KOJ",
"algoliaCrawlStartUrl": "https://koj.co",
"algoliaCrawlBaseUrl": "https://koj.co"
}
appId
is your Algolia application ID and index
is the name of the index. startUrl
is the first page to crawl (it can also be an array of strings), and only pages starting with baseUrl
will be indexed.
Alternately, you can provide these values as environment variables instead of the configuration file:
| Environment variable | Description |
| ------------------------- | ------------------------------ |
| ALGOLIA_CRAWL_APP_ID
| Algolia search application ID |
| ALGOLIA_CRAWL_INDEX
| Algolia search index |
| ALGOLIA_CRAWL_START_URL
| First page to crawl |
| ALGOLIA_CRAWL_BASE_URL
| Index pages with this base URL |
Other environment variables required are:
| Environment variable | Description |
| ----------------------- | ---------------------- |
| ALGOLIA_CRAWL_API_KEY
| Algolia search API key |