npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wikipedia-elasticsearch-import

v1.0.2

Published

Import complete Wikipedia dump file(s) into Elasticsearch server

Downloads

3

Readme

wikipedia-elasticsearch-import

Import Wikipedia dumps into Elasticsearch server using streams and bulk indexing for speed.

What does this module do?

Wikipedia is publishing dumps of their whole database in every language on a regular basis, which you can download and use for free. This module parses the giant Wikipedia xml dump file, converts it into stream and imports the contents right into your Elasticsearch server or farm.

How to import Wikipedia dump into your own Elasticsearch server

  1. In order to import Wikpedia dump you must run the Elasticsearch server first. Please refer to the Elasticsearch documentation how to do this.
  2. Download latest Wikipedia from one of the following locations depending on the language you want:
  • Wikipedia in English dump: https://dumps.wikimedia.org/enwiki/
  • Wikipedia in German dump: https://dumps.wikimedia.org/dewiki/
  • Wikipedia in Polish dump: https://dumps.wikimedia.org/plwiki/
  • Other Wikidata dumps you can download: https://dumps.wikimedia.org/wikidatawiki/
  1. Unzip the downloaded file .xml.bz2, e.g. enwiki-20180801-pages-articles-multistream.xml.bz2 into the unzipped .xml file.
  2. Edit the config.js file to configure Wikipedia dump .xml file and Elasticsearch server connection settings.
  3. Run the importer with npm start and watch your Elasticsearch database is being populated with raw Wikipedia documents.

Settings

  • You can set limit on bulk documents import in the config.js which is 100 by default.
  • Set index, type, host, port, and logFile. If you enabled x-pack plugin for Elasticsearch you can also set the httpAuth setting, otherwise it's ignored.

Please contribute

Please visit my GitHub to post your questions, suggestions and pull requests.