npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

webscrapinglang

v1.0.205

Published

Web Scraping Language CLI from https://scrape.it

Downloads

21

Readme

Web Scraping Language as a Service

Create and scale web crawls quickly and easily on the cloud with WSL (web scraping language)..

Turning complex into simple, this is intelligence. - Sadghuru

No more Boilerplate Code.

Today's various frameworks and tools have boilerplate code that revolves around defining a web crawler, running and maintaining it. WSL reduces several lines of code into a single sentence. WSL is intuitive and simple to understand, and lets you focus on the actual web crawler development.

Easy to Learn & Maintain.

Custom code whipped up by a freelancer that later turns into spaghetti is criminal. WSL is minimal, intuitive and simple to learn. It allows you to express web crawls in a sentence and reads like English. Anyone can therefore, read, learn, and maintain your WSL scripts.

No Infrastructure to Build.

There is no servers or complex GUI software to learn. Create web crawlers with our command line tool available on Linux, Mac & Windows. Scaling is easy. just add additional workers to your job to scale the web crawl speed. Your web crawsls are distributed across various workers, which also gives you a nice rotating IP address effect.

Read more in the FAQ...

How do i get started?

Make sure you have Node.js installed and simply type in your terminal and run it: npm install -g comingsoon; comingsoon

Web Scraping Language (WSL)

WSL is declarative domain specific language for the web. Automate any web browser actions like following a bunch of links (aka crawling) and extracting data from each page that loads, filling out forms. Each action runs in order, separated via pipe operator >>

Syntax

- ACTION1 >> ACTION2 [PAGINATOR]``SELECTOR >> ACTION3 ...
- You reference element(s) on the page with CSS or XPATH SELECTOR
- Extract data via JSON { product: .title, usd: .price, column3...etc}

Web Crawling Scenarios

Crawl URL Params

Format: GOTO URL[range] >> EXTRACT {json}
Example: GOTO github.com/search?p=**[1-3]**&q=**[cms, chess, minecraft]** >> EXTRACT {title: h3}

3 pages X 3 keywords = 9 URL permutations will be crawled and data extracted.

Crawl & Extract

Format: GOTO URL >> CRAWL SELECTOR >> EXTRACT {json}
Example: GOTO en.wikipedia.org/wiki/List_of_Dexter_episodes >> CRAWL **.summary a** >> EXTRACT {title: h1, code: //tr[7]/td}

Follow each Dexter episode link and get the title and production code.

Paginated Crawl

Format: CRAWL [strategy, pageEnd, pageStart] SELECTOR
Strategies:
1. Clicking 'next page' element that runs the crawl again on subsequent pages.
2. Mouse-wheel scroll to load next page.
3. Clicking numbered elements that load the next page.

GOTO news.ycombinator.com >> CRAWL **[.morelink]** .hnuser
GOTO news.ycombinator.com >> CRAWL **[.morelink,4]** .hnuser
1st ex: Crawling all pages using the .moreLink until it can't find this element.
2nd ex: Navigating via .moreLink until the 4th page is reached.

GOTO news.ycombinator.com >> CRAWL **[autoscroll,2]** .hnuser
GOTO news.ycombinator.com >> CRAWL **[autoscroll,4,3]** .hnuser
1st ex: Crawling past the first page by scrolling down one page length, 2 times.
2nd ex: Navigating to the 3rd page first and continues crawling until the 4th page.

GOTO news.ycombinator.com >> CRAWL **[number]** .hnuser
GOTO news.ycombinator.com >> CRAWL **[number,4,3]** .hnuser
1st ex: Finding a numbered link or element and increment exhaustively.
2nd ex: Navigating to the 3rd page via finding & clicking the numbered link until the 4th.

Extract Rows

Format: GOTO URL >> EXTRACT {json} _IN_ SELECTOR
Example: GOTO en.wikipedia.org/wiki/List_of_Dexter_episodes >> EXTRACT { title: h1, aired: //table[2]//tr[2]/td[5] } **_IN_** .wikiepisodetable

Extracts every Dexter episode's title and air date under parent element with class "wikiepisodetable".

Paginated Extract

Format: GOTO URL >> EXTRACT [.selector, limit] {json}
Example: GOTO news.ycombinator.com >> EXTRACT [.morelink,2] {news: .storylink}

Continues extracting every news headline on every page until the 2nd page.

Nested Crawls

Format: GOTO URL >> CRAWL SELECTOR >> CRAWL SELECTOR >> ....
Example: GOTO github.com/marketplace >> CRAWL nav/ul/li/a >> crawl .h4 >> extract {app: h1, langue: .py-3 .d-block}

Follows the category links, and all the apps on first page of results, extract the app name, and supported languages. Crawls recursively!

Typing Text

Format: GOTO URL >> TYPE SELECTOR [keyword1, keyword2...] >> TYPE [KEY_...]
Extract: GOTO github.com/search?q= >> type input[@name="q"] ["time", "security", "social"] >> TYPE ["KEY_ENTER"] >> extract {"search url": ".text-gray-dark.mr-2"}

For each keyword, we send a "KEY_ENTER" to submit the search form via the return key. Then we crawl the
first page of search results and scrape each results url to a data column key named "search url".

Clicking & Forms

Format: CLICK[n-index] SELECTOR
Strategies:
1. Find elements with selector and click the Nth element, note you can just use Xpath for the selector!
2. Try out every permutation possible for selected forms, crawling dropdown forms etc.
3. Click a link and execute macro action like downloading a file.

GOTO news.ycombinator.com/login >> CLICK input >> CLICK input[last()] >> CLICK input[3] >> CLICK[3] input

Click first element then click the last element and finally two methods for selecting the same 3rd element.

GOTO redux-form.com/6.6.3/examples/simple/ >> type input[@name="email"] [[email protected], [email protected]] >> CRAWL select/options

For each email address inputted, we try every options for it.

GOTO https://www.putty.org >> CLICK //tr[1]/td[2]/p[2]/a >> __SAVE__ //div[1]/div[2]/span[2]

Clicks on a link that navigates to a different domain. We save the file to with the macro command wrapped around double underscrolls: __SAVE__.

FAQ

How do I contact you?

Email support@this domain.

Is there a free version?

Yes, just signup after npm install and enjoy 1000 API Credits.

How do I purchase?

email support@ this domain.

I paid but I changed my mind!

Please email support@this domain within 30 days to get a full refund.

Can I get some help?

Ask your questions on stackoverflow (I check it often) or email support@ this domain.

What are API Call Credits?

Each web page downloaded by your crawler is 1 API Call Credit.

What happens when I run out of credits?

You need to upgrade to a larger plan.

Plans

Free

- 1000 API Calls
- 1 Workers Per Job

npm install

Micro

- 50,000 API Calls
- 4 Workers Per Job

Checkout

Small

- 150,000 API Calls
- 8 Workers Per Job

Checkout

Medium

- 500,000 API Calls
- 12 Workers Per Job

Checkout

Large

- 1,500,000 API Calls
- 24 Workers Per Job

Checkout

email: support AT this domain

© Brilliant Code Inc. 1918 Boul Saint Régis, Dorval, Québec, Canada.