@markoskon/scrape-url-node
v0.3.1
Published
A node script that scrapes linked URLs from an initial URL (under the same origin) recursively and reports in the console the unique characters it found.
Downloads
2
Readme
Scrape URLs
A node script that scrapes from an initial URL linked URLs, under the same origin, recursively. It then reports in the console:
- The unique characters from the HTML pages it visited.
- The URLs it discovered and visited. In other words, you get a list with all the linked pages under the same origin.
- A list of URLs it discovered and not visited because they are not from the same origin. In other words, you get a list with all the URLs you link to (this list also includes URLs with hashes and non http/https, though).
Install
npm i -g @markoskon/scrape-url-node
Usage
scrape [option..] <url>
# get the help menu
scrape --help
Future updates
There's a (small) chance to extend it to search for other information, not only URL lists and characters. This is because the ability to recursively discover URLs from a root URL seems useful (I'm sure there are many programs that already do this, but, oh well). I could implement this by parsing additional options in the CLI, or even with a callback, if I export a function that you then can import in your code.
For the time being, I'm not considering to execute JavaScript with a tool like Puppeteer, because the information I want doesn't require it.