webquery
v1.0.9
Published
Query the web with SQL-like syntax
Downloads
6
Maintainers
Readme
WebQuery
Query the web with SQL-like syntax
Inspired by the great Yahoo! YQL, this tool can help you generate stub files for development, scrap data from multiple sources for your portal, perform website's health check,
test your app or just for fun!
Installation
npm install -g webquery
Usage
Terminal (*NIX)
webquery [options...]
For example: To print to console the content, the lang
attribute value and the number of children elements of all
paragraph elements which has content
class in page https://twitter.com/feditorio - just run the following command:webquery -l -q "SELECT text, attr(lang), size(children) as total FROM https://twitter.com/feditorio WHERE jquery=(.content p)"
Options:
-q "QUERY"
- Query statement-f "JSON_OUTPUT_FILE_PATH"
- JSON output file path-ua "USER_AGENT"
- Valid browser user agent-l
- Indicates whether results should be logged to console-h
- Prints usage information-v
- Prints the version number
Node App
var wq = require('webquery');
// Arguments:
// 1 - {string} Query statement
// 2 - {string} JSON output file path
// 3 - {string} Valid browser user agent
// 4 - {boolean} Indicates whether results should be logged to console
// Returns a promise
wq.query('SELECT text, attr(lang), size(children) as total FROM https://twitter.com/feditorio WHERE jquery=(.content p)', null, true).then(
function success (result) {
console.log('Query completed successfully!');
console.log(result);
},
function error (err) {
console.error('Query failed to complete: %s', err);
}
);
Query Statement
SELECT {PROPERTY1}[, {PROPERTY2}[,...]] FROM {URL1}[, {URL2}[,...]] WHERE {SELECTOR1} [OR {SELECTOR2} [OR...]]
Property
You can use single or multiple comma-separated properties from the list below:
tag
- Tag nametype
- Element typehtml
- HTML contentstext
- Combined text contents, including their descendantsvalue
- Current value (form element)id
- Id attribute valuename
- Name attribute valueclass
- CSS class namesindex
- Position of the element, relative to its sibling elementsattr(_attribute_)
- Value of attribute attributedata(_attribute_)
- Value of data attribute attribute (without the data- prefix)size(children)
- Number of childrensize(attributes)
- Number of attributes
URL
Any valid URL which starts with http://
or https://
protocols.
You can query single or multiple comma-separated urls.
SELECTOR
You can use either jquery or xpath valid selectors. You may also mix them both or use multiple selectors
of each type you like, separated with OR
operator:
jQuery:
WHERE jquery=(YOUR_SELECTOR_GOES_HERE)
For example:WHERE jquery=(p > div.content)
XPath:
WHERE xpath=(YOUR_SELECTOR_GOES_HERE)
For example:WHERE xpath=(/*[@id=’foo’])
Mixed:
'WHERE jquery=(p > div.content) OR WHERE jquery=(#messages li) OR xpath=(/*[@id=’foo’])'
Output
{
"meta": {
"date": 1439761398928, // UNIX time in which query was executed
"duration": 2881, // Time in milliseconds it took the query to complete
"url": [ // Array of URLs which were used in the "FROM" clause
"https://my.website.com"
],
"title": [ // An array of pages titles of the url(s) above
"My Website"
],
"items": 36 // Number of items found
},
"data": [ // Array of all items which were found
{
// ..
},
//...
]
}
Known Issues
You may experience problems while executing webquery
if you had to use SUDO
to install it globally.
In general, it is most recommended to use NPM without having to run commands as administrator.
To do so, follow the instructions below:
- Change prefix in NPM configuration:
npm config set prefix ~/npm
- Add NPM's bin folder to your system's PATH in ~/.bashrc:
PATH=$PATH:$HOME/npm/bin
- Reload ~/.bashrc:
. ~./.bashrc
You may re-install now webquery
package.