add-urls-to-planning-file
v8.54.1
Published
NB: THIS PROGRAM HAS NOT BEED TESTED IN CASE THE PRODUCT IS NOT AVAILABLE ON A WEBSITE. THE BEHAVIOR IN THIS CASE IS UNPREDICTABLE.
Downloads
2
Readme
add-urls-to-planning file
NB: THIS PROGRAM HAS NOT BEED TESTED IN CASE THE PRODUCT IS NOT AVAILABLE ON A WEBSITE. THE BEHAVIOR IN THIS CASE IS UNPREDICTABLE.
Usage
ENV=production node bin/run.js -u "http://spreadsheetUrl.com" -n "sheetName"
Manual test
- consider the following spreadsheet to run test https://docs.google.com/spreadsheets/d/1EvD9mGr9zPAjTgifBc-dtT6PNlB4mcUywtUo_XJGo48/edit#gid=1097386816
How it works
Given an SKU and channel, it uses bing for searching for the query
site:amazon.de "HF3507/20" // QUERY_TYPE=v1
domain:amazon.de intitle:"QE75Q7F" "QE75Q7F" "Produktinformation" // QUERY_TYPE=v2
domain:amazon.de && intitle:"QE65Q7F" && "QE65Q7F" && -intitle:"Suchergebnis auf" // QUERY_TYPE=v3
It returns the triple <URL, name, URL_QUALITY>, representing:
- URL: the page of the product (e.g. where the product is sold)
- name: the name of the page as it appears on the search engine when you run a search
- URL_QUALITY: if the name contains the SKU: because the name is usually the title of the page, and if the page contains the SKU, there is 99% chance that it is the correct page
Ideas for improvements (next versions)
- a query like
site:amazon.de "Modellnummer:HF3507/20"
is much more strict, and it returns few result - usually just one - it could be used to increase the precision of the algorithm (less recall is acceptable)
Resources
sometimes site works, someother times don't, same for domain, check following examples
TODO
- [ ] Print report - out of tot entries, tot URLs has been found (percentage, and absolute numbers)
- [ ] To add validation of the column of the spreadsheet: each row should have at least the fields <sku, channel, urlName, urlQuality, url>