beam-uri
v3.2.0
Published
URI Swiss Knife π¨ππͺ
Downloads
783
Readme
URI Swiss Army Knife Utilities
You can install this via the published NPM package:
npm i beam-uri
URL Validation
A complete definition of what constitutes a valid URL can be found in RFC 3986 and RFC 3987. The short version is that a valid URL must, at minimum, consist of a scheme (https://
, http://ftp://
, http://gopher://
) and a host name. If it does not, validation should fail, and the browser should throw an error.
A URL string is a structured string containing multiple meaningful components. When parsed, a URL object is returned containing properties for each of these components.
The Node.js url
module provides two APIs for working with URLs: a legacy API that is Node.js specific, and a newer API that implements the same WHATWG
URL Standard used by web browsers.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β href β
ββββββββββββ¬βββ¬ββββββββββββββββββββββ¬βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ¬ββββββββ€
β protocol β β auth β host β path β hash β
β β β βββββββββββββββββββ¬βββββββΌβββββββββββ¬βββββββββββββββββ€ β
β β β β hostname β port β pathname β search β β
β β β β β β βββ¬βββββββββββββββ€ β
β β β β β β β β query β β
" https: // user : pass @ sub.example.com : 8080 /p/a/t/h ? query=string #hash "
β β β β β hostname β port β β β β
β β β β βββββββββββββββββββ΄βββββββ€ β β β
β protocol β β username β password β host β β β β
ββββββββββββ΄βββΌβββββββββββ΄βββββββββββΌβββββββββββββββββββββββββ€ β β β
β origin β β origin β pathname β search β hash β
βββββββββββββββ΄ββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββββββββ΄ββββββββ€
β href β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
(all spaces in the "" line should be ignored β they are purely for formatting)
getDomain(url) β String
We can extract the domain from a url by leveraging our method for parsing the hostname. Since the above getHostName() method gets us very close to a solution, we just need to remove the sub-domain and clean-up special cases (such as .co.uk)
Returns: String - the extracted domain
getDomainName(url) β String
Extract the main domain without the .domain notation
Returns: String - the extracted domain
getHostName(url) β String
Extracting the hostname from a url is generally easier than parsing the domain. The hostname of a url consists of the entire domain plus sub-domain. We can easily parse this with a regular expression, which looks for everything to the left of the double-slash in a url. We remove the βwwwβ (and associated integers e.g. www2), as this is typically not needed when parsing the hostname from a url
Returns: String - the extracted hostname
getLinkType(source) β String
Identify if the link is for a social website
Kind: global function
isValidIP(ip) β Boolean
Validate if a passed string is a valid IP according to: http://jsfiddle.net/AJEzQ/
Returns: Boolean - indication if the string is valid URI or not
isValidURI(url) β Boolean
Validate if a passed string is a valid URI according to: https://gist.github.com/dperini/729294
Returns: Boolean - indication if the string is valid URI or not
normalize(url) β String
normalize and canonicalise urls including data URL The function first normalize the url by performing various steps from lower-casing to encoding The function then strips any url trackers and paddings in the url The function tries to canonicalise the url if possible based on configurations depending on the domain name
Returns: String - the normalized and canonical url
removeURLTracking(url) β String
removes tracking query parameters from the url
Returns: String - strippedUrl the URL address after tracker stripping
parse(url) β Object
Parses a valid URI into its subparts
Returns: Object - the parsed url
References
- In search of the perfect URL validation regex
- uri-js: An RFC 3986 compliant, scheme extendable URI parsing/validating/normalizing/resolving library for JavaScript
- regex-weburl: Regular Expression for URL validation
- parse-domain: Splits a URL into sub-domain, domain and the top-level domain. Provides TypeScript typings
- normalize-url: Normalize a URL