npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

extract-us-city

v0.0.7

Published

A library that extracts City, State and Zip information from text and provides a structured response.

Downloads

1,185

Readme

Extract US Cities

This project uses Natural Language Processing techniques to identify US cities in a body of text.

Overall we use basic tokenization to create an array of Pronouns (including all-caps ones because human beings are special) to break down the input text. Then we compare those pronouns against a US City dictionary to perform the named entity extraction. Once a city has been identified we refine the potential candidates down to a single match based on other information near to the found entity, like the State and Zip Code.

The goal of this project is high precision identification vs. loose identification (a.k.a Micro Understanding). So you won't get matches for Cities that don't have any refining context surrounding them.

Installation

Extract US Cities is available as an npm package.

npm install extract-us-city

Getting Started

Node JS

const { extract } = require('extract-us-city');

const string = 'A string with a US city in it like Appleton WI, 54911. I hope it finds it.';
const result = extract(string);
/* result = [
  {
    city: 'Appleton',
    state_code: 'WI',
    state_name: 'Wisconsin',
    county_name: 'Outagamie',
    lat: 44.2774,
    lng: -88.3894,
    incorporated: true,
    timezone: 'America/Chicago',
    foundState: true,
    end: 53,
    foundZip: true,
    zip: '54911',
    start: 35,
  },
];
*/

Browser

<textarea id="myTextArea"></textarea>
<button id="myButton" type="button">Extract</button>
<script src="./node_modules/extract-us-city/dist/extract-us-city.js"></script>
<script src="./mode_modules/jquery/dist/jquery.min.js"></script>
<script>
$('#myButton').click(() => {
  var text = $('#myTextArea').val();
  var data = extractUsCity.extract(text);
  console.log(data);
});
</script>

Data

The database contains records for over 70,000 US cities along with their associated metadata. The result will always contain the location where the match was found as well as whether or not the State and/or Zip Code were identified.

If a zip code is found the "zip" field will be populated. If a zip code is not provided and there is only one zip code for the city the "zip" field will also populated. However, if no zip code is found and there are multiple zip codes for a given city all zip codes will be provided in a different field call "zips".

Examples

You can find all the test cases I'm currently validating results for in the test.js file. If you devise a test case you think this should work for where it does not, please submit an issue or a Pull Request.

Examples of what this will NOT do (nor is it intended to):

"Brandon went to the park."
This won't return results because even though there are several cities names Brandon there no context indicating it is a place.

"Some interesting things happened in Charlotte today."
This also won't return results even though we know Charlotte is a place (unless interesting things are happening inside of a person named Charlotte which would be strange BUT possible I guess). This is because we don't have enough context to know which Charlotte it is. Is it Charlotte, MI or Charlotte, TX or... (this list goes on).

Issues

I'm not perfect so when you find bugs please post them on the github issue tracker.

Contribute

PRs are welcome, please ensure you've run and possibly added some more test cases.

Benchmarking

On my Dell XP 15 it currently takes 49,967 ms to process "Pride and Prejudice" by Jane Austen.
Which isn't bad (less than a minute) but I'm open to ideas on how to improve processing time.

License

This project is licensed under the terms of the ISC license