compromise-dates
v3.7.0
Published
plugin for nlp-compromise
Downloads
43,529
Readme
This library is an earnest attempt to get date information out of text, in a clear way -
import nlp from 'compromise'
import datePlugin from 'compromise-dates'
nlp.plugin(datePlugin)
let doc = nlp('the second monday of february')
doc.dates().get()[0]
/*
{ start: '2021-02-08T00:00:00.000Z', end: '2021-02-08T23:59:59.999Z'}
*/
Things it does well:
| explicit-dates
| description | Start
| End
|
| ----------------------------------- | :-----------------------------------: | ---------------: | ---------------: |
| march 2nd | | March 2, 12:00am | March 2, 11:59pm |
| 2 march | | '' | '' |
| tues march 2 | | '' | '' |
| march the second | natural-language number | '' | '' |
| on the 2nd | implicit months | '' | '' |
| tuesday the 2nd | date-reckoning | '' | '' |
| numeric-dates:
| |
| 2020/03/02 | iso formats | '' | '' |
| 2020-03-02 | | '' | '' |
| 03-02-2020 | british formats | '' | '' |
| 03/02 | | '' | '' |
| 2020.08.13 | alt-ISO | '' | '' |
| named-dates:
| |
| today | | - | - |
| tomorrow | | '' | '' |
| christmas eve | calendar-holidays | Dec 24, 12:00am | Dec 24, 11:59pm |
| easter | astronomical holidays | -depends- | - |
| q1 | | Jan 1, 12:00am | Mar 31, 11:59pm |
| times:
| |
| 2pm | | '' | '' |
| 2:12pm | | '' | '' |
| 2:12 | | '' | '' |
| 02:12:00 | weird iso-times | '' | '' |
| two oclock | written formats | '' | '' |
| before 1 | | '' | '' |
| noon | | '' | '' |
| at night | informal daytimes | '' | '' |
| in the morning | | '' | '' |
| tomorrow evening | | '' | '' |
| timezones:
| |
| eastern time | informal zone support | '' | '' |
| est | TZ shorthands | '' | '' |
| peru time | | '' | '' |
| ..in beirut | by location | '' | '' |
| GMT+9 | by UTC/GMT offset | '' | '' |
| -4h | '' | '' | '' |
| Canada/Eastern | IANA codes | '' | '' |
| relative durations:
| |
| this march | | '' | '' |
| this week | | '' | '' |
| this sunday | | '' | '' |
| next april | | '' | '' |
| this past year | | '' | '' |
| second week of march | | '' | '' |
| last weekend of march | | '' | '' |
| last spring | | '' | '' |
| the saturday after next | | '' | '' |
| punted dates:
| |
| in seven weeks | now+duration | '' | '' |
| two days after june 6th | date+duration | '' | '' |
| 2 weeks from now | | '' | '' |
| 2 weeks after june | | '' | '' |
| 2 years, 4 months, and 5 days ago | complex durations | '' | '' |
| a week and a half before | written-out numbers | '' | '' |
| a week friday | idiom format | '' | '' |
| start/end:
| |
| end of the week | up-against the ending | '' | '' |
| start of next year | lean-toward starting | '' | '' |
| middle of q2 last year | rough-center calculation | '' | '' |
| date-ranges:
| |
| between june and july | explicit ranges | '' | '' |
| from today to next haloween | | '' | '' |
| aug 1 - aug 31 | dash-ranges | '' | '' |
| 22-23 February | | '' | '' |
| today to next friday | | '' | '' |
| during june | | '' | '' |
| aug to june 1999 | shared range info | '' | '' |
| before [2019] | up-to a date | '' | '' |
| by march | | '' | '' |
| after february | date-to-infinity | '' | '' |
| repeating-intervals:
| |
| any wednesday | n-repeating dates | |
| any day in June | repeating-date in range | June 1 ... | .. June 30 |
| any wednesday this week | | '' | '' |
| weekends in July | more-complex interval | '' | '' |
| every weekday until February | interval until date | '' | '' |
Things it does awkwardly:
| hmmm,
| description | Start
| End
|
| ------------------------ | :--------------------------------------------: | :-----: | :---: |
| middle of 2019/June | tries to find the sorta-center | June 15 | '' |
| good friday 2025 | tries to reckon astronomically-set holidays | '' | '' |
| Oct 22 1975 2am in PST | historical DST changes (assumes current dates) | '' | '' |
Things it doesn't do:
| 😓, | description | Start
| End
|
| ------------------------------------------- | :----------------------: | :-----: | :---: |
| not this Saturday, but the Saturday after | self-reference logic | '' | '' |
| 3 years ago tomorrow | folksy short-hand | '' | '' |
| 2100 | military time formats | '' | '' |
| may 97 | 'bare' 2-digit years | '' | '' |
API
- .dates() - find dates like
June 8th
or03/03/18
- .dates().get() - simple start/end json result
- .dates().json() - overloaded output with date metadata
- .dates().format('') - convert the dates to specific formats
- .dates().isBefore(iso) - return only dates occuring before given date
- .dates().isAfter(iso) - return only dates occuring after given date
- .dates().isSame(unit, iso) - return only dates within a given year, month, date
- .durations() -
2 weeks
or5mins
- .durations().get() - return simple json for duration
- .durations().json() - overloaded output with duration metadata
- .times() -
4:30pm
orhalf past five
- .durations().get() - return simple json for times
- .times().json() - overloaded output with time metadata
Configuration:
.dates()
accepts an optional object, that lets you set the context for the date parsing.
const context = {
timezone: 'Canada/Eastern', //the default timezone is 'ETC/UTC'
today: '2020-02-20', //the implicit, or reference day/year
punt: { weeks: 2 }, // the implied duration to use for 'after june 2nd'
dayStart: '8:00am',
dayEnd: '5:30pm',
dmy : false //assume british-format dates, when unclear
}
nlp('in two days').dates(context).get()
/*
[{ start: '2020-02-22T08:00:00.000+5:00', end: '2020-02-22T17:30:00.000+5:00' }]
*/
Opinions:
Start of week:
By default, weeks start on a Monday, and 'next week' will run from Monday morning to Sunday night. This can be configued in spacetime, but right now we are not passing-through this config.
Implied durations:
'after October' returns a range starting Nov 1st, and ending 2-weeks after, by default.
This can be configured by setting punt
param in the context object:
doc.dates({ punt: { month: 1 } })
Future bias:
'May 7th' will prefer a May 7th in the future.
The parser will return a past-date though, in the current-month:
// from march 2nd
nlp('feb 30th').dates({ today: '2021-02-01' }).get()
This/Next/Last:
named-weeks or months eg 'this/next/last week' are mostly straight-forward.
This monday
A bare 'monday' will always refer to itself, or the upcoming monday.
- Saying 'this monday' on monday, is itself.
- Saying 'this monday' on tuesday , is next week.
Likewise, 'this june' in June, is itself. 'this june' in any other month, is the nearest June in the future.
Future versions of this library could look at sentence-tense to help disambiguate these dates - 'i paid on monday' vs 'i will pay on monday'.
Last monday
If it's Tuesday, 'last monday' will not mean yesterday.
- Saying 'last monday' on a tuesday will be -1 week.
- Saying 'a week ago monday' will also work.
- Saying 'this past monday' will return yesterday.
For reference, Wit.ai & chronic libraries both return yesterday. Natty and SugarJs returns -1 week, like we do.
'last X' can be less than 7 days backward, if it crosses a week starting-point:
- Saying 'last friday' on a monday will be only a few days back.
Next Friday
If it's Tuesday, 'next wednesday' will not be tomorrow. It will be a week after tomorrow.
- Saying 'next wednesday' on a tuesday, will be +1 week.
- Saying 'a week wednesday' will also be +1 week.
- Saying 'this coming wednesday' will be tomorrow.
For reference, Wit.ai, chronic, and Natty libraries all return tomorrow. SugarJs returns +1 week, like we do.
Nth Week:
The first week of a month, or a year is the first week with a thursday in it. This is a weird, but widely-held standard. I believe it's a military formalism. It cannot be (easily) configued. This means that the start-date for first week of January may be a Monday in December, etc.
As expected, first monday of January will always be in January.
British/American ambiguity:
by default, we use the same interpretation of dates as javascript does - we assume 01/02/2020
is Jan 2nd, (US-version) but allow 13/01/2020
to be Jan 13th (UK-version).
if you want to co-erce an interpretation of 02/03/1999
, you can set it with the dmy:true
option:
nlp('02/03/1999').dates().get() //February 3
nlp('02/03/1999').dates({dmy:true}).get() // March 2
ISO dates, (like 1999-03-02
) are unaffected by the change.
Seasons:
By default, 'this summer' will return June 1 - Sept 1, which is northern hemisphere ISO. Configuring the default hemisphere should be possible in the future.
Day times:
There are some hardcoded times for 'lunch time' and others, but mainly, a day begins at 12:00am
and ends at 11:59pm
- the last millisecond of the day.
Invalid dates:
compromise will tag anything that looks like a date, but not validate the dates until they are parsed.
- 'january 34th 2020' will return Jan 31 2020.
- 'tomorrow at 2:62pm' will return just return 'tomorrow'.
- '6th week of february will return the 2nd week of march.
- Setting an hour that's skipped, or repeated by a DST change will return the closest valid time to the DST change.
Inclusive/exclusive ranges:
'between january and march' will include all of march. This is usually pretty-ambiguous normally.
Date greediness:
This library makes no assumptions about the input text, and is careful to avoid false-positive dates. If you know your text is a date, you can crank-up the date-tagger with a compromise-plugin, like so:
nlp.extend(function (Doc, world) {
// ambiguous words
world.addWords({
weds: 'WeekDay',
wed: 'WeekDay',
sat: 'WeekDay',
sun: 'WeekDay',
})
world.postProcess(doc => {
// tag '2nd quarter' as a date
doc.match('#Ordinal quarter').tag('#Date')
// tag '2/2' as a date (not a fraction)
doc.match('/[0-9]{1,2}/[0-9]{1,2}/').tag('#Date')
})
})
Misc:
- 'thursday the 16th' - will set to the 16th, even if it's not thursday
- 'in a few hours/years' - in 2 hours/years
- 'jan 5th 2008 to Jan 6th the following year' - date-range explicit references
- assume 'half past 5' is 5pm
About:
Parsing dates, times, durations, and intervals from natural language can be a solved-problem.
A rule-based, community open-source library - one based on simple NLP - is the best way to build a natural language date parser - commercial, or otherwise - for the frontend, or the backend.
The match-syntax is effective and easy, javascript is prevailing, and the more people who contribute, the better.
See also
- Duckling - by wit.ai (facebook)
- Sugarjs/dates - by Andrew Plummer (js)
- Chronic - by Tom Preston-Werner (Ruby)
- SUTime - by Angel Chang, Christopher Manning (Java)
- Natty - by Joe Stelmach (Java)
- rrule - repeating date-interval handler (js)
- ParseDateTime by Mike Taylor (Python)
compromise-date is sponsored by
MIT licenced