npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mime-to-jmap

v0.2.0

Published

This is a reference implementation of a [JMAP](https://jmap.io/spec-mail.html)-compatible email parser.

Downloads

3

Readme

Email to JMAP tools

This is a reference implementation of a JMAP-compatible email parser.

The code is a bit of a mess - most of the implementation here is C code which has been ripped out of cyrus-imap since thats already maintained and used for JMAP at Fastmail. The C code compiles to wasm, and does all the heavy lifting to extract usable jmap-compatible structure information, HTML and text bodies and file attachments.

This is needed because most (all?) of the email parsing libraries on npm are missing features:

  • Many JS email libraries pack the headers into an object, which results in the headers being reordered arbitrarily. I'm very sad to say, the order of email headers matters in many cases.
  • In their truest form, emails actually contain a tree of message objects. Most email parsing libraries are too keen to merge message envelopes together.
  • The cyrus email parser used here produces clean HTML from any email for rendering in a browser-like context. (Its the same code that renders emails for fastmail). It also produces meaningful text snippets - even for html-heavy emails.
  • We have full internationalization support

In general, the email spec is nightmare fuel. This library attempts to tame the horror show.

There are two main pieces to this library:

  • A WASM-compiled bundle which bundles cyrus-imap's JMAP handling code. Cyrus is used in production with millions of emails, so this code should be pretty reliable and stable.
  • Some simple javascript code to interact with the native module, and iterate through emails in an mboxrd file. I've only really exercised this thoroughly with emails extracted from gmail. There's a good chance of bugs lurking here.

The mbox handling may contain some bugs simply by virtue of being young code. Please file issues when you find them.

A note on bundle size

Also note the wasm compiled bundle (which can be downloaded from npm) is quite large for use in the browser, sitting at 735k gzipped. The main reason for this is that the library bundles parts of libicu for internationalization support. So the wasm file contains character encoding tables from the pre-unicode world.

There are two ways we could shrink the library:

  • We could build without support for all legacy character encodings (so stop supporting emails which aren't encoded to ASCII, Latin-1 or a UTF-* variant)
  • Use libicu from the execution environment instead of bundled copies. All modern operating systems (and browsers) already have the character encodings we need. We could rewrite the charset code to call out to javascript's String#normalize and TextDecoder to convert obscure formats to unicode.

Please vote for fixing this by filing a github issue if this matters to you. Otherwise I'll assume nobody minds.

Parsing RFC8222 email objects

To convert mime objects into JMAP-compatible JSON use envelope_to_jmap(content: ArrayBufferView | Buffer | string, [options]). This function returns {json, [attachments]} where json is a JMAP email object with all default fields and full header information.

The parser code is synchronous, but unfortunately because the guts of this library is in a wasm module, you have to wait on a ready promise before you can call any methods from this library.

const {ready, envelope_to_jmap} = require('mime-to-jmap')
const fs = require('fs')

const file = fs.readFileSync('some.eml')

ready.then(() => {
  const {json} = envelope_to_jmap(file)

  // json contains {to: [...], from: [...], cc, subject, headers, ... etc as per the jmap spec}
  console.log('From: ', json.from, 'To', json.to, 'subject', json.subject)
  console.log('text body:', json.bodyValues[json.textBody[0].partId].value)

  // Or the HTML body:
  //console.log('html body:', json.bodyValues[json.htmlBody[0].partId].value)
})

The main entrypoint for this library is envelope_to_jmap(). The method takes a buffer with the email data and an optional options object. If supplied, the options object takes the following fields:

  • attachments: (boolean, default false) By default, envelope_to_jmap will not parse out email attachments beyond text and HTML sections. If the attachments flag is set, envelope_to_jmap will return an arguments field, which is an object mapping from blob IDs to the attachment data itself. Note the metadata for each attachment is returned through the regular JSON email body object. This data is attached separately to fit with the jmap data model. See examples/extract_attachments.js for an example of how to extract this data in practice. The attachment data is returned as an ArrayBuffer because the node buffer class isn't available in the browser.
  • want_headers: (string[], default []) A list of extra custom headers for envelope_to_jmap to parse and return in the returned JSON object. Currently all raw headers are returned by default anyway, but the library can parse custom headers into standard objects for many types of data. For example, to fetch unsubscribe links, pass ['header:List-Unsubscribe:asURLs'] here and the corresponding header will be automatically extracted and decoded to URLs. Values extracted this way are put on the returned JMAP object using the supplied search string as their key. (Eg via json['header:List-Unsubscribe:asURLs']).
  • want_bodyheaders: (string[], default []) This is functionally the same as want_headers above, but instead of looking for headers on the root object, this looks for the named header in each body inside the email envelope.

Note: The returned object is slightly non-spec compliant as JMAP objects are expected to have a receivedAt date, but we can't calculate that value from the mime object alone - it needs to be filled in by the receiving email server. As a result, the json.receivedAt property will be null on the returned object.

Parsing emails in MBOX files

This library also comes with some utility methods for pulling emails out of mbox files (using the mboxrd format). This has been tested with emails from Google Takeout, but there may be bugs importing emails from other systems. (Email is a hot mess.)

const {ready, mbox_each, mbox_to_eml, envelope_to_jmap} = require('mime-to-jmap')
const fs = require('fs')

process.on('unhandledRejection', e => {
  throw e
})

;(async () => {
  await ready

  const filename = process.argv[2] || 'archive.mbox'
  const stream = fs.createReadStream(filename)

  for await (const msg of mbox_each(stream)) {
    const {body, mboxFromAddress, receivedAt} = mbox_to_eml(msg)
    const {json} = envelope_to_jmap(body)

    json.receivedAt = receivedAt
    
    console.log('Email', mboxFromAddress, 'from', json.from[0].name, 'subject', json.subject)
  }
})()

Note that the mbox format contains two extra fields for each email:

  • From address
  • IMAP timestamp

Gmail (and other?) systems use the 'from' address to list a system-internal identifier for the email message.

The timestamp is the time the email was received on the server, so if you're going to serve these messages over jmap (or another protocol), you can assign the timestamp into the jmap JSON object. This is returned as an ISO string, as expected of JMAP.

Compiling from source

Its a pain in the neck to compile from source. You need:

Build libicu from the wasm32 branch using emscripten (emconfigure ./configure)

Then change CMakeLists.txt in mime-to-jmap to point to your compiled libicu, then run make or make Release.

LICENSE

This library contains code from Cyrus, which can trace its origins back to the dark days at CMU. See COPYING_cyrus for the cyrus license.

All other code is copyright 2019 Joseph Gentle

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.