allscribe

v0.9.1

Published

2 years ago

A library for processing ebooks

Downloads

0High
0Medium
0Low

mattmc

ebook epub kindle kf7 markup css processing

allscribe

A small Node library providing utility methods to processing ePubs.

Installation

npm install allscribe --save

Usage

var allscribe = require('allscribe');

// open a directory
var book = allscribe.openEpub('~/Documents/Books/magnum-opus');

// compress into an .epub
book.zip(function(compressed){ 

    // open an .epub into a temporary directory, perform modifications and re-compress it
    compressed.process(function(uncompressed){

        // easily add files
        uncompressed.add('oebps/toc.xhtml', 'hello world!');    

        // cleans up cluttered markup
        // good for handling ePubs generated by InDesign
        uncompressed.cleanupMarkup();                           

        // preparing for KF7/mobi conversion
        uncompressed.simplifyCssAndMarkup();                    

        // more preparation for KF7/mobi conversion
        uncompressed.mergeMarkupClasses();

        // access & modify the toc.ncx
        var tocNcx = uncompressed.tocNcx();
        var index = tocNcx.indexOfFile('chapter1.xhtml');
        tocNcx.insertAfterIndex(index, 'Chapter Two', 'chapter2.xhtml');
        tocNcx.save();

        // access & modify the content.opf
        var contentOpf = uncompressed.contentOpf();
        contentOpf.addItemToManifestAndSpine(
            'style2', 'css/styles2.css', 'text/css', false, 'css/styles1.css'
        );

    });
    
});

Todo: Describe each function in detail.

FAQ

Disclaimer: these haven't necessarily been asked at all, let alone frequently.

Why is there no manipulate the `toc.ncx` and `content.opf` simultaneously? Isn't that an obvious utility?

Some ePubs may have direct mappings between their HTML files and their chapters, and in that case the <spine> in the content.opf will probably match the <navmap> in the toc.ncx. So you'd think you could have functions to do CRUD operations on both metadata files simulateously.

However, I have also seen books that are not this way, for example the book A Dog's Tale that is used for testing purposes for this package. There is only one actual HTML file for the entire book, so only one entry in the content.opf, but it contains multiple chapters and therefore merits multiple entries in the toc.ncx.

Because this could come up, I opted to leave out CRUD functions that attempt to manipulate both files simultaneously. It's up to the user of Allscribe to know how their books are set up and do the right things in that respect.

Why aren't the functions for manipulating the `toc.ncx` more complete or less complex?

If the <navMap> was just a straight sequence of <navPoint> elements, it would be super easy to create CRUD functions. But you can nest your <navPoint> elements, which makes it really exciting. At the time of this writing, if you need to do something more complex than adding a <navPoint> you can access the Cheerio object and do it yourself.

Tests

npm test

Dependencies

reworkcss/css for CSS manipulation
cheerio for markup manipulation

Contributing

In lieu of a formal styleguide, take care to maintain the existing coding style. Add unit tests for any new or changed functionality. Lint and test your code.

Release History

0.9.0 Added capability for dealing with the toc.ncx and content.opf.
0.8.0 Added add function, for easily adding files.
0.7.1 Fixed error from calling .zip() on an ePub directory without passing a callback
0.7.0 Added setOnRule util function
0.6.0 Added string and rework-walk package access
0.5.2 Stop treating h# element selectors as complex
0.5.1 Cheerio output customization
0.4.5 Bugfixes
0.4.4 Bugfixes
0.4.3 Bugfixes
0.4.2 Bugfixes
0.4.1 Bugfixes
0.4.0 Added Process method to the Epub class
0.3.0 Added Zip method to the EpubDir class
0.2.0 Added MergeMarkupClasses method to the EpubDir class
0.1.1 Bugfixes
0.1.0 Initial release

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

allscribe

Installation

Usage

FAQ

Why is there no manipulate the toc.ncx and content.opf simultaneously? Isn't that an obvious utility?

Why aren't the functions for manipulating the toc.ncx more complete or less complex?

Tests

Dependencies

Contributing

Release History

Why is there no manipulate the `toc.ncx` and `content.opf` simultaneously? Isn't that an obvious utility?

Why aren't the functions for manipulating the `toc.ncx` more complete or less complex?