Create simple and clean HTML from Microsoft® Word Docs with Mammoth
Semantically convert Word into HTML without worrying about the content being erased.
What is Mammoth?
Mammoth is an easy-to-use, simple, no-fuss package that can be used to convert Word docs generated from Google Docs, Microsoft Word, and LibreOffice into HTML. As an open source Doc to HTML converter, Mammoth comes in handy to semantically convert any document without focusing on the styling, color, or fonts used.
Mammoth provides web demos to see how it will convert docs into HTML, however, one of the best features of this Doc converter to HTML is the many platforms it supports including WordPress, Java/JVM, .NET, and Python through PyPI. If you have complicated documents created with many styles and color features, it might be possible to find a mismatch between the end result and the input file.
Regardless, for simple Word documents that need to be converted into HTML, Mammoth gets the job done.
Getting Started with Mammoth
The recommended way to install Mammoth library is via npm. Please use the following command for a smooth installation
Install Mammoth via npm
npm install mammoth
Convert Microsoft® Word to HTML via Free JavaScript API
Mommoth is an open source JavaScript API to convert Word to HTML for Free. It provides many features to format and edit Word docs when they have been converted into HTML such as adding Headings, Lists, and images, Italicize and bold font, adding line breaks, and much more. All images in the word doc are generated in the HTML result by default. You can also extract raw text from any document by using the mammoth.extractRawText function, however, it will ignore the formatting from the original document.
Convert an existing .docx file to HTML
var mammoth = require("mammoth");
mammoth.convertToHtml({path: "path/to/document.docx"})
.then(function(result){
var html = result.value; // The generated HTML
var messages = result.messages; // Any messages, such as warnings during conversion
})
.done();
Map Styles from Word to HTML via JavaScript API
For the most part, Mammoth does map common Microsoft Word DOCX styles from an original Word document into the HTML end result. Elements such as Heading 1 in Word are converted into H1 in HTML. However, Mammoth does provide many functions to convert styles from Word doc to HTML.
Custom Style Map
var mammoth = require("mammoth");
var options = {
styleMap: [
"p[style-name='Section Title'] => h1:fresh",
"p[style-name='Subsection Title'] => h2:fresh"
]
};
mammoth.convertToHtml({path: "path/to/document.docx"}, options);