How to read/write Word docx files in Python

python-read-write-word-docx-files-feature-image

In this tutorial, we’re gonna look at way to use python-docx module to read, write Word docx files in Python program.

Word documents

Word .docx file has more structures than plain text. With python-docx module, we have 3 different data types:
– a Document object for entire document.
Paragraph objects for the paragraphs inside Document object.
– Each Paragraph object contains a list of Run objects.

read-write-word-docx-files-in-python-docx-module-docx-file

Read/Write Word docx files in Python

Install python-docx module

Open cmd, then run:
pip install python-docx

Once the installation is successful, we can see docx folder at Python\Python[version]\Lib\site-packages.
(In this tutorial, we use python-docx 0.8.10)

Now we can import the module by running import docx.

Read docx file
Open file

We call docx.Document() function and pass the filename to open a docx file under a Document object.

Get paragraphs

Document object has paragraphs attribute that is a list of Paragraph objects.

Get full-text

To get full-text of the document, we will:
– open the Word document
– loop over all Paragraph objects and then appends their text

Write docx file
Create Word document

We call docx.Document() function to get a new, blank Word Document object.

Save Word Document

When everything’s done, we must use save('filename.docx') Document’s method with filename to save the Document object to a file.

Add Paragraphs

We use add_paragraph() Document’s method to add a new paragraph and get a reference to this Paragraph object.

We can add text to the end of an existing Paragraph object using Paragraph’s add_run(text) method.

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-paragraphs

To make text styled, we add Run with text attributes.

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-paragraphs-add-runs

These are some text attributes:
bold
italic
underline
strike (strikethrough)
double_strike (double strikethrough)
all_caps (capital letters)
shadow
outline
rtl (right-to-left)

Add Headings

We use Document’s add_heading(heading, i) method to add a paragraph with heading style with i argument from 0 to 9 for heading levels.

read-write-word-docx-files-in-python-docx-module-add-headings

Add Line Breaks, Page Breaks

Instead of starting a new paragraph, we can add a line break using Run object add_break() method on the one that we want to have the break appear after.

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-line-break

We can also add a page break with add_break() method by passing the value docx.enum.text.WD_BREAK.PAGE as an argument to it.

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-page-break

Add Pictures

We can use Document object’s add_picture() method to add an image to the end of the document.

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-picture

add_picture() method has optional width and height arguments.
If we don’t use them, the width and height will default to the normal size of the image.

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-picture-width

Result in grokonez.docx:

read-write-word-docx-files-in-python-docx-module-add-picture-width-height

By grokonez | January 31, 2019.



Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*