How to read/write Word docx files in Python


In this tutorial, we’re gonna look at way to use python-docx module to read, write Word docx files in Python program.

Word documents

Word .docx file has more structures than plain text. With python-docx module, we have 3 different data types:
– a Document object for entire document.
Paragraph objects for the paragraphs inside Document object.
– Each Paragraph object contains a list of Run objects.


Read/Write Word docx files in Python

Install python-docx module

Open cmd, then run:
pip install python-docx

Once the installation is successful, we can see docx folder at Python\Python[version]\Lib\site-packages.
(In this tutorial, we use python-docx 0.8.10)

Now we can import the module by running import docx.

Read docx file
Open file

We call docx.Document() function and pass the filename to open a docx file under a Document object.

Get paragraphs

Document object has paragraphs attribute that is a list of Paragraph objects.

Get full-text

To get full-text of the document, we will:
– open the Word document
– loop over all Paragraph objects and then appends their text

Write docx file
Create Word document

We call docx.Document() function to get a new, blank Word Document object.

Save Word Document

When everything’s done, we must use save('filename.docx') Document’s method with filename to save the Document object to a file.

Add Paragraphs

We use add_paragraph() Document’s method to add a new paragraph and get a reference to this Paragraph object.

We can add text to the end of an existing Paragraph object using Paragraph’s add_run(text) method.

Result in grokonez.docx:


To make text styled, we add Run with text attributes.

Result in grokonez.docx:


These are some text attributes:
strike (strikethrough)
double_strike (double strikethrough)
all_caps (capital letters)
rtl (right-to-left)

Add Headings

We use Document’s add_heading(heading, i) method to add a paragraph with heading style with i argument from 0 to 9 for heading levels.


Add Line Breaks, Page Breaks

Instead of starting a new paragraph, we can add a line break using Run object add_break() method on the one that we want to have the break appear after.

Result in grokonez.docx:


We can also add a page break with add_break() method by passing the value docx.enum.text.WD_BREAK.PAGE as an argument to it.

Result in grokonez.docx:


Add Pictures

We can use Document object’s add_picture() method to add an image to the end of the document.

Result in grokonez.docx:


add_picture() method has optional width and height arguments.
If we don’t use them, the width and height will default to the normal size of the image.

Result in grokonez.docx:


Result in grokonez.docx:


By grokonez | January 31, 2019.

Related Posts

1 thought on “How to read/write Word docx files in Python”

  1. 87069 649134This is an outstanding write-up and I completely comprehend where your coming from within the third section. Perfect read, Ill regularly follow the other reads. 172528

Got Something To Say:

Your email address will not be published. Required fields are marked *