How to read/write Word docx files in Python


In this tutorial, we’re gonna look at way to use python-docx module to read, write Word docx files in Python program.

Word documents

Word .docx file has more structures than plain text. With python-docx module, we have 3 different data types:
– a Document object for entire document.
Paragraph objects for the paragraphs inside Document object.
– Each Paragraph object contains a list of Run objects.


Read/Write Word docx files in Python

Install python-docx module

Open cmd, then run:
pip install python-docx

Once the installation is successful, we can see docx folder at Python\Python[version]\Lib\site-packages.
(In this tutorial, we use python-docx 0.8.10)

Now we can import the module by running import docx.

Read docx file
Open file

We call docx.Document() function and pass the filename to open a docx file under a Document object.

>>> import docx
>>> gkzDoc = docx.Document('grokonez.docx')
Get paragraphs

Document object has paragraphs attribute that is a list of Paragraph objects.

>>> gkzDoc = docx.Document('grokonez.docx')

>>> len(gkzDoc.paragraphs)
>>> gkzDoc.paragraphs[0].text
' was the predecessor website to'
>>> gkzDoc.paragraphs[1].text
'In this brandnew site, we don\u2019t only focus on Java & Javascript Technology but also approach to other technologies & frameworks, other fields of computer science such as Machine Learning and Testing. All of them will come to you in simple, feasible, practical and integrative ways. Then you will feel the connection of everything.'
>>> gkzDoc.paragraphs[2].text
'What does grokonez mean?'
>>> gkzDoc.paragraphs[3].text
'Well, grokonez is derived from the words grok and konez.'
Get full-text

To get full-text of the document, we will:
– open the Word document
– loop over all Paragraph objects and then appends their text

>>> import docx
>>> gkzDoc = docx.Document('grokonez.docx')

>>> fullText = []
>>> for paragraph in doc.paragraphs:
...     fullText.append(paragraph.text)

>>> fullText
' was the predecessor website to',
'In this brandnew site, we don\u2019t only focus on Java & Javascript Technology but also approach to other technologies & frameworks, other fields of computer science such as Machine Learning and Testing. All of them will come to you in simple, feasible, practical and integrative ways. Then you will feel the connection of everything.',
'What does grokonez mean?',
'Well, grokonez is derived from the words grok and konez.'

# add 2 lines between paragraphs and merge them all
>>> '\n\n'.join(fullText)
Write docx file
Create Word document

We call docx.Document() function to get a new, blank Word Document object.

>>> import docx
>>> gkzDoc = docx.Document()
>>> gkzDoc

Save Word Document

When everything’s done, we must use save('filename.docx') Document’s method with filename to save the Document object to a file.

Add Paragraphs

We use add_paragraph() Document’s method to add a new paragraph and get a reference to this Paragraph object.

>>> gkzDoc.add_paragraph(' for developers!')

We can add text to the end of an existing Paragraph object using Paragraph’s add_run(text) method.

>>> para1 = gkzDoc.add_paragraph('1- Python Tutorials')
>>> para2 = gkzDoc.add_paragraph('2- Tensorflow Tutorials')
>>> para1.add_run(' - Basics')

Result in grokonez.docx:


To make text styled, we add Run with text attributes.

>>> para2.add_run(' - ')

>>> para2.add_run('Machine Learning').bold = True
>>> para2.add_run(' Tutorials').italic = True

# both bold and italic
>>> para3 = gkzDoc.add_paragraph()
>>> runner = para3.add_run('3- Big Data Tutorials')
>>> runner.bold = True
>>> runner.italic = True


Result in grokonez.docx:


These are some text attributes:
strike (strikethrough)
double_strike (double strikethrough)
all_caps (capital letters)
rtl (right-to-left)

Add Headings

We use Document’s add_heading(heading, i) method to add a paragraph with heading style with i argument from 0 to 9 for heading levels.

>>> gkzDoc.add_heading('grokonez 1', 1)

>>> gkzDoc.add_heading('grokonez 2', 2)

>>> gkzDoc.add_heading('grokonez 3', 3)

>>> gkzDoc.add_heading('grokonez 4', 4)

>>> gkzDoc.add_heading('grokonez 5', 5)

>>> gkzDoc.add_heading('grokonez 6', 6)

>>> gkzDoc.add_heading('grokonez 7', 7)

>>> gkzDoc.add_heading('grokonez 8', 8)

>>> gkzDoc.add_heading('grokonez 9', 9)


Add Line Breaks, Page Breaks

Instead of starting a new paragraph, we can add a line break using Run object add_break() method on the one that we want to have the break appear after.

>>> import docx
>>> gkzDoc = docx.Document()
>>> para = gkzDoc.add_paragraph('grokonez Tutorials')
>>> para.runs[0].add_break()
>>> para.add_run('Python Basics')

>>> para.text
'grokonez Tutorials\nPython Basics'

Result in grokonez.docx:


We can also add a page break with add_break() method by passing the value docx.enum.text.WD_BREAK.PAGE as an argument to it.

>>> gkzDoc = docx.Document()
>>> para = gkzDoc.add_paragraph('grokonez Tutorials')
>>> para.runs[0].add_break(docx.enum.text.WD_BREAK.PAGE)
>>> gkzDoc.add_paragraph('Python Basics')


Result in grokonez.docx:


Add Pictures

We can use Document object’s add_picture() method to add an image to the end of the document.

>>> gkzDoc = docx.Document()
>>> gkzDoc.add_paragraph('grokonez Tutorials')

>>> gkzDoc.add_picture('gkn-logo-sm.png')


Result in grokonez.docx:


add_picture() method has optional width and height arguments.
If we don’t use them, the width and height will default to the normal size of the image.

>>> gkzDoc.add_picture('gkn-logo-sm.png', width=docx.shared.Inches(1))


Result in grokonez.docx:


>>> gkzDoc.add_picture('gkn-logo-sm.png', width=docx.shared.Inches(1), height=docx.shared.Cm(3))


Result in grokonez.docx:


By grokonez | January 31, 2019.

Related Posts

42 thoughts on “How to read/write Word docx files in Python”

  1. 87069 649134This is an outstanding write-up and I completely comprehend where your coming from within the third section. Perfect read, Ill regularly follow the other reads. 172528

  2. “Open cmd, then run:
    pip install python-docx

    Once the installation is successful…”

    I’m still pretty green, so this is not very clear. This line does not execute in the command prompt, nor does it execute in IDLE. I assume this is Windows and not Mac? What version of Python? If this was more clear, I’d be home free. Perhaps a hyperlink for “cmd” to clear things up?

    I know this sounds like the typical internet tantrum – it isn’t – but a lot of stuff that is understandably taken for granted by seasoned programmers is the difference between going and stopping for us newcomers. And I would hazard to guess that newcomers outnumber experienced folk when searching for answers online. It doesn’t take much to lock us up, but we’re learning.

  3. 674832 693076Oh my goodness! an remarkable write-up dude. Thank you Even so My business is experiencing issue with ur rss . Dont know why Unable to subscribe to it. Can there be anyone obtaining identical rss problem? Anybody who knows kindly respond. Thnkx 659631

  4. 482057 183500Youre so cool! I dont suppose Ive learn anything like this before. So nice to discover any person with some authentic thoughts on this subject. realy thank you for starting this up. this web site is something that is wanted on the internet, someone with a bit bit originality. helpful job for bringing something new to the internet! 362973

  5. 775128 901370Hosting a blog composing facility (in a broad sense) requires unlimited space. So I suggest you to discover such internet hosting (internet space provider) that give flexibility inside your internet space. 429857

  6. 132681 662951In case you tow a definite caravan nor van movie trailer your entire family pretty soon get exposed to the down sides towards preventing greatest securely region. awnings 993020

  7. 98355 751517I think other web site proprietors need to take this site as an model, very clean and excellent user friendly style and design, as nicely as the content. You are an expert in this topic! 741909

  8. 755916 202873The planet are really secret by having temperate garden which are typically beautiful, rrncluding a jungle that is surely undoubtedly profligate featuring so several systems by way of example the game courses, golf approach and in addition private pools. Hotel reviews 359860

  9. 604942 272363Check out our internet site for data about securities based lending and a lot more. There is details about stock and equity loans as effectively as application forms. 125872

  10. 419211 229131To your organization online business owner, releasing an crucial company will be the bread so butter inside of their opportunity, and choosing a great child care company often indicates the specific between a victorious operation this is. how to start a daycare 885808

  11. 957514 497171Hi, Neat post. Theres a problem along with your internet site in internet explorer, could test this IE nonetheless is the marketplace leader and a excellent portion of people will omit your excellent writing because of this difficulty. 275907

  12. 790582 41168Hi this really is somewhat of off topic but I was wondering if blogs use WYSIWYG editors or in case you need to manually code with HTML. Im starting a weblog soon but have no coding knowledge so I wanted to get guidance from someone with experience. Any support would be greatly appreciated! 396006

  13. 147449 622088Thank you for the auspicious writeup. It in truth used to be a amusement account it. Glance complex to a lot more added agreeable from you! Nevertheless, how could we be in contact? 332999

Got Something To Say:

Your email address will not be published. Required fields are marked *