In this tutorial, we’re gonna look at way to use python-docx
module to read, write Word docx files in Python program.
Contents
Word documents
Word .docx
file has more structures than plain text. With python-docx
module, we have 3 different data types:
– a Document
object for entire document.
– Paragraph
objects for the paragraphs inside Document
object.
– Each Paragraph
object contains a list of Run
objects.
Read/Write Word docx files in Python
Install python-docx module
Open cmd, then run:
pip install python-docx
Once the installation is successful, we can see docx
folder at Python\Python[version]\Lib\site-packages
.
(In this tutorial, we use python-docx 0.8.10
)
Now we can import the module by running import docx
.
Read docx file
Open file
We call docx.Document()
function and pass the filename to open a docx file under a Document
object.
>>> import docx >>> gkzDoc = docx.Document('grokonez.docx')
Get paragraphs
Document
object has paragraphs
attribute that is a list of Paragraph
objects.
>>> gkzDoc = docx.Document('grokonez.docx') >>> len(gkzDoc.paragraphs) 4 >>> gkzDoc.paragraphs[0].text 'JavaSampleApproach.com was the predecessor website to grokonez.com.' >>> gkzDoc.paragraphs[1].text 'In this brandnew site, we don\u2019t only focus on Java & Javascript Technology but also approach to other technologies & frameworks, other fields of computer science such as Machine Learning and Testing. All of them will come to you in simple, feasible, practical and integrative ways. Then you will feel the connection of everything.' >>> gkzDoc.paragraphs[2].text 'What does grokonez mean?' >>> gkzDoc.paragraphs[3].text 'Well, grokonez is derived from the words grok and konez.'
Get full-text
To get full-text of the document, we will:
– open the Word document
– loop over all Paragraph
objects and then appends their text
>>> import docx >>> gkzDoc = docx.Document('grokonez.docx') >>> fullText = [] >>> for paragraph in doc.paragraphs: ... fullText.append(paragraph.text) ... >>> fullText [ 'JavaSampleApproach.com was the predecessor website to grokonez.com.', 'In this brandnew site, we don\u2019t only focus on Java & Javascript Technology but also approach to other technologies & frameworks, other fields of computer science such as Machine Learning and Testing. All of them will come to you in simple, feasible, practical and integrative ways. Then you will feel the connection of everything.', 'What does grokonez mean?', 'Well, grokonez is derived from the words grok and konez.' ] # add 2 lines between paragraphs and merge them all >>> '\n\n'.join(fullText)
Write docx file
Create Word document
We call docx.Document()
function to get a new, blank Word Document
object.
>>> import docx >>> gkzDoc = docx.Document() >>> gkzDoc
Save Word Document
When everything’s done, we must use save('filename.docx')
Document’s method with filename to save the Document
object to a file.
>>> gkzDoc.save('grokonez.docx')
Add Paragraphs
We use add_paragraph()
Document’s method to add a new paragraph and get a reference to this Paragraph
object.
>>> gkzDoc.add_paragraph('grokonez.com for developers!')
We can add text to the end of an existing Paragraph
object using Paragraph’s add_run(text)
method.
>>> para1 = gkzDoc.add_paragraph('1- Python Tutorials') >>> para2 = gkzDoc.add_paragraph('2- Tensorflow Tutorials') >>> para1.add_run(' - Basics')
Result in grokonez.docx:
To make text styled, we add Run
with text attributes.
>>> para2.add_run(' - ')>>> para2.add_run('Machine Learning').bold = True >>> para2.add_run(' Tutorials').italic = True # both bold and italic >>> para3 = gkzDoc.add_paragraph() >>> runner = para3.add_run('3- Big Data Tutorials') >>> runner.bold = True >>> runner.italic = True >>> gkzDoc.save('grokonez.docx')
Result in grokonez.docx:
These are some text attributes:
– bold
– italic
– underline
– strike
(strikethrough)
– double_strike
(double strikethrough)
– all_caps
(capital letters)
– shadow
– outline
– rtl
(right-to-left)
Add Headings
We use Document’s add_heading(heading, i)
method to add a paragraph with heading style with i
argument from 0
to 9
for heading levels.
>>> gkzDoc.add_heading('grokonez 1', 1)>>> gkzDoc.add_heading('grokonez 2', 2) >>> gkzDoc.add_heading('grokonez 3', 3) >>> gkzDoc.add_heading('grokonez 4', 4) >>> gkzDoc.add_heading('grokonez 5', 5) >>> gkzDoc.add_heading('grokonez 6', 6) >>> gkzDoc.add_heading('grokonez 7', 7) >>> gkzDoc.add_heading('grokonez 8', 8) >>> gkzDoc.add_heading('grokonez 9', 9)
Add Line Breaks, Page Breaks
Instead of starting a new paragraph, we can add a line break using Run object add_break()
method on the one that we want to have the break appear after.
>>> import docx >>> gkzDoc = docx.Document() >>> para = gkzDoc.add_paragraph('grokonez Tutorials') >>> para.runs[0].add_break() >>> para.add_run('Python Basics')>>> gkzDoc.save('grokonez.docx') >>> para.text 'grokonez Tutorials\nPython Basics'
Result in grokonez.docx:
We can also add a page break with add_break()
method by passing the value docx.enum.text.WD_BREAK.PAGE
as an argument to it.
>>> gkzDoc = docx.Document() >>> para = gkzDoc.add_paragraph('grokonez Tutorials') >>> para.runs[0].add_break(docx.enum.text.WD_BREAK.PAGE) >>> gkzDoc.add_paragraph('Python Basics')>>> gkzDoc.save('grokonez.docx')
Result in grokonez.docx:
Add Pictures
We can use Document
object’s add_picture()
method to add an image to the end of the document.
>>> gkzDoc = docx.Document() >>> gkzDoc.add_paragraph('grokonez Tutorials')>>> gkzDoc.add_picture('gkn-logo-sm.png') >>> gkzDoc.save('grokonez.docx')
Result in grokonez.docx:
add_picture()
method has optional width
and height
arguments.
If we don’t use them, the width and height will default to the normal size of the image.
>>> gkzDoc.add_picture('gkn-logo-sm.png', width=docx.shared.Inches(1))>>> gkzDoc.save('grokonez.docx')
Result in grokonez.docx:
>>> gkzDoc.add_picture('gkn-logo-sm.png', width=docx.shared.Inches(1), height=docx.shared.Cm(3))>>> gkzDoc.save('grokonez.docx')
Result in grokonez.docx:
87069 649134This is an outstanding write-up and I completely comprehend where your coming from within the third section. Perfect read, Ill regularly follow the other reads. 172528
“Open cmd, then run:
pip install python-docx
Once the installation is successful…”
I’m still pretty green, so this is not very clear. This line does not execute in the command prompt, nor does it execute in IDLE. I assume this is Windows and not Mac? What version of Python? If this was more clear, I’d be home free. Perhaps a hyperlink for “cmd” to clear things up?
I know this sounds like the typical internet tantrum – it isn’t – but a lot of stuff that is understandably taken for granted by seasoned programmers is the difference between going and stopping for us newcomers. And I would hazard to guess that newcomers outnumber experienced folk when searching for answers online. It doesn’t take much to lock us up, but we’re learning.
617051 370854A thoughtful opinion and suggestions Ill use on my internet page. Youve certainly spent some time on this. Effectively carried out! 194565
872516 381685Appreciate it for helping out, superb info. 846628
674832 693076Oh my goodness! an remarkable write-up dude. Thank you Even so My business is experiencing issue with ur rss . Dont know why Unable to subscribe to it. Can there be anyone obtaining identical rss problem? Anybody who knows kindly respond. Thnkx 659631
575071 338807As soon as I identified this internet site I went on reddit to share some with the really like with them. 171171
492779 178418you may have a great weblog here! would you wish to make some invite posts on my weblog? 525942
671136 299216Howdy very cool website!!Man .. Beautiful .. Wonderful. 976010
482057 183500Youre so cool! I dont suppose Ive learn anything like this before. So nice to discover any person with some authentic thoughts on this subject. realy thank you for starting this up. this web site is something that is wanted on the internet, someone with a bit bit originality. helpful job for bringing something new to the internet! 362973
775128 901370Hosting a blog composing facility (in a broad sense) requires unlimited space. So I suggest you to discover such internet hosting (internet space provider) that give flexibility inside your internet space. 429857
28814 577646Truly clear web internet site , thanks for this post. 919512
132681 662951In case you tow a definite caravan nor van movie trailer your entire family pretty soon get exposed to the down sides towards preventing greatest securely region. awnings 993020
545540 414148Music began playing anytime I opened this internet site, so irritating! 38738
785558 430724I besides believe therefore , perfectly composed post! . 857225
344571 654987Outstanding post, I conceive people need to larn a lot from this web website its actually user genial . 758531
507568 975095you are in point of fact a excellent webmaster. The website loading velocity is wonderful. It seems that youre performing any distinctive trick. In addition, The contents are masterpiece. youve done a great activity on this topic! 31431
98355 751517I think other web site proprietors need to take this site as an model, very clean and excellent user friendly style and design, as nicely as the content. You are an expert in this topic! 741909
914486 929675Which is some inspirational stuff. Never knew that opinions might be this varied. Thank you for all of the enthusiasm to provide such valuable info here. 974373
967051 928592Ive read several great stuff here. Definitely value bookmarking for revisiting. I surprise how a lot effort you put to create 1 of these exceptional informative site. 519428
720918 367190Thank you for your very excellent information and feedback from you. car dealers san jose 575331
611137 222247Oh my goodness! an incredible post dude. Thanks a ton Even so We are experiencing problem with ur rss . Dont know why Cannot enroll in it. Can there be any person obtaining identical rss dilemma? Anyone who knows kindly respond. Thnkx 585432
119923 708824You produced some decent points there. I looked on the net towards the issue and discovered many people goes together with along together with your internet site. 528244
755916 202873The planet are really secret by having temperate garden which are typically beautiful, rrncluding a jungle that is surely undoubtedly profligate featuring so several systems by way of example the game courses, golf approach and in addition private pools. Hotel reviews 359860
373164 482567Wow! Thank you! I always wanted to write on my site something like that. Can I incorporate a portion of your post to my website? 391039
604942 272363Check out our internet site for data about securities based lending and a lot more. There is details about stock and equity loans as effectively as application forms. 125872
500757 105244This post is dedicated to all those that know what is billiard table; to all those that do not know what is pool table; to all people who want to know what is billiards; 765644
551327 107727The Case For HIIT Cardio – Why You need to Concider it By the way you may want to have a look at this cool web site I found 700098
94285 492815you use a amazing weblog here! do you wish to have the invite posts in my small weblog? 499944
419211 229131To your organization online business owner, releasing an crucial company will be the bread so butter inside of their opportunity, and choosing a great child care company often indicates the specific between a victorious operation this is. how to start a daycare 885808
138193 565610Hi there! Nice post! Please do inform us when we could see a follow up! 733536
273089 897409Thanks – Enjoyed this post, can you make it so I receive an email when you make a fresh post? From Online Shopping Greek 522443
94664 459877You should participate in a contest for among the very best blogs on the web. I will recommend this website! 331070
524017 896775Your writing style has been amazed me. Thanks, quite great post. 447722
167344 600632hi!,I love your writing so much! 389728
466968 302586Some times its a discomfort inside the ass to read what weblog owners wrote but this internet website is rattling user friendly ! . 700544
957514 497171Hi, Neat post. Theres a problem along with your internet site in internet explorer, could test this IE nonetheless is the marketplace leader and a excellent portion of people will omit your excellent writing because of this difficulty. 275907
866378 816773Dead written topic matter, Truly enjoyed reading by way of . 382533
790582 41168Hi this really is somewhat of off topic but I was wondering if blogs use WYSIWYG editors or in case you need to manually code with HTML. Im starting a weblog soon but have no coding knowledge so I wanted to get guidance from someone with experience. Any support would be greatly appreciated! 396006
26523 510238Outstanding post, I conceive internet site owners should learn a lot from this weblog its real user pleasant. 521719
147449 622088Thank you for the auspicious writeup. It in truth used to be a amusement account it. Glance complex to a lot more added agreeable from you! Nevertheless, how could we be in contact? 332999
Nice bro keep it up! Also can you make articles about Excel and PowePoint?