PDF TO WORD PROMPT CONVERSION

Thato Mmusi
6 min readAug 24, 2022

PDF to Microsoft Word conversion

Background

I think it would be prompt to henceforth declare here before we delve into the tutorial; there are many free tools online, that one can use for the tasks that I will illustrate here. One would then ask, why should I waste time showing such when one can simply access such services at a click.

I have had a challenge though recently; I had a four-page confidential document a client wanted to edit in Microsoft Word format but didn’t want to expose it to third party apps such as uploading it online nor buying the likes of adobe tools as he felt it was no different. In terms of security. But he still wanted the documents converted (as per him typed to be precise).

Being a technical person I quickly browsed through the “Cheese home” — https://pypi.org/ an opensource software repository. I quickly came across the pdftodocx library which proved to be a time saver and easy to work with.

In a few lines of simple code and CMD (command prompts), I was able to do the conversion solving the gentleman’s security issue and saving him time.

Below is an illustration of such. Follow the steps to handle your own files and save time and money:

To begin with we will need to import the libraries from Pypi. We are going to need the pdf2docx library:

· Open your command prompt on windows

Command Prompt Window(cmd)

· Make sure you change to the directory you working on (in my case, it will be on the pdftodocx as illustrated below)

Directory Address

· To do that, you simply type the following in the prompt window:

CD C:\Users\Thato Mmusi\Documents\GitHub\Automation\pdftodocx

· And press enter

The prompt window should now look as follows:

Path to working directory

· Next we now install the pdf2doc package

Type in the following line in the command prompt for a new installation

 - pip install pdf2docx

· Press Enter

You should be able to get a picture as illustrated below(wait for it to finish installing):

Successful Installation

Congratulations you have successfully installed pdf2docx library in your system. Next we move to converting of the files, but first we need the sample pdf file and some tools to write the python code.

I have created a simple pdf file for this illustration and you can download it from [here].

Download project files

Alternatively use any pdf file. Place the pdf file in your route directory. See picture below :

Sample PDF in project directory
A sample of the pdf.

With the files ready, we have to prepare the IDE that we will use to write the code to handle the conversion. You are free to use any of your favourite IDE, popular ones are Visual Code, NetBeans etc. I prefer and handle most of my developments using notepad++ for it simplicity. You can download it[here ]and install it.

You can also follow the following video on how to download & install it [Notepad++ setup]. or watch the video below:

Once done you should have a window like the one below;

notepad++ workspace (IDE)

Next, we add the folder to the notepad++ workspace so we can write some code.

· On the left panel, right click

· A file explorer will come up

· Navigate to your root folder

You should end up with a workspace as below:

Workspace which includes work folder (directory)

A workspace on your left that have folder with a sample pdf file in it.

With the above set, we then create a new python file. It is here where we will write the code.

· Go to file

· Click on new (A new file will appear on your right)

Save the file as a python file.

· Go to file

· Click on Save As

· Navigate to the root folder

· Give the file a name (in our case — we will name it as main.py)

Our file explorer on notepad++ will now look as follows:

Work folder including main.py

Next we write the code. To start with, we would have to bank on that package we started by installing. We write code to import the dependencies.

NB: Please note the for this tutorial we will do the conversion in a few lines. But the library is a very extensive one. You can visit its official documentation to learn all about its capabilities.

There are two methods with which conversion can be done but for this tutorial I am going to explain the use of “Converter”:

1. We start by importing the dependencies as follows

from pdf2docx import Converter

Next, we define the input and outputs file. Just as below in your IDE duly add the following lines of code:

pdf_file = ‘A SIMPLE SAMPLE PDF.pdf’
docx_file = ‘A SIMPLE SAMPLE PDF.docx’

The input path should indicate the path to the pdf file whilst output path is where you would like to place the output file. In the case above it is just the actual PDF file name as it is located in the root of our file and it will also be where the file output will be.

Finally, we write the code to convert the file to word(docx). The code is as follows:

cv = Converter(pdf_file)cv.convert(docx_file)cv.close()

Upon completion of the above, your code on your IDE should look as below:

main.py code

With the above now you are ready to handle the converting.

· Here go back to your command prompt

· Type python main.py

· Press Enter

Conversion

Upon completion of execution the command prompt will be as:

Successful generation of pdf

You should within the root of your file find a generated pdf file.

Working directory showing generated word docx

Congratulations you have successfully completed the simple task of converting a pdf to a word file using python.

Thank you , for taking the time to read this article. It belongs to you and others who it might make a difference: MAKE IT GROW,

  • Share it
  • Follow me here on medium
  • Star the project page on Github

Finally follow me on Twitter

follow me on twitter

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Thato Mmusi
Thato Mmusi

Written by Thato Mmusi

Zealous theorist, Web/Software enthusiast, Avid Reader & Writer, Freelance Explorer all year round... Otherwise just a chilli loving beer over soccer fundi...

No responses yet

Write a response