Merge Data From 100s of excel files in Bulk Using Python

If you already work or intend to work with excels, the following tutorial would be very useful to you.

Let’s say you prepare expense report for your company, and you log in daily expenses in the sheet. But what if you want to calculate all the expenses for a particular month, or even for a year? You would need to pull up each file, do copy-paste, and it may take hours to do the Job. Now, you can cut down this time to a minute of two. by using Python Modules, and do it all at once. You can even reuse the code and get this menial job done in seconds.

Even if you don’t know anything about Python, or if you are a beginner, don’t worry. I have broken this down in easy steps so that you can understand. If you are an advanced python user, directly go to the code section (Third Step).

Remember, the first time when you set this up, you may find this intimidating, but keep going. This will help you do this tedious task quite easily for the rest of your life. You can use it without any coding experience. I will also record a video for your convenience.

First Step – Install Python
1. Go to Python’s Official Website, and download Python. I recommend downloading version 3.5 or higher.

2. Once you download it, install python, and environment variable for python. I have covered it in this python pip post.

3. Once you set up the variable. Go to your command prompt (windows+r) key, and install the below modules by typing in these commands
python –m pip install openpyxl   (For .xlsx files)
python –m pip install xlrd  (For .xls files)

4. Since you now have Python installed, open IDLE (It is a python shell, where we would be doing all of our task). IDLE looks like this

Second Step – Go to the folder where you have those files

  1. Open the folder where the excel files are located, and note down the directory.
  2. In your python shell (which looks like the picture above), type the following commands after >>> (Don’t leave any spaces)
    import openpyxl
    import xlrd
    import os
    import globe
  3. Now, go to the directory, you do that in python using the below command
    os.chdir(“directory name”). Say if my directory name is C:\Users\Deepak\ExpenseSheet, the code will be os.chdir(‘C:\\Users\\Deepak\\ExpenseSheet’) {Notice the two ‘\’ instead of one. For starters, windows use one \ as a separator and python uses two, so always put two backslashes)

Third Step – Code. 

Now that you are in the directory, you are all set to merge the excel files. But, you may have either .xls or .xlsx files. So depending on the files that you are looking to merge, use the respective codes.

a) If you want to merge .xlsx files

list = glob.glob(‘*.xlsx’)
for key in list:
wb = openpyxl.load_workbook(key)
sheet = wb.get_sheet_by_name(“Sheet1”)
for j in range (1,200,1):
print (j, sheet.cell(row =j, column =2).value)

Note {Column = 2 means 2nd column, as openpyxl in the code starts counting from 1, so if you store expenses in say column 3, then  change the number to 3}

a) If you want to merge .xls files

list = glob.glob(‘*.xls’)
for key in list:
wb = xlrd.open_workbook(key)
sheet = wb.sheet_by_index(0)
for j in range (1,sheet.nrows,1):
print (j, sheet.cell_value(rowx =j, colx =1))

Note{colx = 1 means 2nd column, as xlrd in the code starts counting from zero, so if you store expenses in say column 3, then  change the number to 2}

When you hit enter you will see all the data along with the names in the python shell, Just copy it, and paste it to a new file – which will have all your data.

I hope this was easy, and as with all my posts, if you find any difficulty in using it, drop me a comment and I would be happy to help you. Python is a great language, and I encourage you to learn it.

Easy way to find repetation of words in Excel

“Most humans use 2000 words in a language”, says an old adage. Although I do not agree with it, I still believe that word analysis is essential for everyone.

Suppose you are learning a language, how great would be it if you knew the keywords that most newspapers use. Or, if you are a technical writer who is tasked to write a highly specialized content that demands a specific vocabulary, in such cases, word analysis can help.

Even if you read newspapers, you are likely to find a few words that you may not pay attention to, but if you did, then you could improve your vocabulary. Or say if you are a content head, and want to check content for plagiarism, you could find the frequency of the words used, and make out if the content is plagiarized.

This article will help you understand how this can be done — in simple steps. All you need is Microsoft Excel installed on your computer.

1. Start your excel program.

2. Copy Any string, or piece of string that you want to perform a repetition check for. (I have posted the image just below step 3)

3. Go to “Developer” option on your Excel and click on Macros (As located in the image below)
4. Once you click there, assign macro a name (like, Find String Macro), and click on ‘create’ which is located on the right-hand side of the dialog box.

5. You will see a dialogue box, which would be blank, paste the code below.

Sub Fable()
Dim BigString As String, I As Long, J As Long, K As Long
         BigString = “”
For Each r In Selection
BigString = BigString & ” ” & r.Value
Next r
BigString = Trim(BigString)
ary = Split(BigString, ” “)
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next
cl.Add a, CStr(a)
Next a

For I = 1 To cl.Count
v = cl(I)
Cells(I, “B”).Value = v
J = 0
For Each a In ary
If a = v Then J = J + 1
Next a
Cells(I, “C”) = J
Next I
End Sub

6. Now, press Alt+F11 or click Excel icon on the top left (below file menu).

7. Select the cells that you want to run text repetition check for, and click on macros again, and select the macro that you just created (with the name you assigned in step 4).

Say you have data in cells A1 to A4, so, you need to first select all the four cells and then click on macro (So that it excel knows where to run the string repetition check).

8. You are done! now you will see two rows, one with the word, and second with the number of repetition of the word in the excel file.

Now, if you are learning a language like Spanish or English,  pick up the words that you don’t know, and put them in your flashcard or some other repetition list.

High Typing Accuracy By Building Auto Correction Tool

If you type for 6 hours a day, with 95% accuracy, you lose around 15 days in a year, correcting your typos. That’s why you should invest your time in reading this — as it certainly can help you weed out 14 out of those 15 days, like it has in my case.

It’s been a while since my last post, which gives tips on typing over 10k words per day. But since then, I realized that there are quite a few things that can help us type much more — easily and efficiently. By doing this, I have been able to type 20,000 words per day.

Accuracy: The Perennial Foe of Typing Quickly
There are quite a few things that make us recede our typing speed, one of the most notorious things is — accuracy. Suppose, you type the word believe incorrectly as “believe”, in such cases, you would have to type in three additional keystrokes (Four times arrows keys to shift and a backspace)— until you reach and correct it. But, you always don’t type five additional keystrokes, so let’s take a modest estimate of three keystrokes.

It means every time you make a mistake, you press three keystrokes to correct it. Now assuming your typing accuracy is 95%, you spend 15% of your time in correcting mistakes — which is huge, by any yardstick. If you type for 6 hours a day, you would end up losing 1 hour every day on correcting a mistake, which translates spending a little over 15 days on correcting your typos. Now, that isn’t all, I have observed that one tends to lose focus and momentum the minute we try to alter our typing position. Therefore, it consumes additional mental resources to correct things.

Many software tools have autocorrected, why should I bother to build my own?
That is a good question, but do you always type in those environments like MS word etc? Well, this isn’t all, try typing in “better ” (With space) on MS word, let’s see if it corrects it, mine wouldn’t! These software care to a wide range of audience, so they lack the sort of personalization that we, individuals, need.

So, how to build your own autocorrect?

Here’s how you can go about doing it — (step by step on Windows Machine)

1. Download the auto hotkey from https://autohotkey.com/

2. Once you are done, install it.

3. Then,  go to any folder (or Desktop), and right click, select new, and then you would see an option called AutoHotkey Script.

4. Now, select that file and click edit, once you do that, it would open in a notepad.

5. Put all the word mistakes that you make, along with the corrected form. Make sure that you use two colons (::) to separate the words, mistakes on the left-hand side, and corrected words on the right-hand side.  Here’s how it should look like. You can add thousands of mistakes as such.

6. Once you are done entering it, you can save and close it (You edit it whenever you want).

7. Now, click on the file again, and it would ask you if you want to run the script, click yes!

Now, you are covered. Be very careful when you write this script and include common words. Notice the second last word in the picture, “very Good”. Now, good is a valid word and I don’t want my spell check to correct it to good everytime I type it, because I may want to thank god! Even if you only cover your vague mistakes, you would be able to achieve over 99% accuracy, which would save you months of time in the long run. Similarly, you can build such checks for special characters of other languages like French or Spanish.