In this article, we’ll look at how to use the pandas library to read an XLSX file with Python.
Before going further and see how to do this, we need to understand what is a data frame first. We are going to use pandas DataFrames to do this task.
What is Data frame?
Similar to a spreadsheet, a data structure called a dataframe arranges data into a 2-dimensional table of rows and columns. Because they are a flexible and user-friendly method of storing and working with data, DataFrames are one of the most popular data structures used in contemporary data analytics.
Example of reading excel file using pandas
We are going to use the pandas library to do this task. In pandas library we have an API called read_excel which is particularly used to read excel files.
read_excel returns the Data frame or Dict of Date of frames based on the usage. These data frames has rows and columns which is equivalent to the excel sheet.
Reading an excel sheet from an excel file
Let’s say if you have ‘n’ sheets in your excel file and if you want to read a particular sheet which has a name as ‘excel_sheet’ then you can do it as shown below.
from pandas import read_excel
from IPython.display import display
# Change it to the name of your sheet; your sheet name can be found at the leftmost corner of your Excel document.
sheet_n ='excel_sheet' #sheet name
file_n = 'excel_file.xlsx' #Excel file name
df = read_excel(file_n, sheet_name = sheet_n)
# To display the data in the form of table, we will use the Display module from Ipython
display(df)
With the above implementation you can read only a specific sheet in the given excel file.
After calling the read_excel( ) the entire sheet data is returned as Data Frame and with the display module, we are displaying that info.
Read all excel sheets from excel file
Let’s say if you have an excel file which has ‘n’ sheets and if you want to read all the ‘n’ sheets from the given excel file, then use the following code to do that.
from pandas import read_excel
from IPython.display import display
# Change it to the name of your sheet; your sheet name can be found at the leftmost corner of your Excel document.
sheet_n ='excel_sheet' #sheet name
file_n = 'excel_file.xlsx' #Excel file name
df = read_excel(file_n, sheet_name=None)
# To display the data in the form of table, we will use the Display module from Ipython
display(df)
If you are using any of the older version of pandas i.e. version which is less than 0.20, the you have use sheetname=None
to read all the sheets. In the latest versions you can use the above method.