Pandas Excel

Posted : admin On 1/26/2022

In this post, we will see examples of saving a Pandas dataframe as Excel file. Pandas has to_excel() function to write a dataframe into Excel file.

  1. Pandas Excel Xlsx File Not Supported
  2. Pandas Excel Multiple Sheets
  3. Pandas Excel Xlsx File Not Supported
  4. Pandas Excel Sheet Name
  • Import Multiple Excel Sheet into Pandas DataFrame. Multiple Excel Sheets can be read into Pandas DataFrame by passing list in the sheetname parameter e.g. 0, “Salary Info” will load the first sheet and sheet named “Salary Info” as a dictionary of DataFrame.
  • Pandas makes it very easy to output a DataFrame to Excel. However, there are limited options for customizing the output and using Excel’s features to make your output as useful as it could be. Fortunately, it is easy to use the excellent XlsxWriter module to customize and enhance the Excel workbooks created by Panda’s toexcel function.
  • Create an Excel Sheet import pandas as pd writer = pd.ExcelWriter('demo.xlsx', engine='xlsxwriter') writer.save. This code will create a new demo.xlsx file with a default sheet named Sheet1.
  • In short, Excel is great for certain tasks but becomes unwieldy and inefficient as the tasks become more complicated. The Python library pandas is a great alternative to Excel, providing much of the same functionality and more. Pandas is great for data manipulation, cleaning, analysis, and exploration.

Working with Python Pandas and XlsxWriter. Python Pandas is a Python data analysis library. It can read, filter and re-arrange small and large data sets and output them in a range of formats including Excel. Pandas writes Excel files using the Xlwt module for xls files.

Let us load Pandas.

We will create two lists and us these to create a dataframe as before.

We can create a Pandas dataframe using the two lists to make a dictionary with DataFrame() function. Our toy dataframe contains two columns.

Now we have the dataframe ready and we can use Pandas’ to_excel() function to write the dataframe to excel file. In the example, below we specify the Excel file name as argument to to_excel() function.

Pandas to_excel() function has number of useful arguments to customize the excel file. For example, we can save the dataframe as excel file without index using “index=False” as additional argument.

One of the common uses in excel file is naming the excel sheet. We can name the sheet using “sheet_name” argument as shown below.

This post is part of the series on Byte Size Pandas: Pandas 101, a tutorial covering tips and tricks on using Pandas for data munging and analysis.

Related posts:

Learn how to import an Excel file (having .xlsx extension) using python pandas.

Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas data type for storing tabular 2D data. Reading data from excel files or CSV files, and writing data to Excel files or CSV filesusing Python Pandas is a necessary skill for any analyst or data scientist.

Table of Contents

  1. read_excel Important Parameters Examples
    1. Import Specific Excel Sheet using sheet name

1. Pandas read_excel() Syntax

The syntax of DataFrame to_excel() function and some of the important parameters are:

For complete list of read_excel parameters refer to official documentation.

2. Import Excel file using Python Pandas

Let’s review a full example:

  • Create a DataFrame from scratch and save it as Excel
  • Import (or load) the DataFrame from above saved Excel file

We have the following data about students:

Read Excel file into Pandas DataFrame (Explained)

Now, let’s see the steps to import the Excel file into a DataFrame.

Step 1: Enter the path and filename where the Excel file is stored. The could be a local system file path or URL path.

For example,

pd.read_excel(r‘D:PythonTutorialExample1.csv‘)

Notice that path is highlighted with 3 different colors:

  • The blue part represents the path where the Excel file is saved.
  • The green partis the name of the file you want to import.
  • The purple part represents the file type or Excel file extension. Use ‘.xlsx’ in case of an Excel file.

Modify the Python above code to reflect the path where the Excel file is stored on your computer.

Note: How to Find Current Default Working Directory. You can save or read an Excel file without explicitly providing a file path. In that case, the file should be automatically stored at the current working directory. To find current directory path use below code:

Find out how to read multiple files in a folder(directory) here.

Step 2: Enter the following code and make the necessary changes to your path to read the Excel file.

Snapshot of Data Representation in Excel files

On the left side of the image Excel file is opened in Microsoft Excel. On the right side same Excel file is opened in Juptyter Notebook using pandas read_excel.

3. Pandasread_excel Important Parameters Examples

3.1 Import Specific Excel Sheet using Python Pandas

There may be Multiple Sheets in an Excel file. Pandas provide various methods to import one or multiple excel sheets in sheet_name parameter.

  • Default is 0: Read the 1st sheet in Excel as a DataFrame
  • Use 1: To read 2nd sheet as a DataFrame
  • Use Specific Sheet Name: 'Sheet1' to load sheet with name “Sheet1”
  • Load Multiple Sheets using dict:[0, 2, 'MySheet'] will load first, third and sheet named “MySheet” as a dictionary of DataFrame
  • None: Load All sheets

1. Import Excel Sheet using Integer

By default sheet_name = 0 imports the 1st sheet in Excel as a DataFrame. To import Second Excel Sheet i.e. “Salary Info” in our case as a Pandas DataFrame use sheet_name = 1

Pandas Excel Xlsx File Not Supported

2. Import Specific Excel Sheet using Sheet Name

Pandas Excel Multiple Sheets

To import Specific Excel Sheet i.e. “Personal Info” as a Pandas DataFrame using sheet_name = 'Personal Info'

3. Import Multiple Excel Sheet into Pandas DataFrame

Multiple Excel Sheets can be read into Pandas DataFrame by passing list in the sheet_name parameter e.g. [0, “Salary Info”] will load the first sheet and sheet named “Salary Info” as a dictionary of DataFrame.

Now to store different sheets into different DataFrames use Dictionary Key Value.

Pandas

3.2 Import only n Rows of Excel Sheet using Pandas

Sometimes Excel file is quite big or our system has memory constraints. In this case, we can import only the top n rows of Excel Sheet using Pandas read_excel nrows parameter. For example, to import only top 2 rows use nrows=2

3.3 Import specific columns of Excel Sheet

There may be hundreds of columns in excel sheet, but while importing we need only few columns. In this case, we can pass usecols parameter. Different ways to use usecols parameter are below:

  • Default is None, parse all columns.
  • If str, then provide a comma-separated list of Excel columns (“A, B, D, E”) or range of Excel columns (e.g. “A:F” or “A, B,E:F”). Ranges are inclusive of both sides.
  • If list of int, indicates list of column numbers to be parsed e.g. [0,2,5].
  • If list of string, provide list of column names to be parsed e.g. [“A, B, D, E”].

4. Common Errors and Troubleshooting

Pandas excelfile

Pandas Excel Xlsx File Not Supported

Listing down the common error you can face while loading data from CSV files into Pandas dataframe will be:

  1. FileNotFoundError: File b'filename.csv' does not exist
    • Reason: File Not Found error typically occurs when there is an issue with the file path (or directory) or file name.
    • Fix: Check file path, file name, and file extension.
  2. SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated UXXXXXXXX escape
    • Reason: In U starts an eight-character Unicode escape, such as U00014321. In the code, the escape is followed by the character ‘s’, which is invalid.
    • Fix:
      • Use the prefix string with r (to produce a raw string) pd.read_excel(r'D:PythonTutorialfilename.xlsx') or,
      • You either need to duplicate all backslashes pd.read_excel(r'D:PythonTutorialfilename.xlsx')
  3. ImportError: Install xlrd >= 1.0.0 for Excel support.
    • Reason:xlrd package is not available in the python environment
    • Fix: Install xlrd package if you get the above error pip install xlrd

Conclusion

We have covered the steps needed to read an Excel file in python using pandas read_excel function.

Pandas Excel Sheet Name

Go to read data from csv files, and write data to CSV filesusing Python.