In this article, we will see how to save dataframe to amazon s3 bucket.

Saving a DataFrame to a CSV file is a common task in data processing. But there might be some scenarios where we might need to save the data frame directly to the amazon s3 bucket. We can achieve this, by using Python’s pandas and boto3 library.

Before diving into further details, make sure the required libraries such as boto3 and pandas are installed. If not already, please use the following command to install them.

pip install pandas
pip install boto3

You would also need to set up the AWS credentials including the access key ID and secret access key.

Establish a connection with AWS s3

As a first step, we need to establish a connection with the AWS s3. Import the pandas and boto3 libraries and establish a connection with the AWS s3 using boto3.

import pandas as pd
import boto3

s3 = boto3.client('s3',
                  aws_access_key_id='YOUR_ACCESS_KEY',
                  aws_secret_access_key='YOUR_SECRET_ACCESS_KEY')

Replaces the above keys with your actual credentials.

Save dataframe to CSV

Assuming that we have a DataFrame named df and we want to save to Amazon S3:

Convert the dataframe to CSV using the to_csv method.

csv_string = df.to_csv(index=False)

Specify the s3 bucket name and the path where we want to store the CSV file

bucket_name = 'your-bucket-name'
file_path = 'folder/filename.csv'

Finally, upload the CSV to amazon s3

s3.put_object(Body=csv_string, Bucket=bucket_name, Key=file_path)

The put_object method uploads the CSV string as an object to the specified bucket and file path.

Following is a detailed example:

import pandas as pd
import boto3

# Establish connection to Amazon S3
s3 = boto3.client('s3',
                  aws_access_key_id='YOUR_ACCESS_KEY',
                  aws_secret_access_key='YOUR_SECRET_ACCESS_KEY')

# Example DataFrame
df = pd.DataFrame({'Name': ['John', 'Jane', 'Bob'],
                   'Age': [25, 30, 35],
                   'City': ['New York', 'London', 'Paris']})

# Convert DataFrame to CSV string
csv_string = df.to_csv(index=False)

# Specify S3 bucket and file path
bucket_name = 'your-bucket-name'
file_path = 'folder/filename.csv'

# Upload CSV string to Amazon S3
s3.put_object(Body=csv_string, Bucket=bucket_name, Key=file_path)

Categorized in:

Tagged in: