In this article, you will discover new features of S3 and learn how to implement some of them using Boto3 in 🐍Python.
In this article, you will discover new features of S3 and learn how to implement some of them using Boto3 in 🐍Python. Additionally, you will deploy a Localstack container to explore these functionalities without the need to use a credit card.
AWS S3, or Simple Storage Service
, is a core service of AWS that serves as a primary storage solution.
While it is commonly known for storing objects, it offers a wide range of functionalities beyond basic storage.
Understanding these features can significantly enhance the utilization of this service.
Some of the main features
of Amazon S3 include:
# ✨ Exploring 8 Key Features of Amazon S3
While S3 is commonly associated with file storage, such as CSV, JSON, or Parquet files, it offers a wide range of other use cases as well. These include hosting static websites, sharing files, storing data for machine learning models, application configuration, and logging purposes.
In the following image, you can see the types of storage that S3 allows, which depend on the frequency of accessing the object. If you need to access objects frequently, it’s advisable to use the standard storage type. On the other hand, if access to objects is less frequent, it’s recommended to use S3 🧊 Glacier services.
Another important feature of S3 is the ability to use tags, this tags are useful for classifying objects and managing costs. They also serve as a strategy for data protection, allowing you to label objects with categories such as confidentiality or sensitive data, and apply policies accordingly.
Additionally, S3 supports using tags to trigger Lambda functions, enabling you to perform various actions based on these labels.
When managing an S3 bucket, it’s common to accumulate a large number of files within the same bucket. To efficiently organize and manage these files, it’s often necessary to generate an inventory. In S3, an inventory is a file that can be scheduled for updates and contains information about the objects stored in the bucket, including their type, size, and other metadata. This allows for better organization and management of objects within the bucket.
## 📙 S3 Lifecycle configuration
Comprising a set of rules, the S3 lifecycle configuration dictates actions applied by AWS S3 to a group of objects. In the following image, you can observe the typical actions that can be configured.
## 📙 S3 Batch operators
It is a feature provided by S3 that allows users to perform operations on objects stored in buckets. With S3 Batch Operations, users can automate tasks such as copying, tagging, deleting, and restoring objects. This feature is particularly useful for organizations managing large amounts of data in S3 and need to perform these operations at scale efficiently.
## 📙 S3 Query Select
S3 Select is a feature provided by S3 that enables users to find some data from objects stored in S3 buckets using simple SQL queries. With S3 Select, users can efficiently query large datasets stored in various formats such as CSV, JSON, and Parquet, without the need to download and process the entire object. This feature is particularly useful for applications that require selective access to data stored in S3, as it minimizes data transfer and processing overhead.
## 📙 S3 Storage Lens
S3 Storage Lens is a feature that offers a centralized dashboard with customizable reports and visualizations, allowing users to monitor key metrics such as storage usage, access patterns, and data transfer costs.
Also it provides detailed metrics, analytics, and recommendations to help organizations optimize their S3 storage resources, improve data security, and reduce costs.
Boto3
is a 🐍 Python library that allows the integration with AWS services, facilitating various tasks such as creation, management, and configuration of these services.
There are two primary implementations within Boto3: * Resource implementation: provides a higher-level, object-oriented interface, abstracting away low-level details and offering simplified interactions with AWS services. * Client implementation: offers a lower-level, service-oriented interface, providing more granular control and flexibility for interacting with AWS services directly.
Localstack is a platform that provides a local version of several cloud services, allowing you to simulate a development environment with AWS services. This allows you to debug and refine your code before deploying it to a production environment. For this reason, Localstack is a valuable tool for emulating essential AWS services such as object storage and message queues, among others.
Also, Localstack serves as an effective tool for learning to implement and deploy services using a Docker container without the need for an AWS account or the use of your credit card. In this tutorial, we create a Localstack container to implement the main functionalities of S3 services.
As mentioned earlier, LocalStudio provides a means to emulate a local environment for Amazon with some of the most popular services. This article will guide you through the process of creating a container using the LocalStudio image. Subsequently, it will demonstrate the utilization of Boto3 to create an S3 bucket and implement key functionalities within these services.
By the end of this tutorial, you’ll have a clearer understanding of how to seamlessly integrate Boto3 with LocalStudio, allowing you to simulate AWS services locally for development and testing purposes.
Before you begin, ensure that you have the following installed:
{% github r0mymendez/LocalStack-boto3 %}
git clone https://github.com/r0mymendez/LocalStack-boto3.git
cd LocalStack-boto3
docker-compose -f docker-compose.yaml up --build
Recreating localstack ... done
Attaching to localstack
localstack | LocalStack supervisor: starting
localstack | LocalStack supervisor: localstack process (PID 16) starting
localstack |
localstack | LocalStack version: 3.0.3.dev
localstack | LocalStack Docker container id: f313c21a96df
localstack | LocalStack build date: 2024-01-19
localstack | LocalStack build git hash: 553dd7e4
🥳Now we have in Localtcak running in localhost!
!pip install boto3
The following code snippet initializes a client for accessing the S3 service using the LocalStack endpoint.
import boto3
import json
import requests
import pandas as pd
from datetime import datetime
import io
import os
= boto3.client(
s3 ='s3',
service_name='test',
aws_access_key_id='test',
aws_secret_access_key='http://localhost:4566',
endpoint_url )
Below is the code snippet to create new buckets using the Boto3 library
# create buckets
= 'news'
bucket_name_news = 'news-config'
bucket_name_config
= bucket_name_new )
s3.create_bucket(Bucket=bucket_name_config) s3.create_bucket(Bucket
After creating a bucket, you can use the following code to list all the buckets available at your endpoint.
# List all buckets
= s3.list_buckets()
response 'Buckets']) pd.json_normalize(response[
Once we extract data from the API to gather information about news topics, the following code generates a JSON file and uploads it to the S3 bucket previously created.
# invoke the config news
= 'https://ok.surf/api/v1/cors/news-section-names'
url = requests.get(url)
response if response.status_code==200:
= response.json()
data # ad json file to s3
print('data', data)
# upload the data to s3
=bucket_name_config, Key='news-section/data_config.json', Body=json.dumps(data)) s3.put_object(Bucket
data ['US', 'World', 'Business', 'Technology', 'Entertainment', 'Sports', 'Science', 'Health']
Now, let’s list all the objects stored in our bucket. Since we might have stored a JSON file in the previous step, we’ll include code to retrieve all objects from the bucket.
def list_objects(bucket_name):
= s3.list_objects(Bucket=bucket_name)
response return pd.json_normalize(response['Contents'])
# list all objects in the bucket
=bucket_name_config) list_objects(bucket_name
In the following code snippet, we will request another method from the API to extract news for each topic. Subsequently, we will create different folders in the bucket to save CSV files containing the news for each topic. This code enables you to save multiple files in the same bucket while organizing them into folders based on the topic and the date of the data request.
# Request the news feed API Method
= 'https://ok.surf/api/v1/news-feed'
url = requests.get(url)
response if response.status_code==200:
= response.json()
data
# Add the json file to s3
= f'dt={datetime.now().strftime("%Y%m%d")}'
folder_dt
for item in data.keys():
= pd.json_normalize(data[item])
tmp 'section'] = item
tmp['download_date'] = datetime.now()
tmp['date'] = pd.to_datetime(tmp['download_date']).dt.date
tmp[= f"s3://{bucket_name_news}/{item}/{folder_dt}/data_{item}_news.csv"
path
# upload multiple files to s3
= io.BytesIO()
bytes_io =False)
tmp.to_csv(bytes_io, index0)
bytes_io.seek(=bucket_name_news, Key=path, Body=bytes_io)
s3.put_object(Bucket
# list all objects in the bucket
=bucket_name_news) list_objects(bucket_name
In this section, we aim to read a file containing news about technology topics from S3. To accomplish this, we first retrieve the name of the file in the bucket. Then, we read this file and print the contents as a pandas dataframe.
# Get the technology file
= list_objects(bucket_name=bucket_name_news)
files = files[files['Key'].str.find('Technology')>=0]['Key'].values[0]
technology_file print('file_name',technology_file)
file_name s3://news/Technology/dt=20240211/data_Technology_news.csv
# get the file from s3 using boto3
= s3.get_object(Bucket=bucket_name_news, Key=technology_file)
obj = pd.read_csv(obj['Body'])
data_tech
data_tech
When creating a resource in the cloud, it is considered a best practice to add tags for organizing resources, controlling costs, or applying security policies based on these labels. The following code demonstrates how to add tags to a bucket using a method from the boto3 library.
s3.put_bucket_tagging(=bucket_name_news,
Bucket={
Tagging'TagSet': [
{'Key': 'Environment',
'Value': 'Test'
},
{'Key': 'Project',
'Value': 'Localstack+Boto3'
}
]
} )
# get the tagging
=bucket_name_news)['TagSet']) pd.json_normalize(s3.get_bucket_tagging(Bucket
Another good practice to apply is enabling versioning for your bucket. Versioning provides a way to recover and keep different versions of the same object. In the following code, we will create a file with the inventory of objects in the bucket and save the file twice.
# allow versioning in the bucket
s3.put_bucket_versioning(=bucket_name_news,
Bucket={
VersioningConfiguration'Status': 'Enabled'
} )
# Add new file to the bucket
# file name
= 'inventory.csv'
file_name
# list all objects in the bucket
= list_objects(bucket_name=bucket_name_news)
files = io.BytesIO()
bytes_io =False)
files.to_csv(bytes_io, index0)
bytes_io.seek(# upload the data to s3
=bucket_name_news, Key=file_name, Body=bytes_io) s3.put_object(Bucket
# add again the same file
=bucket_name_news, Key=file_name, Body=bytes_io) s3.put_object(Bucket
# List all the version of the object
= s3.list_object_versions(Bucket=bucket_name, Prefix=file_name)
versions
'Versions']) pd.json_normalize(versions[
In this section, we need to utilize a different command, which requires prior installation of the awscli-local
tool specifically designed for use with LocalStack.
The awscli-local
tool facilitates developers in seamlessly engaging with the LocalStack instance, because you can automatically redirecting commands to local endpoints instead of real AWS endpoints.
# install awslocal to use the cli to interact with localstack
!pip3.11 install awscli-local
# the following command creates a static website in s3
!awslocal s3api create-bucket --bucket docs-web
# add the website configuration
!awslocal s3 website s3://docs-web/ --index-document index.html --error-document error.html
# syncronize the static site with the s3 bucket
!awslocal s3 sync static-site s3://docs-web
If you are using localstack, you can access the website using the following
Url: http://docs-web.s3-website.localhost.localstack.cloud:4566/
If you want to learn…
Other references:
- Image preview reference: [Imagen de vectorjuice en Freepik]
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Mendez (2024, Feb. 12). Romina Mendez: Learning AWS S3 on Localhost: Best Practices with Boto3 and LocalStack. Retrieved from https://r0mymendez.github.io/posts_en/2024-02-12-learning-aws-s3-on-localhost-best-practices-with-boto3-and-localstack/
BibTeX citation
@misc{mendez2024learning, author = {Mendez, Romina}, title = {Romina Mendez: Learning AWS S3 on Localhost: Best Practices with Boto3 and LocalStack}, url = {https://r0mymendez.github.io/posts_en/2024-02-12-learning-aws-s3-on-localhost-best-practices-with-boto3-and-localstack/}, year = {2024} }