Beautiful Soup is a Python library which provides a way to parse HTML and XML documents. In this article, we will see how to parse the image tag and get the src attribute using Beautiful Soup.
Install the library
If you don’t have the beautiful soup already installed, use the following command to install it using the pip utility.
pip install beautifulsoup4
After installing the library, we can begin parsing the <img> tag by following the below steps:
Import required libraries
We would need to import the BeautifulSoup class from the bs4 library and also we need to import the requests library to make an HTTP request and retrieve the HTML content from a web page.
from bs4 import BeautifulSoup import requests
Retrieve the HTML content
Now, we need to retrieve the HTML content from the web page where the <img> tag is located. Use the requests library to make an HTTP GET request and obtain the HTML content.
See the following example of how to retrieve the HTML content:
url = "https://example.com" # Replace with the URL of the web page response = requests.get(url) html_content = response.text
In the above example, replace the URL with the actual URL of the web page you want to scrape.
The html_content variable contains the entire HTML content of the website.
Find & extract the src attribute from img tag
Now we have the HTML content, we can create a Beautiful Soup object by passing the HTML content and parsing that we need. Following are the different parsers that Beautiful Soup supports.
In the below example, we created a Beautiful Soup object using the
soup = BeautifulSoup(html_content, "html.parser")
To extract the
src attribute from the
<img> tags, use the
find_all() methods provided by Beautiful Soup. The
find() method returns the first occurrence of the specified tag, while
find_all() returns a list of all occurrences. Here’s an example of finding all
<img> tags in the HTML and retrieving their
In the below example, we iterate over each img tag and get the src attribute using the get() method.
img_tags = soup.find_all("img") for img_tag in img_tags: src = img_tag.get("src") print(src)