Beautiful Soup is a Python library which provides a way to parse HTML and XML documents. In this article, we will see how to parse the image tag and get the src attribute using Beautiful Soup.

Install the library

If you don’t have the beautiful soup already installed, use the following command to install it using the pip utility.

pip install beautifulsoup4

After installing the library, we can begin parsing the <img> tag by following the below steps:

Import required libraries

We would need to import the BeautifulSoup class from the bs4 library and also we need to import the requests library to make an HTTP request and retrieve the HTML content from a web page.

from bs4 import BeautifulSoup
import requests

Retrieve the HTML content

Now, we need to retrieve the HTML content from the web page where the <img> tag is located. Use the requests library to make an HTTP GET request and obtain the HTML content.

See the following example of how to retrieve the HTML content:

url = "https://example.com"  # Replace with the URL of the web page
response = requests.get(url)
html_content = response.text

In the above example, replace the URL with the actual URL of the web page you want to scrape.

The html_content variable contains the entire HTML content of the website.

Find & extract the src attribute from img tag

Now we have the HTML content, we can create a Beautiful Soup object by passing the HTML content and parsing that we need. Following are the different parsers that Beautiful Soup supports.

  1. html.parser
  2. lxml
  3. html5lib

In the below example, we created a Beautiful Soup object using the html.parser

soup = BeautifulSoup(html_content, "html.parser")

To extract the src attribute from the <img> tags, use the find() or find_all() methods provided by Beautiful Soup. The find() method returns the first occurrence of the specified tag, while find_all() returns a list of all occurrences. Here’s an example of finding all <img> tags in the HTML and retrieving their src attributes:

In the below example, we iterate over each img tag and get the src attribute using the get() method.

img_tags = soup.find_all("img")

for img_tag in img_tags:
    src = img_tag.get("src")
    print(src)

Categorized in:

Tagged in: