When working with complex web pages, we may need to locate and manipulate specific elements within a particular section or div.

In this article, we will see how to use Beautiful Soup to find children within a specific div.

Install beautiful soup

You can use the following command to install Beautiful Soup.

pip install beautifulsoup4

Following is the example of HTML content to parse

from bs4 import BeautifulSoup

html_content = """
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<div class="content">
    <p>Paragraph 1</p>
    <p>Paragraph 2</p>
    <div class="inner-div">
        <p>Inner Paragraph 1</p>
        <p>Inner Paragraph 2</p>
    </div>
</div>
</body>
</html>
"""

soup = BeautifulSoup(html_content, 'html.parser')

Find the DIV and its children

You can use the find() method along with the class or ID of the div to locate a specific div and its children.

div_element = soup.find('div', class_='content')  

You can also change ‘class_’ to ‘id=’ for an ID-based search.

The above example div_element contains the div with the class “content“.

Extract the child elements

Once we have the div, we can use methods such as find() and find_all() methods of Beautiful Soup to access its children.

paragraphs = div_element.find_all('p')  # Find all <p> tags within the div

for paragraph in paragraphs:
    print(paragraph.text)

The above code will extract and print the text content of all the <p> elements within the specified div.

References:

https://pypi.org/project/beautifulsoup4/

Categorized in:

Tagged in: