When working with complex web pages, we may need to locate and manipulate specific elements within a particular section or div.
In this article, we will see how to use Beautiful Soup to find children within a specific div.
Install beautiful soup
You can use the following command to install Beautiful Soup.
pip install beautifulsoup4
Following is the example of HTML content to parse
from bs4 import BeautifulSoup
html_content = """
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<div class="content">
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<div class="inner-div">
<p>Inner Paragraph 1</p>
<p>Inner Paragraph 2</p>
</div>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html_content, 'html.parser')
Find the DIV and its children
You can use the find() method along with the class or ID of the div to locate a specific div and its children.
div_element = soup.find('div', class_='content')
You can also change ‘class_’ to ‘id=’ for an ID-based search.
The above example div_element
contains the div with the class “content“.
Extract the child elements
Once we have the div, we can use methods such as find() and find_all() methods of Beautiful Soup to access its children.
paragraphs = div_element.find_all('p') # Find all <p> tags within the div
for paragraph in paragraphs:
print(paragraph.text)
The above code will extract and print the text content of all the <p>
elements within the specified div.