Beautiful soup is yet another excellent library in python which is being widely used to scrap web content from any webpage using python. To scrap the web content using python, beautiful soup is the best tool.

It is used to fetch data from the HTML and XML files. The web page scrapping and data pulling will be done by constructing a parse tree using different strategies and techniques and based on the use case traversing, searching, and modifying the parse tree will also be done.

Extract DIV contents from the HTML using a beautiful soup

As we know in the HTML, “id” is unique to the entire document. A duplicate “id” should not be used in the document. If contains a duplicate “id”, the document still works but we may not get the element that we need.

We can give a unique id to HTML <div> and can fetch the div and its contents using the unique id.

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup('<html> <body><h1> This is sample article </h1> <div id="divElement"> ... </div> </body></html')
soup.find("div", {"id": "divElement"})

soup.find() is used to find the required dom element using the “id”. If you need all the dom elements which are using the “id” as divElement then use the following API to get all the elements.

soup.find_all('div', id="divElement")
  (or)
soup.find_all(id="divElement")

Categorized in:

Tagged in: