Beautiful Soup is a Python library which provides a way to parse HTML and XML documents. In this article, we will see how to parse the script tag using Beautiful Soup.

Install the library

If you don’t have the beautiful soup already installed, use the following command to install it using the pip utility.

pip install beautifulsoup4

After installing the library, we can begin parsing the <script> tag by following the below steps:

Import required libraries

We would need to import the BeautifulSoup class from the bs4 library and also we need to import the requests library to make an HTTP request and retrieve the HTML content from a web page.

from bs4 import BeautifulSoup
import requests

Retrieve the HTML content

Now, we need to retrieve the HTML content from the web page where the <script> tag is located. Use the requests library to make an HTTP GET request and obtain the HTML content.

See the following example of how to retrieve the HTML content:

url = "https://example.com"  # Replace with the URL of the web page
response = requests.get(url)
html_content = response.text

In the above example, replace the URL with the actual URL of the web page you want to scrape.

The html_content variable contains the entire HTML content of the website.

Find & extract the script tag

Now we have the HTML content, we can create a Beautiful Soup object by passing the HTML content and parsing that we need. Following are the different parsers that Beautiful Soup supports.

  1. html.parser
  2. lxml
  3. html5lib

In the below example, we created a Beautiful Soup object using the html.parser

soup = BeautifulSoup(html_content, "html.parser")

We can use the find() or find_all() methods to extract data from the <script> tag. The find() method returns the data first occurrence of the specified tag, whereas the find_all() returns a list of all occurrences.

Following is an example using the finding all <script> tags in the HTML:

script_tags = soup.find_all("script")

In the above example, the script_tags variable contains the list of <script> tags found in the HTML.

Now iterate over the script_tags list and extract the information of each script tag using the text attribute.

for script_tag in script_tags:
    script_data = script_tag.text
    print(script_data)

Categorized in:

Tagged in: