Beautifulsoup

Posted : admin On 1/26/2022

Beautifulsoup: Find attribute contains a number. In this last part of this tutorial, we'll find elements that contain a number in the id attribute value. To do this, we need to use Regex with Beautifulsoup. The following are 30 code examples for showing how to use BeautifulSoup.BeautifulSoup.These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example.

last modified July 27, 2020

Python BeautifulSoup tutorial is an introductory tutorial to BeautifulSoup Python library.The examples find tags, traverse document tree, modify document, and scrape web pages.

BeautifulSoup

BeautifulSoup is a Python library for parsing HTML and XML documents. It is often usedfor web scraping. BeautifulSoup transforms a complex HTML document into a complextree of Python objects, such as tag, navigable string, or comment.

Installing BeautifulSoup

We use the pip3 command to install the necessary modules.

We need to install the lxml module, which is usedby BeautifulSoup.

BeautifulSoup is installed with the above command.

The HTML file

In the examples, we will use the following HTML file:

index.html

Python BeautifulSoup simple example

In the first example, we use BeautifulSoup module to get three tags.

The code example prints HTML code of three tags.

We import the BeautifulSoup class from the bs4module. The BeautifulSoup is the main class for doing work.

We open the index.html file and read its contentswith the read method.

A BeautifulSoup object is created; the HTML data is passed to theconstructor. The second option specifies the parser.

Here we print the HTML code of two tags: h2 and head.

There are multiple li elements; the line prints the first one.

This is the output.

BeautifulSoup tags, name, text

The name attribute of a tag gives its name andthe text attribute its text content.

tags_names.py

The code example prints HTML code, name, and text of the h2 tag.

This is the output.

BeautifulSoup traverse tags

With the recursiveChildGenerator method we traverse the HTML document.

The example goes through the document tree and prints thenames of all HTML tags.

In the HTML document we have these tags.

BeautifulSoup element children

With the children attribute, we can get the childrenof a tag.

get_children.py

The example retrieves children of the html tag, places theminto a Python list and prints them to the console. Since the childrenattribute also returns spaces between the tags, we add a condition to includeonly the tag names.

The html tags has two children: head and body.

BeautifulSoup element descendants

With the descendants attribute we get all descendants (children of all levels)of a tag.

The example retrieves all descendants of the body tag.

These are all the descendants of the body tag.

BeautifulSoup web scraping

Requests is a simple Python HTTP library. It provides methods foraccessing Web resources via HTTP.

scraping.py

The example retrieves the title of a simple web page. It alsoprints its parent.

We get the HTML data of the page.

We retrieve the HTML code of the title, its text, and the HTML codeof its parent.

This is the output.

BeautifulSoup prettify code

With the prettify method, we can make the HTML code look better.

We prettify the HTML code of a simple web page.

This is the output.

Beautifulsoup

BeautifulSoup scraping with built-in web server

We can also serve HTML pages with a simple built-in HTTP server.

We create a public directory and copy the index.htmlthere.

Beautifulsoup

Then we start the Python HTTP server.

scraping2.py

Now we get the document from the locally running server.

BeautifulSoup find elements by Id

With the find method we can find elements by various meansincluding element id.

The code example finds ul tag that has mylist id.The commented line has is an alternative way of doing the same task.

BeautifulSoup find all tags

With the find_all method we can find all elements that meetsome criteria.

find_all.py

The code example finds and prints all li tags.

This is the output.

The find_all method can take a list of elementsto search for.

The example finds all h2 and p elementsand prints their text.

The find_all method can also take a function which determineswhat elements should be returned.

find_by_fun.py

The example prints empty elements.

The only empty element in the document is meta.

It is also possible to find elements by using regular expressions.

The example prints content of elements that contain 'BSD' string.

This is the output.

BeautifulSoup CSS selectors

With the select and select_one methods, we can usesome CSS selectors to find elements.

select_nth_tag.py

This example uses a CSS selector to print the HTML code of the third li element.

This is the third li element.

The # character is used in CSS to select tags by theirid attributes.

The example prints the element that has mylist id.

BeautifulSoup append element

The append method appends a new tag to the HTML document.

append_tag.py

The example appends a new li tag.

First, we create a new tag with the new_tag method.

We get the reference to the ul tag.

Beautifulsoup

We append the newly created tag to the ul tag.

We print the ul tag in a neat format.

BeautifulSoup insert element

The insert method inserts a tag at the specified location.

The example inserts a li tag at the thirdposition into the ul tag.

BeautifulSoup replace text

The replace_with replaces a text of an element.

Beautifulsoup Tutorial

replace_text.py

The example finds a specific element with the find method andreplaces its content with the replace_with method.

BeautifulSoup remove element

The decompose method removes a tag from the tree and destroys it.

The example removes the second p element.

Beautifulsoup 5

In this tutorial, we have worked with the Python BeautifulSoup library.

Beautifulsoup Python Tutorial

Read Python tutorial or listall Python tutorials.