Beautifulsoup
Posted : admin On 1/26/2022Beautifulsoup: Find attribute contains a number. In this last part of this tutorial, we'll find elements that contain a number in the id attribute value. To do this, we need to use Regex with Beautifulsoup. The following are 30 code examples for showing how to use BeautifulSoup.BeautifulSoup.These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example.
last modified July 27, 2020
Python BeautifulSoup tutorial is an introductory tutorial to BeautifulSoup Python library.The examples find tags, traverse document tree, modify document, and scrape web pages.
BeautifulSoup
BeautifulSoup is a Python library for parsing HTML and XML documents. It is often usedfor web scraping. BeautifulSoup transforms a complex HTML document into a complextree of Python objects, such as tag, navigable string, or comment.
Installing BeautifulSoup
We use the pip3
command to install the necessary modules.
We need to install the lxml
module, which is usedby BeautifulSoup.
BeautifulSoup is installed with the above command.
The HTML file
In the examples, we will use the following HTML file:
Python BeautifulSoup simple example
In the first example, we use BeautifulSoup module to get three tags.
The code example prints HTML code of three tags.
We import the BeautifulSoup
class from the bs4
module. The BeautifulSoup
is the main class for doing work.
We open the index.html
file and read its contentswith the read
method.
A BeautifulSoup
object is created; the HTML data is passed to theconstructor. The second option specifies the parser.
Here we print the HTML code of two tags: h2
and head
.
There are multiple li
elements; the line prints the first one.
This is the output.
BeautifulSoup tags, name, text
The name
attribute of a tag gives its name andthe text
attribute its text content.
The code example prints HTML code, name, and text of the h2
tag.
This is the output.
BeautifulSoup traverse tags
With the recursiveChildGenerator
method we traverse the HTML document.
The example goes through the document tree and prints thenames of all HTML tags.
In the HTML document we have these tags.
BeautifulSoup element children
With the children
attribute, we can get the childrenof a tag.
The example retrieves children of the html
tag, places theminto a Python list and prints them to the console. Since the children
attribute also returns spaces between the tags, we add a condition to includeonly the tag names.
The html
tags has two children: head
and body
.
BeautifulSoup element descendants
With the descendants
attribute we get all descendants (children of all levels)of a tag.
The example retrieves all descendants of the body
tag.
These are all the descendants of the body
tag.
BeautifulSoup web scraping
Requests is a simple Python HTTP library. It provides methods foraccessing Web resources via HTTP.
The example retrieves the title of a simple web page. It alsoprints its parent.
We get the HTML data of the page.
We retrieve the HTML code of the title, its text, and the HTML codeof its parent.
This is the output.
BeautifulSoup prettify code
With the prettify
method, we can make the HTML code look better.
We prettify the HTML code of a simple web page.
This is the output.

BeautifulSoup scraping with built-in web server
We can also serve HTML pages with a simple built-in HTTP server.
We create a public
directory and copy the index.html
there.

Then we start the Python HTTP server.
Now we get the document from the locally running server.
BeautifulSoup find elements by Id
With the find
method we can find elements by various meansincluding element id.
The code example finds ul
tag that has mylist
id.The commented line has is an alternative way of doing the same task.
BeautifulSoup find all tags
With the find_all
method we can find all elements that meetsome criteria.
The code example finds and prints all li
tags.
This is the output.
The find_all
method can take a list of elementsto search for.
The example finds all h2
and p
elementsand prints their text.
The find_all
method can also take a function which determineswhat elements should be returned.
The example prints empty elements.
The only empty element in the document is meta
.
It is also possible to find elements by using regular expressions.
The example prints content of elements that contain 'BSD' string.
This is the output.
BeautifulSoup CSS selectors
With the select
and select_one
methods, we can usesome CSS selectors to find elements.
This example uses a CSS selector to print the HTML code of the third li
element.
This is the third li
element.
The # character is used in CSS to select tags by theirid attributes.
The example prints the element that has mylist
id.
BeautifulSoup append element
The append
method appends a new tag to the HTML document.
The example appends a new li
tag.
First, we create a new tag with the new_tag
method.
We get the reference to the ul
tag.

We append the newly created tag to the ul
tag.
We print the ul
tag in a neat format.
BeautifulSoup insert element
The insert
method inserts a tag at the specified location.
The example inserts a li
tag at the thirdposition into the ul
tag.
BeautifulSoup replace text
The replace_with
replaces a text of an element.
Beautifulsoup Tutorial
The example finds a specific element with the find
method andreplaces its content with the replace_with
method.
BeautifulSoup remove element
The decompose
method removes a tag from the tree and destroys it.
The example removes the second p
element.
Beautifulsoup 5
In this tutorial, we have worked with the Python BeautifulSoup library.
Beautifulsoup Python Tutorial
Read Python tutorial or listall Python tutorials.