Python Basename

Posted : admin On 1/26/2022
  1. Python Basename Of File
  2. Python Basename Of Directory
  3. Python Basename Vs Dirname
  4. Python Basename
  5. Python Basename File
  6. Python Basename Function
  7. Pathlib

Summary

Python get file extension from filename To get the file extension from the filename string, we will import the os module, and then we can use the method os.path.splitext. It will split the pathname into a pair root and extension. Import arcpy # Create a Describe object # desc = arcpy.Describe ( 'C:/data/Install.log' ) # Print some Describe Object properties for the file # print ( 'Data Type: ' + desc.dataType) print ( 'Path: ' + desc.path) print ( 'Base Name: ' + desc.baseName) print ( 'Extension: ' + desc.extension). Python OS.Path Methods - The os.path is another Python module, which also provides a big range of useful methods to manipulate files and directories. Most of the useful methods are list. Os.path.basename (path) ¶ Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split. Note that the result of this function is different from the Unix basename program; where basename for '/foo/bar/' returns 'bar', the basename function returns an empty string (').

Parses an input into its file name, extension, path, and the last workspace name. The output can be used as an inline variable in the output name of other tools.

Usage

  • This tool is intended for use in ModelBuilder, not in Python scripting.

  • More than one variable name can be added to create unique names for the output, for example, C:TempOut_%Name%_%Workspace Name%.

  • If the input to the Parse Path tool is C:1Tool DataCity Roads.shp, it is parsed into the following outputs:

    ParseResult
    PathC:1Tool Data
    NameCity Roads
    Extensionshp
    Workspace Name1Tool Data

    If the Format Name, Extension and Workspace parameter is checked, the path above will be parsed into the following outputs:

    ParseResult
    PathC:1Tool Data
    NameCity_Roads
    Extensionshp
    Workspace Name_1Tool_Data
  • The same functionality can be accessed in scripting with the Python os module. For example, if you pass an input variable:

    inData = r'C:1Tool DataCity Roads.shp', then

    • To get the name City Roads
    • To get the path C:1Tool Data
    • To get the file extension shp
    • To get the workspace name 1Tool Data

    To parse paths in a similar way to when the Format Name, Extension and Workspace parameter is checked:

    • To get the name City_Roads
    • To get the path C:1Tool Data
    • To get the file extension shp
    • To get the workspace name _1Tool_Data
  • The Path output of Parse Path has a workspace data type and can be connected directly as an input to the Create Feature Class tool's Feature Class Location parameter, which accepts a workspace data type as input. For tools such as Copy that do not have a workspace data type parameter, the Path value can be passed to the tool using inline variable substitution such as %Path%Out_%Name%.%Extension%.

Syntax

ParameterExplanationData Type

The input values to parse.

Any Value
format

Removes all reserved characters. Given the input value of C:1Tool DataInputFC.shp:

  • Path—The output will be the file path, for example, C:1Tool Data.
  • Name—The output will be the file name, for example, InputFC.
  • Extension—The output will be the file extension, for example, shp.
  • Workspace Name—The output will be the workspace name, for example, _1Tool_Data.
Boolean

Derived Output

NameExplanationData Type
path

The workspace of the input.

Workspace
name

The file name, excluding the extension.

String
extension_type

The file extension.

String
workspace_name

The name of the workspace.

String

Environments

This tool does not use any geoprocessing environments.

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Learning Objectives

  • Use earthpy to download files from a URL (internet address).
  • Use glob to get customized lists of files or directories.
  • Use various functions in the os package to manipulate file paths.

For many data projects, it can be helpful to manipulate and parse file and directory paths, especially when you want to programmatically access data files and automate workflows.

To start working with file and directory paths in Python, you first need some files! On this page, you will first learn how to use the earthpy package to download files from a URL (internet address).

Then, you will use the os and glob packages to access files and directories and to create lists of paths that you can parse to extract useful information from the file and directory names.

Download Files Using EarthPy

You can use the function data.get_data() from the earthpy package to download data from online sources such as the Figshare.com data repository.

Begin by importing the necessary packages: os, glob, and earthpy (using the alias et).

To use the function et.data.get_data(), you can provide a parameter value for the url, which you define by providing a text string of the URL (internet address) for the dataset.

By default, et.data.get_data() will download files to earth-analytics/data/earthpy-downloads under your home directory, and it will create the necessary directories if they do not already exist.

With this information, you can set the working directory to your earth-analytics directory and then create a relative path to the downloaded data directory.

Glob in Python

glob is a powerful tool in Python to help with file management and filtering. While os helps manage and create specific paths that are friendly to whatever machine they are used on, glob helps to filter through large datasets and pull out only files that are of interest.

The glob() function uses the rules of Unix shell to help users organize their files. Unix shell follows fairly straight-forward rules to search for items, which you will explore below.

Search for a Specific Folder or File

The glob function can be used to find just one folder or file. This can be done by just giving glob the path of the item you are trying to find.

This is not very useful, as you already have the data path if you are using it to search for something.

Notice, however, that glob returns a list of all items that match your search, not as individual strings.

You can also use the glob() function in combination with the os.path.join() function to create lists of paths that are built programmatically.

* Operator

glob uses different operators to broaden its searching abilities. The primary operator is *.

The * is a sort of wildcard that can be used to search for items that have differences in their names. Whatever text doesn’t match can be replaced by a *.

Python

For example, if you want every file in a directory to be returned to you, you can put a * at the end of a directory path.

glob will return a list of all of the files in that directory.

If you only want .csv files, than *.csv will return every file that ends with .csv.

If you only want .csv files with the number 2 somewhere in the file name, than *2*.csv will return that list.

Note that 2*.csv would only return files that start with the number 2.

The additional asterix in front of 2 (e.g. *2*.csv) allows the 2 to be anywhere in the path.

The * is meant to replace all text that does not matter to your search.

Recursive Searches

If you are trying to operate on files across multiple directories, you can use multiple * in a file path to indicate that you want every file in all folders in a directory.

The first * is to access all directories in the starting directory (e.g. data_folder).

This followed by the second * operator, which loops through all subdirectories to make a list of all their contents.

Sorting glob Lists

Notice that the lists provided by glob are not sorted.

If it’s important for a list to be in a certain order, then always make sure to sort the list returned by glob using the .sort() method for lists.

Python Basename Of File

Note that sorting can sometimes work differently than you may think, so check your sorted list before you move on with your project.

For example, if two items have identical path names, but one ends in 10 and the other ends in 1, sometimes the file ending in 10 will be put above the file ending in 1. Always double check!

Why Sort glob Lists?

The order in which glob returns files from a folder can vary drastically. Depending on the operating system being used, or the way the files are stored, different people may get results from a glob list in different orders.

This can lead to data errors when running projects across computers.

For example, consider how sorting a glob list changes what files you access when getting an index from the list, such as index [4] to access the 5th item in the list.

Using Ranges

In addition to using * to specify which parts of a file name are important to you, you can use [] to specify a range of characters to search for.

For example, you can create a search for all files with 2001 to 2003 in the name by using *200 and adding [1-3]* to it.

This is not just limited to numbers. [d-q] would also filter results for characters between the letters d and q.

Note, however, that this search range is for characters only, not strings.

For example, you can search for numbers 2-7 with [2-7] but you would not be able to search for numbers [2-14] because 14 is more than one character.

Notice below that the search does not work correctly because [2001-2003] are more than one character.

? Operator

The ? operator functions similarly to the * operator but is used for a single character.

If one character in the file name can be variable, but everything else must stay the same, than ? is a good way to just replace that one character.

? is not limited to one use per search and can be used to replace more than one character in a query.

Python Basename Of Directory

Saving a glob Output to a Variable

In order to use the output of glob later in a script, be sure to save it to a variable! It can be done easily by just assigning the glob function output a variable name.

os Advanced Functionality

os is another very powerful tool and has additional functionality that can be useful when dealing with file paths, such as advanced parsing abilities.

For example, os.path.normpath() is a great way to clean up file paths. It takes out any unnecessary characters to make the path more easily read.

It is a good way to make sure your path is properly formatted before using other os functions on the path.

os.path.commonpath() is a very useful when combined with glob. This function will take a list of file paths and find the lowest directory that all the files have in common.

So if there were two files, one stored in home/user/dir/dir2/example.txt and one stored in home/user/dir/example.txt, then os.path.commonpath() would return home/user/dir as it’s the lowest common directory the two folders share.

Python Basename Vs Dirname

os.path.basename() finds the last section of a path and returns that. If a file path is passed in, the file name will be parsed out and returned.

os.path.split() will split a path into two parts:

  1. the last part of the path.
  2. the rest of the path.

It returns the same output as os.path.basename() with the addition of the rest of the path that was left out as another .

You can then use indexing on the result to get each piece of the split path.

Python Basename

String Manipulation

Recall that when you create a file path using os.path.join(), it will properly format a string of the file path, so it can be used on any operating systems.

Note, however, that the file path is still just a string. Thus, you can parse file paths, just like you would strings, and extract information from them that you may need for a project.

Python Basename File

.split() is a built-in Python function that splits a string into a list of strings based on a separator character, and can be used in combination with os.sep to separate directories in file paths into their base parts. os.sepis a data value stored in os that will return the character used to separate pathname components, such as directory or file names. This is for Windows and / for POSIX systems, such as Mac or Linux.

In addition to built-in functions, file paths can be parsed with string[start_index:end_index] like a normal string. This can help get important infromation from a file path, such as a date.

Notice that the range includes the first index value but not the second index value (e.g. 1999 are index values 10 through 13).

Python Basename Function

Think about how you can use the same string parsing syntax to get the site name!

Pathlib

You have now learned the essentials of glob and os to create custom lists of files and directories to manipulate and parse file names and directories, which can come in handy for future projects.

Set Working Directory