File Access 102: os and shutil

Accessing and manipulating paths, directories, and files on disk (including flash drives) has essential but limited support in native Python. (See Examples: File Access – built-in methods.) All the heavy lifting stuff is primarily scattered across two or three modules: os, which automatically includes os.path, and shutil.

Now before we go any further – none of the following address the new pathlib module introduced in Python 3.4.  If you can use 3.4 and above, I strongly recommend replacing everything in the following with the pathlib functions. See the Data-on-Disk Toolbox for a mix of functions.

Gathered here are about 10% of the methods and attributes from those legacy modules – listed alphabetically. These have been merged into groups categorized by their primary area of function: PATH, DIRECTORY, FILE, ENVIRONMENT, and OTHER. These descriptions are extremely condensed and are primarily, but not exclusively, based on the documentation for modules operating under Python 3.5 which you can find at:
https://docs.python.org/3.5/py-modindex.html

We have tried to include only those “popular” functions which will work both in Unix and Windows environments – with a single exception. If you don’t see what you want, go search the other 90% at the link above. These descriptions include only about 50% of the options – we are guessing that covers 95% of programmer needs given the methods’ normal defaults.

A couple of notes: Nearly all functions in the modules return OSError if something screws up. Plan for it. Don’t be confused by the doubled slash marks to escape any escape characters, e.g., for a Windows environment; myPath = ‘D:\\Documents\\Python3\\fileinout’.

PATH Methods

os.access(path, mode) Test acceess to path. Modes listed below but use os.F_OK to test existence of path. True is allowed, False if not.
os.defpath X The default search path. DOES NOT WORK in later versions of Python – at least in Windows maybe Mac
os.path.abspath(path) Return a normalized version of pathname ‘path’
os.path.exists(path) Return True if path refers to an existing path or an open file descriptor.
os.path.isabs(path) Return True if path is an absolute pathname. On Unix, that means it begins with a slash, on Windows that it begins with a (back)slash after chopping off a potential drive letter.
os.path.islink(path) Return True if path refers to a directory entry that is a symbolic link. Always False if symbolic links are not supported by the Python runtime.
os.path.join(path, *paths) Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep)
os.path.lexists(path) Return True if path refers to an existing path. Returns True for broken symbolic links. Equivalent to exists() on platforms lacking os.lstat().
os.path.normcase(path) Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes. Raise a TypeError if the type of path is not str or bytes.
os.path.normpath(path) Normalize a pathname by collapsing redundant separators and up-level references. To normalize case, use normcase().
os.path.realpath(path) Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system).
os.path.relpath(path, start=os.curdir) Return a relative filepath to path either from the current directory or from an optional start directory. This is a path computation: the filesystem is not accessed to confirm the existence or nature of path or start.
os.path.split(path) Split the pathname path into a pair, (head, tail) where tail is the last pathname component and head is everything leading up to that. The tail part will never contain a slash.
os.path.splitext(path) Split the pathname path into a tuple (root, ext) such that root + ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext(‘.cshrc’) returns (‘.cshrc’, ”).
os.readlink(path) Return a string for the path to which ‘path’ points if it is a symbolic link.
os.rmdir(path) Remove (delete) the directory path. Only works when the directory is empty.  To remove whole directory trees, shutil.rmtree() is an alternative.
shutil.which(cmd, mode=os.F_OK | os.X_OK, path=None) Return path to ‘cmd’ if it was called. If none, return None.  Doesn’t work in our testing.

DIRECTORY Methods

os.chdir(path) Change current working directory to path
os.curdir Constant that denotes the current directory. Like ‘.’
os.get_exec_path(env=None) Returns a list of directories searched for an executable. Env is a variable where you can look up path if it is provided.
os.getcwd() Return a string with the current working directory
os.getcwdb() Return a byte string current working directory
os.listdir(path=’.’) Return a list of entries in the path directory. Os.scandir() is a better alternate.
os.mkdir(path, mode=0077) Create directory named ‘path’.  Error if it already exists.
os.pardir Constant that denotes the parent directory. Like’..’
os.path.dirname(path) Return the directory name of pathname path.
os.path.expanduser(path) On Unix and Windows, return the argument with an initial component of ~ or ~user replaced by that user‘s home directory.
os.path.isdir(path) Return True if path is an existing directory.
os.removedirs(path/name) remove directories recursively moving backward as each empty component is successfully eliminated
os.scandir(path=’.’) Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order
os.sep Path component seperator. Like ‘\’ or ‘/’
os.walk(top, topdown=True, onerror=None, followlinks=False) Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
scandir.close()  ***new in Python 3.6 Frees iterator allocated resources.
shutil.move(srcdstcopy_function=copy2) Recursively move a file or directory (src) to another location (dst) and return the destination.

FILE Methods

os.DirEntry Object yielded by scandir() to expose the file path and other file attributes of a directory entry.
os.fsync(fd) Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function. If you’re starting with a buffered Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk.
os.path.basename(path) Return the file name from ‘path’.
os.path.isfile(path) Return True if path is an existing regular file.
os.path.sameopenfile(fp1, fp2) Return True if the file descriptors fp1 and fp2 refer to the same file.
os.remove (path) delete the file named by path. See rmdir() to remove directories.
os.rename(src, dst) Rename a file from src (source) to dst (destination). If no file, then error.
os.replace(src,dst) Rename a file from src (source) to dst (destination). If no file, then error. See os.rename
shutil.copy(srcdst*follow_symlinks=True) Copies the file src to the file or directory dstsrc and dst should be strings. If dst specifies a directory, the file will be copied into dst using the base filename from src. Returns the path to the newly created file.
shutil.copy2(srcdst*follow_symlinks=True) Identical to copy() except that copy2() also attempts to preserve all file metadata.
shutil.move(srcdstcopy_function=copy2) Recursively move a file or directory (src) to another location (dst) and return the destination.

ENVIRONMENT Methods

os.cpu_count Returns number of cpu’s on current system
os.get_terminal_size(fd=STDOUT_FILENO) Returns size of terminal window in a tuple (columns, lines).  Shutil.get_terminal_size() is prefered.
os.getlogin() Returns name of logged in user.  Better off using LOGNAME or USERNAME environment variables
os.getpid() Returns current process id
os.getppid() Returns parent’s process id
os.putenv(keyvalue) Set environment variable key to value

OTHER Methods

os.getrandom(size,flags=0) Get up to ‘size’ random bytes – suitable for cryptography.
os.linesep String that separates lines on current platform.
os.path.expandvars(path) Return the argument with environment variables expanded. Substrings of the form $name or ${name} are replaced by the value of environment variable name. Malformed variable names and references to non-existing variables are left unchanged.
os.pipe() Create a named or ‘FIFO’ pipe. Return a pair of file descriptors (r, w) usable for reading and writing, respectively.
os.startfile(path[, operation]) Start a file with its associated application.
os.strerror(code) Return error message corresponding to code.

You may be wondering about that single solitary exception that was for Windows only. (Yes that was redundant and repetitive plus it said the same thing twice.) It’s a small jewel that could open a lot of interesting possibilities. This is it:
os.startfle (path[,operation]) – its purpose is to start a file with its associated application – kind of like a “shebang” in Linux – Debian if you are a Raspberry Pi guy. Regretfully, so far in our testing we have not been able to get it to work. Stay tuned.

A Few Usage Examples

import os   #automatically gets Platform specific version of os.path
import shutil
print("Examples on a Windows system:")
if not os.name =='nt':
    print("Examples are for a Windows system")
    exit()
aPath = 'D:\\Documents\\Python3\\fileinout'
aFile ='\\Jack.txt'
aWholePath = aPath + aFile
print(os.path.abspath(aWholePath)) #get normalized version of whole path
print(os.path.basename(aWholePath)) #get just file name
print(os.path.dirname(aWholePath))  #get just path to directory
print(os.getcwd()) # get string with path to current working directory
oldwkdir=os.getcwd()
os.chdir(aPath)  # change the current working directory
print('New working directory: ',os.getcwd()) # proof of changed working directory
print('Is '+ aPath + ' a good directory? : ',end=" ")
print(os.path.exists(aPath))
print('aPath is now: ', aPath)
ap=os.path.abspath(aPath)
# for windows only:
sympath = '> This Pc > Data (D:) > Documents > Python3 > fileinout'
print ('This is a symbolic path: ', sympath)
spn = os.path.abspath(sympath)
print('After being normalized it yeilds: ')
print (spn)
print("May be a bug in abspath in 3.5???")
print('Get a tuple of path and file type:')
print(os.path.splitext(aWholePath)) #get path wil file but pull of file type
print('No, we have no idea why you would want that.')
os.chdir(oldwkdir)
print('Changed back to original working directory so we \ncould demo getting a relative path to a file at \n'+aWholePath)
print(os.path.relpath(aWholePath))
print('path to an exe like Python: ')
print(shutil.which("python.exe"))
print('Could not get "which" to work on a Windows system.')
print('This symbol denotes current directory: ', os.curdir)
dirList = os.get_exec_path()
print ("Here is a list of paths searched for an exe file.")
for i in dirList:
    print (i)
print('Entries in the path '+aPath)
print('using os.listdir')
print(os.listdir(aPath))
print('using os.scandir iterator')
for i in os.scandir(aPath):
    print (i)
print("Directory tree: topdown set to False")
for i in os.walk( 'D:\\Temp2', topdown=False):
    print(i)
print("Directory tree: topdown default to True")
for i in os.walk( 'D:\\Temp2',topdown=True):
    print(i)
print("The file Mary.txt is in D:\\Temp2 and we want to move it to D:\\Temp2\\Temp3")
print('Find it with os.listdir(path)')
print(os.listdir(path='D:\\Temp2'))
print('now move it with shutil.move(src,dst) and change its name to Marycopied.txt')
shutil.move('D:\\Temp2\\Mary.txt','D:\\Temp2\\Temp3\\Marycopied.txt')
print("Sure enough, it is both moved and renamed")