BASh has survived for almost 35 years as an integral part of Linux and MacOS because it lets users do complex things with just a few keystrokes, enables automation of repetitive tasks, and allows programs to be run on high-performance computing clusters and cloud environments from virtually anywhere in the world.
In this series of posts, we are going to learn fundamental commands for working with BASh. We will start with very basic operations such as navigating the filesystem and manipulating objects within it in an interactive manner. Gradually we will complexify to writing and running scripts for programmatic handling of analytical tasks, allowing you to harness the real power of the BASh language.
So that we have some files and folders to work with, I have made a ready-to-use directory that we can use to run our example commands on. Go ahead and download it from here.
The file you have just downloaded is a zip archive; move it to your home directory and unzip it. Now, open your terminal and let’s start running some BASh commands.
Recall from Foundations for BASh Part 1 that files and directories within the filesystem are identified by an absolute path that shows how to reach it from the root directory.
If you run the command pwd
(short for “print working directory”) in your terminal having just opened it, the output returned will be the absolute path to your home directory.
Running pwd
will always print the absolute path to your current working directory to standard output (the terminal) so that you can always know where you are in the filesystem.
While pwd
tells you where you are, the ls
command tells you what’s in a directory. If I run the ls
command in my home directory, the output looks like this:
Applications/ Documents/ Library/ Movies/ Pictures/ cmd_line_ops/ src/
Desktop/ Downloads/ Mount/ Music/ Public/ opt/
On its own, ls
lists the contents of your current working directory (the one displayed by pwd
). If you add the name of a directory, ls
will list the contents. For example, running ls cmd_line_ops
would result in the following output:
imdb_top_horror.txt imdb_top_scifi.txt imdb_top_war.txt test_subdir/
Just as you can move around in a file explorer by double-clicking on directories, you can move around in the filesystem using the cd
command which stands for “change directory”.
If you run the command cd cmd_line_ops
and then run pwd
, you should see that that your current working directory is now the cmd_line_ops folder. If you now run the ls
command, the output you receive should be identical to that received above because you are now in the cmd_line_ops directory.
If you want to get back to your home directory, you can use the command cd
with the absolute path to your home directory. There are other ways of doing this, however.
Recall from Foundations for BASh Part 1 that there are two special relative paths. The first “.
” (a single period) means “the current directory” and the second “..
” (two consecutive periods with no spaces) means “the parent of my current working directory”. The term “parent” always refers to the directory one level above that given.
To easily get back to your home directory from the cmd_line_ops directory you could run cd ..
to change directory up one level. If you ran this command again it would take you to the parent of your home directory. This could also be achieved by running either cd ./../../
or cd ../../
as a single command.
If you also remember from Foundations for BASh Part 1 that you can refer to your home directory using the lone tilde symbol “~
” or the environmental variable $HOME
, you will already likely have realised that you can return to the home directory by running either cd ~
or cd $HOME
. The simplest way, however, is just to run the cd
command without any further arguments.
A firm understanding of these concepts and some practice are all that is required to navigate the filesystem and specify file paths using the command line; it doesn’t really get any more complex than that.
Before we move on, let’s make sure that our current working directory is the cmd_line_ops folder. Running the command cd ~/cmd_line_ops
will do the trick.
Along with navigating the filesystem, other routine tasks include copying, moving, renaming, and deleting files or folders.
To copy a file or folder we can use the cp
command which, as you might have guessed, is short for “copy”.
The general format for using the cp
command is shown here:
# copy a file
cp source_file target_file
# copy a file or multiple files to a directory
cp source_file_or_files target_directory
If we wanted to duplicate one of the files in our current working directory, we could do so using either of the following:
# file path relative to the current directory - implicit
cp imdb_top_horror.txt duplicate.txt
# file path relative to the current directory - explicit
cp ./imdb_top_horror.txt ./duplicate.txt
If a file with the same name as the one specified already exists when copying, then the original will be overwritten by default. However, if you have setup your .bashrc file with a “safe version” of the default cp
command as I showed in my how to Setup BASh for Data Analysis post, the shell will prompt you for permission before overwriting any files.
You can copy a file or multiple files either from or to locations other than the current working directory by specifying the relative or absolute file paths.
For example, to copy two files from our current working directory to the test_subdir directory contained within it we could use any of the following:
# copy multiple files to a directory - implicit relative path
cp imdb_top_horror.txt imdb_top_scifi.txt test_subdir/
# copy multiple files to a directory - explicit relative path
cp imdb_top_horror.txt imdb_top_scifi.txt ./test_subdir/
# copy multiple files to a directory - absolute path
cp imdb_top_horror.txt imdb_top_scifi.txt ~/cmd_line_ops/test_subdir/
Similarly, if we made the test_subdir directory our current working directory and wanted to copy the same files from the cmd_line_ops folder we could do so by correctly specifying the absolute or relative paths to the target files and destination folder, as shown here:
# change directory
cd test_subdir/
# copy files to the current working directory
cp ../imdb_top_horror.txt ~/cmd_line_ops/imdb_top_scifi.txt .
# change back to the cmd_line_ops directory
cd ..
Here, I have used a relative path to imdb_top_horror.txt using “..
”, specified “imdb_top_scifi.txt” by stating the absolute path using one of the several possible ways we saw earlier, and then given my current working directory with the “.
” relative path notation.
This example should illustrate that there are many ways of using the “rules” we have seen so far. This flexibility is what can make BASh so powerful.
One last thing for the cp
command: copying an entire directory and its contents can be done using the -r
flag, which means “recursively”, for example:
cp -r test_subdir/ copy_subdir/
This command will copy the test_subdir directory to the current working directory and call the newly created duplicate “copy_subdir”.
While cp
copies objects, the mv
command will move them from one destination to another, just as if you were to drag and drop in a graphical file explorer.
The mv
command works in much the same way as the cp
command when moving objects around the file system; you specify the files you want to move, using either a relative or absolute path, and where you want them to go.
The mv
command can also be used to rename files and directories by providing a different name for the file or directory when specifying the destination. Here we will rename the copy_subdir folder we just created:
mv copy_subdir/ clone_subdir/
Note that, like the cp
command, mv
will overwrite existing files by default if a file or folder with the same name already exists in the specified destination.
Now that we can copy files and move them around, we should know how to delete them.
To delete a file, we can use the rm
command, which stands for “remove”.
As with cp
and mv
, you can give rm
the names or paths to as many files as you’d like. Without changing our current working directory, let’s remove the files inside the newly renamed clone_subdir
folder:
rm clone_subdir/imdb_top_horror.txt clone_subdir/imdb_top_scifi.txt
One thing to always bear in mind when using rm
is that, unlike graphical file explorers, the shell doesn’t have a trash can; when you run the command to remove a file, it is gone for good!
If you try to use rm
on a directory, the shell will print an error message. This behaviour exists primarily to stop you from deleting an entire directory of precious files accidentally.
To remove a directory, we can use the rmdir
command, but it only works when the directory is empty, so you must delete or relocate the files in the directory you want to delete before you can do so.
If you are certain that you want to delete a directory and any contents contained within it, you can use the rm
command with the -r
flag. This will only prompt to check you want to remove files under certain circumstances.
Let’s use the rm
command to clean up the cmd_line_ops directory so it’s ready for the next post in the series:
rm -r clone_subdir
Only use rm -r
in this way if you are absolutely certain; it will permanently remove the folder and files contained within and there’s no coming back once it’s gone. You have been warned.
Aside from learning a couple of commands that can be used to achieve specific tasks, the most important take home from this tutorial should be that a solid understanding of filesystem structure and file paths is one of the most important things when using the command line; help for specific commands is available, so you don’t really have to learn all the flags and what they do, but you do need to be able to provide input and specify the destination for output when using them in a concise and accurate manner.
Next time in this series, we will look at how to manipulate file contents and how to combine tools to create powerful chains of commands to achieve complex and repetitive tasks with a few keystrokes.
See you then.
Thanks for reading. I hope you enjoyed the article and that it helps you to get a job done more quickly or inspires you to further your data science journey. Please do let me know if there’s anything you want me to cover in future posts.
Happy Data Analysis!
Disclaimer: All views expressed on this site are exclusively my own and do not represent the opinions of any entity whatsoever with which I have been, am now or will be affiliated.