Foundations for BASh Part 2

Tutorial - Shell

September 14, 2022 · 14 mins read

Shell Features - Greatest Hits

A system shell like BASh does much more than simply run commands; it has a powerful series of features, such as wildcards for matching filenames, a history to recall previous commands, pipes for redirecting the output of one command to become the input of another, variables for storing values, and many more.

In this post we are going to continue laying a foundation for learning BASh command line operations and shell scripting by looking at some of the more powerful shell features you will frequently encounter. Investing the time to learn some of these will enable you to become a more proficient BASh user, allowing you to write better and more efficient shell scripts, ultimately increasing effectiveness and productivity.


A level of effectiveness we can but hope to achieve…

Wildcards

Wildcards are a sub-set of regular expression or regex; a sequence of characters that specifies a search pattern in text. Regex is a topic that I plan to cover in detail in future as it is a fundamental concept in computer science and particularly handy (and very frequently used) in programming for data science in common scripting languages other than BASh.

Wildcards essentially act as shorthand enabling you to specify sets of files with similar names or shared text patterns in their names. For example, e* means all files whose names begin with lower‐case “e”.

Wildcards are expanded by the shell into the actual set of filenames they match prior to being passed to the command with which they are used.

Imagine you have a directory filled with a series of three example text files and that this is your current working directory, if you run:

ls e*

The shell first expands e* into the filenames that begin with “e” in your current directory, as if you had typed:

ls example_1.txt example_2.txt example_3.txt

The ls command doesn’t know you used a wildcard; it is only passed the final list of filenames after the shell expands the wildcard.

The fact that wildcards are handled completely by the shell before the associated program even runs means that every BASh command works with wildcards.

Common Wildcards:

*    matches zero or more consecutive characters
?    matches any single character
[set]    matches any single character in a given set or range e.g., [aeiou] for all lower-case vowels or [A-Z] for all capital letters
[^set]    matches any single character not in the given set, such as [^0-9] meaning any non-digit
[!set]    means the same thing as [^set]

There are two characters never matched by wildcards: a leading period (.), and the directory slash (/).

Filenames with a leading period, called dot files, will not be displayed by some programs unless you use a flag to explicitly ask to see them. As you might expect, dot files are also sometimes called “hidden files” for this reason.

If you want to match for hidden files or match all files in a directory (as the uninitiated might want to do with the directory slash) you must use these characters explicitly in context, e.g., .hid* to match .hidden_file, or /User/Lewis/tutorial/*txt to match all filenames ending in “txt” in the /tutorial directory.

Brace Expansion

Like wildcards, expressions with curly braces are also expanded by the shell to become multiple arguments to a command. For example, the comma-separated expression {I,OO,XXX} could be used with the echo command like this:

echo typ{I,OO,XXX}ng

In this instance, the shell will first expand the curly braces to I, then OO and then XXX. Each is passed in turn, along with the characters either side of the curly braces, to the echo command, resulting in the following output:

typIng typOOng typXXXng

Brace expansion will work with any string of characters or series thereof, unlike wildcards which are only expanded if they match existing filenames.

Shell Variables

A variable is a value that can change. In computing, variables are used to store information to be referenced and manipulated in a program. When working with BASh, you can define variables and their values by assigning them as follows:

VARIABLE=value

All values held in variables are stored as strings. If values assigned to variables are numeric, the shell will treat them as numbers when appropriate.

To refer to the value of a variable, simply place a dollar sign in front of the variable name:

echo $VARIABLE

Running this line will return the value stored by VARIABLE which, in this case is the string “value”.

When you assign a value to a variable or refer to that value, it is a good idea to surround the value with double quotes to prevent certain run‐time errors. An undefined variable, or a variable with spaces in its value, will evaluate to something unexpected if not surrounded by quotes and will likely cause your script to malfunction.

# value containing whitespace assigned to variable
FILENAME="my document"

# attempt to list the variable value
ls -l $FILENAME

The output shown below is printed in this case because ls saw two arguments “my” and “document” because value of the FILENAME variable contains whitespace that is seen by the shell when it is evaluated.

ls: document: No such file or directory
ls: my: No such file or directory

If we instead run ls -l "$FILENAME" then the value of the variable is interpreted as a single entity contained within the double quotes, meaning that the shell sees “my document” with the whitespace being part of the character sting.

Another similar problem occurs if we attempt to evaluate a variable immediately adjacent to another character or character string:

# define variable to use in this example
ACTIVITY="coding tutorial"

# attempt to use the variable with adjacent character(s)
echo "This is one $ACTIVITY of many $ACTIVITYs I have written"

In this example I am attempting to use the defined variable to say that “This is one coding tutorial of many coding tutorials I have written” but instead I receive the following:

This is one coding tutorial of many  I have written

The absence of the second variable value occurs because the shell thinks that we are referring to a unique variable $ACTIVITYs instead of $ACTIVITY and an adjacent “s”. To remedy this issue, we can use the brace expansion we saw earlier to leverage the hierarchy of expansion/evaluation of values by the shell.

echo "This is one $ACTIVITY of many ${ACTIVITY}s I have written"

In this case, the contents of the curly braces are expanded, followed by evaluation of variables, then the phrase within the double quotes is passed to the echo command and the resulting output printed:

This is one coding tutorial of many coding tutorials I have written

These concepts are important to get a handle on if you want to avoid needless bugs and errors in your scripts and become a proficient BASh user.

Environmental Variables

There are some variables that are standard and commonly defined by default by your shell upon login, for example:

HOME    Your home directory (e.g., /Users/Lewis)
PATH    Your shell search path (more on this below)
PWD    Your shell’s current directory
SHELL    The path to your shell (e.g., /bin/bash)
USER    Your login name

The default scope of a variable (i.e., which programs know about it) is the shell in which it was defined. To make a variable and its value available to other programs your shell invokes you can use the export command. For example:

export MYVAR="567"

The variable MYVAR is now called an environment variable and it becomes available to other programs in your shell’s environment i.e., all programs run by that same shell, including sub-shells and shell scripts.

In order to list all environment variables available in a shell, you can run the printenv command. You can also use this to display the value of a specific environmental variable:

printenv HOME

In my case this results in the output:

/Users/Lewis

One useful concept that will become useful later when BASh scripting is the option to provide the value of an environment variable to a specific program just once, by prepending the variable assignment to the command line. For example, we have just seen the value of the HOME environmental variable on my machine. If I was to run:

HOME="/User/Batman" printenv HOME

I would receive the output:

/User/Batman

However, if I was then to run the printenv HOME command again, I would receive the original output.

Search Path

When you run a command, it might invoke a program (like the printenv command) that is part of the operating system, or it might be a built-in command which is a feature of the shell itself (such as the echo command). You can tell the difference using type command. For example, running type printenv.

Programs are located all over the filesystem, in directories like “/usr/bin” and “/usr/local/bin”. When you run a program via a shell command it is the value of the critical PATH variable that tells the shell where to look. Recall from earlier that PATH is one of the default environmental variables defined by your shell at login.

The value of the PATH variable is a sequence of directories separated by colons. Try running echo $PATH to see how your PATH variable looks, it should be something like this:

/usr/local/bin:/bin:/usr/bin

When you type any command such as printenv, the shell locates the program by searching through the directories specified by the value of PATH and looks for the program in each of these directories. If it finds the program, the shell executes the command, otherwise, it reports a failure, such as “command not found”.

You can temporarily add directories to the shell search path by modifying the PATH variable. For example, if we wanted to append “/usr/sbin” to the example search path above, we could do so by running:

PATH=$PATH:/usr/sbin

Then to check this worked, we could again run echo $PATH, receiving the new value as output:

/usr/local/bin:/bin:/usr/bin:/usr/sbin

This change affects only the current shell. To make such a change permanent, you must modify the PATH variable in the BASh configuration profiles, as described in my setup BASh for data analysis post.

Aliases

The built-in command alias defines a convenient shorthand for a longer command, to save typing. For example, alias ls='ls -hlGF' redefines the ls command to run with the -hlGF flags without having to type these each time. Check man ls to see what I have defined here.

To make these custom versions of commands available whenever you log in, you can define permanent aliases in your ~/.bashrc or ~/.bash_aliases file, as described in my setup BASh for data analysis post.

Simply type the alias command on the command line to list all the aliases you currently have defined.

Input/Output Redirection

As mentioned in the foundations for BASh part 1 post, the shell can redirect standard input, standard output, and standard error to and from files.

Any command that reads from standard input can have its input come from a file instead with the < operator:

# example command taking input from file
command < infile

Similarly, any command that writes to standard output can write to a file instead using the > or >> operators:

# example command writing output to a file - create/overwrite
command > outfile

# example command writing output to a file - append
command >> outfile

A command that writes to standard error can have its output redirected to a file as well, while standard output still goes to the screen. This can be done with the 2> operator:

# example command writing error to a file while stdout goes to screen
command 2> errorfile

In order to redirect both standard output and standard error to files, we can use a combination of the above:

# stdout to file (create/overwrite) and stderr to separate file
command > outfile 2> errorfile

# stdout to file (append) and stderr to separate file
command >> outfile 2> errorfile

Finally, to redirect these to the same single file we can use:

# stdout and stderr to same file - create/overwrite (preferred syntax)
command &> outfile

# stdout and stderr to same file - append (preferred syntax)
command &>> outfile

As you can imagine, there’s more to learn about redirecting input and output. If you’re interested in more advanced concepts, you can read more about these here.

Pipes

As well as redirecting output to files, you can redirect the standard output of one command to be the standard input of another. This can be achieved using the pipe operator “|”. For example:

who | sort

In this example, the who command produces a list of all users currently logged into a multi-user system and sends the output to the sort program, printing an alphabetically sorted list of logged-in users on the system plus some additional details, resulting in an output that would look something like the fictional one shown here:

ab1sox   pts/26       Aug 09 11:00 (143.16.198.01)
ac2hgv   pts/87       Sep  7 12:10 (143.16.196.02)
bi3diy   pts/1        Aug 18 13:01 (172.202.14.123)
bj2gbh   pts/48       Sep  9 14:20 (143.16.194.03)
en1xyz   pts/53       Aug 27 15:03 (172.202.15.456)

You can use multiple pipes to create longer series of commands. For example, if we wanted to extend the pipeline from the example above to extract the first column of information using the awk command and then use the wc command to count the number of words (which would clearly be 5), we could do so like this:

who | sort | awk '{print $1}' | wc -w

We will come across many more useful examples of how to use this ability to pipe between commands later in our BASh studies.

Quoting

Normally, the shell treats whitespace as separating the words on the command line. We saw how this caused a problem when referring to the value of a variable that contained whitespace earlier.

If you want a word to contain whitespace, e.g., a filename with a space in it, you can surround it with single or double quotes to make the shell treat it as a unit.

An important distinction to make between single and double quotes is that single quotes treat their contents literally, while double quotes permit the evaluation of shell constructs such as variables. Note the difference between the two examples below.

Running the command echo 'The variable USER refers to $USER' results in the output:

The variable USER refers to $USER

While running the same command but using double quotes like this echo "The variable USER refers to $USER" results in the following on my machine:

The variable USER refers to Lewis

Backticks ` ` cause their contents to be evaluated as a shell command. The contents are then replaced by the standard output of the command. For example, running the command echo "This year is 'date +%Y'" results in the output:

This year is 2022

A dollar sign and parentheses “$()” are equivalent to backticks:

echo "This year is $(date +%Y)"

The output here would be the same as in the example above but the “$()” combination is superior to using backticks as they can be nested. Guess the output of the following command:

echo "Next year is $(expr $(date +%Y) + 1)"

This functionality allows us to nest BASh commands within BASh commands.

Don’t worry if this seems confusing at first, we will revisit these concepts again in future and revise them in the context of multiple scenarios.

Escaping

As mentioned in the foundations for BASh part 1 post, there are some characters (such as the slash “/”) that have special meaning to the shell. While the advice I gave to avoid using these special characters in file or directory names in that post is sound, you might occasionally need or want to use them literally.

To use a special character in this way, you must precede it with a backslash “\”. This is called escaping the special character. For example, if I wanted to echo all the files in my current working directory beginning with the letter “e” I could use echo e* but if I wanted to echo the letter “e” and a literal asterisk I would have to write echo e\*.

Command History

Using the shell command history, both the feature and the literal history command, you can recall previous commands and re-execute them.

Some of the most useful history commands are:

history    Print your history
history N    Print the most recent N commands in your history
history -c    Delete your shell history
!!    Re-run the previous command
!N    Re-run command number N in your history
!-N    Re-run the command you typed N commands ago

You can also scroll through your command history using the “up” and “down” arrow keys, hitting “return” to re-execute the displayed command.

While it is possible to scroll all the way back through your history, this feature is probably saved for the last 10 or so commands you ran; you’d be better using a combination of the history and !N commands above to re-execute much earlier commands.

Tab Completion

Press the “Tab” key while you are in the middle of typing a file or directory name and the shell will automatically finish typing for you! If several filenames match what you’ve typed so far, the shell will beep, indicating that the match is ambiguous. Immediately pressing “Tab” again will present you with a list of alternatives; type a few more characters to disambiguate your choice and then press “Tab” again. Magic.

Here’s an example. I have three files in my tutorial directory, all beginning with the same initial “example_” string.

ls example_<tab><tab>

Running the command shown above will result in the following output:

example_1.txt example_2.txt example_3.txt

If I then type an additional “3” and then hit tab like this:

ls example_3<tab>

The shell auto-completes my command to become:

ls example_3.txt

This is one of the best easy-to-learn features of the shell (as well as script editors and integrated development environments or IDEs) as it prevents you making typos in commands, file names, directory paths etc and thereby saves loads of time when writing longer scripts. Get into the habit of using tab completion wherever this feature is available.

Killing a Command

If you’ve launched a command from the shell running in the foreground, and want to kill it immediately, type “Ctrl+C”. The shell recognises “Ctrl+C” as “kill the current foreground command immediately”.


“B****, You Don’t Have A Future!”

Killing a program with “Ctrl+C” might leave your shell in an odd or unresponsive state, perhaps not displaying the keystrokes you type. This happens because the killed program had no opportunity to clean up after itself.

If this happens to you:

  1. Press “Ctrl+J” to get a shell prompt; this keystroke produces the same new line character as the “return” key (a newline) but will work even if “return” does not.

  2. Type the shell command “reset” (even if the letters don’t appear while you type) and press “Ctrl+J” again to run this command.

This should bring your shell back to normal.

Conclusion

We have covered a lot of useful topics and features over these last two posts, and I would strongly advise that any of you wanting to continue with this BASh series ensure that you’re 100% happy with the underlying concepts; you do not have to know the inner workings of each function, just make sure you are familiar with the basic features or commands and how they work in principle.

Next time in our BASh voyage we will move on to the first part of a command line operations sub-series in which we will cover manipulating files, directories, and data with BASh, take a more detailed look at combining tools with pipes, and learn how to undertake basic batch processing.

. . . . .

Thanks for reading. I hope you enjoyed the article and that it helps you to get a job done more quickly or inspires you to further your data science journey. Please do let me know if there’s anything you want me to cover in future posts.

Happy Data Analysis!

. . . . .

Disclaimer: All views expressed on this site are exclusively my own and do not represent the opinions of any entity whatsoever with which I have been, am now or will be affiliated.