August 29, 2007 (Lecture 2)

August 29, 2007(Lecture 2)

The Shell and Shell Scripting

Most of us are pretty familar with the "UNIX shell". We use it, whether bash, sh, tcsh, zsh, or other varients, to start and stop processes, control the terminal, and to otherwise interact with the system.
Any, many of you have heard of, or made use of "shell scripting" -- the process of providing instructions to the shell in a simple, interpreted programming language.
In many ways the language of the shell is very powerful -- it has functions, conditionals, loops, for example. In other ways, it is weak -- it is completely untyped (everything is a string).
But, the real power of shell program doesn't come from the language but from the diverse library that it can call upon -- any program. Shell programming remains popular because it provides a quick and easy way to integrate command-line tools and filters to solve often complex problems.

Simple Scripts

The simplest scripts of all are nothing more than lists of commands. Consider the script below:
  #!/bin/sh

  # A really simple script

  who am i     # cs395      pts/3        Sep  8 12:11    (SUNW-KRB5-AUTH-DATA)
  date         # Thu Sep  9 02:10:53 EDT 2004
  pwd          # /export/home/cs395/public_html/applications/ln
  
What to notice? Well, in general, anything after a # is a comment and is ignored by the shell. We see this used both as an entire line and next to each fo several lines, where it shows example output. The only exception is the one seen in the first line. This line tells the shell that started the scrip to invoke /bin/sh to run the script. This is necessary because different users might be using different shells: sh, csh, bash, zcsh, tcsh, &c. And these shells have slightly different languages and build-in features. In order to ensure consistent operation, we want to make sure that the same shell is used to run the script each time.

Aside: The various shells are more the same than different. As a result, on many systems, there is actually one shell program cabable of behaving with different personalities. On these systems, the personality is often selected by soft linking different names to the same shell binary. Then, the shell looks at argv[0] to observe how it was invoked, sets some flags to enable/disable behaviors, and goes from there.

The bulk of this simple script is a list of commands. These commands are executed, in turn, and the output is displayed. The commands are found by searching the standard search path PATH. PATH is a : delimited list of directories which should be searched for executibles. Here's my search path:
  /bin:/usr/bin:/usr/local/bin:/usr/openwin/bin:/usr/ccs/bin:/usr/ucb:/etc:.
  
The command which, used as in which ls, will tell you which version of a command is bneing executed. This is useful if different versions might be in your search path. In general, the search path is traversed from left to right.
  
  
  Aside: Notice that ".", the current working directory, is the last
  directory listed. This should almost certainly be the case. Placing 
  it first is dangerous. And, I've got a personal story on this one. 
  When I was a freshman, student UNIX accounts were created with this
  path incorrect -- and "." placed first. 

  
  This led to a collection of people putting bogus "ls" and "cd" commands
  into their home directories. These commands would appear to work -- 
  but also send off an email to the system admin, "I've been a bad boy 
  snooping around in other peoples directories. Please punish me severely."

  
  Curious freshman would wander around the directory space, including
  the homes of others -- silently annoying the system administrators. 
  No doubt, they wanted someone punished severely!
  
  

  

Variables

PATH discussed above is one example of a variable. It is what is known as an environment variable. It reflects one aspect of the shell environment -- where to look for executibles. Changing it changes the environment in which the shell executes programs. Environment variables are special in that they are defined before the shell begins.
Environment variables, like most other variables, can be redefined siply by assigning them a new value:
    PATH=/bin:/usr/bin:/usr/local/bin:/usr/openwin/bin:/usr/ccs/bin:/usr/ucb:/etc:/usr/local/apache/bin:.
  
And, they are evaluated (their value is examined, using the $operator, as below:
    echo $PATH
    PATH=$PATH:/usr/local/apache/bin:.
    echo $PATH
  
To create new variables, you simply assign them a value:
    echo $GREGS_CAR
    GREGS_CAR="Tarus"
    echo $GREGS_CAR
  
All shell script variables are untyped (well, they really are strings) -- how they are interpreted often depends on what program is using them or what operator is manipulating or examing them.

Positionals, e.g. Command Line Arguments

Several special variables exist to help manage command-line arguments to a script:

$# - represents the total number of arguments (much like argv)
$0 - represents the name of the script, as invoked
$1, $2, $3, .., $8, $9 - The first 9 command line arguments
$* - all command line arguments
$@ - all command line arguments
"$@" - all command line arguments, where each argument is individually quoted.

Unlike other variables, positions can't be assigned values using the = operator. Instead, they can only be changed in a very limited way.
The set command sets these values. Consider the following example:
  set a b c
  # $1 is now a
  # $2 is now b
  # $3 is now c
  
One thing that should be noted about the set command. It accepts arguments, itself. These begin with the - sign. As a result, it can get confused and begin to interprete values that it should be assigning to positionals. To avoid this, the -- flag can be used:
  set -- -a- -b- -c- 
  # $1 is now -a-
  # $2 is now -b-
  # $3 is now -c-
  
If there are more than 9 command-line arguments, there is a bit of a problem -- there are onyl 9 positionals: $1, $2, ..., $9. $0 is special and is the shell script's name.
To address this problem, the shift command can be used. It shifts all of the arguments to the left, throwing away $1. What would otherwise have been $10 becomes $9 -- and addressible. We'll talk more about shift after we've talked about while loops.

Quotes, Quotes, and More Quotes

Shell scripting has three different styles of quoting -- each with a diffent meaning:

unquoted strings are normally interpreted
"quoted strings are basically literals -- but $variables are evaluated"
'quoted strings are absolutely literally interpreted'
`commands in quotes like this are executed, their output is then inserted as if it were assigned to a variable and then that variable was evaluated`

I think "quotes" and 'quotes' are pretty straight-forward -- and will be constantly reinforced. But, I do want to show an example using `quotes`:
  day=`date | cut -d " " -f1`
  printf "Today is %s.\n" $day
  

expr

The expr program can be used to manipulate variables, normally interpreted as strings, as integers. Consider the following "adder" script:

  sum=`expr $1 + $2`

  printf "%s + %s = %s\n" $1 $2 $sum

A Few Other Special Variables

$? - the exit status of the last program to exit
$$ - The shell's pid

Predicates

The convention among UNIX programmers is that programs should return a 0 upon success. Typically a non-0 value indicates that the program couldn't do what was requested. Some (but not all) programmers return a negative number upon an error, such as file not found, and a positive number upon some other terminal condition, such as the user choosing to abort the request.
As a result, the shell notion of true and false is a bit backward from what most of us might expect. 0 is considered to be true and non-0 is considered to be false.
We can use the test to evaluate an expression. The following example will print 0 if gkesden is the user and 1 otherwise. It illustrates not only the test but also the use of the status variable. status is automatically set to the exit value of the most recently exited program. The notation $var, such as $test, evaluates the variable.
  test "$LOGNAME" = gkesden
  echo $?
  
Shell scripting languages are typeless. By default everything is interpreted as a string. So, when using variables, we need to specify how we want them to be interpreted. So, the operators we use vary with how we want the data interpreted.

Operators for strings, ints, and files

string x = y, comparison: equal x != y, comparison: not equal x, not null/not 0 length -n x, is null

ints x -eq y, equal x -ge y, greater or equal x -le y, lesser or equal x -gt y, strictly greater x -lt y, strictly lesser x -ne y, not equal

file -f x, is a regular file -d x, is a directory -r x, is readable by this script -w x, is writeable by this script -x x, is executible by this script

logical x -a y, logical and, like && in C (0 is true, though) x -o y, logical or, like && in C (0 is true, though)

Operators for strings, ints, and files
string	x = y, comparison: equal	x != y, comparison: not equal	x, not null/not 0 length	-n x, is null
ints	x -eq y, equal	x -ge y, greater or equal	x -le y, lesser or equal	x -gt y, strictly greater	x -lt y, strictly lesser	x -ne y, not equal
file	-f x, is a regular file	-d x, is a directory	-r x, is readable by this script	-w x, is writeable by this script	-x x, is executible by this script
logical	x -a y, logical and, like && in C (0 is true, though)	x -o y, logical or, like && in C (0 is true, though)

[ Making the Common Case Convenient ]

We've looked at expressions evaluated as below:
  test -f somefile.txt
  
Although this form is the canonical technique for evaluating an expression, the shorthand, as shown below, is universally supported -- and much more reasonable to read:
  [ -f somefile.txt ]
  
You can think of the [] operator as a form of the test command. But, one very important note -- there must be a space to the inside of each of the brackets. This is easy to forget or mistype. But, it is quite critical.

Making Decisions

Like most programming languages, shell script supports the if statement, with or without an else. The general form is below:
  if command
  then
      command
      command
      ...
      command
  else
      command
      command
      ...
      command
  fi
  
  if command
  then
      command
      command
      ...
      command
  fi
  
The command used as the predicate can be any program or expression. The results are evaluated with a 0 return being true and a non-0 return being false.
If ever there is the need for an empty if-block, the null command, a :, can be used in place fo a command to keep the syntax legal.
The following is a nice, quick example of an if-else:
  if [ "$LOGNAME" = "gkesden" ]
  then
    printf "%s is logged in" $LOGNAME
  else
    printf "Intruder! Intruder!"
  fi
  

The elif construct

Shell scripting also has another construct that is very helpful in reducing deep nesting. It is unfamilar to those of us who come from languages like C and Perl. It is the elif, the "else if". This probably made its way itno shell scripting because it drastically reduces the nesting that would otherwise result from the many special cases that real-world situatins present -- without functions to hide complexity (shell does have functions, but not parameters -- and they are more frequently used by csh shell scripters than traniditonalists).
  if command
    command
    command
    ...
    command
  then
    command
    command
    ...
    command
  elif command
  then
    command
    command
    ...
    command
  elif command
  then
    command
    command
    ...
    command
  fi
  

The switch statement

Much like C, C++, or Java, shell has a case/swithc statement. The form is as follows:

  case var
  in
  pat) command
              command
              ...
              command
              ;; # Two ;;'s serve as the break
  pat) command
              command
              ...
              command
              ;; # Two ;;'s serve as the break
  pat) command
              command
              ...
              command
              ;; # Two ;;'s serve as the break
  esac

Here's a quick example:

   #!/bin/sh

   echo $1
   
   case "$1"
   in
     "+") ans=`expr $2 + $3`
          printf "%d %s %d = %d\n" $2 $1 $3 $ans
         ;;
     "-") ans=`expr $2 - $3`
          printf "%d %s %d = %d\n" $2 $1 $3 $ans
         ;;
     "\*") ans=`expr "$2 * $3"`
          printf "%d %s %d = %d\n" $2 $1 $3 $ans
         ;;
     "/") ans=`expr $2 / $3`
          printf "%d %s %d = %d\n" $2 $1 $3 $ans
         ;;

     # Notice this: the default case is a simple *
     *) printf "Don't know how to do that.\n"
         ;;

The for Loop

The for loop provides a tool for processing a list of input. The input to the for loop is a list of values. Each trip through the loop it extracts one value into a varible and then enters the body of the loop. the loop stops when the extract fails because there are no more values in the list.
Let's consider the following example which prints each of the command line arguments, one at a time. We'll extract them from "$@" into $arg:
  for var in "$@"
  do
    printf "%s\n" $var
  done
  
Much like C or Java, shell has a break command, also. As you might guess, it can be used to break out of a loop. Consider this example which stops printing command line arguments, when it gets to one whose value is "quit":
  for var in "$@"
  do
    if [ "$var" = "quit" ]
    then
      break
    fi
    printf "%s\n" $var
  done
  
Similarly, shell has a continue that works just like it does in C or Java. This one can be used to censor me!
  for var in "$@"
  do
    if [ "$var" = "shit" ]
    then
      continue
    elif [ "$var" = "fuck" ]
    then
      continue
    elif [ "$var" = "damn" ]
    then
      continue
    fi
    if [ "$var" = "quit" ]
    then
      break
    fi
    printf "%s\n" $var
  done
  

The while and until Loops

Shell has a while loop similar to that seen in C or Java. It continues until the predicate is false. And, like the other loops within shell, break and continue can be used. Here's an example of a simple while loop:
  # This lists the files in a directory in alphabetical order
  # It continues until the read fails because it has reached the end of input

  ls | sort |
  while read file
  do
    echo $file
  done
  
There is a similar loop, the until loop that continues until the condition is successful -- in other words, while the command failes. This will pound the user for input until it gets it:
  printf "ANSWER ME! "
  until read $answer
  do
    printf "ANSWER ME! "
  done