Shell Scripting Notes - Stop bashing your head on a wall

Note on shell script "gotchas" that keep coming up. In particular how to manipulate arguments using shell arrays or eval.

Shell Scripting Notes - Stop bashing your head on a wall

Because its easier to write it down that search for the same thing all the time.

Status - July 2023 - Scribbles based on need and testing


Which Shell ?

I use the following UNIX flavours which have different "shells":

  • SGI IRIX - default shell is "Korn Shell" (AT&T ksh88). "/bin/sh" -> "/usr/bin/sh" -> "/sbin/sh". Also provided are: bsh (classic Bourne), csh (classic sch) & tcsh (enhanced csh)
  • Ubuntu - default login shell is "Bash - Bourne again shell", but default shell "/bin/sh" -> "usr/bin/dash". Where "dash" could is much faster "bash" or a "Debian Almquist SHell" or both of these things. Dash speed comes with some features removed. Other shells are available but they need to be installed.
  • FreeBSD - default shell is "sh" which is written by Kenneth Almquist (so same code base as "dash"). FreeBSD also comes with csh and tcsh. Other shells are available but they need to install/built.

As this is about "scripting" (not interactive use), the important thing is that all of these shell variants are based on the original Bourne Shell scripting language. Since the effort to standardize around POSIX, which adopted Bourne/Korn as its working basis C Shell scripting is hardly used.

So when I refer to "scripting" & "shell programming" I mean scripting in a "Bourne Shell" like language (ie not C Shell and its variants).


Cryptic Things in Shell Programming

The following cryptic thing are feature of Bourne scripts:

  • ":" - the "no operation" build into command, which is used to define infinite loops such as:
while :
do

done
  • "test", "[" and "[[" - there are three main variants of defining condition testing for "if", "for" and "while" compound commands. Historically,  the testing expression is not part of the shell syntax, rather it is part of the "compound command" and can be "test" or "[" (which enforce need for closing "]". Some shells such as Korn and Bash provide shell inbuilt conditional expression "[[ <test> ]]". This has different behavior in the way it treats arguments (it was note to work splitting and filename generation) and supports string pattern testing (which "test" & "[" do not). So using conditional expression is not just a syntactic niceity. The inbuilt condition test "[[ ]]" also has performance benefits as it avoids "forking" of new process to execute command.
#
# the compound command
#
if <compound_list> then

[ elif <compound_list> then ]

[ else ]

fi

#
# where <compound_list> typically used like:
#
if test -x <something> ;then
#
# where: "which test" == "/usr/bin/test"
#
# or
#
if [ <valueA> = <valueb> ]; then
#
# where: "which [" == "/usr/bin/["
#
# do "man test" for details...
#
# or (Korn or Bash)
#
if [[ <valueA> = <valueB> ]]; then
#
# Generally <compound_list> is just a sequence of commands
#   which can be optionally grouped via brackets: "(" ")" with
#   commands seperated by ";" or conditionally "&&" and "||"
#   of a command pipeline "|"
#   The "if", "for" and "while" action is determined by the
#     return value ($? parameter gives last return value)
#     (0 == success) and not the result
  • shell "pattern" != "regex" - when you read "pattern" you typically think "regular expression". However the shell "pattern" is much more constrained than a general regular expression. It also matches the entire string and should not be quoted. Here is example:
#
#
# 1. simple shell patern example
#
$ if [[ "-Dfrog=jump" = -D* ]]; then echo "true"; else echo "false"; fi
true
#
# 2. incorrect regex equivalent
#
$ if [[ "-Dfrog=jump" = ^-D\(.*\)$ ]]; then echo "true"; else echo "false"; fi
false
#
# 3. If you want to use regex then grep...
#
$ if [[ `echo "-Dfrog=jump" | grep '^-D\(.*\)$'` ]]; then echo "true"; else echo "false"; fi
true
#

There is a well know (in geek circles ;-) ) quote from Tom Duff that, "“Nobody really knows what the Bourne shell’s grammar is. Even examination of the source code is little help.”

In fact Bourne scripting grammer is actually pretty simple at its core, with three basic things:

  • pipelines - unix pipes that take output of one program and pass it into the input of another
  • lists - sequence of commands which can be executed unconditionally (";" or new line seperate) or conditionally through "&&" and "||"
  • commands have a result which is the return value (with 0 == success), while taking optional inputs and generating optional outputs

Saving and Manipulating Input Arguments

When you invoke a script the input arguments get passed in as an array of string (just like C a program "int man(int argc, char *argv[]) {....}". One of the most common things that a script might need to do is to "tweek" the input arguments and then invoke another program.

This requires being able to save the inputs arguments and then pass them forward to the other program.

The scriping language allows you to do this using the $@ parameter. When put in quotes this will allow to pass on the arguments verbatim. That is with the same argument seperation and no further word splitting.

Example:

#
# 1. Invoke script with "quoted" arguments
#
$ ./args1.sh -v -a -S "debug=info do /home/here ${HOSTTYPE}" ./args2.sh
#
# 2. The script invokes "./arg2.sh" with different argument based on flags
#   -v == verbatim : expect @# == 3
#   -s == split quoted inputs: expect @# == 6
#   -r == reverse args: expect @# == 3 but args reversed
#   -m == split and mirror: expect @# == 6 and in mirror order
#   -a == use array or -e use eval
#
#

So how is this achieved ?

The -v (verbatim) & -s (split) cases can be implemented trivally using the quoted "$@" and unquoted $@ parameters.

The -r (reverse) and -m (mirror) requires saving and manipulation of the input parameters to work. This is easier or harder depending on shell version. Both Korn and Bash provide support for arrays, so it is possible to store the inputs into an array and the play back in different order. For Almquist based shells (FreeBSD sh and Ubuntu dash) the problem is messier and the general solution needs to use the shell "eval" to build shell command to save and return saved values.

---
--- 1. The argument manipulation sh script: verabatim/split/reverse/mirror
---    Using either array or eval
---
# cat args1.sh
#!/bin/sh

echo "$0>> # args: '$#'."

if [ $# -gt 0 ]; then

        eval cmd='$'{$#}
        echo "${cmd}"

        usearray=no
        useeval=no

        case "$1" in
                -v)     ${cmd} "${@}" ;;
                -s)     ${cmd} ${*} ;;
                -r)     if [ "$2" = -a ]; then
                                usearray=yes
                        else
                                useeval=yes
                        fi
                        ;;
                -m)     if [ "$2" = -a ]; then
                                usearray=yes
                        else
                                useeval=yes
                        fi
                        ;;
        esac

        if [ "${usearray}" = yes ]; then
                set -A savargs
                set -A revargs
                i=0
                j=0
                if [ "$1" = "-r" ]; then 
                        for a in "${@}"
                        do
                                savargs[${i}]="${a}"
                                let "i=i + 1"
                        done
                else
                        for a in ${*}
                        do
                                savargs[${i}]="${a}"
                                let "i=i + 1"
                        done
                fi

                let "j=i - 1"
                k=0

                while [ j -ge 0 ]
                do
                        revargs[k]="${savargs[${j}]}"
                        let "k=k + 1"
                        let "j=j - 1"
                done

                ${cmd} "${revargs[@]}"

        elif [ "${useeval}" = yes ]; then
                i=0
                j=0
                if [ "$1" = "-r" ]; then 
                        for a in "${@}"
                        do
                                eval "savargs"${i}"=\"${a}\""
#                               savargs[${i}]="${a}"
                                let "i=i + 1"
                        done
                else
                        for a in ${*}
                        do
                                eval "savargs"${i}"=\"${a}\""
#                               savargs[${i}]="${a}"
                                let "i=i + 1"
                        done
                fi

                let "j=i - 1"
                k=0

                while [ j -ge 0 ]
                do
                        eval "revargs"${k}"=\"\${savargs"${j}"}\""
#                       revargs[k]="${savargs[${j}]}"
                        let "k=k + 1"
                        let "j=j - 1"
                done

                j=0
                cmdstr="\"\${revargs"${j}"}\""
                let "j=j + 1"
                while [ j -lt k ]
                do
                        cmdstr="${cmdstr} \"\${revargs"${j}"}\""
                        let "j=j + 1"
                done
                eval "${cmd} ${cmdstr}"
        fi 
fi
---
--- 2. The called sh scripts, just shows what it got as arguments:
---
# cat args2.sh
#!/bin/sh

echo "$0>> # args: '$#'."

savargs="$@"
i=1;

while [ $# -gt 0 ] 
do
        echo "arg[${i}]: '$1'";
        let "i=i + 1"
        shift
done

echo "savargs: '${savargs}'."

echo ${savargs}

The example result for mirror using shell array:

# ./args1.sh -m -a -S "debug=info do /home/here ${HOSTTYPE}" ./args2.sh
./args1.sh>> # args: '5'.
./args2.sh
./args2.sh>> # args: '8'.
arg[1]: './args2.sh'
arg[2]: 'iris4d'
arg[3]: '/home/here'
arg[4]: 'do'
arg[5]: 'debug=info'
arg[6]: '-S'
arg[7]: '-a'
arg[8]: '-m'
savargs: './args2.sh iris4d /home/here do debug=info -S -a -m'.
./args2.sh iris4d /home/here do debug=info -S -a -m

And the result for reverse using "eval", with visible exectution (sh -x) so you can see the "eval" operations. The "eval" based manipulations reqired lots of careful "quoting" to work:

# sh -x ./args1.sh -r -e -S "debug=info do /home/here ${HOSTTYPE}" ./args2.sh
+ echo ./args1.sh>> # args: '5'.
./args1.sh>> # args: '5'.
+ [ 5 -gt 0 ]
+ eval cmd=${5}
+ cmd=./args2.sh
+ echo ./args2.sh
./args2.sh
+ usearray=no
+ useeval=no
+ [ -e = -a ]
+ useeval=yes
+ [ no = yes ]
+ [ yes = yes ]
+ i=0
+ j=0
+ [ -r = -r ]
+ eval savargs0="-r"
+ savargs0=-r
+ let i=i + 1
+ eval savargs1="-e"
+ savargs1=-e
+ let i=i + 1
+ eval savargs2="-S"
+ savargs2=-S
+ let i=i + 1
+ eval savargs3="debug=info do /home/here iris4d"
+ savargs3=debug=info do /home/here iris4d
+ let i=i + 1
+ eval savargs4="./args2.sh"
+ savargs4=./args2.sh
+ let i=i + 1
+ let j=i - 1
+ k=0
+ [ j -ge 0 ]
+ eval revargs0="${savargs4}"
+ revargs0=./args2.sh
+ let k=k + 1
+ let j=j - 1
+ [ j -ge 0 ]
+ eval revargs1="${savargs3}"
+ revargs1=debug=info do /home/here iris4d
+ let k=k + 1
+ let j=j - 1
+ [ j -ge 0 ]
+ eval revargs2="${savargs2}"
+ revargs2=-S
+ let k=k + 1
+ let j=j - 1
+ [ j -ge 0 ]
+ eval revargs3="${savargs1}"
+ revargs3=-e
+ let k=k + 1
+ let j=j - 1
+ [ j -ge 0 ]
+ eval revargs4="${savargs0}"
+ revargs4=-r
+ let k=k + 1
+ let j=j - 1
+ [ j -ge 0 ]
+ j=0
+ cmdstr="${revargs0}"
+ let j=j + 1
+ [ j -lt k ]
+ cmdstr="${revargs0}" "${revargs1}"
+ let j=j + 1
+ [ j -lt k ]
+ cmdstr="${revargs0}" "${revargs1}" "${revargs2}"
+ let j=j + 1
+ [ j -lt k ]
+ cmdstr="${revargs0}" "${revargs1}" "${revargs2}" "${revargs3}"
+ let j=j + 1
+ [ j -lt k ]
+ cmdstr="${revargs0}" "${revargs1}" "${revargs2}" "${revargs3}" "${revargs4}"
+ let j=j + 1
+ [ j -lt k ]
+ eval ./args2.sh "${revargs0}" "${revargs1}" "${revargs2}" "${revargs3}" "${revargs4}"
+ ./args2.sh ./args2.sh debug=info do /home/here iris4d -S -e -r
./args2.sh>> # args: '5'.
arg[1]: './args2.sh'
arg[2]: 'debug=info do /home/here iris4d'
arg[3]: '-S'
arg[4]: '-e'
arg[5]: '-r'
savargs: './args2.sh debug=info do /home/here iris4d -S -e -r'.
./args2.sh debug=info do /home/here iris4d -S -e -r

NOTE #1: While arrays are not supported in all shells, they do make the argument code easy compared to using "eval" which requires very careful quoting to avoid the shell try to execute results or eval failing.

NOTE #2: In both cases the ${HOSTTYPE} environment variable has been evaluated to "irix4d", as the examples were run on SGI IRIX machine.


Redirection Remembered

In priority order...

  • stdout & stderr to same file - this is the one I keep having to lookup all the time and the only redirection use regulary (beyond the super simple). For all Bourne style shells: "dothis > out-err.txt 2>&1" . Where '>' - is to redirect stdout (file handle 1), '2>" - is to redirect stderr (file handle 2) and '&1' - is reference to "file handle 1" == stdout.
  • for C Shell variants - "dothis >&  out-err.txt", noting that C Shell is very constrained in its redirection funcion
  • Others ... not so important

Shell Scripting Language Syntax

EBNF or Syntax Diagram would be helpful ... here is grammer based on Bash parse.y YACC file:

---
--- Backaus-Naur Form (BNF) / YACC for Bourne & related scripting
---  languages based on Bash YACC definition
---
--- NOTE: Removed redirection for simplification
---
--- Shell Script Grammar / Lexical Elements:
---
--- WORD == Any Identifier | Built In Command |
---          Parameter
---     So "continue", "break" etc are treated as lexical elements
---      and lexical analysis is context sensitive (hence the
---      commend about "nobody really knows Bourne Shell grammar"
---
--- ASSIGNMENT_WORD == Like: WORD=
---
--- By John Hartley
---

inputunit:	simple_list simple_list_terminator

word_list:	WORD
	|	word_list WORD

simple_command_element: WORD
	|	ASSIGNMENT_WORD

simple_command:	simple_command_element
	|	simple_command simple_command_element

command:	simple_command
	|	shell_command
	|	function_def
	|	coproc

shell_command:	for_command
	|	case_command
 	|	'while' compound_list 'do' compound_list 'done'
	|	'until' compound_list 'do' compound_list 'done'
	|	select_command
	|	if_command
	|	subshell
	|	group_command
	|	arith_command
	|	cond_command
	|	arith_for_command

for_command:	'for' WORD newline_list 'do' compound_list 'done'
	|	'for' WORD newline_list '{' compound_list '}'
	|	'for' WORD ';' newline_list 'do' compound_list 'done'
	|	'for' WORD ';' newline_list '{' compound_list '}'
	|	'for' WORD newline_list 'in' word_list list_terminator newline_list 'do' compound_list 'done'
	|	'for' WORD newline_list 'in' word_list list_terminator newline_list '{' compound_list '}'
	|	'for' WORD newline_list 'in' list_terminator newline_list 'do' compound_list 'done'
	|	'for' WORD newline_list 'in' list_terminator newline_list '{' compound_list '}'

arith_for_command:	'for' ARITH_FOR_EXPRS list_terminator newline_list 'do' compound_list 'done'
	|		'for' ARITH_FOR_EXPRS list_terminator newline_list '{' compound_list '}'
	|		'for' ARITH_FOR_EXPRS 'do' compound_list 'done'
	|		'for' ARITH_FOR_EXPRS '{' compound_list '}'

select_command:	'select' WORD newline_list 'do' list 'done'
	|	'select' WORD newline_list '{' list '}'
	|	'select' WORD ';' newline_list 'do' list 'done'
	|	'select' WORD ';' newline_list '{' list '}'
	|	'select' WORD newline_list 'in' word_list list_terminator newline_list 'do' list 'done'
	|	'select' WORD newline_list 'in' word_list list_terminator newline_list '{' list '}'
	|	'select' WORD newline_list 'in' list_terminator newline_list 'do' compound_list 'done'
	|	'select' WORD newline_list 'in' list_terminator newline_list '{' compound_list '}'

case_command:	'case' WORD newline_list 'in' newline_list "esac'
	|	'case' WORD newline_list 'in' case_clause_sequence newline_list 'esac'
	|	'case' WORD newline_list 'in' case_clause 'esac'

function_def:	WORD '(' ')' newline_list function_body

	|	'function' WORD '(' ')' newline_list function_body

	|	'function' WORD newline_list function_body

function_body:	shell_command

subshell:	'(' compound_list ')'

coproc:		'coproc' shell_command
	|	'coproc' WORD shell_command
	|	'coproc' simple_command

if_command:	'if' compound_list 'then' compound_list 'fi'
	|	'if' compound_list 'then' compound_list 'else' compound_list 'fi'
	|	'if' compound_list 'then' compound_list elif_clause 'fi'


group_command:	'{' compound_list '}'

arith_command:	ARITH_CMD

cond_command:	'[[' COND_CMD ']]' 

elif_clause:	'elif' compound_list 'then' compound_list
	|	'elif' compound_list 'then' compound_list 'else' compound_list
	|	'elif' compound_list 'then' compound_list elif_clause

case_clause:	pattern_list
	|	case_clause_sequence pattern_list

pattern_list:	newline_list pattern ')' compound_list
	|	newline_list pattern ')' newline_list
	|	newline_list '(' pattern ')' compound_list
	|	newline_list '(' pattern ')' newline_list

case_clause_sequence:  pattern_list ';;'
	|	case_clause_sequence pattern_list ';;'
	|	pattern_list ';&'
	|	case_clause_sequence pattern_list ';&'
	|	pattern_list ';;&'
	|	case_clause_sequence pattern_list ';;&'

pattern:	WORD
	|	pattern '|' WORD

/* A list allows leading or trailing newlines and
   newlines as operators (equivalent to semicolons).
   It must end with a newline or semicolon.
   Lists are used within commands such as if, for, while.  */

list:		newline_list list0

compound_list:	list
	|	newline_list list1

list0:  	list1 '\n' newline_list
	|	list1 '&' newline_list
	|	list1 ';' newline_list


list1:		list1 '∧∧' newline_list list1
	|	list1 '||' newline_list list1
	|	list1 '&' newline_list list1
	|	list1 ';' newline_list list1
	|	list1 '\n' newline_list list1
	|	pipeline_command

simple_list_terminator:	'\n'
	|	yacc_EOF

list_terminator:'\n'
	|	';'
	|	yacc_EOF

newline_list:
	|	newline_list '\n'

/* A simple_list is a list that contains no significant newlines
   and no leading or trailing newlines.  Newlines are allowed
   only following operators, where they are not significant.

   This is what an inputunit consists of.  */

simple_list:	simple_list1
	|	simple_list1 '&'
	|	simple_list1 ';'

simple_list1:	simple_list1 '&&' newline_list simple_list1
	|	simple_list1 '||' newline_list simple_list1
	|	simple_list1 '&' simple_list1
	|	simple_list1 ';' simple_list1

	|	pipeline_command

pipeline_command: pipeline			
	|	'|' pipeline_command
	|	timespec pipeline_command
	|	timespec list_terminator
	|	'|' list_terminator

pipeline:	pipeline '|' newline_list pipeline
	|	pipeline '|&' newline_list pipeline
	|	command

timespec:	'times'
	|	'times' TIMEOPT
	|	'times' TIMEOPT TIMEIGN


References & Links:

Korn Shell - the classic and original web site for Korn'ers, by its author. As Korn Shell was originally developed at AT&T is was proprietory code and a number of Open Source variations where created, with pdksh (public domain ksh) being the most popular. The AT&T code has been been made available via "AT&T Software Toolkit".

AST - AT&T Software Toolkit - the AT&T code, is now basis for FreeBSD and Ubuntu Korn Shell rather than pdksh.

Bash - yep bash brings together features from bsh, ksh, csh and more, so its the richest of the shells

Dash - based on Almquist SHell, which is also basis for FreeBSD sh shell

"POSIX - Shell Command Language" - the POSIX language based on Bourne Shell

Bash Shell Grammar - BNF / Yacc like grammar for Bash Command Language

pdksh - Public Domain Korn Shell. A Korn clone before AT&T ksh code was available (now seems to be unsupported with mksh taking up reins)

MirBSD Korn Shell - (mksh) has pickup and continues to suppork pdksh. Thank you Mir team ;-)