POSIX Pitfalls

  • Home
  • Back
  • Plan 9 argues that given a few carefully implemented abstractions it is possible to produce a small operating system that provides support for the largest systems on a variety of architectures and networks.

    The Use of Name Spaces in Plan 9 by Rob Pike et al.

    Prologue

    Since the moment I decided to take software development more seriously, I have been absolutely enamored by the Shell 1 — the POSIX shell to be more specific. The syntax is questionable at times, and the available resources outside of the POSIX specification itself are absolutely piss-poor as a result of the average *NIX user failing to understand the difference between /bin/sh and Bash 2. What really drew me into the Shell was the powerful idea of composability, and being able to combine simple tools to form a much more powerful one in only a handful of lines. I talked more about this in my previous post.

    It didn’t take long for me to find issues with my beloved /bin/sh however. Like it or not, the modern shells we all use such as Bash and Zsh are all based on a that is approaching half a century in age. It some things right — like the idea that you can use loops and conditional statements in a pipeline — but it also got a lot of things wrong, and these are things that we can improve on. The most obvious deficiency in POSIX shells is the absolutely abhorrent handling of whitespace.

    There have been quite a few alternatives to the POSIX shell made over the years, although I find this to be an area that is shockingly underdeveloped. If you’re reading this, I implore you to attempt to design your own shell, no matter how simple. If you know how to make one, you can experiment with new ideas! If you don’t, it’s a really great learning experience, even if all your shell can do is spawn a process.

    Alternatives to POSIX

    There are a few alternatives shells that have managed to garner a respectable userbase. Fish, Powershell, Nushell, and Elvish just to name the ones I can think of off the top of my head have all managed to get a userbase while giving the finger to POSIX. I do believe that ditching POSIX is a necessity to create a half-decent modern shell. I have used Fish for close to a year before and it is probably my favorite of the bunch; it tries to do its own thing with its own ideas, but it still remains highly familiar for those coming from POSIX.

    I’m not entirely happy with Fish though. Fish and most of the other modern shells all fall in my opinion to the classic trap of over-engineering; they try to do too much and lose sight of what the shell is fundamentally all about. The philosophy of the shell is to manipulate streams by composing small- and simple tools, yet Fish bundles in a whole host of builtins that add nothing while replacing functionality that is already solved by existing tools. You can read from /dev/urandom to generate random numbers, yet Fish added a random builtin. You can do arbitrary-precision mathematics with the Bc and Dc calculators, yet Fish added the math builtin. The same goes for the string builtin.

    I do appreciate Fish though, because despite loosing sight of what a shell should be (in my opinion), they still tried something new, and I respect that. The same goes for all the other shells out there. Also they definitely do get some things right. Using Fish as an example once again, they decided to just remove the ‘?’ wildcard from globs entirely — a move I completely support.

    All in all, while I don’t think any of these ‘mainstream’ alternatives got it right, they are a great source of inspiration for me as to what I should or should not do should I make my own shell.

    Introducing Andy

    Andy is a shell that I’ve been meaning to make for around 2 years now which never materialized as a result of a lack of dedicated focus, and a lack of a thought-out vision and -design. Part of why I’m writing this in fact is to help me develop a proper vision for what I want Andy to be; I find that discussing and writing about things helps a lot with this kind of thing.

    I want the philosophy of Andy to reflect that of the original Bourne Shell, and the less features the better — ‘less is more’ as Ludwig Mies van der Rohe famously said. That being said, not all features should be thrown to the wayside; if a feature is simple to understand, simple to implement, and solves a real problem, there is no problem in adding it.

    Take process redirection for example. To properly compare the outputs of two processes in POSIX shell, we need to do this whole rigmarole:

    #!/bin/sh
    
    tmp=$(mktemp)
    trap "rm -f $tmp" EXIT
    cmd2 >$tmp
    cmd1 | diff - $tmp

    Now compare that to the Bash solution using process redirections:

    #!/bin/bash
    
    diff <(cmd1) <(cmd2)

    The Bash solution is more readable, and far easier to understand at a glance. It’s also a lot better functionally in that it doesn’t require you to need to need to manually cleanup your temporary file (something which might fail if your script receives certain signals). It’s more efficient too; instead of waiting for cmd2 to write all its output to a temporary file for us to read, both cmd1 and cmd2 are run in parallel to each other. This can obviously be solved using named pipes, but now we’re adding more complexity to our application.

    There are a few fundamental ‘problems’ I want to fix in Andy. The first is whitespace handling; safe POSIX shell scripts will contain almost as many quotation marks to avoid word-splitting as Lisp programs contain parenthesis. This is an absolute must, under no circumstance should strings be expanding into even more strings without the explicit consent of the user; it’s a recipe for disaster and it’s the shell-equivalent of the null-pointer-exception.

    The second major fix I want to make is in terms of datatypes. For this I took major inspiration from Plan 9’s Rc shell. While the fundamental datatype of the shell is the stream — which is well-represented by the string — we very often are working with lists of items. Lists of filenames, lists of regular expression matches, etc. I want lists to be a first-class citizen of Andy.

    Outside of these major changes, there are other minor changes I want to make. I want to use a C-style syntax similar (but even simpler) than that of Rc. The whole ‘if-then’ and ‘esac’ business is both overly verbose for a language that needs to work well in a REPL, and just plain ugly. A friend of mine even suggested that the reason the Bourne Shell decided to call them ‘case-statements’ instead of ‘switch-statements’ like every other language was that nobody would remember how to spell ‘hctiws’.

    I also want to allow functions to take named arguments, and to completely remove the need for newline-escaping, allowing for readable multiline pipelines.

    In ‘The shell and its crappy handling of whitespace’, the author Mark Dominus offers an example piece of shell script to rename *.jpeg files to *.jpg. Take note of all the quoting that is required in his example in order to properly handle filenames with spaces, as well as the seemingly useless ‘do’ keyword:

    for i in *.jpeg; do
    	mv "$i" "$(suf "$i")".jpg  # three sets of quotes
    done

    Here is how I envision such a solution in Andy:

    for i in *.jpeg {
    	mv $i `{suf $i}.jpg
    }
    
    # maybe have an implicit iteration variable?
    
    for *.jpeg {
    	mv $_ `{suf $_}.jpg
    }

    Notice the complete lack of quotes in the Andy solution, because it lacks the retardation of automatic word-expansion. The syntax is also minimal, fast to type, and visually out of the way. C-style braces work well here; they’re only one character each. We can also completely remove the ‘do’ keyword, and potentially even make the binding of an iteration variable optional — I’m not sure about that yet though.

    I’m currently in the process of actively developing Andy, and I will probably make another post on here soon detailing the current progress and features of the shell. I hope to soon be able to use Andy as my primary shell; both for scripting and interactive use.