Table of Contents
Simplicity and Abstraction
I like my software simple and devoid of useless abstraction. I often find myself in positions where I’m searching for scissors to cut a sheet of paper, and am instead greeted with a chainsaw. The urge to over-complicate and -abstract your software can be strong; I often see people who preach simple software writing programs to solve basic problems that have 30 different command-line flags, and require a 50 page PDF explaining its operation.
Why do I mention all of this? Well as anyone who’s ever tried their hand at web-development knows, websites are written in HTML. I wish I could say that’s a good thing, but as anyone who’s ever looked at HTML before would know, that language is — to put it lightly — really not great. It’s extremely verbose, and awkward to write- and edit (angle brackets are not the easiest-to-reach keys on the keyboard).
So what’s the solution? The most obvious to me is to create a nicer to
read- and write language which I can easily transpile down to
HTML. Ideally the CLI is very simple and works on the
standard input and -output like all good UNIX utilities. I should be
able to transpile my site by simply running ‘cmd in.xyz
out.html
’, where my input reflects the structure of my output will
nicer, less-polluting syntax such as in ‘div .cls { … }
’.
The kind of tool I am describing here is what I imagine the ideal solution to be. A simple tool with a simple function. It takes an input language and produces an output language. There is also minimal abstraction. The input language should reflect the structure of HTML, because that’s exactly what we’re trying to output. It makes little sense to create a fundamentally different language when HTML not only does a good job at defining a websites structure, but sticking close to the language we are targeting just makes everyone’s life easier in every way.
Most Software Sucks
So with my ideal solution being a simple language with a simple CLI that sticks close to the structure of HTML, let’s take a look at what other people have come up with:
# Markdown 4 Lyfe
Welcome to my website written in Hugo!
Oh no.
Now most readers probably had the initial reaction of ‘What’s wrong with Markdown?’. To answer your question: everything. The issue I have with these highly-prevalent Markdown-based replacements for HTML is that they ignore the fundamental fact that HTML and Markdown are not compatible languages with each other. HTML is designed around making websites (with the added autism of XML). It gives us things like semantic tags for describing input forms, navigation bars, figures, and more. With the addition of classes and IDs, we can even style two paragraphs on the same page in different ways. This is fundamentally not possible in Markdown. If we ignore the fact that Markdown is just poorly designed, it offers us basically none of what we need to make an even slightly-complex static page as it’s not meant for website-design but simply to be a readable plain-text format you can use for documentation or your email or something.
How do you make your navigation bar in Markdown? Or style two paragraphs differently? You can’t. Some try to get around this by adding extensions to the Markdown language, but they never manage to cover all bases. Another problem I always come across when trying to use Markdown-to-HTML tools is code blocks. I always make sure to use tabs for indentation in my code blocks instead of spaces, so that I can vary the tab-width based on the screen size of the reader. You obviously can’t do this with spaces since the fundamental (and retarded) purpose of space-indentation is to force everyone to view code with the same indentation, which sucks for users on mobile when you have nice large indents. To this day I have yet to find a Markdown-to-HTML converter that will let me have tab indents without error-prone post-processing of the generated HTML.
Ok well… there are other ways of generating HTML; one rather popular option is Pug:
div
p
| Hello world! This is a
| multiline paragraph.
ul
li foo
li bar
li baz
While Pug certainly hits the ‘maintain the same structure’ point right
on the head, it fails in one very crucial area — it’s a JavaScript
library only, and so requires a whole JS setup simply to
transpile your site to HTML. What a bummer. There is also a
second issue which is that it uses an indentation-sensitive syntax.
Normally I am actually a fan of languages like this — such as Python —
but in the case of a markup language like Pug, this is terrible as it
makes macros and templating with tools such as m4
exceptionally
difficult. Pug does offer templating faculties via JavaScript,
but I really try to minimize the amount of JavaScript I need to write
whenever possible.
My Solution
So with no existing tools fitting my entry criteria, I did the only
reasonable next thing and made my own tool. It tries to stick to the
format of HTML as closely as possible while offering an
extremely easy-to-use transpiler. It also has no added bullshit
like filters, templates, etc. If you want macros, use a macro-processor
like m4
. I called it GSP because everyone knows that
German Shorthaired Pointers are better than pugs. Here is a quick
syntax example:
html lang="en" {
body {
p {- Hello, World!}
ul {
li {a href="#" #home {-Home Page}}
li {a href="#" #about {-About Me}}
li {a href="#" #links {-Fun Links}}
}
}
}
Here you can see almost all of GSP. The document follows the same structure as HTML, but thanks to the use of braces instead of opening- and closing tags, the syntax is far less verbose and easier to read. The language also provides shorthands for classes and IDs through CSS-selector syntax.
Templating and macros are also very easy via macro processors thanks to
the use of braces instead of whitespace-based scoping. You may have
noticed that I like to make use of abbreviations on this website that
expand when hovered over (unless you’re on mobile, in which case you
might not see them). I do this via the HTML <abbr>
tag, providing the appropriate class. For example, many times on this
very page I’ve made use of @abbr .html {-HTML}
. Obviously
repeating that everywhere in my document can be quite annoying, so I’ve
gotten around this by making use of the following m4
macro:
m4_define(m4_abbr, `@abbr .m4_translit($1, A-Z, a-z) {-$1}')
I can then insert abbreviations by simply writing something along the
lines of ‘m4_abbr(HTML)
’ in my document.
The transpiler itself is also incredibly easy to use, something
JavaScript developers would never be able to comprehend. In order to
transpile a GSP document into an HTML document, I
simply run ‘gsp index.gsp >index.html
’. Yep, that’s it.
Syntax Highlighting
One problem that I came across writing GSP was the lack of syntax highlighting. It can seem not so important, but syntax highlighting is crucial for helping you quickly identify different syntax elements. The awesome solution I found for this ended being Tree-Sitter. Tree-Sitter is a parser-generator that various text editors such as Vim and Emacs can integrate with to offer efficient- and high quality syntax highlighting, amongst other features such as syntax-aware code folding and movement.
After a bit of research and reading the documentation, I found that
creating your own parsers is actually really easy. You effectively just
define a JavaScript object that describes the language grammar, and a C
parser is generated from that. If you’re interested, you can find the
GSP parser here. To give you a bit of an idea of just how simple a Tree-Sitter
parser is, here’s a simplified example of how you describe the
definition of a node, and a node name
{
node: $ => seq(
optional('>'),
$.node_name,
optional($.attribute_list),
'{',
optional($.node_body),
'}',
),
node_name: $ => /[a-zA-Z:_][a-zA-Z0-9:_\-​.]*​/,
}
As you can see, the grammar syntax is extremely simple. You simply
define your core syntax elements via regular expressions, and then
compose them together via helper functions such as optional
and
repeat
to define the full structure of your language.
This isn’t enough though. We now have a parser for our language that can create a syntax tree that our editor can take advantage of, but our editor still doesn’t know what each node actually is so that it can be syntax highlighted properly. Tree Sitter solves this through a query file written in Scheme where we can describe how to syntax highlight our AST. This is what the configuration for GSP looks like:
queries/highlights.scm
[">" "-" "=" "@"] @operator
["{" "}"] @tag.delimiter
(node_name) @tag
[
(attribute_name)
(class_shorthand)
(id_shorthand)
] @tag.attribute
(attribute_value) @string
As you can see, this is all really simple stuff, which is what I love so much about Tree Sitter — it’s just so easy! With these basic annotations your editor knows that attribute values should be highlighted like strings, braces like tag delimiters, etc. In a similar vein, writing a query to describe code-folding is really easy:
queries/folds.scm
[
(node)
(attribute_list)
] @fold
The Takeaway
So what’s the takeaway? I think it’s that when you have a problem, often times the best solution is not to fundamentally redesign something from the ground up, or to completely change the way a system works, but to instead identify the specific thing that annoys you and find a fix for it. I thought that the syntax of HTML was annoying and bad, so I found a solution for the syntax, while keeping the core structure the same. In the same line of thinking, try not to over-abstract — I’m looking at you, Java developers. Abstraction often leads to exponentially increased complications the moment we want to do anything different or out of the ordinary, so unless you can find a really nice abstraction that doesn’t really make anyone’s life harder, try to avoid them when you can.
If you’re interested in GSP, you can find the git repository over at Sourcehut.