#+TITLE: Fast Static Site with make
#+DESCRIPTION: A deeper view of my static site builder Makefile
#+KEYWORDS: blog static
#+AUTHOR: Yann Esposito
#+EMAIL: yann@esposito.host
#+DATE: [2021-05-25 Tue]
#+LANG: en
#+OPTIONS: auto-id:t
#+STARTUP: showeverything

This article will dig a bit deeper about my =Makefile= based static website generator.
In a [[https://her.esy.fun/posts/0017-static-blog-builder/index.html][previous article]] I just gave the rationale and an overview to do it
yourself.
Mainly it is very fast and portable.

A few goals reached by my current build system are:

1. Be fast and make the minimal amount of work as possible.
   I don't want to rebuild all the html pages if I only change one file.
2. Source file format agnostic. You can use markdown, org-mode or even
   directly write html.
3. Support gemini
4. Optimize size: minify HTML, CSS, images
5. Generate an index page listing the posts
6. Generate RSS/atom feed (for both gemini and http)

=make= will take care of handling the dependency graph to minimize
the amount of effort when a change occurs in the sources.
For some features, I built specifics small shell scripts.
For example to be absolutely agnostic in the source format for my articles
I generate the RSS out of a tree of HTML files.
But taking advantage of =make=, I generate an index cache to transform those
HTML into XML which will be faster to use to build different indexes.
To make those transformations I use very short a shell scripts.

* =Makefile= overview
:PROPERTIES:
:CUSTOM_ID: -makefile--overview
:END:

A Makefile is made out of rules.
The first rule of your Makefile will be the default rule.
The first rule of my Makefile is called =all=.

A rule as the following format:

#+begin_src makefile
target: file1 file2
	cmd --input file1 file2 \
		--output target
#+end_src

if =target= does not exists, then =make= will look at its dependencies.
If any of its dependencies need to be updated, it will run all the rules in
the correct order to rebuild them and finally run the script to build
=target=.
A file needs to be updated if one of its dependency needs to be updated or is
newer.

The usual use case of =make= is about building a single binary out of many
source files.
But for a static website, we need to generate a lot of files from a lot of
files.
So we construct the rules like this:

#+begin_src makefile
all: site

# build a list of files that will need to be build
DST_FILES := ....
# RULES TO GENERATE DST_FILES
ALL += $(DST_FILES)

# another list of files
DST_FILES_2 := ....
# RULES TO GENERATE DST_FILES_2
ALL += $(DST_FILES_2)

site: $(ALL)
#+end_src

In my =Makefile= I have many similar block with the same pattern.

1. I retrieve a list of source files
2. I construct the list of destination files (change the directory, the extension)
3. I declare a rule to construct these destination files
4. I add the destination files to the =ALL= variable.

I have a block for:
- raw assets I just want copied
- images I would like to compress for the web
- =html= I would like to generate from org mode files via pandoc
- =gmi= I would like to generate from org mode files
- =xml= files I use as cache to build different index files
- =index.html= file containing a list of my posts
- =rss.xml= file containing a list of my posts
- =gemini-atom.xml= file containing a list of my posts

** Assets
:PROPERTIES:
:CUSTOM_ID: assets
:END:

The rules to copy assets will be a good first example.

1. find all assets in =src/= directory
2. generate all assets from these files in =_site/= directory
3. make this rule a dependency on the =all= rule.


#+begin_src makefile
SRC_ASSETS := $(shell find src -type f)
DST_ASSETS := $(patsubst src/%,_site/%,$(SRC_ASSETS))
_site/% : src/%
	@mkdir -p "$(dir $@)"
	cp "$<" "$@"
.PHONY: assets
assets: $(DST_ASSETS)
ALL += assets
#+end_src

OK, this looks terrible.
But mainly:

- ~SRC_ASSETS~ will contain the result of the command ~find~.
- ~DST_ASSETS~ will contain the files of ~SRC_ASSETS~  but we replace the
  =src/= by =_site/=.
- We create a generic rule; for all files matching the following pattern
  =_site/%=, look for the file =src/%= and if it is newer (in our case)
  then execute the following commands:
  - create the directory to put =_site/%= in
  - copy the file

About the line ~@mkdir -p "$(dir $@)"~:
- the =@= at the start of the command simply means that we make this execution silent.
- The =$@= is replaced by the target string.
- And =$(dir $@)= will generate the folder name of =$@=.

For the line with ~cp~, you just need to know that =~$<~= will represent the
first dependency.

My Makefile is composed of similar blocks, where I replace the first
find command to match specific files and where I use different building rules.
An important point is that the rules must be the most specific possible.
This is because =make= will use the most specific rule in case of ambiguity.
For example, the matching rule =_site/%: src/%= will match all files in
the =src/= dir.
But if we want to treat =CSS= files with another rule we could write:

#+begin_src makefile
_site/%.css: src/%.css
	minify "$<" "$@"
#+end_src

And if the selected file is a =CSS= file, this rule will be selected.

** Prelude
:PROPERTIES:
:CUSTOM_ID: prelude
:END:

I start with variables declarations:

#+begin_src makefile
all: site
# directory containing the source files
SRC_DIR ?= src
# directory that will contain the site files
DST_DIR ?= _site
# a directory that will contain a cache to speedup indexing
CACHE_DIR ?= .cache

# options to pass to find to prevent matching files in the src/drafts
# directory
NO_DRAFT := -not -path '$(SRC_DIR)/drafts/*'
# option to pass to find to not match  org files
NO_SRC_FILE := ! -name '*.org'
#+end_src

** CSS
:PROPERTIES:
:CUSTOM_ID: css
:END:

Here we go; the same simple pattern for CSS files.

#+begin_src makefile
# CSS
SRC_CSS_FILES := $(shell find $(SRC_DIR) -type f -name '*.css')
DST_CSS_FILES := $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%,$(SRC_RAW_FILES))
$(DST_DIR)/%.css : $(SRC_DIR)/%.css
	@mkdir -p "$(dir $@)"
	minify "$<" > "$@"
.PHONY: css
css: $(DST_CSS_FILES)
ALL += css
#+end_src

This is very similar to the block for raw assets.
The difference is just that instead of using =cp= we use the =minify=
command.

** ORG → HTML
:PROPERTIES:
:CUSTOM_ID: org----html
:END:

Now this one is more complex but is still follow the same pattern.

#+begin_src makefile
# ORG -> HTML
EXT ?= .org
SRC_PANDOC_FILES ?= $(shell find $(SRC_DIR) -type f -name "*$(EXT)" $(NO_DRAFT))
DST_PANDOC_FILES ?= $(patsubst %$(EXT),%.html, \
                        $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, \
                            $(SRC_PANDOC_FILES)))
PANDOC_TEMPLATE ?= templates/post.html
MK_HTML := engine/mk-html.sh
PANDOC := $(MK_HTML) $(PANDOC_TEMPLATE)
$(DST_DIR)/%.html: $(SRC_DIR)/%.org $(PANDOC_TEMPLATE) $(MK_HTML)
	@mkdir -p "$(dir $@)"
	$(PANDOC) "$<" "$@.tmp"
	minify --mime text/html "$@.tmp" > "$@"
	@rm "$@.tmp"
.PHONY: html
html: $(DST_PANDOC_FILES)
ALL += html
#+end_src

So to construct =DST_PANDOC_FILES= this time we also need to change the
extension of the file from =org= to =html=.
We need to provide a template that will be passed to pandoc.

And of course, as if we change the template file we would like to
regenerate all HTML files we put the template as a dependency.
But importantly *not* at the first place. Because we use =$<= that will be
the first dependency.

I also have a short script instead of directly using =pandoc=.
It is easier to handle =toc= using the metadatas in the file.
And if someday I want to put the template in the metas, this will be the
right place to put that.

The =mk-html.sh= is quite straightforward:

#+begin_src bash
#!/usr/bin/env bash
set -eu

# put me at the top level of my project (like Makefile)
cd "$(git rev-parse --show-toplevel)" || exit 1
template="$1"
orgfile="$2"
htmlfile="$3"

# check if there is the #+OPTIONS: toc:t
tocoption=""
if grep -ie '^#+options:' "$orgfile" | grep 'toc:t'>/dev/null; then
    tocoption="--toc"
fi

set -x
pandoc $tocoption \
       --template="$template" \
       --mathml \
       --from org \
       --to html5 \
       --standalone \
       $orgfile \
       --output "$htmlfile"
#+end_src

Once generated I also minify the html file.
And, that's it.
But the important part is that now, if I change my script or the template
or the file, it will generate the dependencies.
** Indexes
:PROPERTIES:
:CUSTOM_ID: indexes
:END:

We often need indexes to build a website.
Typically to list the latest articles, build the RSS file.
So for sake of simplicity, I decided to build my index as a set of XML files.
Of course, this could be optimizide, by using SQLite for example.
But this will already be really fast.

For every generated html file I will generate a clean XML file with
=hxclean=.
Once cleaned, it will be easy to access a specific node of in these XML files.

#+begin_src makefile
# INDEXES
SRC_POSTS_DIR ?= $(SRC_DIR)/posts
DST_POSTS_DIR ?= $(DST_DIR)/posts
SRC_POSTS_FILES ?= $(shell find $(SRC_POSTS_DIR) -type f -name "*$(EXT)")
RSS_CACHE_DIR ?= $(CACHE_DIR)/rss
DST_XML_FILES ?= $(patsubst %.org,%.xml, \
                        $(patsubst $(SRC_POSTS_DIR)/%,$(RSS_CACHE_DIR)/%, \
                            $(SRC_POSTS_FILES)))
$(RSS_CACHE_DIR)/%.xml: $(DST_POSTS_DIR)/%.html
	@mkdir -p "$(dir $@)"
	hxclean "$<" > "$@"
.PHONY: indexcache
indexcache: $(DST_XML_FILES)
ALL += indexcache
#+end_src

This rule will generate for every file in =site/posts/*.html= a corresponding
=xml= file (=hxclean= takes an HTML an try its best to make an XML out of it).

** HTML Index
:PROPERTIES:
:CUSTOM_ID: html-index
:END:

Now we just want to generate the main =index.html= page at the root of
the site.
This page should list all articles by date in reverse order.

The first step is to take advantage of the cache index.
For every XML file I generated before I should generate the small HTML
block I want for every entry.
For this I use a script =mk-index-entry.sh=.
He will use =hxselect= to retrieve the date and the title from the cached
XML files.
Then generate a small file just containing the date and the link.

Here is the block in the Makefile:

#+begin_src makefile
DST_INDEX_FILES ?= $(patsubst %.xml,%.index, $(DST_XML_FILES))
MK_INDEX_ENTRY := ./engine/mk-index-entry.sh
INDEX_CACHE_DIR ?= $(CACHE_DIR)/rss
$(INDEX_CACHE_DIR)/%.index: $(INDEX_CACHE_DIR)/%.xml $(MK_INDEX_ENTRY)
	@mkdir -p $(INDEX_CACHE_DIR)
	$(MK_INDEX_ENTRY) "$<" "$@"
#+end_src

It means: for every =.xml= file generate a =.index= file with
=mk-index-entry.sh=.

#+begin_src sh
#!/usr/bin/env zsh

# prelude
cd "$(git rev-parse --show-toplevel)" || exit 1
xfic="$1"
dst="$2"
indexdir=".cache/rss"

# HTML Accessors (similar to CSS accessors)
dateaccessor='.yyydate'
# title and keyword shouldn't be changed
titleaccessor='title'
finddate(){ < $1 hxselect -c $dateaccessor | sed 's/\[//g;s/\]//g;s/ .*$//' }
findtitle(){ < $1 hxselect -c $titleaccessor }

autoload -U colors && colors

blogfile="$(echo "$xfic"|sed 's#.xml$#.html#;s#^'$indexdir'/#posts/#')"
printf "%-30s" $blogfile
d=$(finddate $xfic)
echo -n " [$d]"
rssdate=$(formatdate $d)
title=$(findtitle $xfic)
keywords=( $(findkeywords $xfic) )
printf ": %-55s" "$title ($keywords)"
{ printf "\\n<li>"
  printf "\\n<span class=\"pubDate\">%s</span>" "$d"
  printf "\\n<a href=\"%s\">%s</a>" "${blogfile}" "$title"
  printf "\\n</li>\\n\\n"
} >> ${dst}

echo " [${fg[green]}OK${reset_color}]"
#+end_src

Then I use these intermediate files to generate a single bigger index file.

#+begin_src makefile
HTML_INDEX := $(DST_DIR)/index.html
MKINDEX := engine/mk-index.sh
INDEX_TEMPLATE ?= templates/index.html
$(HTML_INDEX): $(DST_INDEX_FILES) $(MKINDEX) $(INDEX_TEMPLATE)
	@mkdir -p $(DST_DIR)
	$(MKINDEX)
.PHONY: index
index: $(HTML_INDEX)
ALL += index
#+end_src

This script is a big one, but it is not that complex.
For every file, I generate a new file =DATE-dirname=.
I sort them in reverse order and put their content in the middle of an HTML
file.

Important note: this file updates only if the index change.

The first part of the script creates files with the creation date in their
metadatas.
The created file name will contain the creation date, this will be helpful
later.

#+begin_src sh
#!/usr/bin/env zsh

autoload -U colors && colors
cd "$(git rev-parse --show-toplevel)" || exit 1
# Directory
webdir="_site"
indexfile="$webdir/index.html"
indexdir=".cache/rss"
tmpdir=$(mktemp -d)

echo "Publishing"

dateaccessor='.pubDate'
finddate(){ < $1 hxselect -c $dateaccessor }
# generate files with <DATE>-<FILENAME>.index
for fic in $indexdir/**/*.index; do
    d=$(finddate $fic)
    echo "${${fic:h}:t} [$d]"
    cp $fic $tmpdir/$d-${${fic:h}:t}.index
done
#+end_src

Then I use these files to generate a file that will contain the =body= of
the HTML.

#+begin_src sh
# for every post in reverse order
# generate the body (there is some logic to group by year)
previousyear=""
for fic in $(ls $tmpdir/*.index | sort -r); do
    d=$(finddate $fic)
    year=$( echo "$d" | perl -pe 's#(\d{4})-.*#$1#')
    if (( year != previousyear )); then
        if (( previousyear > 0 )); then
            echo "</ul>" >> $tmpdir/index
        fi
        previousyear=$year
        echo "<h3 name=\"${year}\" >${year}</h3><ul>" >> $tmpdir/index
    fi
    cat $fic >> $tmpdir/index
done
echo "</ul>" >> $tmpdir/index
#+end_src

And finally, I render the HTML using a template within a shell script:

#+begin_src sh
title="Y"
description="Most recent articles"
author="Yann Esposito"
body=$(< $tmpdir/index)
date=$(LC_TIME=en_US date +'%Y-%m-%d')

# A neat trick to use pandoc template within a shell script
# the pandoc templates use $x$ format, we replace it by just $x
# to be used with envsubst
template=$(< templates/index.html | \
    sed 's/\$\(header-includes\|table-of-content\)\$//' | \
    sed 's/\$if.*\$//' | \
    perl -pe 's#(\$[^\$]*)\$#$1#g' )
{
    export title
    export author
    export description
    export date
    export body
    echo ${template} | envsubst
} > "$indexfile"

rm -rf $tmpdir
echo "* HTML INDEX [done]"
#+end_src

** RSS
:PROPERTIES:
:CUSTOM_ID: rss
:END:

My RSS generation is similar to the system I used to generate the index
file.
I just slightly improved the rules.

The =Makefile= blocks look like:

#+begin_src makefile
# RSS
DST_RSS_FILES ?= $(patsubst %.xml,%.rss, $(DST_XML_FILES))
MK_RSS_ENTRY := ./engine/mk-rss-entry.sh
$(RSS_CACHE_DIR)/%.rss: $(RSS_CACHE_DIR)/%.xml $(MK_RSS_ENTRY)
	@mkdir -p $(RSS_CACHE_DIR)
	$(MK_RSS_ENTRY) "$<" "$@"

RSS := $(DST_DIR)/rss.xml
MKRSS := engine/mkrss.sh
$(RSS): $(DST_RSS_FILES) $(MKRSS)
	$(MKRSS)

.PHONY: rss
rss: $(RSS)
ALL += rss
#+end_src
** Gemini
:PROPERTIES:
:CUSTOM_ID: gemini
:END:

I wrote a minimal script to transform my org files to gemini files.
I also need to generate an index and an atom file for gemini:

#+begin_src makefile
# ORG -> GEMINI
EXT := .org
SRC_GMI_FILES ?= $(shell find $(SRC_DIR) -type f -name "*$(EXT)" $(NO_DRAFT))
DST_GMI_FILES ?= $(subst $(EXT),.gmi, \
                        $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, \
                            $(SRC_GMI_FILES)))
GMI := engine/org2gemini.sh
$(DST_DIR)/%.gmi: $(SRC_DIR)/%.org $(GMI) engine/org2gemini_step1.sh
	@mkdir -p $(dir $@)
	$(GMI) "$<" "$@"
ALL += $(DST_GMI_FILES)
.PHONY: gmi
gmi: $(DST_GMI_FILES)

# GEMINI INDEX
GMI_INDEX := $(DST_DIR)/index.gmi
MK_GMI_INDEX := engine/mk-gemini-index.sh
$(GMI_INDEX): $(DST_GMI_FILES) $(MK_GMI_INDEX)
	@mkdir -p $(DST_DIR)
	$(MK_GMI_INDEX)
ALL += $(GMI_INDEX)
.PHONY: gmi-index
gmi-index: $(GMI_INDEX)

# RSS
GEM_ATOM := $(DST_DIR)/gem-atom.xml
MK_GEMINI_ATOM := engine/mk-gemini-atom.sh
$(GEM_ATOM): $(DST_GMI_FILES) $(MK_GEMINI_ATOM)
	$(MK_GEMINI_ATOM)
ALL += $(GEM_ATOM)
.PHONY: gmi-atom
gmi-atom: $(GMI_ATOM)

.PHONY: gemini
gemini: $(DST_GMI_FILES) $(GMI_INDEX) $(GEM_ATOM)
#+end_src
** Images
:PROPERTIES:
:CUSTOM_ID: images
:END:

For images, I try to compress them all with imagemagick.

#+begin_src makefile
# Images
SRC_IMG_FILES ?= $(shell find $(SRC_DIR) -type f -name "*.jpg" -or -name "*.jpeg" -or -name "*.gif" -or -name "*.png")
DST_IMG_FILES ?= $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, $(SRC_IMG_FILES))

$(DST_DIR)/%.jpg: $(SRC_DIR)/%.jpg
	@mkdir -p $(dir $@)
	convert "$<" -quality 50 -resize 800x800\> "$@"

$(DST_DIR)/%.jpg: $(SRC_DIR)/%.jpeg
	@mkdir -p $(dir $@)
	convert "$<" -quality 50 -resize 800x800\> "$@"

$(DST_DIR)/%.gif: $(SRC_DIR)/%.gif
	@mkdir -p $(dir $@)
	convert "$<" -quality 50 -resize 800x800\> "$@"

$(DST_DIR)/%.png: $(SRC_DIR)/%.png
	@mkdir -p $(dir $@)
	convert "$<" -quality 50 -resize 800x800\> "$@"

.PHONY: img
img: $(DST_IMG_FILES)
ALL += $(DST_IMG_FILES)
#+end_src
** Deploy
:PROPERTIES:
:CUSTOM_ID: deploy
:END:

A nice bonus is that I also deploy my website using make.

#+begin_src makefile
# DEPLOY
.PHONY: site
site: $(ALL)

.PHONY: deploy
deploy: $(ALL)
	engine/sync.sh

.PHONY: clean
clean:
	-[ ! -z "$(DST_DIR)" ] && rm -rf $(DST_DIR)/*
	-[ ! -z "$(CACHE_DIR)" ] && rm -rf $(CACHE_DIR)/*
#+end_src