Moving from Blogger to Hugo

Oct 6, 2021 · 1849 words · 9 minute read

It’d be easy they said, move to a static site generator, up and running in no time at all. Well, “they” might have been right, but “they” hadn’t counted on my inability to leave anything not understood.

So here we are, a month and a half, an almost new theme and several ugly Bash scripts, later, looking at the updated site, which, thanks to my efforts, looks almost exactly like the old site.

Why Leave Blogger?

My site had been languishing on Blogger, and that was OK, but it had a number of annoyances:

  1. The WYSIWYG editor was very annoying, so I would write the post somewhere else first, and then transfer it, having to do things like carefully adjust spaces, and click around to add links. Also over the last couple of years I’ve become used to using Emacs Org-Mode (Markdown like) inline markup, which is much easier to transfer in plain text format.
  2. Most posts are neatened forms of notes I write for myself, but because everything on Blogger has to be done via the web interface, there was always a second final copy somewhere else, which wasn’t easy to back up. If I had a repository of plain text notes already, there must be an easier way re-using those as a source of posts.
  3. Every time I update a workflow I’ve been trying to move away from vendor lock-ins and propriety solutions.
  4. The cookie consent pop-up from the built Google Analytics felt unnecessary and invasive for what is a very simple website1.

For the above reasons I’d basically stopped transferring things to the web, and so removing that motivation to finalise and neaten my notes.

Before demonising Blogger too much, it was fine, and has a few notable good points: it’s free, you can use it with your own domain and no infrastructure or set-up problems. Plus you can customise the appearance easily with templates, and if you really need, fiddle with the HTML and CSS directly.

Why Hugo?

Based on the good and bad from Blogger, I wanted a solution that:

  1. Uses some kind of mark-up (ideally Org-Mode, but Markdown is fine) in plain text files as the source of content
  2. Can be written locally on a computer, and then easily synced to upload
  3. Free to host
  4. Can be used with a custom domain
  5. Customisable appearance
  6. Not too technically complicated to manage2

This quickly rules out the other major ‘integrated’ solutions (like Medium, WordPress) and leads to some kind of static site generator, of which there are legion , but the main contenders tend to be Hugo , Jekyll , Gatsby and Next.js .

Any of those is probably fine, but in the end I opted for Hugo because:

  • It seemed well established with reasonable documentation and community provided answers
  • Installing is simple (only one binary, no need to set up a whole JavaScript environment)
  • Multiple free hosting options
  • Lots of free themes
  • It supports Org-Mode as a content format
  • Others have already written Blogger migration scripts

Export from Blogger and Convert to Markdown

The first step is getting your old content out of Blogger, which you can do from your site’s Blogger Dashboard via Settings -> Manage Blog - Back up content, which leaves you with a XML file of your posts, pages and comments (which I never used).

The next step was to use blog2md kindly provided by Palani Raja . Feed the XML file into it, and you’ll get all your posts as Markdown files, with the front matter etc. already created from the Blogger metadata.

At this point you get test your site, put all the markdown files into the Hugo site’s ‘content/posts’ directory, install a theme and run hugo -D server.

And that works…but there’s a few things that aren’t quite right yet.

Page Bundles

The first is that Hugo now prefers, and seems to recommend Page Bundles instead of a monolithic folder with all your posts, and another with all your images. It’s a more convenient way of keeping everything related to one topic/post in the same folder. That makes sense to me, and I’m happy to go along with it as it makes the entries more portable, even if it means that all the post files end up being called “index.md”, and identifying the content relies on the name of the surrounding folder.

The above ‘blog2md’ script seems to pre-date this Page Bundle move, so we need a little bit of ugly Bash script to do that for us.

(Note: all this was done on MacOS High Sierra, using Homebrew to download any of the programs, such as wget, that weren’t installed. So in some cases the utilities are BSD flavoured, such as sed, than probably more expected GNU/Linux variant.)

#!/bin/bash
# Take a directory of .md files, and move each
# into it's own folder of the same name,
# then rename the file index.md
# md_to_page_bundle.sh in_dir out_dir
# Doesn't work with spaces in the folder names!

for f in $1/*.md;
do
    filename=$(basename -- "$f")
    postname="${filename%.*}"
    newdir="$2/$postname"
    mkdir "$newdir"
    cp "$f" "$newdir/index.md"
done

(As is always the case with Bash, I’m sure there’s a much better way of doing things, which probably fits on only one line and works with spaces, plus catches a million edges cases.)

Recovering Images

The next step is to download the pictures. The above conversion wrote only the links to the pictures, as currently hosted on Blogger, into the files in Markdown format. The pictures are still on the Blogger servers.

Here my script skills failed me, and while I could extract the image file URL, and download it into the folder using wget, all my attempts to then replace the original image link with the local file name failed in a flurry of escapes using sed or even briefly awk. In the long URLs there were somehow too many special characters and I couldn’t get Bash to ignore all of them successfully and just take the string literals. For what it’s worth, below is the script to download the hi-resolution images linked in the converted markdown files into the local folder.

#!/bin/bash
#
# 1. Pass markdown file with link generated by blogger-to-hugo
# exporter then extract the image link
# 2. download the image,
# 3. NOT WORKING: and update the .md to point to it locally.

fullfilepath=$(realpath "$1")
workingdir=$(dirname "$fullfilepath")
echo "$workingdir" # no trailing slash!
# This gets the raw links from within the file
# the second link is the one with the high-res image
linkmatches=($(grep -oEi '^(\[\!\[\]\(http.*\))' "$fullfilepath"))
# outer set of brackets above turn the result into a bash array.

echo "${#linkmatches[@]} items"

if [ ${#linkmatches[@]} -eq 0 ]; then
    echo "No links found in $1"
else
    for linkmatch in "${linkmatches[@]}"
    do
	# Extract the second link
	hiresimagelink=$(echo "$linkmatch" | grep -oEi '\)\]\(http.*[^\)]' | grep -oEi 'http.*')
	echo "$hiresimagelink"
	# make sure of no spaces and use underscore as preferred separator. 
	imagename=$(basename "$hiresimagelink" | sed -e 's/[-+[:space:]]/_/g')
	echo "$imagename"
	# download as the new imagename
	wget "$hiresimagelink" -O "$workingdir/$imagename"
	# Replace the old long link with the new one:
	# NOT WORKING
	# sed -i '' -e 's|"$linkmatch"|"$newlink"|'  "$fullfilepath"
    done
    
fi

Also many, but not all, of the converted Markdown files contained semi-randomly distributed “no-break space” characters, that appear as " " rendered, and similar to an underscore in my text editor (codepoint 160, #o240, #xa0). These then added extra spaces throughout various sentences and in some cases had to be completely removed, or replaced with a normal space.

In the end I updated the links by hand. I don’t have that many posts, and it allowed me to remove the non-break spaces scattered through the text, and reformat some of the unexpected newlines that had also appeared.

With that, the content now sits nicely in text files on my hard drive; easy to back up and edit however I choose.

Resize images

This step is entirely optional, but I don’t need the full resolution images for my blog. Therefore to save on space and bandwidth, especially when the hosting clones the git repository of the site, I re-sized all the images using the following script.

#!/bin/bash

# $1 is filepath
# $2 is max pixel width

files=`find "${1}" \( -iname '*.png' -o -iname '*.jpg' \)`

for imagefile in ${files} ; do
    pix=$(sips -g pixelWidth ${imagefile} | awk '/pixelWidth:/{print $2}')
    echo "${imagefile} ${pix}"
    if [ ${pix} -gt $2 ]
    then
	sips -Z $2 "${imagefile}"
    fi
done

Theming & Configuration

The best and simplest solution is just to choose one of the many nice themes available and use that. What you shouldn’t do is start with one and tweak it so much that by the end you’ve re-written about half of it.

Theming and configuring a Hugo site is a post by itself , but in most cases the defaults provided in the theme are ok to be up and running.

One item that is worth changing for consistency in your site’s config.toml file is:

[permalinks]
posts = "/:year/:month/:title"

The Blog2md script will have kindly added a custom URL to the front matter of each post, meaning the post will keep the same URL as when it was previously hosted on Blogger. This is good for meaning that old links to your site don’t die. But future posts won’t automatically follow this “/year/month/title” pattern unless you set the above. This way old and new posts are consistently linked, and if you’re a little obsessive like me, that’s important.

Hosting

There are a number of hosting options, and in the end I went for Render , who offer static sites as part of their free offering. Perhaps that will change in the future, but if so, the site is now easily ported to another host, or since the content is in markdown, also easily re-used with a different static site generator, should Hugo become unusable somehow.

If you’re intending to keep your site as a git repository anyway, this is very simple. Just follow the instructions to link the site to your repository wherever you keep the remote version (e.g. GitHub, GitLab). Then every time you add a commit to the main branch, it will re-build the site for you and publish it. Additionally you can use your own domain should you have it.

That’s it, the site is moved.


  1. Minor rant warning: Anyone who blames the GDPR or similar privacy protecting legislation for this wave of annoying whack-a-mole consent banner pop-ups is wrong. No-one providing a website is forced to use such invasive personalised tracking technology. It’s the website’s choice to do so. The only thing is now they have to tell you that they’re doing so, but instead of taking on board the purpose of the legistlation: more privacy for users on the internet, it’s often twisted as “Politicians made every website annoying”. No they didn’t, you did. ↩︎

  2. This is extreamly subjective, and as we’ll see later, this isn’t entirely trivial. If I could achive this with Git, Bash, HTML and CSS then that’s fine; but managing my own websever would be a step too far. ↩︎