Migrating from BlogEngine.NET to Ghost

Migrating from BlogEngine.NET to Ghost

Outdated and time to change

I've been running by Blog using BlogEngine for well over a decade. It is no longer supported and is suffering with various security issues and lack of support for modern consumption of content, like mobile. It was time to move on a long time ago, but I didn't like the idea of going to WordPress, that seemed the obvious choice. WordPress seems too cumbersome and heavy for a simple blog.

After much research, my personal preference was to move to Ghost, but self hosted to give me another machine in the cloud for other purposes too.

Tasks

  • Make all email to domain timwappat.info get delivered as that was handled by a free service with my previous provider
  • Find a hosting company
  • Install Ghost
  • Migrate posts, pages and images
  • Get Mastodon links working
  • Get old URLs to redirect to new URLs
  • CDN to make performance good around the globe
  • Google Analytics
  • DNS changes to support above as I already run and the domain name
  • Ensure SSL certificate is installed
  • Code syntax highlighting for c#, SQL, powershell etc

Guide

I stumbled on an amazing step by step blog post by (Meshael Rahman)[https://meshaelr.com/how-to-build-a-website-using-ghost-cloudflare-and-digital-ocean/], it guides you through setting up Cloudflare for CDN and Ghost with digital ocean, I highly recommend it.

CDN Cloudflare

Signed up for free account with Cloudflare, configured the DNS to point at my Digital Ocean Droplet IP address.
A CDN is something I've wanted to use for some time as my content is served over a wide area, both Europe and USA being the most volume, so giving a good experience to both sides of the ocean was appealing.

Hosting

Some friends and family have been saying good things about Digital Ocean recently. My luck it turns out that they provide Ghost as a "Droplet", that is a pre-configured package installer for their platform.
I signed up for an account

Followed guide on getting Ghost installed.
I selected to place my server in New York as it is in the middle of my geographic foot print for my content, that is very much 50/50 Europe/USA.
I followed the guide on selecting the droplet and installing it, together with the guide on the market place,
Ghost Droplet in Digital Ocean marketplace the cost of running a low specification instance was quite reasonable.

DNS

Configured my existing domain host to point my DNS at Cloudflare's nameservers so they could manage my traffic.

SSL

Meshael also showed how to get a lets encrypt certificate installed on there too.

Images

The site images were easy to move over, using filezilla and a SFTP connection to between old and new sites. I just lifted and dumped the image folder into /content/images folder on Ghost.

Exporting posts

BlogEngine has a good xml export available to export the content. Using this and some search and replace on the image names, as recommended in this blog post by Brian Peek Migrating BlogEngine.NET to Ghost. He shows how to export the content and manipulate it using a GitHub project.

  • Find /file.axd?file= and replace with /,
  • Find /image.axd?picture= and replace with /content/images/.

I also downloaded the BlogEngine2GhostConverter
to convert the format to the json used by Ghost for import. However I found in the documentation for Ghost that the version I'm using uses a new format for importing content.

The developer guide showed a new field in the JSON "mobiledoc" which is a container field for all the different block types you can create in a post in Ghost. I had to hack the project to generate this. I also had to hack it to use the post-url with a regular expression "./(.?)$" to extract the "slug" = url to use. This was important to preserve my urls from old to new.
I also corrected some of the other fields in the format while I was at it. After doing this I uploaded the JSON to Ghost, that created my posts.

Url rewriting

There is a under the admin the ability to define redirects. This can be static url redirects or use regular expression to pattern match. I downloaded the empty redirects.json and updated it with this to ignore the old month day format of the URLs and format them to the root folder.
So this deals with
http://mydomain.com/post/*
http://mydomain.com/page/*
Urls mapping them to the root.
There is also the issue that non-alpha numeric characters are not allowed in the slugs (urls) on Ghost. So old urls with ( or ) or " in them will not work by default. The url rewriting strips some of these out make the old url map correctly to the new one. I found it necessary to also remove any trailing dashes at the end of the new URLS in each post as these will mess up rewriting too.


  {
    "from": ".*post/[\\d]{4}/[\\d]{2}/[\\d]{2}/(.*)$",
    "to": "/$1/",
    "permanent": true
  },
  {
    "from": ".*page/[\\d]{4}/[\\d]{2}/[\\d]{2}/(.*)$",
    "to": "/$1/",
    "permanent": true
  },
  {
    "from": ".*page/(.{3,})$",
    "to": "/$1/",
    "permanent": true
  },
  {
    "from": "\\((.+?)\\)",
    "to": "$1",
    "permanent": true
  },
  {
    "from": "%28(.+?)%29",
    "to": "$1",
    "permanent": true
  }
]

This means anyone using old URLs will auto redirect to the new Ghost url.
File was uploaded. Above we have to be careful of ghost's own /page/... that is used for pagination of the posts (infinite scroll).

Google analytics

The Google Analytics tag needed upgrading anyhow, so I went in and generated the new script pasting it into the Site Header Code injection.
This was the easiest task.

Syntax Highlighting

There was a guide to using Prism Code syntax highlighting with Ghost.
How to set up and configure Prism.js with Ghost CMS
It involves putting more in site Code injection to get it running.

There was a forum post that led me to the document on how to add links at the top of the site for social media links, essentially by styling up menu options added via the settings>>navigation part of the site.
How to add social media icons to your site

Once this was done I had Mastodon and LinkedIn links on the site.

wait to brew

Then it was a case of letting DNS propagate around the web.

Tidy up

The about us page needed creating and moving from BlogEngine, the author details needed filling in and moving too.
The tags become more important and so an over haul of those is required to tag the content up appropriately.

Comments Disqus

My blog has been using Disqus for comment provider. I followed the guide on getting Disqus working by editing the theme file and replacing the handlebars for {{comments}} with custom code to inject Disqus.
Site settings>> Integrations>> Disqus
has a guide to doing this.
The Disqus site also needs to be fed a url mapping file to map the old urls to the new ones to migrate post comments from old to new.
Disqus + Ghost

Email support

My previous host threw in an email forwarder that simply sent all mail for my domain name to my gmail account. This is all I really need. There are a number of free services that let you do this on Digital Ocean. However one of the advantages of self hosting is having a machine in the cloud.
How To Install and Configure Postfix as a Send-Only SMTP Server on Ubuntu 20.04
Doing this runs a small daemon that will forward mail to my gmail, without another external service to set up and keep track of.
This Forwarding Email to Gmail with Digital Ocean and Ubuntu guide was the most simple to follow and works, as far as I can tell.
Once it is set up, go point the mx records of the DNS server at the same server as is hosting the ghost site.

Api

I also setup the administration API in ghost, that lets me via a C# Nuget package maintain the whole site. This was an awesome bonus, and gives me opportunity to do deal link checking, dead image finding etc. I used it to clear down all the posts before doing another import, as the built in reset functionality was not reliable during testing.

Embeded images in posts

There was an issue with some old posts that had embedded images in them, some old tooling had put them embedded Base64 encoded in the HTML itself. As there were a lot of large images in some of the posts this caused the Ghost droplet server memory to ballon. This rapid expansion was caused by listing available posts in the admin page or loading posts in the blog itself. The limited memory 1Gb, was exhausted quickly causing the site to freeze up.
Using the web browser to save each of these images to file manually and referencing the image files in the normal way as files allowed the site to function again.
An example of embedding and image directly using base 64 encoding looks like this:

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
  //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />

Conversion to Markdown format

The blog has been written using many tools over the years, the HTML of the posts is very bloated with excessive HTML markup and non-compliant HTML. This means some of the blog posts are very heavy in content compared to what they could be that possibly could add to load time, data transferred by the site and maybe corruption on some browsers if they donn't handle malformed markup well.
I chose Ghost as it supports Markdown as a authoring format. Markdown is a simple format used for editing posts on many online platforms. It provides a very basic set of formatting, that means you end up focused on the content of the post rather than the formatting of it. As a consequence when rendered to the browswer the contenet is very light too. This keep my bandwidth needs down, gives me good first page render times on Google, thus improving page ranks, easier to have good responsive design for mobile and tablets, all positives.
The problem is that all this old content was in sill stored in HTML and not Markup format. Luckily there is a NUGET package "ReverseMarkdown" that converts HTML in a best efforts fashion to Markdown. This package converts image tags, links, tables etc. Combining this with the API allowed me to programatically go through each post converting the HTML documents into Markdown. Once written the untility software I wrote did this in less than a minute.
For each post, after conversion, the new markdown was saved as a new section after the HTML section. The posts are held in MobileDoc format which is essentially a container with a number of sections of content type.
This then allowed me to manually open each post in browser, proof read, visually scan all the posts and correct places where the conversion did not work too well.
The areas of most trouble were:

  • Tables
  • Iframes - mostly YouTube videos embedded
  • Code snippets

The package did a good job on tables, and mostly it was just a case of removing extra line breaks to bring the tables to life.
Iframes were more tricky, the package allows an error to be raised on unsupported html tags, turning this on allowed for Iframes to be caught and convert them into native YouTube sections in the Mobiledoc format.
Code snippets were the worst to deal with. Mostly there needed the prism code syntax tags adding to surround the code snippet and identify what type of code it was for code syntax highlighting. However the formatting of the code was destroyed after all the table tags were removed from the post HTML. Tables had been used by previous code syntax tools from the old blog to format the code. With code snippets, they had to be copied and pasted from the rendered HTML of the original HTML post, into the markdown. This took quite a number of hours, but was nothing compared to how long it would have taken to do the conversion by hand or using an online converter.

Images folder

The API was used to go through each post extracting all the images into a list. This list was then used to difference against the contents of the image folder. Any images that were no longer used on the site were removed from the folder, a very large number of images had built up over time on in the images folder, so many were no longer, or had never been used in posts.