Self-hosted tools for web development

Make your life easier as a developer by self hosting some services. I will teach you how to use Thumbor, Huginn, OpenFAAS, and more.


Mon May 18 2020 - 13 min read
self-hosting

Having some self-hosted services and tools can make your life as a developer, and your life in general, much easier. I will share some of my favorites in this post. I use these for just about every project I make and they really make my life easier.

All of this, except OpenFaaS, is hosted on a single VPS with 2 CPU cores, 8GB of RAM and 80GB SSD with plenty of capacity to spare.

If you need a cheap and solid server then feel free to use my referral code to get €20 in credit at Hetzner. Read my review for more information. I use the CX31 model for all of this but you can probably get away with something cheaper.

Huginn

Huginn is an application for building automated agents. You can think of it like a self hosted version of Zapier. To understand Huginn you have to understand two concepts: Agens and Events. An Agent is a thing that will do something. Some Agents will scrape a website while others post a message to Slack. The second concept is an Event. Agents emit Events and Agents can also receive Events.

As an example you can have a Huginn agent check the local weather, then pass that along as an event to another agent which checks if it is going to rain. If it is going to rain the rain checker agent will pass the event along, otherwise it will be discarded. A third agent will receive an event from the second agent and then it will send a text message to your phone telling you that it is going to rain.

This is barely scratching the surface of what Huginn can do though. It has agents for everything: Sending email, posting to slack, IoT support with MQTT, website APIs, scrapers, and much more. You can have agents which receive inputs from custom web hooks and cron-like agents which schedules other agents and so on.

The Huginn interface

Huginn is a Ruby on Rails application and can be hosted in Docker. I host mine on Dokku. I use it for so many things and it is truly the base of all my automation needs. Highly recommended! If you are looking for alternatives then you can take a look at Node-RED and Beehive. I don't have personal experience with either though.

Huginn uses about 350MB of RAM on my server, including the database and the background workers.

Thumbor

Thumbor is a self-hosted image proxy like Imgix. It can do all sorts of things with a single image URL. Some examples:

Simple caching proxy

Take the URL and put your Thumbor URL in front like so:
https://thumbs.mskog.com/https://images.pexels.com/photos/4048182/pexels-photo-4048182.jpeg

Simple enough. Now you have a version of the image hosted on your proxy. This is handy for example when you don't want to hammer the origin servers with requests when linking to the image.

Resizing

That image is much too large. Lets make it smaller! https://thumbs.mskog.com/800x600/https://images.pexels.com/photos/4048182/pexels-photo-4048182.jpeg

Much smaller. Note that all we had to do is add the desired format.

Resizing to specific height or width

What about a specific width while keeping the aspect ratio? No problem!
https://thumbs.mskog.com/300x/https://images.pexels.com/photos/4048182/pexels-photo-4048182.jpeg

Quality

Smaller file size?
https://thumbs.mskog.com/1920x/filters:quality(10)/https://images.pexels.com/photos/4048182/pexels-photo-4048182.jpeg

You get the idea! Thumbor also has a bunch of other filters like making the image black and white, changing the format and so on. It is very versatile and is useful in more scenarios then I can count. I use it for all my images in all my applications. Thumbor also has client libraries for a lot of languages such as Node.

Thumbor is a Python application and is most easily hosted using Docker. There are a number of great projects on Github that have Docker compose setups for Docker. I use this one. It comes with a built-in Nginx proxy for caching. All the images will be served through an Nginx cache, both on disk and in memory by default. This means that only the first request for an image will hit Thumbor itself. Any requests after that will only hit the Nginx cache and will thus be very fast.

To make it even faster you can deploy a CDN in front of your Thumbor server. If your site is on Cloudflare you can use theirs for free. Just keep in mind that Cloudflare will not be happy if you just use their CDN to cache a very large number of big images. You can of course use any other CDN like Cloudfront. My entire Thumbor stack takes up about 200MB of RAM.

In conclusion I think that Thumbor is a vital part of my self hosted stack and I use it every single time I need to show images on any website or app. Once you have this working properly you never have to worry about image formatting ever again since the Thumbor is always there.

Hosted alternatives to Thumbor: Imgix, Cloudinary
Self hosted alternatives: Imaginary, Imageflow

Searx

Searx is a self hosted metasearch engine. It will strip any identifying headers and such from your searches and then it will use one or many search engines to run your query. It can search on for example Google, Bing and DuckDuckGo. What makes Searx great as a self hosted service is that it has a simple JSON api. Simply tell it to use JSON and your query will be returned as JSON. This will enable some pretty neat combinations, but more of that later. It can also search for images, music, news and more.

Searx in action

This is another killer service. The JSON formatting is what really sells it for me since it can be combined with other services in lots of different ways.

Searx is another Python app and is easily hosted through the use of the official Docker image. It uses about 230MB of RAM on my server.

InfluxDB + Telegraf + Grafana

InfluxDB is a time series database. It is built to receive time based event data from sensors, servers and so on. For example it is very good at things like storing CPU load data every 5 seconds. It also has built-in ways to makes sure it doesn't fill the disk with all this data and much more. There are client libraries for most languages as well as a very simple HTTP API for adding data. It goes very well together with Huginn where you can create agents to poll data from somewhere and then use the HTTP API in InfluxDB and a Post Agent to store it . There will be examples of this later on!

Telegraf is a service that will collect data about your server and send it to InfluxDB. It can also send the data to other databases and such but for this we use InfluxDB. It can collect just about any data about a server that you want, include statistics from Docker containers. It has a very simple out of the box configuration that you can tweak if you wish. I install it on all my machines to send data to InfluxDB, including my at-home NAS.

Grafana is a graphing, analytics, and monitoring tool. It can graph the data from many different sources including InfluxDB, AWS Cloudwatch and PostgreSQL. It also has alerting capabilities for Slack for example. It is a delight to use and you will quickly be able to create some very nice looking graphs of your data. You need to be careful though because I find that it is very addicting to graph all your things.

A Grafana dashboard for server monitoring

There are many Docker setups for this stack that you can find on Github, so hosting this is easy.

Grafana is truly delightful to work with and it is probably the slickest graphing tool I've ever used and it can easily be compared to commercial projects like Datadog.

Docker Registry

This is a simple one but oh so useful. It is good to have a place to store your own Docker images and this is what you need. You can use Docker Hub for this but private images cost about $7 a month. It is however very easy to host your own registry. A Docker registry is a requirement to be able to use OpenFaaS.

Ghost CMS

Ghost is an open source publishing and blogging platform. It was originally kind of a replacement for Wordpress but it has since grown to be something more. I use it as a headless CMS for this blog as well as other websites. It has a great GraphQL and REST API that you can use to pull your articles and pages out to use in a static site or show on another website. I have another article about how my blog works with this if you want to know more.

The Ghost editor while typing this

Ghost has a great editor that makes it very easy to include Twitter posts, images, Spotify links, and so on. It is also hosted on your server so you can write from anywhere. You don't have to deal with markdown files if you don't want to and I find it to be a delight to use and write in.

OpenFaaS

OpenFaaS is self-hosted functions-as-a-service aka serverless. I have another article about OpenFaaS so I won't go into too much detail here. You can use OpenFaaS to easily deploy functions in any programming language without having to setup a microservice. Also, I understand the irony of self-hosting a serverless setup, but it is a strange world we live in so just go with it.

It is very useful for a number of tools and combinations. I have a number of these functions and here are some examples:

Readability

Python function that uses the newspaper3k library to pull out metadata and the article content from any URL. I use it to render snippets from articles, prepare for sentiment analysis, and things like that.

Puppeteer renderer

Sometimes websites will not work at all without Javascript or they have systems in place to prevent scraping and interacting with the sites automatically. Rotten Tomatoes is such a site that will fight back against any automation attempts. Enter Puppeteer, the headless Chrome API. This function simply takes a URL, renders the page with Javascript and returns the resulting body. This is then ready for processing in a scraper for example using Huginn. There will be examples of how to use this later with Huginn later so stick around if you're interested in that.

OpenFaaS is hosted on its own server because that made sense to me. It is a tiny little thing though and doesn't use much resources at all.

Combinations

This is where we unlock the real magic of having all these things. You can combine these in clever ways to create something really neat. Here are some examples to get you started:

OpenFaaS+Searx = First image

This is a combination that I really like. Create a function in OpenFaaS in any language of your choice, I used Javascript, that will search for the given query in Searx, making sure to return the result as JSON. Then parse the results in the OpenFaaS function and return the URL for the first image result.

You now have a function that you can call with any query and it will return the first image result. This is useful in a number of different ways. You can for example search for bryan cranston site:wikipedia.org to get a good image of actor Bryan Cranston. Now you can use some cool Thumbor filters and such to process the image if you want!

Suggestions for improvements: Add more functionality to the OpenFaaS function. For example you can add probe-image-size to your function. You can now reject images which are too small for example.

OpenFaaS+Huginn+Trello = Movie recommendations

This is a simple one which adds movie recommendations to my Trello inbox daily.
Steps to create:
1. Add a Website Agent to Huginn. Use the URL for the Rotten Tomatoes front page and add the OpenFaaS Puppeteer function to render that URL with Javascript enabled. Scrape the section with new movies.
2. (Optional) Create a Trigger Agent in Huginn to select movies with a minimum score. Perhaps you want only the movies which have a score of 80 or better in your inbox.
3. Add a Post Agent to Huginn that will post the movie names to your Trello inbox using the Trello API.

Suggestions for improvements: Add another step that will also link to the IMDB page for the movie. You can use Searx for this. Simple search for the movie like so: `"fried green tomatoes" site:imdb.com`

RescueTime+Huginn+InfluxDB+Grafana = Productivity graph

RescueTime is automatic time tracking software. It keeps track of what you do on your computer and will tell you when you are being productive and when you are slacking off on Reddit. You can use a Web Site Agent in Huginn to access your productivity data on RescueTime. You can then use a Post Agent to add this data to InfluxDB. Finally you can graph it using Grafana. I use something similar to get data about our hot water bill and such. Once you have Huginn and InfluxDB you can graph almost anything.

Huginn+Slack = Notifications center

If you're like me and you have a lot of notifications then you might want to use Huginn to sort these out. Instead of interfacing directly with Slack or whatever notification system you use, you can instead use a Webhook Agent in Huginn to create an API endpoint. Post your notifications to this endpoint. You can then use for example a Slack Agent to post the notifications to Slack.

What is the point of this then? Well, you can very easily change to using something else than Slack for your notifications without changing it on every site that creates them. Perhaps you want to delay some notifications? You can do that with a Delay Agent in Huginn. Perhaps some notifications should go to Trello instead of Slack? No problem using Huginn. You can even use a Digest Agent to group low level notifications and send them all at once by email or something. Don't forget that you can also graph all of this using Grafana.

Conclusion

This is by no means an exhaustive list of things you can self host to make your life easier as a developer. Do you have any favorites that I've missed? Please reply in the comments below or hit me up on Twitter!

If you need a cheap and solid server then feel free to use my referral code to get €20 in credit at Hetzner. Read my review for more information. I use the CX31 model for all of this but you can probably get away with something cheaper.