BlogOnLemmy - I made my Blog using Lemmy's API (martijn.sh)

submitted 3 months ago* (last edited 3 months ago) by kernelle@0d.gs to c/self@0d.gs

0 comments fedilink

Lemmydocs 7:4 – Thou shall create a blog

Features

Linked to a user using Lemmy’s API, no authentication
Host content on any instance
Category filters: Set one or more community as the categories
Easy to adapt to your profile
One page constraint
Anchor navigation and permalinks
Responsive
Dark / Light mode
No cookies or tracking
Interactive “about me”
No backend: serving a single lightweight page that can be hosted anywhere, including GitHub
HTML, CSS and ES6 JavaScript. That's it.

TODO

Possible compatibility issues with older iOS devices. Let me know if you encounter an issue! I'll be cleaning up the code in the meantime.
The only class not written by me is the markdown-html translation layer for which I'm using snarkdown. It does so using regex queries. As to not completely re-invent the wheel I've forked it for this purpose, but I'd like to write one myself.

GitHub | ./Martijn.sh > Blog

2

1

Welcome to ø Lemmy Zero D ø, a self-hosted Lemmy instance! (0d.gs)

submitted 2 years ago* (last edited 4 months ago) by kernelle@0d.gs to c/self@0d.gs

0 comments fedilink

I'm running this Lemmy instance because I believe in the importance of having a decentralised and open social media platform. The ability to spin up your own server and contribute to a space where people can come together to discuss and share ideas without being bound to one platform or userbase is what has proven to be the most robust and progressive way of creating a digital agora.

What is Lemmy?

Lemmy is a federated alternative to Reddit, which means that it is made up of a network of independent servers that communicate with each other. This makes Lemmy more resistant to censorship, improves transparency and gives users more control over their data. Here you can find a list of all active and federated instances sorted by popularity.

The Web was designed from the ground up to be decentralised and open to all. These values are what made the internet and are being spat on by companies trying to seize control over the content and its users.

Lemmy is the internet’s response to an ever increasing centralisation of social media. Where every regular social media is owned by a single entity, Lemmy has no owner and consists of hundreds of intertwined instances.

This concept also connects Lemmy with many other forms of social media which are collectively referred to as the Fediverse.

Federated?

~ Anyone can host a server
~ Servers host instances
~ Instances host Communities
~ Communities host wonderful people
~ Instances can communicate freely between each other
This is federation

What makes Lemmy and any federated platform so interesting is the ActivityPub protocol. This allows Lemmy, which is a content aggregator social media, to communicate (or Federate) with other types of social media, such as Mastodon (a twitter microblog style) and PeerTube (video hosting). Meaning any instance of the Fediverse can independently read each others content, without the necessity of having to use different apps and/or accounts.

To get started, you can start browsing communities! Find any Lemmy instance to create an account and Subscribe to Communities you find interesting or even create your own and start sharing content.

Why 0d.gs?

Short URL's have always intrigued me, and have owned quite a few. With ICANN explicitly prohibiting them, it's tough to find TLD's not only allowing but also having them available. 0D also has a significant meaning within my main interest field.

I hope you enjoy your time on ø Lemmy Zero D ø

Banner: David Simonds, The Economist, 19 March 2008

3

2

Delivering BlogOnLemmy worldwide in record speeds (martijn.sh)

submitted 1 month ago* (last edited 1 month ago) by kernelle@0d.gs to c/self@0d.gs

0 comments fedilink

TLDR: Testing some CDNs to reveal Vercel and GitHub Pages as the fastest/most reliable for static solutions and Cloudflare for a self-hosted origin.

The Problem

In my previous post, I achieved loadtimes in Europe of under 100ms. But getting those speeds worldwide is a geographical hurdle. My preferred hosting location has always been London because of its proximity to the intercontinental submarine fiber optic network providing some of the best connectivity worldwide.

Heatmap of latency

Azimuthal Projection: Measuring the latency of 574k servers around the world from my Lemmy server in London

But it's a single server, in a single location. From the heatmap we can see the significant fall-off in response times past the 2500km line, such that a visitor from Tokyo has to wait around 8 times longer than their European counterparts.

Free Web

The answer is obvious, a Content Delivery Network or CDN distributes a website in a decentralised fashion, always delivering from the closest server to the user, drastically reducing loadtimes.

I could be renting servers on every continent and make my own CDN, with blackjack and .. never mind! Instead, I'll be using existing and free infrastructure. While the internet is forever changing to accommodate stock markets, many companies, organisations, and individuals still offer professional grade services for free. We cannot understate the value of having such accessible and professional tools ready for experimenting, exploring, and learning with no barrier of entry.

These constraints do not provide an answer to how good CDNs are, but rather how much resources they allocate to their free tiers.

Pull vs Push CDN

Pull CDN

A Pull CDN has to request the data from an origin server, whereas a Push CDN pre-emptively distributes code from a repository worldwide.

For our static one-pager a Push CDN is a dream, because they deploy your repository to distribute it worldwide in an instant and keep it there. Pull CDNs also store a cached version but needs our origin server just as frequently. The first visitor in a particular area might be significantly slower if a closer server has not yet cached from the origin. This doesn't mean Push CDNs can't be used for complex websites with extensive back-ends, but it adds complexity, optimization, and cost to the project.

Push CDN

Map: Measuring round trip time to my site when using a push CDN

Edge vs Regional vs Origin

CDNs cache in different ways, at different times, using different methods. But all describe Edge Nodes - literally the edge of the network for the fastest delivery, and Regional Nodes when Edge Nodes or "Points of Presences" need to be updated.

Origin Nodes are usually only used with Pull CDNs so the network knows what content to serve when no cache is available, so an Edge Node asks the Regional Node what the Origin has to serve. Unfortunately, that means a CDN without a minimum amount of resources will be slower than not using one at all.

Where cache is stored with Push CDNs also depends on the provider but often use a repository that automatically updates the entire network with a new version. Meaning they can cache much more aggressively and efficiently across the network, resulting in faster loadtimes.

Testing

I know, I know, you're here for numbers, so let me start with some: 3 rounds of testing 6 continents 98 times each for a combined total of 588 requests spread over 34 days. Every test cycle consists of one https request per continent, using the GlobalPing network and a simple script I've written to interface with their CLI-tool.

Different times between requests will provide us with insights on how much resources are allocated for us regular users. We're looking for the CDN that's not only the fastest but also the most consistent in its loadtimes.

Included is a demonstration of a badly configured CDN, actually just chaining two CDNs. Without the two platforms talking to each other, the network gets confused and it more than doubles the loadtimes.

Finally, I've included ioRiver - the only platform I've found with Geo-Routing on a free trial. This would allow me to serve Europe with my own server and the rest of the world with an actual CDN. The first 2 testing rounds, I configured ioRiver to only serve with Vercel for a baseline test on how much, if any, delay they add. In the 3rd round, ioRiver routed Europe to my server, and the rest of the world to Vercel.

Results

We should be expecting my self-hosted solution to the deliver the same speeds every round, this is our baseline. Any CDN with a slower average than my single Self-Hosted solution is not worth considering.

Round 1: 3 days of requesting once every hour (72 tests)
Round 2: 7 days of requesting once every 12 hours (14 tests)
Round 3: 24 days of requesting once every 48 hours (12 tests)

Round 1 - data

Graph Round 1

Frequent requests to a CDN ensures the website is cached not only in faster memory spaces but also to a more diverse spread of edge servers.

Pull CDNs show their strong advantage by not needing an extra request to my server
ioRiver (Multi-CDN) is setup to only serve Vercel, we can see it adds a considerable delay
Vercel, GitHub Pages, and Cloudflare (Pull) show themselves to be early leaders

Round 2 - data

Graph Round 2

Most reflective of a regular day on the site (all charts are ordered by this round)

Some CDNs already reflect slightly slower times due to not being cached as frequently
GitHub stands out to me in this round of testing being a little more stable than the previous round

Round 3 - data

Graph Round 3

†I didn't take Static.app's 30 day trial into account when testing, which is why it's absent from this final round.

Surprisingly enough for Cloudflare we notice their Pull version pulling ahead of their Push CDN
Adding my Self-Hosted solution to ioRiver's Multi-CDN via Geo-Routing to Europe shows it can genuinely add stability and decrease loadtimes

Notes

It's pretty clear by now Vercel is throwing more money at the problem, so it shouldn't come as a surprise they've set a limit on the monthly requests: a respectable 1 million or 100GB total per month. For evaluation, I've changed my website to start hosting from Vercel.

GitHub's limits are described as a soft bandwidth limit of 100GB/month, more like a gentleman's agreement.

Same as last time, I'll be leaving up the different deployments for another month probably.

martijn.sh (Vercel - for now)
vercel.martijn.sh
~~ioriver.martijn.sh~~
cf.martijn.sh (Cloudflare Push)
git.martijn.sh
cdn.martijn.sh (Cloudflare Pull)
~~staticapp.martijn.sh~~
fastly.martijn.sh
eu.martijn.sh (Self-Hosted)
netlify.martijn.sh
gcore.martijn.sh
~~cdncdn.martijn.sh~~

Code / Scripts on GitHub

These scripts are provided as is. I've made them partly configurable with CLI, but there are also hard-coded changes required if you're planning on using them yourself.

CDN Test

ping.py

Interface with GlobalPing's CLI tool, it completes a https request for every subdomain or different deployment from every continent equally with many rate limiting functions. In hindsight, interfacing with their API might've been a better use of time...

parseGlobalPing.py

Parses all files generated by GlobalPing during ping.py, calculate averages, and returns this data in pretty print or CSV (I'm partial to a good spreadsheet...). Easy to tweak with CLI arguments.

CDN Testing Round

Ping every 12h from every continent (hardcoded domains & time)
$ python3 ping.py -f official_12h -l 100 
Parse, calculate, and pretty print all pings
$ python3 parseGlobalPings.py -f official_12h

Heatmap

masscan - discovery

masscan 0.0.0.0/4 -p80 --output-format=json --output-filename=Replies.json --rate 10000

Scans a portion of the internet for servers with an open port 80, traditionally used for serving a website or redirect.

hpingIps.sh - measurement

Due to masscan not recording RTT's, I used hping for the measurements. Nmap is a good choice as well but hping is slightly faster. I've found MassMap after my scan, which wraps Masscan and Nmap nicely together. I'll update this when I've compared its speed compared to my implementation.

This is a quick and dirty script to use hping and send one packet to port 80 of any host discovered by masscan.

query.py - parse and locate

Primary and original function is to query the GeoLite2 database with an IP-address to give a rough estimate of their physical location to plot a heatmap. Now it can also estimate the distance between my server and another using Haversine.

plot.py

Creates heatmap with the output of query.py (longitude, latitude, and RTT) using Matplotlib

query.py and plot.py are forked from Ping The World! by Erik Bernhardsson, but is over 10 years old. The new scripts fixed many issues and are much improved.

Graph plot

Masscan (Command mentioned above)
# Replies.json

Masscan -> IPList
$ python3 query.py --masscan > IPList.txt

IPList -> RTT
# sh hpingIps.sh

RTT -> Combinedlog
$ python3 query.py > log_combined.txt

CombinedLog -> Plot
$ python3 plot.py

./Martijn.sh > Blog / How I made a blog using Lemmy / Measuring the latency of 574k servers around the world from my lemmy server

4

0

My blog is now part of the Fediring! (martijn.sh)

submitted 2 months ago* (last edited 2 months ago) by kernelle@0d.gs to c/self@0d.gs

0 comments fedilink

Promoted by the indieweb and keeping old internet traditions alive, webring websites link to eachother in a circular fashion. Often likeminded individuals with common interests or themes, in my case the Fediring. Following the links or arrows, you'll find a large community of amazing and interesting people on the Fediverse.

Other interesting webrings:

Melonland: over 1k of interesting people!
Octo-ring: a large GitHub-ring
Null: A NeoCities-ring, because those are still around too
Transring: For trans and enby homies
No-AI

5

1

How I made a blog using Lemmy - a write-up (martijn.sh)

submitted 3 months ago* (last edited 2 months ago) by kernelle@0d.gs to c/self@0d.gs

0 comments fedilink

This is a followup to my introduction of BlogOnLemmy, a simple blog frontend. If you haven't seen it, no need because I will be explaining how it works and how you can run your own BlogOnLemmy for free.

Leveraging the Federation

Having a platform to connect your content to likeminded people is invaluable. The Fediverse achieves this in a platform agnostic way, so in theory it shouldn't matter which platform we use. But platform have different userbases that interact with posts in different ways. I've always preferred the forum variety, where communities form and discussion is encouraged.

My posts are shared as original content on Lemmy, and that's who it's meant for. Choosing for a traditional blog style to make a more palatable platform for a wider audience, and in this way also promoting Lemmy.

Constraints

Starting off I did not want the upkeep of another federated instance. Not every new thing that is deployed on the Fediverse needs to stand on its own or made from the ground up as an ActivityPub compatible service. But rather use existing infrastructure, already federated, already primed for interconnectivity. Taking it one step further is not a having a back-end at all, a 'dumb' website as it were. Posts are made, edited, and cross-posted on Lemmy.

The world of CSS and JavaScript on the other hand - how websites are styled and made feature-rich - is littered with libraries. Being treated like black boxes, often just a few functions are used with the rest clogging up our internet experience. Even jQuery, which is used by over 74% of all websites, is already 23kB in its smallest form. I'm not planning on having the smallest possible footprint†, but rather showing a modern web browser provides an underused toolset of natively supported functionality; something the first webdevs would have given their left kidney for.

Lastly, to improve maintainability and simplicity, one page is enough for a blog. Provided that its content can be altered dynamically.

†See optimization

How it's made

Graphviz

1) URL: Category/post

Even before the browser completely loads the page, we can take a look at the URL. With our constraints only two types of additions are available for us, the anchor and GET parameters. When an anchor, or '#', is present websites scroll to a specific place in a website after loading. We can hijack this behavior and use it to load predefined categories. Like '#blog' or '#linkdumps'. For posts, '#/post/3139396' looks nicer than '?post=3139396', but anchors are rarely search engine compatible. So I'm extracting the GET parameter to load an individual post.

Running JavaScript before the page has done loading should be swift and easy, like coloring the filters or setting Dark/Light mode, so it doesn't delay the site.

2) API -> Lemmy

A simple 'Fetch' is all that's required. Lemmy's API is extensive already, because it's used by different frontends and apps that make an individual’s experience unique. When selecting a category, we are requesting all the posts made by me in one or more lemmy communities. A post or permalink uses the same post_id as on the Lemmy instance. Pretty straight forward.

3) Markdown -> HTML

When we get a reply from the Lemmy instance, the posts are formatted in Markdown. Just as they are when you submit the post. But our browsers use HTML, a different markup language that is interpretable by our browsers. This is where the only code that's not written by me steps in, a Markdown to HTML layer called snarkdown. It's very efficient and probably the smallest footprint possible for what it is, around 1kB.

Optimization

When my blog was launched, I was using a Cloudflare proxy, for no-hassle https handling, caching and CDN. Within the EU, I'm aiming for sub-100ms† to be faster than the blink of an eye. With a free tier of Cloudflare we can expect a variance between 150 and 600ms at best, but intercontinental caching can take seconds.

Nginx and OpenLiteSpeed are regarded as the fastest webservers out there, I often use Apache for testing but for deployment I prefer Nginx's speed and reliability. I could sidetrack here and write another 1000 words about the optimization of static content and TLS handling in Nginx, but that's a story for another time.

†For the website, API calls are made asynchronously while the page is loaded and are not counted

Mythical 14kB, or less?

All data being transferred on the internet is split up into manageable chunks or frames. Their size or Maximum Transmission Unit, is defined by IEEE 802.3-2022 1.4.207 with a maximum of 1518 bytes†. They usually carry 1460 bytes of actual application data, or Maximum Segment Size.

Followed by most server operating systems, RFC 6928 proposes 10x MSS (= Congestion Window) for the first reply. In other words, the server 'tests' your network by sending 10 frames at once. If your device acknowledges each frame, the server knows to double the Congestion Window every subsequent reply until some are dropped. This is called TCP Slow Start, defined in RFC 5681.

10 frames of 1460 bytes contain 14.6kB of usable data. Or at least, it used to. The modern web changed with the use of encryption. The Initial Congestion Window, in my use case, includes 2 TLS frames and from each frame it takes away an extra 29 bytes. Reducing our window to 11.4kB. If we manage our website to fit within this first Slow Start routine, we avoid an extra round trip in the TCP/IP-protocol. Speeding up the website as much as your latency to the server. Min-maxing TCP Traffic is the name of the game.

†Can vary with MTU settings of your network or interface, but around 1500 (+ 14 bytes for headers) is the widely accepted default

10kB vs 15kB with TCP Slow Start

Visualizes two raw web requests, 10.7kB vs 13.3kB with TCP Slow Start

Above Blue: Request Starts
Between Green: TLS Handshake
Inside Red: Initial Congestion Window

Icons

Icons are tricky, because describing pixel positions takes up a considerable amount of data. Instead SVG's are commonplace, creating complex shapes programmatically, and significantly reducing its footprint. Feathericons is a FOSS icon library providing a beautiful SVG rendered solution for my navbar. For the favicon, or website icon, I coded it manually with the same font as the blog itself. But after different browsers took liberties rendering the font and spacing, I converted it to a path traced design. Describing each shape individually and making sure it's rendered the same consistently.

Regular vs. Inline vs Minified

If we sum up the filesizes we're looking at around 50kB of data. Luckily servers compress† our code, and are pretty good at it, leaving only 15kB to be transferred; just above our 11kB threshold. By making the code unreadable for humans using minifying scripts we can reduce the final size even more. Only... the files that make up this blog are split up. Common guidelines recommend doing so to prevent one big file clogging up load times. For us that means splitting up our precious 11kB in multiple round trips, the opposite of our goal. Inline code blocks to the rescue, with the added bonus of the entire site now being compressed into one file making the compression more efficient to end optimization at a neat 10.7kB.

†The Web uses Gzip. A more performant choice today is Brotli, which I compiled for use on my server

In Practice

All good in theory, now let's see the effect in practice. I've deployed the blog 4 times, and each version was measured for total download time from 20 requests. In the first graph we notice the impact of not staying inside the Initial Congestion Window, where only the second scenario is delayed by a second round trip when loading the first page.

Scenario 1. and 3. have separate files, and separate requests are made. Taking priority in displaying the website, or the first file, but neglecting potential useable space inside the init_cwnd. We can tell when comparing the second graph, it ends up almost doubling their respective total load times.

The final version is the only one transferring all the data in one round trip, and is the one deployed on the main site. With total download times as low as 51ms, around 150ms as a soft upper limit, and 85ms average in Europe. Unfortunately, that means worldwide tests show load times of 700ms, so I'll eventually implement a CDN.

Speedtest 4 scenarios

Regular (14,46kB): no minification, separate files
Inline (13,29kB): no minification, one file
Regular Minified (10,98kB): but still using separate files
Inline Minified (10,69kB): one page as small as possible

~~I'll be leaving up dev versions until there's a significant update to the site~~

Content Delivery Network

Speeds like this can only be achieved when you're close to my server, which is in London. For my Eurobros that means blazing fast response times. For anyone else, cdn.martijn.sh points to Cloudflare's CDN and git.martijn.sh to GitHub's CDN. These services allow us to distribute our blog to servers across the globe, so requesting clients always choose the closest server available.

GitHub Pages

An easy and free way of serving a static webpage. Fork the BlogOnLemmy repository and name it 'GitHub-Username'.github.io. Your website is now available as username.github.io and even supports the use of custom domain names. Mine is served at git.martijn.sh.

While testing its load times worldwide, I got response times as low as 64ms with 250ms on the high end. Not surprisingly they deliver the page slightly faster globally than Cloudflare does, because they're optimizing for static content.

Extra features

Taking over the Light or Dark mode of the users' device is a courtesy more than anything else. Adding to this, a selectable permanent setting. My way of highlighting the overuse of cookies and localStorage by giving the user the choice to store data of a website that is built from the ground up to not use any.
A memorable and interactable canvas to give a personal touch to the about me section.
Collapsed articles with a 'Read More'-Button.
'Load More'-Button loads the next 10 posts, so the page is as long as you want it to be

Webmentions

Essential for blogging in current year, Webmentions keep websites up-to-date when links to them are created or edited. Fortunately Lemmy has got us covered, when posts are made the first instance sends a Webmention to the hosters of any links that are mentioned in the post.

To stay within scope I'll be using webmention.io for now, which enables us to get notified when linked somewhere else by adding just a single line of HTML to our code.

Notes

Enabling HTTP2 or 3 did not speed up load times, in fact with protocol negotiations and TLS they added one more packet to the Initial Congestion Window.
For now, the apex domain will be pointing directly to my server, but more testing is required in choosing a CDN.
Editing this site for personal use requires knowledge of HTML and JS for now, but I might create a script to individualize blogs easier.

GitHub | ./Martijn.sh > Blog

ø Personal Space ø

Lemmydocs 7:4 – Thou shall create a blog

Features

TODO

What is Lemmy?

Federated?

Why 0d.gs?

The Problem

Free Web

Pull vs Push CDN

Edge vs Regional vs Origin

Testing

Results

Round 1 - data

Round 2 - data

Round 3 - data

Notes

Code / Scripts on GitHub

CDN Test

ping.py

parseGlobalPing.py

CDN Testing Round

Heatmap

masscan - discovery

hpingIps.sh - measurement

query.py - parse and locate

plot.py

Graph plot

Leveraging the Federation

Constraints

How it's made

1) URL: Category/post

2) API -> Lemmy

3) Markdown -> HTML

Optimization

Mythical 14kB, or less?

Icons

Regular vs. Inline vs Minified

In Practice

Content Delivery Network

GitHub Pages

Extra features

Webmentions

Notes