Indie Web Camp Nuremberg

October 31st, 2023

After two days at border:none in Nuremberg, it was time for two days at Indie Web Camp, also in Nuremberg.

I hadn’t been to an Indie Web Camp since before The Situation. It felt very good to be back. I had almost forgotten how inspiring and productive they can be.

This one had a good turnout of around twenty people. We had ourselves an excellent first day of thought-provoking sessions. Then on day two it was time to put some of those ideas into action.

A little trick I like to do on the practical day is to have two tasks to attempt: one of them quite simple, and the other more ambitious. That way, as long as I get the simpler task done, I’ll always have at least something to demo at the end of the day.

This time I attempted three bits of home improvement on my website.

Autolinking Mastodon usernames

The first problem I set myself was ostensibly the simple one. But it involved regular expressions, so then I had two problems.

I wanted to automatically link up Mastodon usernames if I mentioned one in my notes. For example, during border:none I mentioned Brian’s mastodon username in a note: @briansuda@loðfíll.is.

That turned out to be an excellent test case. Those Icelandic characters made sure I wasn’t making unwarranted assumptions about character sets.

Here’s the regular expression I came up with. It’s not foolproof by any means. Basically it looks for @something@something.something.

Good enough. Ship it.

My next task was a bit more ambitious. It involved SQL queries, something I’m slightly better at than regular expressions but that’s a very low bar.

I wanted to show related posts when you get to the end of one of my blog posts.

I’ve been tagging all my blog posts for years so that’s the mechanism I used for finding similar posts. There’s probably a clever SQL statement that could do this, but I ended up brute-forcing it a bit.

I don’t feel too bad about the hacky clunky nature of my solution, because I cache blog post pages. That means only the first person to view the blog post (usually me) will suffer any performance impacts from my clunky database queries. After that everything’s available straight from a cached file.

Let’s say you’re reading a blog post of mine that I’ve tagged with ten different keywords. I make a separate SQL query for each keyword to get all the other posts that use that tag. Then it’s a matter of sorting through all the results.

I loop through the results of each tag and apply a score to the tagged post. If the post shares one tag with the post you’re looking at, it has a score of one. If it shares two tags, it has a score of two, and so on.

I decided that for a post to be considered related, it had to share at least three tags. I also decided to limit the list of related posts to a maximum of five.

It worked out pretty well. If you scroll down on my recent post about JavaScript, you’ll see links to related posts about JavaScript. If you read through a post on accessibility testing, you’ll find other posts about accessibility testing. If you make it to the end of this post about Mars colonisation you’ll see links to more posts about exploring our solar system.

Right now I’m just doing this for my blog but I’d like to do it for my links too. A job for a future Indie Web Camp.

Link rot

I was very inspired by Remy’s recent post on how he’s tackling link rot on his site. I wanted to do the same for mine.

On the first day at Indie Web Camp I led a session on link rot to gather ideas and alternative approaches. We had a really good discussion, though it’s always worth bearing in mind that there’ll never be a perfect solution. There’ll always be some false positives and some false negatives.

The other Jeremy at Indie Camp Nuremberg blogged about the session. Sebastian Greger was attending remotely and the session inspired him to spend the second day also tackling linkrot.

In the end I decided to stick with Remy’s two-pronged approach:

a client-side script that—as a progressive enhancement—intercepts outbound links and re-routes them to
a server-side script that redirects to the Internet Archive if the link is broken.

Here’s the JavaScript I wrote for the first part.

It’s very similar to Remy’s but with one little addition. I check to see if the clicked link is inside an h-entry and if it is, I pass on the date from the post’s dt-published value.

Here’s the PHP I wrote for the server-side redirector. The comments tell the story of what the code is doing:

Check that the request is coming from my site.
There also has to be a URL provided in the query string.
Make a very quick curl request to get the response headers from the URL. The time limit is set to 1 second.
If there was any error (like a time out), give up and go to the URL.
Pick the response headers apart to get the HTTP status code.
If the response is OK, go to the URL.
If the response is a redirect, go around again but this time use the redirect URL.
Construct the archive.org search endpoint.
If we have a date, provide it. Otherwise ask for the latest snapshot.
Ping that archive.org URL. This time there’s no time limit; this might take a while.
If there’s an archived copy, redirect to that.
There’s no archived copy. Give up and go the URL anyway.

Not perfect by any means, but it works for the most common cases of link rot.

For the demo at the end of the day I went back into my archive of over 10,000 links and plucked out some old posts, like this one from December 2005. It takes a little while to do the rerouting but eventually you get to see the archived version from the same time period as when I linked to it.

Here’s another link from 2005. Here’s another. Those links are broken now, but with a little patience, you’ll still get to read them on the Internet Archive.

The Internet Archive’s wayback machine really is a gift. I can’t imagine how would it be even remotely possible to try to address link rot on my site without archive.org.

I will continue to donate money to the Internet Archive and I encourage you to do the same.

« Newer Older »

Responses

Valtteri Laitinen

@adactio I would use `\p{L}` or `\pL` to match all Unicode letters, so the regex would become `/@?\b([\p{L}0-9._%+-]+)@([\p{L}0-9.-]+\.\p{L}{2,})\b/iu` (note the `u` flag).

Jeremy Cherfas

IndieWeb Camp and border:none in Nürnberg were wonderful. I had a great time seeing old friends, making new ones and just giving myself over to the whole thing. Well worthwhile, including even the two twelve-hour train journeys that took me there and back. No complaints.

But. Because there has to be a but.

I failed miserably in the main task I chose for myself on the hack day on Sunday, which is why I am especially grateful to Jeremy Keith for his timely reminder that it is a really good idea to have a little task that you were pretty certain to accomplish. That I did, on the day before the hack day, no less, though it is possible that nobody but me will notice it or care if they do.1

The big task was to fix the way my microblogging machine, WithKnown, handles maps. There are actually two problems, which may or may not be related. Most obviously, check-ins to a location have not displayed a map of the location since 22 May 2022. And that’s a bit weird because the form for posting a new check-in does show a map that allows you see where it thinks you are. A couple of days ago, however, that map started warning me that the map tiles it displays would no longer be available after today.

I’d known about this for a while. Stamen map tiles have been very up-front about moving hosting to Stadia Maps and I did my homework. According to the guide, the changeover could be as simple as “just” changing the URL from which I requested the tiles. But I also knew how difficult WithKnown’s code is for me to understand, which is why I postponed the changeover to a time and place when I knew that smarter and more experienced people would be available to help: IndieWeb Camp.

Lost in the Code

I’m quite certain WithKnown’s code makes a great deal of sense to its own developers, but it is a dreadful tangle for a mere amateur to explore. There is precious little guidance in the way of either documentation or comments in the code, which perhaps makes sense for people who work on the code every day, but the sad truth is that nobody is doing that any more. If they were, these problems would have been addressed aeons ago. So, nothing for it but to tie a thread round my waist and go spelunking.

As an aside, Jeremy Keith has also written about his hack day projects, unsurprisingly a lot more successful than mine, and the code he shared is a delight. The comments tell me exactly what is going on, and why, making it possible for even a tyro like me to follow along, no thread necessary.

Anyway, grep found what I was looking for in a flash, a call to the old Stamen URL in tile.stamen.js. At first glance it looked quite straightforward, assembling the request from a base URL to which were added variables for the zoom, the latitude and longitude, and the type of image required (because different tilesets offer different types of image). Modifying the URL constructor to use the new base URL, however, failed miserably, and looking at the request being sent revealed that it was not what I expected, having a tile “flavor” added to the tile style.

Further digging revealed another function that seemed to offer the opportunity to indicate the style and flavour of the map tiles required, with the flavour added to the style by a hyphen, -. The new URL joins the flavour to the style with an underscore, _. OK, lets change the join to use that.2 Still no change. Maybe something else was adding -flavor?

We looked, but we couldn’t find anything. It certainly wasn’t in the bits that actually build the maps in Leaflet. All very frustrating.

Options not Offered

A bigger mystery is why WithKnown code includes functions to set the style and flavour of the map tiles it displays. I’ve never seen those options offered to me anywhere. At this point I was desperate, so I tried deleting that whole function; it didn’t break anything, but it didn’t fix the problem either. My guess is that WithKnown’s developers modified an original tile.stamen.js just enough to use the single tile source they wanted but left various bits and pieces in even though they made no use of them. Or rather, no use of them that I could find.

At that point I resigned myself to failure, and spent the final 45 minutes of the hack day thinking about my own options. One of the attractions of the IndieWeb in general and WithKnown in particular is that it ought to allow me to take the data I have been so careful to “own” and move it to some other system. I’ve tried to get WithKnown’s export to work before, unsuccessfully, but of course I tried again. After a good long wait, it downloaded an empty file, with no indication that it had run out of time, or memory, or patience. Twice.

Then I started reading up about web scraping and getting the HTML source files (and images) so that I could maybe work on those. Because the source contains microformats it ought to be possible to import them into another suitably equipped machine, like micro.blog. Then I had a few moments of existential doubt. What was so precious about the roughly 3500 posts I had created with Known? Maybe better to cut and run? That does and does not appeal. It would be very, very easy, but it would also somehow negate the point of having used WithKnown in the first place, that the contents was mine to migrate.

My Solution is Known

I’ve a feeling I am going to keep banging my head against this brick wall for a little longer yet. Step One will be to share this post and see whether I can’t get people who know WithKnown to step up. Step Two will be to keep spelunking through the code to see whether I can trace the complete path from clicking on the location button to the display of a checkin on my site, complete with map.

Step Three I don’t want to contemplate any further. Yet.

On the day itself, I fixed the display of search results here, not that important. ↩
A further complication, spotted by my helpers, was that tile.stamen.min.js was also present, and rather than re-minify the original file, we edited that too, and that definitely worked as planned. ↩

Theresa O’Connor

@adactio The redirector seems broken, by the way. It 403s for me in STP.

adactio.medium.com

5 Likes

# Liked by beyond tellerrand on Tuesday, October 31st, 2023 at 4:35pm

# Liked by Chris Burnell on Tuesday, October 31st, 2023 at 5:01pm

# Liked by Florian Ziegler on Tuesday, October 31st, 2023 at 5:33pm

# Liked by Roma Komarov on Tuesday, October 31st, 2023 at 9:47pm

# Liked by Jan on Wednesday, November 1st, 2023 at 9:29am

Indy maps

It’s not the years, honey. It’s the mileage.

Indy web

Maps—they don’t love you like I love you.

Indie Web Camp Brighton 2016

A jolly weekend of talking and making.

Selfish publishing

It’s not you, it’s me.

Words I wrote in 2022

A handpicked selection of blog posts.

Previously on this day

2 years ago I wrote Do You Like Rock Music?

A little story of three Brighton bands.

3 years ago I wrote Four days

A balancing act.

5 years ago I wrote Indy maps

It’s not the years, honey. It’s the mileage.

7 years ago I wrote Speak and repeat

The lifecycle of a conference talk.

13 years ago I wrote Carpenter

I have come here to watch an all-night movie marathon and chew bubblegum. And I am all out of bubblegum.

20 years ago I wrote Let sleeping iPods die

After a Friday afternoon meeting over at Semantico, I decided to swing by the local Cancom shop and browse through whatever Apple goodies they might have in stock.

22 years ago I wrote The Pumpkin Queen

Not only does Jessica carve a scary looking Jack O’Lantern, she also makes a mean roasted garlic pumpkin bisque with herb potato dumplings.

23 years ago I wrote Bairin Breac

A quick update to my previous entry: I found a recipe for that cake.

23 years ago I wrote Halloween

Happy Halloween.