
Back when the web was young, it wasn’t yet clear what the rules were. Like, could you really just link to something without asking permission?

Then came some legal rulings to establish that, yes, on the web you can just link to anything without checking if it’s okay first.

What about search engines and directories? Technically they’re rifling through all the stuff we publish and reposting snippets of it. Is that okay?

Again, through some legal precedents—but mostly common agreement—everyone decided that on balance it was fine. After all, those snippets they publish are helping your site get traffic.

In short order, search came to rule the web. And Google came to rule search.

The mutually beneficial arrangement persisted uneasily. Despite Google’s search results pages getting worse and worse in recent years, the company’s huge market share of search means you generally want to be in their good books.

Google’s business model relies on us publishing web pages so that they can put ads around the search results linking to that content, and we rely on Google to send people to our websites by responding smartly to search queries.

That has now changed. Instead of responding to search queries by linking to the web pages we’ve made, Google is instead generating dodgy summaries rife with hallucina… lies (a psychic hotline, basically).

Google still benefits from us publishing web pages. We no longer benefit from Google slurping up those web pages.

With AI, tech has broken the web’s social contract:

Google has steadily been manoeuvring their search engine results to more and more replace the pages in the results.

As Chris puts it:

Me, I just think it’s fuckin’ rude.

Google is a portal to the web. Google is an amazing tool for finding relevant websites to go to. That was useful when it was made, and it’s nothing but grown in usefulness. Google should be encouraging and fighting for the open web. But now they’re like, actually we’re just going to suck up your website, put it in a blender with all other websites, and spit out word smoothies for people instead of sending them to your website. Instead.

Ben proposes an update to robots.txt that would allow us to specify licensing information:

Robots.txt needs an update for the 2020s. Instead of just saying what content can be indexed, it should also grant rights.

Like crawl my site only to provide search results not train your LLM.

It’s a solid proposal. But Google has absolutely no incentive to implement it. They hold all the power.

Or do they?

There is still the nuclear option in robots.txt:

User-agent: Googlebot
Disallow: /

That’s what Vasilis is doing:

I have been looking for ways to not allow companies to use my stuff without asking, and so far I coulnd’t find any. But since this policy change I realised that there is a simple one: block google’s bots from visiting your website.

The general consensus is that this is nuts. “If you don’t appear in Google’s results, you might as well not be on the web!” is the common cry.

I’m not so sure. At least when it comes to personal websites, search isn’t how people get to your site. They get to your site from RSS, newsletters, links shared on social media or on Slack.

And isn’t it an uncomfortable feeling to think that there’s a third party service that you absolutely must appease? It’s the same kind of justification used by people who are still on Twitter even though it’s now a right-wing transphobic cesspit. “If I’m not on Twitter, I might as well not be on the web!”

The situation with Google reminds me of what Robin said about Twitter:

The speed with which Twitter recedes in your mind will shock you. Like a demon from a folktale, the kind that only gains power when you invite it into your home, the platform melts like mist when that invitation is rescinded.

We can rescind our invitation to Google.

Have you published a response to this? :


Charles Roper


What about Bing? I’ve found Google Bard to be largely useless and irrelevant. I’ve found Bing’s approach of being able to “search the web” (Microsoft’s technology - Prometheus - can call on the Bing search index) to be the most useful of all AI chat tools, specifically because it provides links to sources. You can also provide it with links/sites to work with; e.g. “based on information at …”.

I find it very hard to trust LLMs that do not also cite sources.

Adactio: Jeremy Keith

Jeremy Felt

We can rescind our invitation to Google.

Jeremy Keith

I’ve been thinking about this quite a bit lately and I’m happy to see I’m not the only one. It’s hard to imagine Google’s value add for me and my website.

Bonus: fewer requests a day to my server—by thousands.

Published Tuesday, Jul 11 at 13:58

Tracy Durnell

ADDED 4 October 2023:

Google has announced a new token you can block to exclude your website from training Bard and Vertex AI: Google-Extended. To block your site from being used to train Google’s AI products, you should include this code in your robots.txt file:

# Google AI User-agent: Google-Extended Disallow: /

As a standalone token, that means that we don’t need to block Google from indexing our websites to block them from using our content to train their AI products.

⭐ ADDED 11 December 2023:

Except!!!! Google-Extended applies to their products but not their generative search results. So if you don’t want your content to appear in generative search results, you still need to block Googlebot.

ORIGINAL ARTICLE (published 11 July 2023):

After thinking about it for a couple days, I’ve decided to de-index my website from Google. It’s reversible — I’m sure Google will happily reindex it if I let them — so I’m just going ahead and doing it for now. I’m not down with Google swallowing everything posted on the internet to train their generative AI models. I was pushed over the edge by posts from Jeremy Keith and Vasilis van Gemert, thanks y’all.

I don’t have Google Search Console set up for this website so I don’t know how much search traffic I get. My other blog, Cascadia Inspired, got about 200 hits in the past three months. I’m not going to cry over that — they’re mostly going to one 2015 article anyway (and probably not that helpful of a post, to my eye. Around New Year’s every year I usually get an influx of people to my ten-year-old guide to doing a creative annual review. Sorry folks, I’m sure someone else has written something better by now.) 😉

I’m going to start by pulling my websites out of Google search, then work on adding my sites to directories. Maybe I’ll even join a webring 💍✨

Adding a noindex meta tag to my WordPress header

Because my website has already been indexed by Google, I need to allow the Google bot to re-crawl the pages and see the new “noindex” instruction. So in the future I’ll also block the Googlebot crawler, but not just yet 😉

I added this code to the functions.php file of my child theme:

add_action( 'wp_head', function() { global $page; echo '<meta name="Googlebot" content="noindex, nofollow, noimageindex">'; });

I figured out how to adapt this from WPExplorer. This random wordpress plugin help forum suggested another version, I don’t know which is better 🤷‍♀️

I’m not 100% on whether the noimageindex is actually helpful for Googlebot since that’s their text bot, but can’t hurt right? (Tell me if it hurts lol.) Yoast says there’s a better way to block image indexing but I’m scared of touching the .htaccess file and definitely nothing with my server 😂 (I’m on shared hosting anyway, so I think the edits I can make are limited?)

Blocking bots that collect training data for AIs (and more)

In addition, I created a robots.txt file to tell “law abiding” bots what they’re not allowed to look at. I ought to have done this before but kind of assumed it came with my WordPress install 😅 (Nope.)

AI user agents to block

There’s so many now, just copy from my robots file tbh.

ADDED 4 October 23: To block training of Google’s Bard, I blocked Google-Extended.

I specifically want to deter my website being used for training LLMs, so I blocked Common Crawl.

To block OpenAI, I blocked both user agents ChatGPT-User and GPTBot. (Added GPTBot 10 August 23)

ADDED 4 October 23: Per Neil Clarke’s article, I have also blocked Omgilibot, Omgili, and FacebookBot. (Via Jeremy Keith)

ADDED 14 February 2024: I also blocked user agents used in AI training sets: anthropic-ai, Bytespider, FacebookBot, and PerplexityBot (source)

ADDED 16 April 2024: prompted by Ethan Marcotte, I blocked several more known and suspected user agents used in AI training: Claude-Web, ClaudeBot, cohere-ai, Diffbot, YouBot, ChatGPT

Added 17 June 2024: I’ve now blocked Apple’s AI training bot Applebot-Extended (thanks for the heads-up James!) Does anyone else feel like this is getting ridiculous?

I also blocked Amazonbot and applebot to block Siri and Alexa’s “smart answers.” I believe this also excludes me from Apple search.

I’ve also now blocked Googlebot and bingbot in protest of their generative AI search results — I’ve had the code up for my pages to be deindexed by Google for over six months and I’m done waiting anymore.

Dark Visitors apparently has a WordPress plugin to update your robots.txt whenever a new agent comes out, but for now I’m stickin’ with manual. I am also still wary of modifying my .htaccess file and breaking something, so it’s just my robots.txt making my stance clear — I can’t control whether companies have any sort of ethics and comply, unfortunately.

Other user agents

Searching on DuckDuckGo, I found an older article from a theme maker with specific advice for WordPress robots.txt. From there I jumped to Jeff Star’s recommendations from 2020.

I also appreciate fellow opinionated individuals on the internet so I followed some other blocks from Rohan Kumar. I would happily take more opinionated suggestions of junk bots to block if anyone else has opinions or can point me to a list somewhere 😉

Note: this article generated a lot of interest! See a Hacker News discussion.

Syndicated to IndieWeb News

Ed Summers

@adactio thanks for writing about this and the pointers to the conversation that has already been happening.

I wonder if formulating this as a well-known URI might be easier to layer in than further encumbering robots.txt? @mnot has there been talk of this already?

Also re: buy in from Google, there does seem to be some interest from them?

A principled approach to evolving choice and control for web content

# Posted by Ed Summers on Wednesday, July 12th, 2023 at 11:42am

Andy Carolan :prami:

@anniegreens That’s VERY true. Google has indeed broken the social contract.

If I stopped Google’s bots (or any search engine’s bots for that matter) from crawling my site, I would notice zero change. I’ve never been found or benefited from search engines.

Every business contact has been made through recommendation, social media presence or by my own ‘reaching out’.

Michael Martinez :verified:

@simoncox That’s a very nuanced question. I would say we probably inadvertently shot ourselves in the foot. Even those of us who have held firmly to non-search promotion strategies still rely on search engines for exposure.


@simoncox Dont hate the player, hate the game.

Also, if we didn’t help decent brands improve their sites then just imagine the shite that Google would be ranking top

# Posted by Boswell on Wednesday, July 12th, 2023 at 5:09pm

Charles Roper

@adactio Same question to @chriscoyier. Does the inclusion of links in an answer, as Bing AI does, restore the social contract?

I find this conundrum really interesting; I’m ambivalent and conflicted on it.

What’s happening with LLMs seems to be a version of the original vision of the Semantic Web, as envisioned by Tim Berners-Lee.

And are we ourselves not spitters of word smoothies based on our own unique experience of the world?

Semantic Web - Wikipedia

Tane Piper

@charlesroper @adactio @chriscoyier It’s one thing we looked at - data pods as way to allow people to store personalisation, so we don’t have to - then along with our Knowledge Graph and LLM provide real recommendations, not ones based on weak signals. Didn’t get to finish it in this hackathon PoC, but it would be something we might add later - also allow “bring your identity” and sharing.

# Posted by Tane Piper on Wednesday, July 12th, 2023 at 6:15pm

Tane Piper

@charlesroper @adactio @chriscoyier The video of the demo should be up soon from OntoCommons. Haven’t really touched upon it enough yet to write about it, but it’s only part of the puzzle of being non-creepy with recommendations.

# Posted by Tane Piper on Wednesday, July 12th, 2023 at 6:23pm

Charles Roper

@chriscoyier @adactio I’d not seen Phind before - interesting. Quite similar to Bing AI.

What I found working in biodiversity informatics for over a decade is that people are generally much more comfortable sharing their data - often a lifetime of dedicated, expert work - if a) they’re credited and b) it’s used for good causes and not for profit.

It was one of the more challenging aspects of the work. Remarkable parallels with LLMs consuming data scraped from the web….


(As usual, this roundup is a mix of French and English. Comme d’habitude, ce récap est un mélange d’anglais et de français.)

Toggle Quelques nouvelles perso

Je commence à être vraiment confortablement installé à Grenoble. Le fait d’avoir mes potes à 10min de vélo, c’est vraiment extraordinaire, et quelque chose que j’avais oublié à Paris : pas besoin de s’organiser plusieurs jours à l’avance, un « je vais au insérer bar dans une heure si tu veux » ça couvre les bases et ça facilite tellement les interactions !

J’ai aussi bien profité de mon mois de juillet pour lire et me balader en France, comme vous pourrez le constater dans ce récap.

Reading Books

In English:

  • We Deserve Monuments, by Jas Hammonds, had me sobbing during most of the second half of the book.
  • The graphic novel Shubeik Lubeik by Deena Mohamed was beautiful and moving, and I highly recommend it. It was also nice to read something in translation from Arabic – I haven’t done much of that yet and the writing style is really special!
  • All my Rage by Sabaa Tahir follows two Pakistani teenagers in an US high school, with their own issues and trauma and, well, all their rage.
  • Spinning by Tillie Walden, which was recommended by the hosts of the (French) RomComment podcast, is Walden’s recollections of being a tween mid-level figure skater who moves to Texas, trains every morning at 4am, finds out she’s gay, and is understandably a bit overwhelmed.
  • In Wild and Crooked, by Leah Thomas, two kids become unlikely friends. There’s the criminal-in-training daughter of a guy who murdered someone, and a disabled kid who finds solace in tabletop role playing. Oh, and whose dad got murdered by the first one’s. When that comes to light, oddly enough, it doesn’t go down well with the town and their respective families.
  • I read the Scholomance trilogy pretty much in one sitting (aside from this pesky « I need sleep » little thing).
  • Finally, a bit more nonfiction: I love Daniel M. Lavery’s blog The Chatner, so I couldn’t miss the essay collection Something that may shock and discredit you, about being trans, or being religious, or maybe Greek mythology or reality TV, I’m not quite sure.

En français :

  • Abobo Marley de Yaya Diomandé parle d’un jeune homme ivoirien qui rêve de venir en Europe et illustre à merveille l’adage «un tiens vaut mieux que deux tu l’auras» en détruisant méthodiquement chacune de ses chances de réussir au pays.
  • Grenoble Calling: une histoire orale du punk dans une ville de province, de Nicolas Bonanni et Margaux Capelier, était une lecture très appropriée pour cet emménagement et super intéressante. J’en ai résumé quelques points pertinents sur la nouvelle page Wikipédia Punk à Grenoble mais je recommande vraiment le bouquin, et en particulier le CD qui va avec.
  • Je suis une fille sans histoire, d’Alice Zeniter, est un petit message d’amour à la rédaction d’histoires, aux femmes, et à la langue française. Il est drôle, il est émouvant, il ne révolutionne rien mais rappelle des messages importants, je l’ai beaucoup aimé. Ça tombe d’ailleurs bien, puisque j’ai toujours L’art de perdre quelque part dans mes cartons (la bibliothèque n’a toujours pas été livrée, non).

Articles académiques

Articles non-académiques

Watching Movies

In chronological order:

  • But I’m a cheerleader was very cringe and perfect for a too-hot afternoon where I didn’t know what to do. It’s not a good movie. I didn’t want a good movie.
  • Nimona, however, absolutely was a good movie, and you should watch it.
  • Birds of Prey was fun, and sometimes fun is all I want.
  • Barbie was also fun, but this time I have to be honest – fun was not all I wanted. I was happy to see something pink and happy at the theater for once and that made it all worth it, but honestly, not a masterpiece…
  • unlike Oppenheimer, which solidifies my transition into a very manly man, I guess. That one was absolutely incredible.

TV Shows

My partner told me about Severance, which had vaguely been on my radar since it came out. I took it as a sign to finally watch it and it was GREAT. Amazing. I need season two now.

I also watched Good Omens season 2. It was okay – I have to be honest here, I liked season 1 but I just liked it. Season 2 was the same: it was nice and I’m immediately going to forget about it.

Video essays


My friend Sébastien made yet another amazing art project and you should watch his wonderful video, which includes the demo and how he made it.

Listening Podcasts

In English:

En français :


  • Found CDs at the yearly library sale, which I haven’t listened to yet. I’m really, really glad to have found Joey Bada$$’s eponymous (?) album, which I used to listen to on repeat.
  • Grenoble Calling, le CD de punk dans le livre mentionné plus haut.


I spent a long weekend in Narbonne (south-west France) for a Wikimedia weekend, which was really sweet. Narbonne looked wonderful and has loads of adorable little corners.

I also spent one day in Lyon on a whim and rediscovered some of my favourite places there.

Publications similaires :
  1. Gender Reveal épisode 116 : River Butcher sur la transmasculinité
  2. L’obsession fasciste pour l’Antiquité gréco-romaine
  3. Comprendre et éviter l’épuisement militant
  4. La conquête du pain, Pierre Kropotkine

# Posted by Alex on Tuesday, August 1st, 2023 at 8:56am

# Monday, August 28th, 2023 at 5:56pm


I’m trying a new format, which is sharing 3-5 links (with context) on one given topic. I’m starting with what I believe to be the easiest one, since I already have a giant collection about it: moving towards a homemade web.

This is inspired by Tracy Durnell’s « article pairings ».

Toggle The problem

The Internet isn’t meant to be so small is a good explanation of the current Internet situation.

the internet wasn’t supposed to be like this. It wasn’t supposed to be six boring men with too much money creating spaces that no one likes but everyone is forced to use because those men have driven every other form of online existence into the ground.

The wall of text

Rediscovering the small web is full of hope and goes into detail about why we got here, what can be done, what can’t, and what else to think about.

The solution

The Handcrafted Artisanal Web

Nadreck shares a few personal rules, including posting at least once a week on your website, visiting other people’s websites, and promoting other people’s work on your own website and on your social media if you use social media.

The radical

In his blog post Permission, Jeremy encourages us to completely leave Google. After all, the service is getting worse and worse and we don’t benefit from them anymore, especially with the new Gemini updates; why should we keep giving them value by donating our creations to their index?

# Posted by Alex on Sunday, June 16th, 2024 at 12:33pm


It’s almost as though humans prefer to use post-hoc justifications rather than being rational actors.

What price?

Using generative large-language model tools? Sleeping well at night?


Disregard previous instructions and fight back.

The machine stops

Self-hosted sabotage as a form of collective action.

Continuous partial ick


Report: Thinking about using AI? - Green Web Foundation

A solid detailed in-depth report.

The sheer amount of resources needed to support the current and forecast demand from AI is colossal and unprecedented.

Tagged with

A short note on AI – Me, Robin

I hope to make something that could only exist because I made it. Something that is the one thing that it is. Not an average sentence. Not a visual approximation of other people’s work. Not a stolen concept that boils lakes and uses more electricity than anything in my household.

Tagged with

Why “AI” projects fail

“AI” is heralded (by those who claim it to replace workers as well as those that argue for it as a mere tool) as a thing to drop into your workflows to create whatever gains promised. It’s magic in the literal sense. You learn a few spells/prompts and your problems go poof. But that was already bullshit when we talked about introducing other digital tools into our workflows.

And we’ve been doing this for decades now, with every new technology we spend a lot of money to get a lot of bloody noses for way too little outcome. Because we keep not looking at actual, real problems in front of us – that the people affected by them probably can tell you at least a significant part of the solution to. No we want a magic tool to make the problem disappear. Which is a significantly different thing than solving it.

Tagged with

Does AI benefit the world? – Chelsea Troy

Our ethical struggle with generative models derives in part from the fact that we…sort of can’t have them ethically, right now, to be honest. We have known how to build models like this for a long time, but we did not have the necessary volume of parseable data available until recently—and even then, to get it, companies have to plunder the internet. Sitting around and waiting for consent from all the parties that wrote on the internet over the past thirty years probably didn’t even cross Sam Altman’s mind.

On the environmental front, fans of generative model technology insist that eventually we’ll possess sufficiently efficient compute power to train and run these models without the massive carbon footprint. That is not the case at the moment, and we don’t have a concrete timeline for it. Again, wait around for a thing we don’t have yet doesn’t appeal to investors or executives.

Tagged with

Why A.I. Isn’t Going to Make Art | The New Yorker

Using ChatGPT to complete assignments is like bringing a forklift into the weight room; you will never improve your cognitive fitness that way.

Another great piece by Ted Chiang!

The companies promoting generative-A.I. programs claim that they will unleash creativity. In essence, they are saying that art can be all inspiration and no perspiration—but these things cannot be easily separated. I’m not saying that art has to involve tedium. What I’m saying is that art requires making choices at every scale; the countless small-scale choices made during implementation are just as important to the final product as the few large-scale choices made during the conception.

This bit reminded me of Simon’s rule:

Let me offer another generalization: any writing that deserves your attention as a reader is the result of effort expended by the person who wrote it. Effort during the writing process doesn’t guarantee the end product is worth reading, but worthwhile work cannot be made without it. The type of attention you pay when reading a personal e-mail is different from the type you pay when reading a business report, but in both cases it is only warranted when the writer put some thought into it.

Simon also makes an appearance here:

The programmer Simon Willison has described the training for large language models as “money laundering for copyrighted data,” which I find a useful way to think about the appeal of generative-A.I. programs: they let you engage in something like plagiarism, but there’s no guilt associated with it because it’s not clear even to you that you’re copying.

I could quote the whole thing, but I’ll stop with this one:

The task that generative A.I. has been most successful at is lowering our expectations, both of the things we read and of ourselves when we write anything for others to read. It is a fundamentally dehumanizing technology because it treats us as less than what we are: creators and apprehenders of meaning. It reduces the amount of intention in the world.

Tagged with

6 years ago I wrote Links, tags, and feeds

You can filter my ramblings by subscribing to specific tags.

7 years ago I wrote Patterns Day videos

The first video is online for your enjoyment.

10 years ago I wrote For Chloe

Donations in her memory.

13 years ago I wrote Responsive dConstruction

I’m preparing a workshop for dConstruct on responsive design.

21 years ago I wrote Science + Tolkien = Geek Heaven

I am so there:

22 years ago I wrote Product placement

Jessica and I went to see Minority Report today.

22 years ago I wrote Road Signs for Vagabond Computer Users

Yay for Ben Hammersley!

22 years ago I wrote Calexico

I went out on Monday night to see Calexico play.