authorship: Difference between revisions

From IndieWeb
(explicit theoretical section, shorten a couple of wordpress issues, move to theoretical lacking illustrative examples)
(copyedit)
(24 intermediate revisions by 7 users not shown)
Line 1: Line 1:
'''<dfn>authorship</dfn>''' is an algorithm that determines the author of a [[post]].
'''<dfn>authorship</dfn>''' is how to indicate who the [[author]] is for a [[post]], and an algorithm that determines the author of a [[post]].
Β 
See the specification to implement the algorithm:
* [[authorship-spec]]


== Why ==
== Why ==
If you write code which consumes [[h-entry]], e.g. your [[CMS]] receives [[Webmention]]s and you parse the source for post information including who the author is, '''authorship''' is how you determine from that information who the author(s) is/are.
You should clearly indicate who is the author of a post so when other sites summarize or reply to your post, they can properly recognize the author(s)!
Β 
== How to publish ==
<span id="How_to"></span>


== How to ==
=== How to publish ===
Publishing your authorship of a post is designed to be both easy and flexible to adapt to a variety of publishing methods and designs. Any of the following are fine. Β 
Publishing your authorship of a post is designed to be both easy and flexible to adapt to a variety of publishing methods and designs. Any of the following are fine. Β 


Choose whichever is the least work for you, your site, your theme(s), your code, and easiest for you to maintain!
Choose whichever is the least work for you, your site, your theme(s), your code, and easiest for you to maintain!


'''[[posts]]:'''
=== Authorship for individual posts ===
* [[h-entry]] markup on your posts including an explicit <code>p-author</code> or <code>u-author</code> (preferably enhanced with an embedded [[h-card]], e.g. <code>&lt;a class="u-author h-card" href="/">…&lt;/a></code> if your posts have your name, photo etc. already visible)
* OR if you have an h-card in your footer (e.g. from a global template) that links to your homepage, then add a minimal <code>&lt;a class="u-author" href="/">&lt;/a></code> inside your post h-entry.


'''[[streams]] of posts''' (like on an [[archive]], [[homepage]], etc.) without explicit authorship per post:
For individual '''[[posts]]''', (such as a post displayed on its [[permalink]]), you have a few options for how to provide authorship information.
* [[h-feed]] markup including an explicit <code>p-author</code> or <code>u-author</code> ((preferably enhanced with an embedded [[h-card]] as described above)
* OR if you have an h-card in your footer (e.g. from a global template) that links to your homepage, then add a minimal <code>&lt;a class="u-author" href="/">&lt;/a></code> inside your h-feed.


'''separate author description pages''' (e.g. when using <code>u-author</code>)
==== Full author information in each post ====
* [[h-card]] with information about you on your author description page (e.g. [[homepage]])
If your posts have your name and photo already visible, then your [[h-entry]] for the post can include a full [[h-card]] with the complete author info for the post, by adding an explicit <code>p-author</code> or <code>u-author</code> property with an embedded [[h-card]], e.g. <code>&lt;a class="u-author h-card" href="/">…&lt;/a></code>


=== Validate ===
Example:
Try this [[authorship testing tool]] to validate your authorship markup - it will tell you how the authorship algorithm finds your author information on a permalink:
* https://sturdy-backbone.glitch.me/


=== How to determine ===
<pre><nowiki>
How to <span id="Determining">determine</span> authorship of a post on a page - AKA the Authorship [[discovery]] algorithm / processing model for implementations.
<div class="h-entry">
Β  <a href="/" class="u-author h-card">
Β  Β  <img src="/me.jpg" class="u-photo">
Β  Β  <span class="p-name">Author Name</span>
Β  </a>
Β  <div class="e-content">This is an example note</div>
</div>
</nowiki></pre>


# start with a particular <code>[[h-entry]]</code> to determine authorship for, and no <code>author</code>. if no <code>h-entry</code>, then there's no post to find authorship for, abort.
==== Author information in a header or footer ====
# parse the <code>[[h-entry]]</code>
If you have an [[h-card]] on your website in a header or footer, you can avoid duplicating the author information inside the [[h-entry]] by establishing a link between the two: add a <code>&lt;a class="u-author" href="/">&lt;/a></code> inside your post h-entry (an invisible link to your home page), and ensure the [[h-card]] in your header or footer has a <code>url</code> property that also links to your home page.
# if the <code>h-entry</code> has an <code>author</code> property, use that
# otherwise if the h-entry has a parent <code>[[h-feed]]</code> with <code>author</code> property, use that
# if an <code>author</code> property was found
## if it has an [[h-card]], use it, exit.
## otherwise if <code>author</code> property is an <code>http(s)</code> URL, let the <var>author-page</var> have that URL
## otherwise use the <code>author</code> property as the author name, exit
# if there is no <var>author-page</var> and the <code>h-entry</code>'s page is a permalink page, then
## if the page has a [[rel-author]] link, let the <var>author-page</var>'s URL be the href of the [[rel-author]] link
##* Note: this is for backcompat [[rel-author]] support, and not recommended for new sites/posts.
# if there is an <var>author-page</var> URL
## get the <var>author-page</var> from that URL and parse it for microformats2
## if <var>author-page</var> has 1+ <code>[[h-card]]</code> with <code>url</code> == <code>uid</code> == <var>author-page</var>'s URL, then use first such <code>h-card</code>, exit.
## else if <var>author-page</var> has 1+ <code>[[h-card]]</code> with <code>url</code> property which matches the <code>href</code> ofΒ  a [[rel-me]] link on the <var>author-page</var> (perhaps the same hyperlink element as the u-url, though not required to be), use first such <code>h-card</code>, exit.
## if the <code>h-entry</code>'s page has 1+ <code>[[h-card]]</code> with <code>url</code> ==Β  <var>author-page</var> URL, use first such <code>h-card</code>, exit.
# otherwise no deterministic author can be found. Implementations are encouraged to document additional heuristics below for consideration for incorporation in to the authorship algorithm.


Note: the steps of checking for "url == uid == page's URL" and "url that's also a rel-me" were incorporated inline from the steps for [http://microformats.org/wiki/representative-h-card-parsing parsing a representative h-card]. Some improvements have been made here due to feedback from implementations in practice, and those improvements should be incorporated into an iteration of representative h-card.
Example:


==== Questions ====
<pre><nowiki>
* In step 7, "if author-page has 1+ h-card...", can you clarify the meaning of "has"? In my implementation, I was checking for the presence of this h-card by iterating through all top-level "items" on the page and checking those. However, I believe I found an example where this fails. When the home page h-card is actually the "author" property of the top-level h-feed, my consuming code doesn't find this h-card. Is this sentence intended to match this case? If so, clarification on what it means to say a page "has" an h-card would be helpful. Parsing the HTML for h-card will easily find this h-card, but once you're working with the mf2 parsed result, it is buried a little deeper in the data structure. <span class="h-card" style="white-space:nowrap">{{sparkline|https://aaronparecki.com/photo.jpg}} [[User:Aaronparecki.com|Aaron Parecki]]</span> 06:45, 24 February 2017 (PST)
<div class="h-entry">
** "has" like CSS <code>.h-card</code> - <span class="h-card" style="white-space:nowrap">{{sparkline|https://twitter.com/t/profile_image}} [[User:Tantek.com|Tantek Γ‡elik]]</span> 13:02, 26 July 2017 (PDT)
Β  <div class="e-content">This is an example note</div>
** so, if you're working with the JSON structure, this means you have to iterate through every mf2 object and all levels of nesting to look for an h-card then? <span class="h-card" style="white-space:nowrap">{{sparkline|https://aaronparecki.com/photo.jpg}} [[User:Aaronparecki.com|Aaron Parecki]]</span> 08:43, 21 November 2017 (PST)
Β  <a href="/" class="u-author"></a>
</div>


* {{sknebel}} 2017-05-13: "if there is no author-page and the h-entry's page is a permalink page": What makes a page a permalink page? url == u-url? only one root h-entry on the page (and no feed)?
...
** url of page == h-entry's u-url seems like a good start at that. <span class="h-card" style="white-space:nowrap">{{sparkline|https://twitter.com/t/profile_image}} [[User:Tantek.com|Tantek Γ‡elik]]</span> 13:02, 26 July 2017 (PDT)
*** I was looking at a post by {{adactio}} today and this would fail, as it does not have a marked up u-url. Neither is the <code>h-entry</code> the only root object as there is also an <code>h-card</code> at the same permalink. See {{citation|url=https://adactio.com/links/14366|title=The State of Fieldset Interoperability - Bocoup}}. β€”<span class="h-card" style="white-space:nowrap"><a href="https://vanderven.se/martijn/" class="u-url" style="padding-right:0;background:none">{{sparkline|https://vanderven.se/martijn/martijn.jpg}}</a> <span class="p-name">[[User:Vanderven.se_martijn|Martijn van der Ven]]</span></span> 08:39, 25 September 2018 (PDT)


* {{martijnvdven}} 2017-07-16: could this same authorship algorithm be applied to <code>h-feed</code>? [https://chat.indieweb.org/dev/2017-07-16#t1500238350593000 See chat.] Are there any obvious problems with this?
<footer>
** It would start at step 4.
Β  <a href="/" class="h-card">
** Step 5 is done as normal.
Β  Β  <img src="/me.jpg" class="u-photo">
** Step 6 would need β€œthe h-entry’s page” changed to β€œthe h-feed’s page”.
Β  Β  <span class="p-name">Author Name</span>
** Same for step 7.4.
Β  </a>
** (end algorithm summary)
</footer>
** Seems like a good idea, what are the consuming code use-cases for this? <span class="h-card" style="white-space:nowrap">{{sparkline|https://twitter.com/t/profile_image}} [[User:Tantek.com|Tantek Γ‡elik]]</span> 13:02, 26 July 2017 (PDT)
*** Consuming use-case would be looking for the author of a homepage feed. E.g. who authors <code>https://licit.li/</code>. {{martijnvdven}} 10:35, 12 August 2017 (CEST)


* {{martijnvdven}} 2017-08-09: people are no longer only using h-entry for posts, but h-event as well. Should authorship be extended to also parse h-event?
</nowiki></pre>
** The testing tool https://sturdy-backbone.glitch.me/ supports h-event [https://chat.indieweb.org/dev/2017-08-14/1502739756528000 as of 2017-08-14].


* <span id="list-of-h-entrys-with-h-card"></span> {{aaronpk}} 2018-02-18: Both https://adactio.com and https://miklb.com use a pattern that is not supported by the current authorship algorithm. On those pages, there is a list of h-entrys as well as an h-card at the same level. There is no author property on the h-entrys, so nothing currently links those entries to the h-card. Is this a pattern that should be recognized by the authorship algorithm? Or should we encourage them to change that markup?
(This example has a minimal [[h-card]] but you can include as many details in the h-card as you would like)
** {{t}} I think we should encourage [[h-feed]] markup for such cases because clustering related objects/properties ("list of h-entrys as well as an h-card at the same level") is exactly what a parent object is for. Also worth documenting as an h-feed use-case.


==== Test cases ====
==== Authorship on a dedicated page ====
* HTML files for testing out the above Authorship Algorithm (replace raw.github.com with rawgithub.com to serve the pages with text/html):
** https://github.com/sandeepshetty/authorship-test-cases (appears to need updates)


== Issues ==
See [[#Dedicated author page]]
=== Formal spec and issues ===
authorship is being supported by many implementations which would benefit from a formal specification (interoperability), test cases, a more structured issues list.
* perhaps a new repo on github.com/indieweb like https://github.com/indieweb/indieauth


=== Spoofing ===
Authorship can potentially be spoofed, as the current algorithm may only look at the markup
within an arbitrary page to determine the author.


For example, any page with markup like http://paste.debian.net/plainh/587c8bb3 would be parsed as being written by [http://waterpigs.co.uk/ Barnaby Walters]. Examples of such spoofing are
=== Authorship for streams of posts ===
present at:
* http://waterpigs.co.uk/notes/4QbH5C/ (search for "Whose articles are these")
was not actually written by aaronpk, but posted on a debian pastebin site and linked via a webmention.
* http://aaronparecki.com/notes/2013/10/12/2/indiewebΒ  (search for "How does parsing work")
is similarly spoofed.


http://checkmention.appspot.com allows you to test receiving a spoofed webmention from Jonathan Ive.
For a page with a '''[[stream]] of posts''' (like on an [[archive]], [[homepage]], etc.), you may want to avoid duplicating authorship information on each individual post.


=== multiple authors ===
==== Include authorship with h-feed markup ====
A lot of steps could be expanded (have discovery steps check for a list of authors, do "enrichment" steps for each of them individually) for dealing with multiple authors. Do we have examples of posts published with multiple authors?
If you have a top-level [[h-feed]], then you can include a <code>u-author</code> property on that feed, either as a full [[h-card]] or referencing an h-card in the header or footer as described above.
* {{sknebel}}: all the steps that turn URLs into h-cards could be run for multiple author-pages. The main question IMHO would be if authors can be discovered from multiple places, or if they have to be in the same place? The first would mean early steps could not exit the algorithm, but would have to check further places for additional authors, so my first instinct would be to not allow this. (Theoretical pattern where this could come up: main author on h-feed level, with a "in collaboration with $Second_Author" on an individual post)


== Theoretical Issues ==
Example:
Issues are moved here when they have no real world examples, or all examples given don't actually illustrate the problem case. Please find real world examples for them.


=== h-card without photo ===
<pre><nowiki>
Warning: theoretical, needs actual real world examples.
<div class="h-feed">
* https://ma.tt/ has no photo on permalinks or homepage - thus not helpful
Β  <a href="/" class="u-author h-card">
Β  Β  <img src="/me.jpg">
Β  Β  Author Name
Β  </a>


Issue:
Β  <div class="h-entry">
Β  Β  <span class="e-content">This is an example note</span>
Β  </div>
Β  <div class="h-entry">
Β  Β  <span class="e-content">This is an example note</span>
Β  </div>
</div>
</nowiki></pre>


A common practice, especially in WordPress themes(see example https://ma.tt/), is to have an h-card with name and url only, but not a photo on individual h-entrys inside a feed. The IndieWeb plugin for Wordpress allows to have the URL point to the homepage, but can not modify the theme to not add a mini hCard. The expectation of a minimal h-card would suggest you follow the URL inside the h-card to find the photo, assuming your goal is to offer a visual representation of the individual.
(This example has a minimal [[h-card]] but you can include as many details in the h-card as you would like)


* {{sknebel}}: there's also the pattern of having a larger h-card in the sidebar (in WP through the h-card widget). For both, a possible change to the algorithm could take u-uid (or a single u-url) from the mini-card and use it as the author-page in the later stages of the algorithm. For the steps where an additional page is fetched, it'd be interesting how often this actually helps vs how often it doesn't produce a result. This could also benefit from integration into the consuming use case: check if there's actually properties not yet known before going further to find them.
==== Include authorship without h-feed markup ====
If you don't have an [[h-feed]] and want to display your author info in your website header or footer, you will need to include an invisible <code>u-author</code> property in each [[h-entry]] that shares the same <code>url</code> property as your header or footer h-card.


=== WordPress author archives ===
Example:
These author archives have no h-card of their own, and have the same h-entrys with author property h-cards linking back to the author archive.


=== Other theoretical ===
<pre><nowiki>
Theoretical issues are grouped here for capturing purposes. If you find a real world example of one of these, feel free to promote it to an actual issue with its own === subhead.
Β  <div class="h-entry">
Β  Β  <span class="e-content">This is an example note</span>
Β  Β  <a href="/" class="u-author"></a>
Β  </div>
Β  <div class="h-entry">
Β  Β  <span class="e-content">This is an example note</span>
Β  Β  <a href="/" class="u-author"></a>
Β  </div>
Β  ...
Β  <footer>
Β  Β  <a href="/" class="h-card">
Β  Β  Β  <img src="/me.jpg">
Β  Β  Β  Author Name
Β  Β  </a>
Β  </footer>
</nowiki></pre>


* '''string-only author''': What to do if page h-entry has an author property which is not an h-card, but a string? Treat as empty and continue to fallback methods?
=== Dedicated author page ===
** Algorithm updated for handling a string-only author that is a URL. See 5.2.
If you don't want to show any author information on your post permalinks or feed pages, you can create an [[h-card]] on your [[home page]] and link to it from an invisible <code>u-author</code> property in each post or feed.
** This probably isn’t worth discussing much as I can’t find a single example of this in the wild, but as it’s supported in the h-entry spec (β€œ'''optionally''' embedded h-card(s)) it might be worth considering/specifying for when it does happen --[[User:Waterpigs.co.uk|Barnaby Walters]] 15:26, 17 January 2014 (PST)
** {{aaronpk}}'s new site is now an example in the wild, e.g. https://aaronparecki.com/2016/04/06/15/ [[User:Kylewm.com|Kylewm.com]] 14:04, 7 April 2016 (PDT)
* Step 7.4 looks for <code>h-card</code>s on the current page and will use the first one where the <code>url</code> property matches the author URL. The current page could have many such <code>h-card</code>s when minimal person cards are used for linking to other people (e.g. for [[person-tag]]s), and these <code>h-card</code>s may have been created by someone other than the currently sought after author. They could use a nickname for the person rather than the actual name they would have expected.
** This is an acceptable problem when 7.1–7.3 have all failed, as the algo has tried to get a full name from the author’s own specified page. It [https://chat.indieweb.org/dev/2017-11-12#t1510515347917000 was discussed] moving 7.4 to before 7.1–7.3 which may highlight the trade-of.
*** Checking whether the author page is [https://chat.indieweb.org/dev/2017-11-12/1510516706284000 on a different domain] could help against this. E.g. by not accepting an <code>h-card</code> on the current page if the current page is on a different domain from the author URL.
*** Another option is to use 7.4 as a value [https://chat.indieweb.org/dev/2017-11-12/1510516583872000 until you get around] to parsing the external page as outlined by 7.1–7.3. Then [https://chat.indieweb.org/dev/2017-11-12/1510516569815000 the value will be updated] for its actually specified value but the overhead of the web request can be offloaded.
** Applying the algo to <code>h-entry</code> objects for [[comments]] could potentially make for a bigger chance that a minimal person card has been used previously in source order. E.g. someone commenting on a post they were person tagged in.
** Looking through the current document for <code>h-card</code>s in other places can help out when the document’s mark-up does not allow the <code>h-card</code> to be nested directly under the <code>h-entry</code>, [https://chat.indieweb.org/microformats/2017-11-20/1511137462728000 because of layout/DOM limitations]. See [https://chat.indieweb.org/microformats/2017-11-21#t1511277129777000 a discussion] on the <code>#microformats</code> [[discuss|IRC channel]]. (This time brought on by [[User:Keithjgrant.com|Keith J. Grant]].)
*** Keith also wishes β€œthere was a way to say the mf of element X is associated with the h-* defined by element Y.” [https://chat.indieweb.org/microformats/2017-11-21/1511278502855000] This is basically what is being done by having the <code>u-author</code> property map to any <code>h-card</code> on the page with a matching <code>u-url</code>. In the mapping use-case it makes sense to do so before retrieving external resources.


== Resolved Issues ==
Example permalink:
As [[#Issues|issues]] as resolved, they'll be moved here. Next step is to rewrite their resolutions as [[#FAQ|FAQs]] below.


=== Consider head meta link tags ===
<pre><nowiki>
The algorithm described above does only rely on new markup, but does not consider the page's head meta and link tags with <tt>rel="author"</tt>. They have been used since HTML 3.02 and are still defined in HTML5:
<div class="h-entry">
* http://www.w3.org/TR/REC-html32#meta
Β  <div class="e-content">This is an example note</div>
** '''RESOLVED WON'T FIX'''. meta tags have long been abandoned in practice (with few exceptions, not author). Lacking any real world indieweb examples that depend/need this, makes no sense to burden consuming code with unreliable legacy hidden metadata. Better to encourage the less-bad rel=author link to a page with visible author information marked up with h-card. - <span class="h-card" style="white-space:nowrap">{{sparkline|https://twitter.com/t/profile_image}} [[User:Tantek.com|Tantek Γ‡elik]]</span> 12:56, 26 July 2017 (PDT)
Β  <a href="/" class="u-author"></a>
* http://www.w3.org/TR/html5/links.html#linkTypes
</div>
** '''RESOLVED WORKS ALREADY'''. rel=author was already in algorithm at step 6.1 when issue noted. - <span class="h-card" style="white-space:nowrap">{{sparkline|https://twitter.com/t/profile_image}} [[User:Tantek.com|Tantek Γ‡elik]]</span> 12:56, 26 July 2017 (PDT)
</nowiki></pre>


Previously: this issue was originally named <i><span id="Does_not_implement_standards">Does not implement standards</span></i>. This is an example of a bad name for an issue because it is (1) Negative framing - it complains about what something is not rather than requesting something positive, (2) As a general statement, it is false, as of course the authorship algorithm is based on standards. Left here as an example, please avoid such negative framing overgeneral statements that are provably false, and instead, name your issue as something you want (use-case) but which does not appear to be handled, e.g. in this case, it was renamed to the actual request which was to "Consider head meta link tags".
Home page:


== FAQ ==
<pre><nowiki>
=== What about meta author ===
...
Q: Can the algorithm consider the meta author tag as described in http://www.w3.org/TR/REC-html32#meta ?
<div class="h-card">
Β  <img src="/me.jpg" class="u-photo">
Β  <a href="/" class="u-url p-name">Author Name</a>
</div>
...
</nowiki></pre>


A: No. Avoid [[meta]] tags. Old style meta tags (including "author") have largely been abandoned due to data rot and spam, two common results of [[invisible metadata]].
(This example has a minimal [[h-card]] but you can include as many details in the h-card as you would like)


== Use Cases ==
== Validate ==
=== Name avatar display in comments ===
Try this [[authorship testing tool]] to validate your authorship markup - it will tell you how the authorship algorithm finds your author information on a permalink:
In [[comments-presentation]], it describes how a site that accepts indieweb [[reply]] posts via webmention can retrieve those replies and display them as full-fledged comments on a post, including name and icon/avatar of commenter.
* https://sturdy-backbone.glitch.me/


[[h-card]] is the most common way that name, URL, photo information is published about a person on the web. Thus parsing for an author's [[h-card]] to retrieve their name and avatar makes the most sense.
You can use the preview feature of [[Monocle]] to see if a feed or post will look good in a [[reader]]:
* https://monocle.p3k.io/preview


==== Minimal h-cards ====
== How to determine ==
Maybe handle u-uid on h-cards for fallback for representative h-card? My site showed no avatars in comments with a minimal h-card like this: <code>&lt;a class="u-author h-card" href="https://fireburn.ru"&gt;Vika&lt;/a&gt;</code> until I removed h-card class from the markup. -- [[User:Fireburn.ru|Fireburn.ru]] ([[User talk:Fireburn.ru|talk]]) 16:56, 5 December 2018 (PST)
See [[authorship-spec]].


=== Name avatar display in a reader ===
[[Category:building-blocks]]
In a [[reader]] (feature of an IndieWeb site), it's nice to show the name and icon/avatar of the person whose posts you're reading from their indieweb home page h-feed.
Β 
Typically this name/icon information is found via the authorship algorithm.
Β 
==== Unhandled Examples ====
In some (many?) cases, an indieweb h-feed of h-entry elements does not have explicit author information for a couple of reasons:
* No author information inside each h-entry, because it would be redundant, since all the entries (the entire h-feed) are from the same author.
* No rel=author link, because the page itself is likely the home page and thus page representing the author.
Β 
IndieWeb examples:
* {{t}}'s home page at http://tantek.com/
** tried <code><nowiki><a class="u-author" href="/"></a></nowiki></code> inside <code>h-entry</code> but there doesn't seem to be much interest in handling that. Also am leaning against having to include such seemingly empty / non-visible markup just to work around algorithm limitations.
* ...
==== Fallback to page representative h-card ====
Proposal: we could add one more fallback to lack of author h-card, or lack of rel-author, and that is to use the page being processed as the author-page if no other author page has yet been found.
Β 
I.e. change "7. if there is an author-page URL " to "7. if there is no author-page URL, use the page itself as the author-page URL" and then continue processing the rest of the algorithm accordingly.
Β 
This would handle the examples from above:
* {{t}}'s home page at http://tantek.com/
* ...
Β 
Β 
==== Fallback to icon for photo ====
Proposal: if the author doesn't have a photo in their entry h-card, and they also don't have a photo on their representative h-card, but you still want to display a photo as an avatar, how about looking for an [[icon]] in the head of the document?
Β 
If a rel="icon" value exists that is using type="image/jpg" (or maybe type="image/png", but *not* type="image/ico"), then that image could be used an avatar. If a sizes attribute exists, a parser can look for the most appropriate size for display as an avatar.
Β 
This could be used as a last resort if every other photo discovery mechanism fails to return an image.


(idea by [http://loic.mathaud.fr/ Loic])
Examples of sites with icons, but without photos in h-cards:
* https://adactio.com/
== Algorithm Design Notes ==
Why do we parse for the authorship details in the order that we do?
First, we prefer the p-author of the h-entry first because that is the most direct way of specifying the information, visibly, on the page. There's also established practice among indieweb sites of publishing a mini h-card with photo, name (sometimes as the alt text of the photo img), and URL to the person's indieweb site root / home page.Β  Also, it may be possible that the post is a guest post, in which case we really want the post-specific authorship information rather than anything general to the site.
Only if the post itself lacks direct authorship information do we fall back to checking for a [[rel-author]] link, which is a fairly well established practice for linking from posts to pages representing authors.
On such sites that use rel-author, they almost always point to a page that has a much richer h-card about the author than the post page itself, including a much higher likelihood of having a good photo / avatar image as part of that h-card. Thus we next prefer to go retrieve that rel-author destination, and look for a representative h-card there (per the "url == uid == page's URL" and "url that's also a rel-me" steps noted above).
Only if the rel-author page lacks an h-card do we then fallback to looking for a likely smaller (if present) h-card on the post page itself that has a u-url of the same value as the destination of the rel-author, thus indicating that it is an h-card for the author.
== Implementations ==
=== php mf2 getAuthor ===
[https://github.com/barnabywalters/php-mf-cleaner/blob/master/src/BarnabyWalters/Mf2/Functions.php barnabywalters/mf-cleaner getAuthor()] implements several extra steps whilst missing out the steps above which require fetching another URL β€” at the moment getAuthor completely lacks side effects:
* If the given h-entry has an author or reviewer (for compat. with h-review if it doesn’t become more consistent with h-entry) property:
** if the found value is a string, search all the h-cards on the page for one with a name property equal to the found value, and return it
* If the found value is a microformat, return it
* Look for page-scoped [[rel-author]], search all the h-cards on the page for one with a url property equal to the first rel-author value β€” if found, return it
* If a page URL is given, or the h-entry has a url property, search all h-cards on the page for one where the domain of their url property is the same as the domain of the found url β€” if found return that
* Otherwise return the first found h-card on the page, or null
=== Converspace ===
* First pass at the authorship algo: [https://github.com/converspace/converspace/blob/795a70965ede4cdbdacdb5858d3eb3863bd46d20/webmention.php#L66 get_h_card()] (announcement: http://www.sandeep.io/116)
** Should be extracting it into an independent lib soon -[[User:Www.sandeep.io|Www.sandeep.io]] 01:23, 24 July 2013 (PDT)
=== XRay ===
* The [[XRay]] library/API implements the authorship algorithm
** Tests: https://github.com/aaronpk/XRay/tree/master/tests and test data: https://github.com/aaronpk/XRay/tree/master/tests/data/author.example.com
=== ProcessWire Webmention ===
* The [[ProcessWire]] Webmention plugin handles the authorship algorithm as of 2016-04-10 with the exception of step 4
** {{gRegor}} runs this on gregorlove.com. I'm skipping step 4 for now because it's undecided what the desired behavior would be if an h-feed page sent a webmention.
=== Dobrado ===
* [[dobrado]] implements the authorship algorithm in it's Post module ReceiveWebmention function: https://gitlab.com/dobrado/dobrado/blob/master/install/Post.php#L841
=== SimplePie ===
* SimplePie implements the authorship algorithm in it's Parser class: https://github.com/simplepie/simplepie/blob/master/library/SimplePie/Parser.php#L440
=== mf-obj ===
* Implements authorship in getEntryFromUrl(): https://github.com/notenoughneon/mf-obj#getentryfromurl
== Brainstorming ==
=== feed authorship ===
* See: https://chat.indieweb.org/dev/2018-11-07#t1541618658454100
* XRay issues: https://github.com/aaronpk/XRay/issues/79
{{gRegor}}: Here's my attempt at finding the author of a feed. Step 2 is based on [https://github.com/aaronpk/XRay/commit/8043ba575f203d7fdea249877c83c52d3d86b47c#diff-cd791fa54f76e837af01ffee8ff5efc6 this update to XRay].
Parse the page for microformats2 and start with no author.
#if an h-feed microformat is found, then
## if the h-feed has an author property, use that
## if an author property was found
### if it has an h-card, use it, exit
### otherwise if the author property is an http(s) URL, let the <var>author-page</var> have that URL
### otherwise use the author property as the author name, exit
## if there is no <var>author-page</var>, then
### if the page has a rel-author link, let the <var>author-page</var> have that URL
## if there is an <var>author-page</var> URL
### get the <var>author-page</var> from that URL and parse it for microformats2
### if <var>author-page</var> has 1+ h-card with url == uid == <var>author-page</var>'s URL, then use first such h-card, exit.
### else if <var>author-page</var> has 1+ h-card with url property which matches the href of a rel-me link on the <var>author-page</var> (perhaps the same hyperlink element as the u-url, though not required to be), use first such h-card, exit.
### if the h-feed's page has 1+ h-card with url == <var>author-page</var> URL, use first such h-card, exit.
## otherwise no deterministic author can be found
# Otherwise if the page has a list of 2+ h-* and exactly one of them is an <code>h-card</code>, then use that <code>h-card</code>, exit
## otherwise no deterministic author can be found
----
== Silo Implementations ==
=== none currently ===
== Past Implementations ==
=== Google Search ===
'''Support dropped 2014-08-28[http://searchengineland.com/goodbye-google-authorship-201975]'''
[[Google]]'s search spider supported only part of the authorship algorithm, rel=author, in an oddly [[silo]]-specific way:
* if there's <code>rel=author</code> link on a page
** if it links to a [[Google+]] profile, use that profile for authorship information
** if it links to a [[home page]]
*** if the home page has a <code>rel=me</code> link to a Google+ profile
*** and the Google+ profile has <code>rel=contributor-to</code> link back to the home page
*** then use that profile for authorship information
[[File:googleplus-authorship.png]]
[[Category:building-blocks]]
== See Also ==
== See Also ==
* [[authorship-spec]]
* [[discovery]]
* [[discovery]]
* [[posts]]
* [[posts]]
Line 286: Line 156:
* [[h-card]]
* [[h-card]]
* [[rel-author]]
* [[rel-author]]
* https://sturdy-backbone.glitch.me/ – simple testing tool by {{sknebel}} showing how the algorithm plays out for a given URL
* https://sturdy-backbone.glitch.me/ – a testing tool by {{sknebel}} to check whether your authorship markup is working
* Why: Because heuristics will get it wrong, sometimes hilariously: https://twitter.com/AlexanderRKlotz/status/1252389218514427909
** "Google Scholar has parsed this cafeteria lunch menu as an author list, and it's delightful" [https://twitter.com/AlexanderRKlotz @AlexanderRKlotz] April 21, 2020
* https://twitter.com/thornet/status/1475924019308371978
** "β€œThe meaning of authorship depends heavily on how we draw the boundaries of our selfhood.”<br><br>Nice line from its footnotes." [http://michellethorne.cc/ @thornet] December 28, 2021

Revision as of 12:38, 6 January 2023

authorship is how to indicate who the author is for a post, and an algorithm that determines the author of a post.

See the specification to implement the algorithm:

Why

You should clearly indicate who is the author of a post so when other sites summarize or reply to your post, they can properly recognize the author(s)!

How to publish

Publishing your authorship of a post is designed to be both easy and flexible to adapt to a variety of publishing methods and designs. Any of the following are fine.

Choose whichever is the least work for you, your site, your theme(s), your code, and easiest for you to maintain!

Authorship for individual posts

For individual posts, (such as a post displayed on its permalink), you have a few options for how to provide authorship information.

Full author information in each post

If your posts have your name and photo already visible, then your h-entry for the post can include a full h-card with the complete author info for the post, by adding an explicit p-author or u-author property with an embedded h-card, e.g. <a class="u-author h-card" href="/">…</a>

Example:

<div class="h-entry">
  <a href="/" class="u-author h-card">
    <img src="/me.jpg" class="u-photo"> 
    <span class="p-name">Author Name</span>
  </a>
  <div class="e-content">This is an example note</div>
</div>

Author information in a header or footer

If you have an h-card on your website in a header or footer, you can avoid duplicating the author information inside the h-entry by establishing a link between the two: add a <a class="u-author" href="/"></a> inside your post h-entry (an invisible link to your home page), and ensure the h-card in your header or footer has a url property that also links to your home page.

Example:

<div class="h-entry">
  <div class="e-content">This is an example note</div>
  <a href="/" class="u-author"></a>
</div>

...

<footer>
  <a href="/" class="h-card">
    <img src="/me.jpg" class="u-photo"> 
    <span class="p-name">Author Name</span>
  </a>
</footer>

(This example has a minimal h-card but you can include as many details in the h-card as you would like)

Authorship on a dedicated page

See #Dedicated author page


Authorship for streams of posts

For a page with a stream of posts (like on an archive, homepage, etc.), you may want to avoid duplicating authorship information on each individual post.

Include authorship with h-feed markup

If you have a top-level h-feed, then you can include a u-author property on that feed, either as a full h-card or referencing an h-card in the header or footer as described above.

Example:

<div class="h-feed">
  <a href="/" class="u-author h-card">
    <img src="/me.jpg"> 
    Author Name
  </a>

  <div class="h-entry">
    <span class="e-content">This is an example note</span>
  </div>
  <div class="h-entry">
    <span class="e-content">This is an example note</span>
  </div>
</div>

(This example has a minimal h-card but you can include as many details in the h-card as you would like)

Include authorship without h-feed markup

If you don't have an h-feed and want to display your author info in your website header or footer, you will need to include an invisible u-author property in each h-entry that shares the same url property as your header or footer h-card.

Example:

  <div class="h-entry">
    <span class="e-content">This is an example note</span>
    <a href="/" class="u-author"></a>
  </div>
  <div class="h-entry">
    <span class="e-content">This is an example note</span>
    <a href="/" class="u-author"></a>
  </div>
  ...
  <footer>
    <a href="/" class="h-card">
      <img src="/me.jpg">
      Author Name
    </a>
  </footer>

Dedicated author page

If you don't want to show any author information on your post permalinks or feed pages, you can create an h-card on your home page and link to it from an invisible u-author property in each post or feed.

Example permalink:

<div class="h-entry">
  <div class="e-content">This is an example note</div>
  <a href="/" class="u-author"></a>
</div>

Home page:

...
<div class="h-card">
  <img src="/me.jpg" class="u-photo">
  <a href="/" class="u-url p-name">Author Name</a>
</div>
...

(This example has a minimal h-card but you can include as many details in the h-card as you would like)

Validate

Try this authorship testing tool to validate your authorship markup - it will tell you how the authorship algorithm finds your author information on a permalink:

You can use the preview feature of Monocle to see if a feed or post will look good in a reader:

How to determine

See authorship-spec.

See Also