Creating Portable Social Networks With Microformats

June 21st, 2008

A presentation from XTech 2008 in Dublin.

Download the slides from this presentation.
Listen to the audio recording of this presentation.

Hello. Fáilte romhabh go léir.

I’m going to talk about, supposedly, creating portable social networks with microformats. It’s a very grandiose title for something that’s actually very, very simple.

Hands up: who was in David Recordon’s keynote yesterday? He had about a two-minute segment there where he showed himself on his own blog having rel="me" in links, and then he showed the OpenSocial thing. That’s pretty much it, really. So, he did it in two minutes and I’ve been given 45. We’ll go into a little more detail, but there really isn’t that much to it. It’s pretty straightforward.

So what this is about is all these different social networks—and there are quite a few of them now. I’m on quite a few social networks myself, and I want to see which ones you guys are on as well. So, a show of hands for any of these:

Pownce, anybody on Pownce? Okay, one or two. Magnolia? Magnolia, a few—like del.icio.us, but with more of the whole social aspect going on. Anybody know a site called Edenbee? No? Social network site for environmental stuff. It’s based out of Dublin, which is why I thought I’d mention it. Upcoming, for events—who uses Upcoming? Okay, quite a few. Last.fm? Music? Okay, good, good, lots of Last.fm-ers. Twitter. Oh, I’m surprised not to see every single hand in the room go up for Twitter. But how about Flickr? Okay, most people are using Flickr.

Alright, so, we’ve got a lot of different social networks here, and on each one, you have to sign up, and you have to enter your details when you sign up—and then you have to go and find your friends on each one, and you have to say, “Yes, I know that person, yes, I want to share my photos with that person, yes, I want to share my music with that person.” And by the fourth or fifth social network, it gets a little bit tiring having to through all this, right? I see some nodding heads—yes, yes, we’re fed up with this.

So, this is one angle of the whole idea of portable social networks: this idea of social network fatigue. But, to be honest, that’s not really such a big problem. It’s kind of something that’s going to affect those of us who use a lot of these social networks, but maybe we’re sort of canaries in the coalmine, and this isn’t really an issue that’s going to affect the average user.

But it’s about much more than that. This isn’t really about the whole social network fatigue thing. It’s about being able to move freely. It’s freedom of travel, the idea that you should be able to seamlessly move between these social networks and not have to deal with the hassles of having to fill in another form or go through another process of finding all your contacts.

And when I talk about social network portability, I’m not talking about, “I’m leaving this social network, I’m taking all my friends with me, and I’m going over to this new social network that everybody’s raving about.” It’s not that kind of portability. It’s more about ease of movement. It’s the idea that I can have as many social networks as I want, and the ease of getting into it and setting it up is nice and simple, and it’s not complicated. So that’s the reason why there’s a lot of different people, very smart people, trying to tackle this problem. It’s really a question of interoperability more than portability.

The ways that people have tried to tackle this problem of trying to make it easy to sign up to a site and then to get all your contacts listed on that site… Well, something that people have been doing for a while is to ask you for your user name and your password—for instance, from a webmail client you might use, like Google Mail or Yahoo Mail or Hotmail.

…Short hiatus while Jeremy is rickrolled by Aral…

Aral just rickrolled me, the bastard!

Audience member: That was antisocial networking.

Jeremy: Yeah. It’s always me! Is everybody familiar with the concept of rickrolling? Okay. Sorry, I just got live-rickrolled, and I think it’s going to be on the Internets in a few minutes.

Where was I? Oh yes, user name and password for third-party sites. So you sign up to a new social network and it says, “Do you use Google Mail, Gmail? Do you use Yahoo Mail? Well then, why don’t you just give us your user name and your password.” Now, crucially here, they’re asking for your Gmail password not on Gmail, they’re asking for it on example.com. They’re asking on the latest social network site that’s probably disemvoweled and ends with the letter “r” without the letter “e” attached.

This is just wrong. This is something that gets referred to as the password anti-pattern, because essentially what you’re doing here is you’re teaching users how to be phished. You’re telling them that it’s okay for you to throw around your password on any site. So instead of saying, “Only ever enter your Gmail password on Gmail,” this is saying, “It’s okay to throw around your Gmail password willy-nilly.” And in the long term, this is really, really harmful.

And it’s just really, really insecure. How do you trust this site? How do you know it’s not going to spam all your friends? How do you know it’s not going to go into your inbox and use that account that is also the same account you use for Google Checkout, if you use Google Checkout.

To get around that, what people have been doing is using APIs, which is a much more secure way of dealing with this issue, where you can authenticate, give authorization to a site like, say, Gmail or Yahoo Mail or whatever. This combination of using an API together with some kind of authentication is much safer and much more secure, and this is why you see this combination of some sort of API and some sort of authentication, like OAuth, or like BBAuth, or AuthSub—there’s all these different kinds of authentication things.

Now, for instance, with Gmail you can do this; there’s a Google Contacts API. So in this case, you say, “Hey, do you have a Gmail account?” They say, “Yes.” “Well, okay, click this button to start importing friends.” And you get sent off to the Google domain—and there you might have to input your user name and password, but that’s an order of magnitude better than doing it on a third-party site. The flow works much better.

OAuth is aimed at making this flow work the same for all these different social networks and sites so that you don’t have to write a different API call for every different social network. And it’s pretty much based on the Flickr model. If you’ve ever used some desktop application that uses the Flickr API, you know that first you have to authorize it, and that involves going to the Flickr site and saying, yes, I give permission to allow this application to look at my photos, or maybe I give it permission to also upload photos. You set the permissions. So this is good. This whole combination of some sort of authentication, like OAuth, and some sort of API works really, really well.

But it is quite complex. That’s not a bad thing for us; we’re all pretty smart developers, and we can implement this kind of stuff. But there is a certain barrier to entry with getting this stuff done. It’s not something you can do overnight.

I would say this is generally the best way if you are trying to get, say, email addresses out of an address book. Whether that really defines who your friends are is another question again. I don’t know how it is for you, but for me, email is no longer really the defining factor of whether somebody is my friend, somebody I know. I’m very good friends with a bunch of people and I don’t even know their email addresses. I might know them on Twitter and Flickr and Last.fm and all these other places, and if I stopped and thought about it and I wanted to write them an email, I’d think, actually, I don’t even know their email addresses. So if you think of email as this way of getting at contacts, then this way works pretty well. But like I said, I’m not even sure that an email address is still an identifier for having a friend.

There’s this other way which kind of complements all these other methods like APIs, which is to use microformats. Microformats are not very full-featured, not a very complex way of transferring information or storing data. It’s really all about being lazy, frankly. There’s a couple of microformats principles, and they’re all pretty much based on being really quite lazy.

Hands up if you’re familiar with microformats. Okay, good, that’s good. I was kind of assuming that I wouldn’t have to go into much detail. But just to give a quick overview of the philosophy behind microformats, and this philosophy of laziness: They’re built on the idea of reusing. At all costs, avoid reinventing the wheel. If somebody’s already solved the problem and there’s some kind of standard out there, steal it. Just take it verbatim, use it. And they’re deliberately simple, they deliberately don’t try and solve every problem. “Avoid boiling the ocean” is one of the principles.

There’s this idea, the Pareto principle—which comes from economics, the Italian economist Pareto—also called the 80/20 principle. I think what he noticed was that 80 percent of the wealth was with 20 percent of the population. These numbers show up in quite a few different places, and the Pareto principle applies to microformats in the sense that if we can hit 80 percent of the use cases with 20 percent of the effort, that’s good enough. Because as soon as you get into that extra 20 percent, the edge cases, the effort required to cover those edge cases gets exponential. So whereas other formats will aim to hit 100 percent of the possible use cases—that you should be able to encode absolutely any possible, conceivable scenario—the formats tend to get kind of complex, because the effort required to design a format that can cover all those scenarios increases exponentially.

With microformats, the idea is, “You know what? There are some cases where this just won’t work.” There’s an event microformat, for instance—hCalendar—and you might have a situation where you say, “Aha, but what about this kind of event, where it starts here and it’s in a leap year on a full moon, and how does the hCalendar microformat cope with that?” Well, it doesn’t. Basically, we’re not even going to try. You’re falling into that 20 percent edge-case stuff, and you’ll need a different format.

Again, it’s kind of being lazy. There’s this whole idea—I think was Joel Spolsky who said that good programmers were lazy programmers because you just always look for the simplest solution, and that’s the sign of a good programmer. That’s definitely the philosophy behind microformats. It’s like, “Aww, you mean we have to work on this stuff? Can’t we just steal somebody else’s work?” Because that’s what we do: just reuse, borrow, steal, build on existing formats. That’s a key thing: to not try and come up with something brand new, to always build on what’s already out there.

Crucially, as well, when deciding what needs formatting, what should be a microformat and what shouldn’t, is look at what people are publishing. Not what we think people should be publishing—we’re not trying to encourage people to publish this kind of data or that kind of data. It’s much more about what kind of data are people publishing anyway that could do with being formatted a bit better.

If you look at the language that people are publishing in, the existing language, that is very much HTML on the World Wide Web. It’s far more popular. Theoretically, you can put a Word document on the Web, a PDF, you can put all sorts of things on the Web. But HTML is the lingua franca of the Web because it’s a good, simple markup language.

There’s this idea that microformats, because they’re in the HTML, people are coming to this idea that because we’re publishing these microformats in the HTML—and HTML on the World Wide Web is kind of like a RESTful interface because you’ve got a URL, and at that URL there’s a Web document which happens to be HTML rather than XML or JSON, but still, it’s an addressable document—could your website be your API? Could you point at a URL and say, “Extract this data.” And to a certain extent, you can with microformats, though it’s very much a read-only API, whereas with a fully-featured API you can read and you can write.

With microformats, it’s kind of possible to build a super-simple, almost dumb, read-only API. Because if you look at what’s being published inside the HTML, you’ll find there’s a bunch of different stuff. On a social network site, there will be a page that has my details. So that data is already being published. This isn’t something we need to encourage social network sites to publish; they’re publishing it already. Who my contacts are on a social network site—that’s already there, that’s already in the markup at a publicly available URL. And then the details of those contacts are also being published in the same way that my details are.

So each one of these is being published, but if they’re just being published in regular HTML, there’s some semantic fidelity being lost because there are no elements in HTML to say this is a human being and this is that human being’s contact. We’ve got a nice set of elements in HTML, but they don’t provide quite that level of semantic fidelity.

Let’s go through these. For my details, somewhere on a social network site—how do we identify a person? To do this, we have the hCard microformat, which is essentially an HTML version of vCard. vCard is the address book format that you’ll find on your desktop in Outlook Express, you’ll find it on your mobile phone in your address book there, you’ll find it anywhere where addresses are used. So that whole idea of being lazy and reusing existing formats very much applied to the creation of hCard, where, when we were deciding what values should be in hCard, it was every single value that’s in vCard—and that’s it, we’re done. Game over.

Let’s say on a social network site, I’ve got a link like this that’s linking off to my profile. This social network site, example.com, has got a profile page for me. My user name on this site is adactio, because my user name is adactio on pretty much every social networking site. I can wrap that with some kind of containing element—in this case it’s a span, but it could be any element, we don’t mandate what elements you need to use—and this is going to be an hCard. The reason it why says vCard rather than hCard is because, as I said, we took all the values from vCard verbatim. Because vCard begins with the root element of “vCard”, it’s always called vCard in hCard. And I say, “This is the formatted name and the nickname, and this is the URL.”

There’s actually a whole lot of optimization going on here, because technically in vCard, you must a formatted name—or “name” is the only required value, I think. There are all these optimization rules, like, if you see “fn” and “nickname” together, that means, “This is the nickname and it’s the formatted name because we don’t have the full name.” And for the URL, it knows to look in the href rather than in the text between the opening and closing tags.

There are just a couple of classes there, and, of course, the great thing about the “class” attribute is the fact that you can space-separate values. And I could be throwing on my own classes there and that would be absolutely fine; I could be putting in non-hCard values: fn, nickname, url, user name, profile—any of my own class names are absolutely fine.

What this does is that it basically says: this is a human being, these are the contact details for a person, and this is that person’s nickname, and this is a URL that represents that person—just with the addition of a few class names.

This is something that I think people sometimes forget with hCard. If you’re familiar with hCard, the usual way it’s sold or pushed is that, well, it’s like vCard, so you put up your contact details and then people can translate it to vCard. There are plug-ins for Firefox, there’s a Technorati service that will convert hCard to vCard. So people wonder sometimes: what’s the point of putting up an hCard that’s only got this much information? Because here you’ll see that there’s no email address, there’s no telephone number, there’s no street address—none of these kind of real contact details are in there. All this really does is identify one piece of string on a page as being a person as opposed to some other piece of string which is just a text.

So, there is value in using really, really simple hCards. If you converted this to a vCard, it wouldn’t be much use. You’d have that in your address book, but you couldn’t call me, you couldn’t email me—it’s not much use. But there is still value in identifying this piece of text as being a human being.

What we’re doing here is essentially making up for the fact that there is no “person” element in HTML. The elements I’ve had to use are things like “span” or “a”—like I say, use whatever elements you want. But the great thing about HTML is that we are provided with our own semantic building blocks, like the “class” attribute, which, as the spec said, is for general-purpose processing by user agents. So it there for you to add your own semantics. It kind of got co-opted by CSS for many years, where people thought that the only reason you ever used classes was for CSS. But that’s not true. There is a CSS class selector, but you can use classes as hooks for Javascript, as hooks for whatever you want. It’s for adding your own semantic data.

On this page on this social network site, it’s linking off to a profile page that represents me. And if we go to that profile page, I’d probably have another hCard there. In this case, the element might be h1, say, because now we’re on the page that is about this person. Like I say, you can use whatever element you want. And I’m saying, this is the URL, and it’s linking off to my other URL, my homepage, my blog or whatever. And here I’m saying “fn”, formatted name—here we’ve actually got my full name rather than just my user name, so there’s more information being added here. And this is the kind of stuff that a lot of social networks are publishing anyway.

So, this is useful; we’re getting a lot of different bits of data about me pulled together. But what’s really nice is to tie together the fact that, okay, this page we’re on, on example.com, this represents me, this is a page about me—but this other site as well, adactio.com, that’s also a URL about me. It’s a URL representing a person. So, by adding one value, rel="me", we make that explicit. We say, that URL is me also.

So, the rel attribute you’re probably mostly familiar with from using in the link element at the top of your pages when you link off to a stylesheet; you’ll say link rel="stylesheet". And what that is doing is saying, this document I’m linking off to is a stylesheet for the current document. So it’s the relationship—the relationship of the linked document is “stylesheet”. Here, what we’re saying is the relationship of the linked document—the value in the href—is “me”. There’s a relationship being established.

Audience question: Why do you need the URL in the class anymore?

Jeremy Keith: What’s happening here is the fn and the URL are for the hCard, and this rel=”me” is part of XFN. So now what we’ve done is that actually we’ve got two different microformats going on. So: fn, URL, hCard, rel="me", XFN. And that’s one of the other nice things about microformats: you can mash them up. You can have an hCard and an hCalendar and XFN. They lend themselves well to being built Lego-style and put together like this. That explains why they’re all there: there’s two different microformats here.

So, XFN is this other microformat. I say it’s a microformat, but it’s actually kind of a proto-microformat in that it existed before microformats did. Microformats have to go through this whole Process before you have a finished microformat. It takes a while and, like I say, it has to fulfil all these criteria, it has to solve a real problem, it has to be using stuff that people are already publishing.

XFN didn’t do any of that. XFN was kind of born fully-featured by a bunch of people who got together—it was Tantek Çelik, Eric Meyer, Matt Mullenweg who got together and said, okay, we want to have a whole bunch of values for defining relationships between people. They did base it on what people were publishing, because around that time—this was a good few years ago—blogs were taking off and getting very popular, and what you’d have a lot in blogs was you’d have a sidebar with a blogroll. It’s always Americans who call it blogroll. Nobody over here seems to call it blogroll because I think it sounds too much like bogroll. There it sounds like logroll, but here it sounds like bogroll.

Anyway, what that means is you’ve now got something for representing the idea of contacts, which was the second part of what we were talking about—the kind of data that’s already being used on social networks.

Let’s say that on my profile page, I might have a list—an unordered list, an ordered list, whatever—of people who are contacts of me. I’ll mark them up as hCards as well—might as well, there’s no harm in there—but I will also add an XFN value, in this case “contact”. So here the relationship is the linked resource has the relationship of being a contact of the current resource.

XFN actually has a whole bunch of values—13, 14 values I think—but really, contact is enough. This is something that Chris Messina talked about recently after we had a panel discussion at South by Southwest, that instead of trying to convince people that you’ve gotta use XFN, you’ve gotta use XFN, it was like, well, actually, you’ve gotta just put these few characters into a link, rel="contact". That’s actually enough.

There’s a whole bunch of XFN values like friend, sweetheart, crush, co-worker, colleague, all these things. But I think for social networks, you don’t need them. I think on a personal site, it’s still good to have that. When I’m writing a blog post and I link off to someone, I like to have the ability to say I’ve met that person and I consider them a friend and they work in the same industry, so they’re a colleague. I think that level of fidelity is nice to have on personal publishing sites, but for social networks it can actually be really dangerous to programmatically try and say, “You are friends with that person.” That’s something that a person should be deciding for themselves. So I kind of agree with Chris that, at least in the case of social networks, rel="contact" is plenty. That’s enough.

If you have a social network that’s, say, a dating site, then maybe you would want to use rel="crush" or rel="sweetheart". But that’s kind of an edge case. Or if you have a professional site like LinkedIn, then there’s a use case for rel="colleague", rel="co-worker", these kinds of things. But, mostly, rel="contact".

So, essentially, out of XFN, two really, really useful values are rel="me" and rel="contact". And that’s it. There are a whole bunch of other values, but you don’t have to worry about it, I would say. For the purpose of building a social network, at least, don’t worry about them.

I’ve probably got a whole bunch of friends like this, and I’m linking to all of them: this is a contact, this person is a contact, and they’d all be marked up as hCards and all that. What happens a lot of the time is that I’ll have so many friends ‘cause I’m so popular that it’ll go on to more than one page. It’ll paginate. This is how it works on Flickr and on Twitter; you’ve got a page full of twenty contacts, click “Next” to see the next twenty and the next twenty and the next twenty. What you can do there is that you can semantically attach some more data here to say, okay, we’re continuing on. So this is another page representing me, and we use the rel="prev" and rel="next" values that have actually been baked into HTML for quite a while. They’d be used more in the “link” element, but there’s no reason why you can’t use them in the “a” element as well. So this works really well for pagination. This is another page representing me, the next page, and the previous page representing me as well.

So, again, this rel="me" thing is very handy. And, like the “class” attribute, the “rel” attribute can take multiple values, space-separated. Very, very useful. So there are few little things in HTML that are incredibly powerful. Incredibly powerful.

Finally, you’ve got their details. And that’s really just the same as what you did for your own details. In the same way as I had my page that was marked up with hCard and XFN, any one of my contacts—in this case, my friend Brian—he’s got his own profile page, and that’s got his hCard on it, and it’s linking off to his URL, and that’s tied up using rel="me". So, if every one of my friends has something like this, that’s quite a large network of friends.

And that’s pretty much it as regards what you need to do to add that extra bit of semantic goodness in there, to make it clear that this is a person, this is a contact of that person, here’s another page of contacts of that person. It’s just using hCard and XFN. Not even a full hCard, a very simple hCard, and not even many XFN values. Two values we’re talking about here, really: “me”, “contact”. “Prev” and “next” aren’t even XFN values, they’re just values. And we’re talking about two attributes of HTML—not even elements, attributes. Not new, custom attributes that we’ve invented; these are attributes that have been in HTML for quite a while.

It isn’t a replacement for an API. I’m not saying you don’t have to worry about making an API. But this makes a nice complement: the fact that, until you get around to building that API, why don’t you just add a few class and rel values and you’ve got a nice, simple, read-only way of allowing people to get at that data.

And generally I would also say: allow people to get at your data in as many formats as possible. So if you are building an API, of course you’ll export in XML, say, but you’ll want to export in JSON as well because developers want to have that format. Well, why not do this as well, so that if that’s what they want, rf they want to get at it in this hCard or XFN format, give them that option too. Basically, you can’t have too many data formats as far as developers are concerned. Give them anything you possibly can.

As regards publishing this stuff, it is that simple. It is that two-minute thing that David showed in his keynote yesterday: rel="me", rel="contact", some hCard values where applicable.

What about parsing this stuff, though? What if you want to consume this information from a social network that is publishing XFN and hCard? That’s tricky, because that’s hard work. Parsing can be tricky, parsing an HTML page—you’d usually have to run it through something like HTML Tidy to get it all cleaned up so you can then parse it as a well-formed document, because most HTML is actually pretty messy out there. Then what’s even harder than parsing is spidering, having to follow all those links, all those rel="next", all those rel="contact". Following all of those, programmatically, that’s actually a lot of work to do.

Fortunately for us, Google have taken care of it for us. This is kind of exactly the thing that Google are good at, because to do this spidering and parsing, you kind of need to have a copy of the World Wide Web stored somewhere on you server—which they happen to have, which is really handy. There’s a video of Brad Fitzpatrick talking about how they did this, and it’s basically, “Well, we took all the links on the World Wide Web and threw away all the ones that didn’t have rel values.” But that’s pretty much what they can do.

So, we can use what they’ve done. They’ve provided an API, the Social Graph API, which, again, David was demonstrating yesterday. And this will spider rel values—it’ll spider rel="me", it’ll spider rel="contact", it’ll spider all that stuff, it’ll spider the previous and next values—and it will send back a JSON file of what it finds. You can also use this to parse FOAF files, and it will follow all the FOAF files. What it does is it actually takes the FOAF files and converts them to XFN, so it always ends up as XFN right before the end anyway.

Let’s say you’ve got a new social network site, and you’re trying to get sign-ups, you’re trying to get people to join your site, and you want to make it as easy as possible. Give them the option to provide a URL somewhere else, and then use this Social Graph API to spider that URL and look for contacts, and look for other URLs that represent this person, and look for contacts over there. And then you’ve got to try and do some matching and figure out if you’ve found a contact for this person. This isn’t a Boolean thing, this isn’t a one/zero kind of thing where you can say, “Oh, that person on that social network is the same person as this person over here on this social network.” We don’t have any kind of unique identifier for this, but you can use some fuzzy logic here, I think.

If you’ve got a formatted name, which is the fn name in hCard, and it’s the same value over here as it is over here, that’s a pretty good chance. Not 100 percent, as I say, there are edge cases. But if there’s a Jeremy Keith on that site and there’s a Jeremy Keith on your site, it could well be the same person.

You’ve got the nickname, because a lot of people use the same user names on a lot of different sites, so that’s something to check out. If you get a combination of the two, that’s looking really good. And actually, URL is something that people identify with a lot, so when you allow people to put in a URL on a website—it’s usually their blog or something—and if you find a match for that, that’s looking pretty good.

None of these are 100 percent, absolute certainty things to say, “Yes, that is definitely the same person as that person over there.” But with a combination of matching this kind of stuff, you could probably make a pretty good guess.

What you don’t get is email, generally, because this isn’t something that’s generally published on the public Web because of the spam we’ve had to deal with for years. So most social networking sites, on your profile page, they will not display your email. And rightly so; that’s information you wouldn’t necessarily want published. So when it comes to parsing this stuff and matching values, you don’t have access to email, usually, whereas you would if you were doing the whole address book API thing.

But I think you can get by pretty well with what you’ve got. And email is often used as a unique identifier, but I’m not even sure that that’s a safe bet to make.

So, what I’m saying here is, basically, you don’t get 100 percent accuracy, you get maybe, about, 80 percent. So we’re kind of back to this 80/20 principle. You’ll probably get about 80 percent of the people where you say, “I think they’re pretty much the same person,” and you might get some edge cases that fall through and it doesn’t match them up, or you get false matches and say, “Ah, that person isn’t actually the same person.” But that’s pretty much it as regards parsing.

To anticipate some of the questions you might have and pre-empt some of them, questions that get asked a lot include: what about trust? How can you believe the assertion that this person is that same person over there on another social network. I’m signing up to your social network and it says, “Give me a URL,” and I put in a URL. How do you know that’s really my URL? I could be putting in your URL or your URL. How do we trust this input?

Well, we can’t. Or at least, that’s not something that microformats can solve. But this question of trust is something that’s true of everything on the Web. How do you trust any piece of information you read on the Web? And it’s different for every person. You can try and solve this programmatically—you’ve got secure certificates and things like that—but generally, this is actually a problem of just publishing on the Web anywhere.

Some people will read a Wikipedia article and they will trust that source because it is on Wikipedia. Other people will read a Wikipedia article and they will not trust that source because it is on Wikipedia. What I’m saying is that it’s different for every person. So you can’t make any assertions about the veracity of a claim just using the format. However, there are other technologies out there that do aim to solve this. So, OpenID aims to solve this one, to say you can authenticate the fact that that person claiming to reside at that URL really does reside at that URL.

By mashing up microformats for the formatting and OpenID for the authentication, then you can be pretty secure. But the whole idea of trust and authentication is not something to be solved at the formatting level. That is something to be solved at the protocol level.

Now, what about walled gardens? Because everything I’ve been talking about has been publicly published information, and I would say that this whole idea of hCard and XFN does work best on public websites. If you are running a walled garden like Facebook—Facebook does not allow much in the way of public access to profile information or friends lists, contact lists, all that kind of stuff—what do you do in that case? Well, I still think there’s no harm in publishing hCard and XFN because it only takes a few seconds to edit a template and put in those values. But, of course, the Social Graph API or other spiders can’t get to that information because you’ve locked it up in a walled garden behind a user name and password.

Again, some kind of mashup with an authentication layer like OpenID or SSL or password authentication would allow access. But generally, for walled gardens, you’re probably going to be relying on a more complex solution to get at that data: an API together with something like OAuth, or we saw the Open Social building-new-widgets thing and doing all that. But generally, for walled gardens, things get more complex. It’s just the way it is. For stuff that’s public, it’s generally straightforward.

So, who is publishing this stuff on the public Web? Well, every single one of those sites that I asked you about earlier on, every single one of those sites is publishing hCard and XFN. And you could plug one of your profiles on those sites into the Social Graph API and get back a whole list of contacts that you could then suck into another website. So there are a lot of people publishing.

Who’s parsing this stuff? Not many. Surprisingly few, considering how relatively straightforward it is now using the Social Graph API. Although the Social Graph API has only been around since the end of February, start of March. So, okay, that’s not very long. It’s not that surprising. Dopplr is doing some nice stuff, where you enter a URL and it tries to find people who are already on Dopplr who are on some social network site over there. Get Satisfaction does something interesting; it doesn’t have a friends list, a contact list or anything like that, but it does have a single-field sign-up which is really nice. The first question—you can fill in the whole form, saying what’s your name, what’s your email address, all this kind of stuff, or you can fill in a form over here which is one field, which is: What’s your URL? And if that URL is encoded as an hCard, it will suck out the hCard data. But it’s not yet doing the whole friends list stuff.

So there is room for some innovation here, for you people to get in there and start doing this stuff and get a leap. So the future is looking good. Like I said, this idea of one form field for sign-up. And actually, David mentioned this yesterday, and I didn’t even realize that Plaxo and Pulse were doing this, that it asks you for an OpenID URL, a URL that represents your OpenID, and while it’s got that URL, it says, “You know what? I’ll just run this through the Social Graph API and see if I find any rel links that I can match against people who are on my social network and see if you’ve got any friends here.”

So it’s this idea that I can come to a new social network site, and instead of filling out a long form with my contact details and then going through the whole process of saying that these are my friends, I could least try at the start to say, look, here’s my URL, you do the work. Don’t make me do the work. You go find who my friends are, you go find my contact information. Again, you won’t get 100 percent accuracy, but you’d get maybe 80 percent and you’d do pretty well.

So what you can do, certainly, is start publishing this stuff. The parsing, like I said, is pretty hard, but certainly publish this stuff, because it’s so easy to add these rel values and it’s so easy to add these class names.

But I would say that the real challenge here—because from a technological point of view, it’s really simple—the real challenge is in design. How do you design this stuff to flow nicely, how do you design it to not be creepy? That you make this flow nice… Some of the things would be: don’t use jargon. Don’t ever mention microformats or hCard or XFN or any of that stuff. People don’t care, and nor should they. They shouldn’t know what the technology is.

And don’t make assumptions. You’ve got an 80 percent match, you think that person is the same person as this person, we’ll make them friends on my network—don’t assume that. Always allow people to explicitly make that connection. So you might provide a list of names with checkboxes and allow people to check or uncheck those names. Don’t assume anything like that.

Should you notify people? When you’ve stored their network from over here, and now when a friend of theirs from over there joins a week later, do you let that person know? Do you send out an email saying, “Hey, your buddy from Flickr just joined our new social networking site!” Or is that creepy? These are kind of design challenges.

And what about allowing people to subscribe to it: rather than a one-time import, store that URL. So you say, “Do you have a URL on some other social network?” “Yeah, here’s my Flickr URL.” Instead of throwing that away once you’ve spidered it for XFN values, hold on to it, and every couple of days, spider it again and see if there’s any new people? In other words, allowing people to subscribe to a contact list somewhere else rather than just import. Again, design challenges, trying not to be creepy, that can be tough.

What I would say is that the technological side of things is super simple. It really is just a couple of attributes. The real challenge here is design. Big “D” design thinking sort of stuff. But that’s why we have designers. They’re going to help us solve this.

So, my website is adactio.com, I’ll post these slides up there and blog about this later. And microformats.org is where you can go to learn more about microformats, but you pretty much got everything you need from that little session there.

Thank you very much.

June 21st, 2008

« Newer Older »