Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate properties #61

Open
JKingweb opened this issue Jul 5, 2023 · 2 comments
Open

Duplicate properties #61

JKingweb opened this issue Jul 5, 2023 · 2 comments

Comments

@JKingweb
Copy link
JKingweb commented Jul 5, 2023

Currently if an element has two or more of the same property, all major parsers add the property multiple times:

<div class="h-test">
  <div class="p-name p-name u-name">W</div>
</div>
{
  "items": [{
    "type": ["h-test"],
    "properties": {
      "name": [
        "W",
        "W",
        "http://example.com/W"
      ]
    }
  }],
  "rels": {},
  "rel-urls": {}
}

While it's good that implementations are consistent with each other, this is unfortunately inconsistent with other aspects of parsing:

Do we want to deduplicate properties in v2 processing? If yes, how would collisions involving different prefixes be resolved? In my implementation I had implemented a simple ranking system with e- winning over u- winning over dt- winning over p-, before I realized other implementations did nothing at all to resolve the duplication.

Either way I intend to write tests to cover this.

@gRegorLove
Copy link
Member

Interesting find. I think this is somewhat intentional since properties can be multi-valued, allowing publishers to put the same property on different elements, e.g. multi-photo posts:

<div class="h-entry">
  <img src="/photo1.jpg" class="u-photo">
  <img src="/photo2.jpg" class="u-photo">
</div>

I think a case could be made for de-duplicating class names on an individual element before following the parsing rules. A case could also be made that keeping the parsed duplicate will help publishers find likely mistakes in their markup. I don't have a strong opinion currently.

De-duplication in the spec could look something like:

"parse a child element class for property class name(s) "p-*,u-*,dt-*,e-*. If any are found, normalize the list of classes, then continue parsing the element"

It would need to be precise about the normalization process:

  • split the classList by space character
  • trim whitespace and newlines around each class name
  • compare class names in a case-sensitive manner and remove duplicates
  • use the normalized classList to continue parsing properties, e.g. "parsing a p- property," etc.

That's just off the top of my head. There's probably an HTML spec to reference for more precision on this.

This would change your example into this before parsing the p-*:

<div class="h-test">
  <div class="p-name u-name">W</div>
</div>

@JKingweb
Copy link
Author
JKingweb commented Jul 5, 2023

I think this is somewhat intentional since properties can be multi-valued, allowing publishers to put the same property on different elements, e.g. multi-photo posts:

Yes, I meant specifically the same property element, not the same microformat (where multiple of the same property on different elements is absolutely expected).

There's probably an HTML spec to reference for more precision on this.

There is, yes. It's not complicated, though it performs no deduplication itself; the salient detail is really the reference to splitting on whitespace, which in turn references the definition of whitespace. Note that the same logic should be used in evaluating link relations (both are DOMTokenLists).

I don't have a strong opinion currently.

I don't, either, for what it's worth. I do feel it needs to be specified one way or the other, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants