-
Notifications
You must be signed in to change notification settings - Fork 1k
[localize] Fix "&", "<" and ">" getting replaced with html escape sequences. #5058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
@aomarks any thoughts? @IIIMADDINIII is there some kind of test you can add so this won't regress if it's a good fix? |
🦋 Changeset detectedLatest commit: a366b9b The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
After digging a little deeper i figured out the following: @justinfagnani There is already a test for this, but the expected value included the escape sequences: There also already was a test case for keeping escape sequences in html translations:
This Pull request is changing this, by not removing the escape sequences during extract (copy source string instead of using the parser result). So the escape sequences do not need to be added back by escapeTextContentToEmbedInTemplateLiteral which is flawed. Advantages of this solution:
Disadvantages of this solution:
I would consider this a breaking change, because if some project has existing translations, these would need to be fixed to add the escape sequences which where previously removed. I think this pull request implements the correct behavior, but if the braking change is not desired, It might be a good idea to instead only apply the escaping in escapeTextContentToEmbedInTemplateLiteral for html Templates. |
await checkTransform( | ||
'msg(str`Hello <b>${msg("World", {id: "bar"})}</b>!`, {id: "foo"});', | ||
'`Hola <b>Mundo</b>!`;', | ||
'`Hola <b>Mundo</b>!`;', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does look more correct to me. We should not be doing HTML escaping of expressions that only contain msg()
calls, string literals, and str
templates.
We do need to be careful about HTML that we emit into html
templates, both because that is a security boundary for Lit (a maliciously crafted html template can execute arbitrary code. this is fine becaus
8000
e html templates are themselves source code).
<trans-unit id="h02c268d9b1fcb031"> | ||
<source><Hello<ph id="0"><b></ph><World & Friends><ph id="1"></b></ph>!></source> | ||
<target><Hola<ph id="0"><b></ph><Mundo & Amigos><ph id="1"></b></ph>!></target> | ||
<source>&lt;Hello<ph id="0"><b></ph>&lt;World &amp; Friends&gt;<ph id="1"></b></ph>!&gt;</source> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks less correct. Why is it double-escaped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR
The original template for this translation is html`<Hello<b><World & Friends></b>!>`
which already includes escape sequences. To write this in an XML file, the &
symbols need to be escaped.
Long Version:
I have the opinion, that localize should not change how I write my HTML in templates. Lets say I want to put the cent Symbol in a temlate (¢). But for some reason i need to escape it. I would write: html`¢`
.
The old Version would convert all escape sequences back, so the ¢
would become a ¢
Symbol. This is then written to the translation file as a ¢
Symbol.
During build it would read the ¢
Symbol and output html`¢`
to the source. So it is not the same as how I wrote the template.
To fix this, we need to preserve the original escape sequences. So instead off writing the ¢
symbol to the translation files we need to write ¢
to the translation file. Doing so will cause the XML serializer to escape the &
Sign to make it valid XML. This results in &cent;
to be written to the translation file. If you open this with an XML viewer you will not See the double escape.
The XML parser will convert the &cent;
back to ¢
while reading the file during build. So localize can See the original content and will emit html`¢`
to the source. This is the behaivor i would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explanation sounds reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. I'm surprised that no other tests needed changing here, just these .xlf files. Do we not use them in any other parts of the flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the behavior is mostly the same and most of the test do not include escape sequences, not a lot needed to be changed. If it helps, i could add some tests to check preserving escape sequences.
Regarding the Question: I do not know exactly. My understanding is, that the translation files are updated when extract is run. And when build is run it will do an extract first (without updating translations) to check for missing translations and then read the translations from the files.
Is there anything i need to to to get this merged?? |
Fixes #5012
"&", "<" and ">" do not need to be escaped in Template Literal Strings.
This causes invalid translations when these symbols are used in non HTML Context.
I think it is the code author/translator responsibility to properly escape special characters depending on the context.
lit/localize can not know in which context the strings are used.