Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying strip_html filter to escaped html will unescape the string #306

Closed
ugenl opened this issue Jul 18, 2024 · 6 comments
Closed

Applying strip_html filter to escaped html will unescape the string #306

ugenl opened this issue Jul 18, 2024 · 6 comments

Comments

@ugenl
Copy link
ugenl commented Jul 18, 2024

Encountered this on liqp 0.9, though it could go further back.

Ex.
{{ "<em>test</em>" | escape }} --> &lt;em&gt;test&lt;/em&gt;

{{ "<em>test</em>" | escape | strip_html }} --> <em>test</em>

@msangel
Copy link
Collaborator
msangel commented Jul 27, 2024

This happened because we use Jsoup library for stripping html, while the Ruby's implementation is simple and naive:

    STRIP_HTML_BLOCKS = Regexp.union(
      /<script.*?<\/script>/m,
      /<!--.*?-->/m,
      /<style.*?<\/style>/m
    )
    STRIP_HTML_TAGS = /<.*?>/m

    def strip_html(input)
      empty  = ''
      result = input.to_s.gsub(STRIP_HTML_BLOCKS, empty)
      result.gsub!(STRIP_HTML_TAGS, empty)
      result
    end

And we probably should go naive implementation too

@msangel
Copy link
Collaborator
msangel commented Jul 27, 2024

Will it be safer?
No.
Will it be more compatible?
Yes.

@msangel
Copy link
Collaborator
msangel commented Jul 27, 2024

Fixed in 0.9.1.0. Side effect - jsoup dependency removed as not in use. If someone used it as transitive dependency in own projects, must add that back manually:

    <dependency>
      <groupId>org.jsoup</groupId>
      <artifactId>jsoup</artifactId>
      <version>1.15.3</version>
    </dependency>

As for this library the Jsoup has single use here.

@msangel msangel closed this as completed Jul 27, 2024
@ugenl
Copy link
Author
ugenl commented Sep 4, 2024

very much appreciate the fix here! Now that the fix is in place though, is there any other way to unescape strings at this point?

@msangel
Copy link
Collaborator
msangel commented Sep 4, 2024

@ugenl probably not. as unescaping is destructive operation - you never know which symbol before unescaping was represented via escape sequence and which not.

@ugenl
Copy link
Author
ugenl commented Sep 4, 2024

yeah, fair enough - and people can directly substitute via replace if necessary anyway. Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants