bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

bluemonday Build Status GoDoc Sourcegraph

bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable.

bluemonday takes untrusted user generated content as an input, and will return HTML that has been sanitised against a whitelist of approved HTML elements and attributes so that you can safely include the content in your web page.

If you accept user generated content, and your server uses Go, you need bluemonday.

The default policy for user generated content (bluemonday.UGCPolicy().Sanitize()) turns this:

Hello <STYLE>.XSS{background-image:url("javascript:alert('XSS')");}</STYLE><A CLASS=XSS></A>World

Into a harmless:

Hello World

And it turns this:

<a href="javascript:alert('XSS1')" onmouseover="alert('XSS2')">XSS<a>

Into this:

XSS

Whilst still allowing this:

<a href="http://www.google.com/">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

To pass through mostly unaltered (it gained a rel="nofollow" which is a good thing for user generated content):

<a href="http://www.google.com/" rel="nofollow">
  <img src="https://ssl.gstatic.com/accounts/ui/logo_2x.png"/>
</a>

It protects sites from XSS attacks. There are many vectors for an XSS attack and the best way to mitigate the risk is to sanitize user input against a known safe list of HTML elements and attributes.

You should always run bluemonday after any other processing.

If you use blackfriday or Pandoc then bluemonday should be run after these steps. This ensures that no insecure HTML is introduced later in your process.

bluemonday is heavily inspired by both the OWASP Java HTML Sanitizer and the HTML Purifier.

Technical Summary

Whitelist based, you need to either build a policy describing the HTML elements and attributes to permit (and the regexp patterns of attributes), or use one of the supplied policies representing good defaults.

The policy containing the whitelist is applied using a fast non-validating, forward only, token-based parser implemented in the Go net/html library by the core Go team.

We expect to be supplied with well-formatted HTML (closing elements for every applicable open element, nested correctly) and so we do not focus on repairing badly nested or incomplete HTML. We focus on simply ensuring that whatever elements do exist are described in the policy whitelist and that attributes and links are safe for use on your web page. GIGO does apply and if you feed it bad HTML bluemonday is not tasked with figuring out how to make it good again.

Supported Go Versions

bluemonday is tested against Go 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11, 1.12, and tip.

We do not support Go 1.0 as we depend on golang.org/x/net/html which includes a reference to io.ErrNoProgress which did not exist in Go 1.0.

We support Go 1.1 but Travis no longer tests against it.

Is it production ready?

Yes

We are using bluemonday in production having migrated from the widely used and heavily field tested OWASP Java HTML Sanitizer.

We are passing our extensive test suite (including AntiSamy tests as well as tests for any issues raised). Check for any unresolved issues to see whether anything may be a blocker for you.

We invite pull requests and issues to help us ensure we are offering comprehensive protection against various attacks via user generated content.

Usage

Install in your ${GOPATH} using go get -u github.com/microcosm-cc/bluemonday

Then call it:

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// Do this once for each unique policy, and use the policy for the life of the program
	// Policy creation/editing is not safe to use in multiple goroutines
	p := bluemonday.UGCPolicy()

	// The policy can then be used to sanitize lots of input and it is safe to use the policy in multiple goroutines
	html := p.Sanitize(
		`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
	)

	// Output:
	// <a href="http://www.google.com" rel="nofollow">Google</a>
	fmt.Println(html)
}

We offer three ways to call Sanitize:

p.Sanitize(string) string
p.SanitizeBytes([]byte) []byte
p.SanitizeReader(io.Reader) bytes.Buffer

If you are obsessed about performance, p.SanitizeReader(r).Bytes() will return a []byte without performing any unnecessary casting of the inputs or outputs. Though the difference is so negligible you should never need to care.

You can build your own policies:

package main

import (
	"fmt"

	"github.com/microcosm-cc/bluemonday"
)

func main() {
	p := bluemonday.NewPolicy()

	// Require URLs to be parseable by net/url.Parse and either:
	//   mailto: http:// or https://
	p.AllowStandardURLs()

	// We only allow <p> and <a href="">
	p.AllowAttrs("href").OnElements("a")
	p.AllowElements("p")

	html := p.Sanitize(
		`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
	)

	// Output:
	// <a href="http://www.google.com">Google</a>
	fmt.Println(html)
}

We ship two default policies:

  1. bluemonday.StrictPolicy() which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on its whitelist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements and the content of the elements should be stripped. This is a very strict policy.
  2. bluemonday.UGCPolicy() which allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does not whitelist iframes, object, embed, styles, script, etc. An example usage scenario would be blog post bodies where a variety of formatting is expected along with the potential for TABLEs and IMGs.

Policy Building

The essence of building a policy is to determine which HTML elements and attributes are considered safe for your scenario. OWASP provide an XSS prevention cheat sheet to help explain the risks, but essentially:

  1. Avoid anything other than the standard HTML elements
  2. Avoid script, style, iframe, object, embed, base elements that allow code to be executed by the client or third party content to be included that can execute code
  3. Avoid anything other than plain HTML attributes with values matched to a regexp

Basically, you should be able to describe what HTML is fine for your scenario. If you do not have confidence that you can describe your policy please consider using one of the shipped policies such as bluemonday.UGCPolicy().

To create a new policy:

p := bluemonday.NewPolicy()

To add elements to a policy either add just the elements:

p.AllowElements("b", "strong")

Or using a regex:

Note: if an element is added by name as shown above, any matching regex will be ignored

It is also recommended to ensure multiple patterns don't overlap as order of execution is not guaranteed and can result in some rules being missed.

p.AllowElementsMatching(regex.MustCompile(`^my-element-`))

Or add elements as a virtue of adding an attribute:

// Not the recommended pattern, see the recommendation on using .Matching() below
p.AllowAttrs("nowrap").OnElements("td", "th")

Again, this also supports a regex pattern match alternative:

p.AllowAttrs("nowrap").OnElementsMatching(regex.MustCompile(`^my-element-`))

Attributes can either be added to all elements:

p.AllowAttrs("dir").Matching(regexp.MustCompile("(?i)rtl|ltr")).Globally()

Or attributes can be added to specific elements:

// Not the recommended pattern, see the recommendation on using .Matching() below
p.AllowAttrs("value").OnElements("li")

It is always recommended that an attribute be made to match a pattern. XSS in HTML attributes is very easy otherwise:

// \p{L} matches unicode letters, \p{N} matches unicode numbers
p.AllowAttrs("title").Matching(regexp.MustCompile(`[\p{L}\p{N}\s\-_',:\[\]!\./\\\(\)&]*`)).Globally()

You can stop at any time and call .Sanitize():

// string htmlIn passed in from a HTTP POST
htmlOut := p.Sanitize(htmlIn)

And you can take any existing policy and extend it:

p := bluemonday.UGCPolicy()
p.AllowElements("fieldset", "select", "option")

Inline CSS

Although it's possible to handle inline CSS using AllowAttrs with a Matching rule, writing a single monolithic regular expression to safely process all inline CSS which you wish to allow is not a trivial task. Instead of attempting to do so, you can whitelist the style attribute on whichever element(s) you desire and use style policies to control and sanitize inline styles.

It is suggested that you use Matching (with a suitable regular expression) MatchingEnum, or MatchingHandler to ensure each style matches your needs, but default handlers are supplied for most widely used styles.

Similar to attributes, you can allow specific CSS properties to be set inline:

p.AllowAttrs("style").OnElements("span", "p")
// Allow the 'color' property with valid RGB(A) hex values only (on any element allowed a 'style' attribute)
p.AllowStyles("color").Matching(regexp.MustCompile("(?i)^#([0-9a-f]{3,4}|[0-9a-f]{6}|[0-9a-f]{8})$")).Globally()

Additionally, you can allow a CSS property to be set only to an allowed value:

p.AllowAttrs("style").OnElements("span", "p")
// Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none'
// on 'span' elements only
p.AllowStyles("text-decoration").MatchingEnum("underline", "line-through", "none").OnElements("span")

Or you can specify elements based on a regex patterm match:

p.AllowAttrs("style").OnElementsMatching(regex.MustCompile(`^my-element-`))
// Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none'
// on 'span' elements only
p.AllowStyles("text-decoration").MatchingEnum("underline", "line-through", "none").OnElementsMatching(regex.MustCompile(`^my-element-`))

If you need more specific checking, you can create a handler that takes in a string and returns a bool to validate the values for a given property. The string parameter has been converted to lowercase and unicode code points have been converted.

myHandler := func(value string) bool{
	return true
}
p.AllowAttrs("style").OnElements("span", "p")
// Allow the 'color' property with values validated by the handler (on any element allowed a 'style' attribute)
p.AllowStyles("color").MatchingHandler(myHandler).Globally()

Links

Links are difficult beasts to sanitise safely and also one of the biggest attack vectors for malicious content.

It is possible to do this:

p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a")

But that will not protect you as the regular expression is insufficient in this case to have prevented a malformed value doing something unexpected.

We provide some additional global options for safely working with links.

RequireParseableURLs will ensure that URLs are parseable by Go's net/url package:

p.RequireParseableURLs(true)

If you have enabled parseable URLs then the following option will AllowRelativeURLs. By default this is disabled (bluemonday is a whitelist tool... you need to explicitly tell us to permit things) and when disabled it will prevent all local and scheme relative URLs (i.e. href="localpage.html", href="../home.html" and even href="//www.google.com" are relative):

p.AllowRelativeURLs(true)

If you have enabled parseable URLs then you can whitelist the schemes (commonly called protocol when thinking of http and https) that are permitted. Bear in mind that allowing relative URLs in the above option will allow for a blank scheme:

p.AllowURLSchemes("mailto", "http", "https")

Regardless of whether you have enabled parseable URLs, you can force all URLs to have a rel="nofollow" attribute. This will be added if it does not exist, but only when the href is valid:

// This applies to "a" "area" "link" elements that have a "href" attribute
p.RequireNoFollowOnLinks(true)

Similarly, you can force all URLs to have "noreferrer" in their rel attribute.

// This applies to "a" "area" "link" elements that have a "href" attribute
p.RequireNoReferrerOnLinks(true)

We provide a convenience method that applies all of the above, but you will still need to whitelist the linkable elements for the URL rules to be applied to:

p.AllowStandardURLs()
p.AllowAttrs("cite").OnElements("blockquote", "q")
p.AllowAttrs("href").OnElements("a", "area")
p.AllowAttrs("src").OnElements("img")

An additional complexity regarding links is the data URI as defined in RFC2397. The data URI allows for images to be served inline using this format:

<img src="data:image/webp;base64,UklGRh4AAABXRUJQVlA4TBEAAAAvAAAAAAfQ//73v/+BiOh/AAA=">

We have provided a helper to verify the mimetype followed by base64 content of data URIs links:

p.AllowDataURIImages()

That helper will enable GIF, JPEG, PNG and WEBP images.

It should be noted that there is a potential security risk with the use of data URI links. You should only enable data URI links if you already trust the content.

We also have some features to help deal with user generated content:

p.AddTargetBlankToFullyQualifiedLinks(true)

This will ensure that anchor <a href="" /> links that are fully qualified (the href destination includes a host name) will get target="_blank" added to them.

Additionally any link that has target="_blank" after the policy has been applied will also have the rel attribute adjusted to add noopener. This means a link may start like <a href="//host/path"/> and will end up as <a href="//host/path" rel="noopener" target="_blank">. It is important to note that the addition of noopener is a security feature and not an issue. There is an unfortunate feature to browsers that a browser window opened as a result of target="_blank" can still control the opener (your web page) and this protects against that. The background to this can be found here: https://dev.to/ben/the-targetblank-vulnerability-by-example

Policy Building Helpers

We also bundle some helpers to simplify policy building:

// Permits the "dir", "id", "lang", "title" attributes globally
p.AllowStandardAttributes()

// Permits the "img" element and its standard attributes
p.AllowImages()

// Permits ordered and unordered lists, and also definition lists
p.AllowLists()

// Permits HTML tables and all applicable elements and non-styling attributes
p.AllowTables()

Invalid Instructions

The following are invalid:

// This does not say where the attributes are allowed, you need to add
// .Globally() or .OnElements(...)
// This will be ignored without error.
p.AllowAttrs("value")

// This does not say where the attributes are allowed, you need to add
// .Globally() or .OnElements(...)
// This will be ignored without error.
p.AllowAttrs(
	"type",
).Matching(
	regexp.MustCompile("(?i)^(circle|disc|square|a|A|i|I|1)$"),
)

Both examples exhibit the same issue, they declare attributes but do not then specify whether they are whitelisted globally or only on specific elements (and which elements). Attributes belong to one or more elements, and the policy needs to declare this.

Limitations

We are not yet including any tools to help whitelist and sanitize CSS. Which means that unless you wish to do the heavy lifting in a single regular expression (inadvisable), you should not allow the "style" attribute anywhere.

It is not the job of bluemonday to fix your bad HTML, it is merely the job of bluemonday to prevent malicious HTML getting through. If you have mismatched HTML elements, or non-conforming nesting of elements, those will remain. But if you have well-structured HTML bluemonday will not break it.

TODO

  • Investigate whether devs want to blacklist elements and attributes. This would allow devs to take an existing policy (such as the bluemonday.UGCPolicy() ) that encapsulates 90% of what they're looking for but does more than they need, and to remove the extra things they do not want to make it 100% what they want
  • Investigate whether devs want a validating HTML mode, in which the HTML elements are not just transformed into a balanced tree (every start tag has a closing tag at the correct depth) but also that elements and character data appear only in their allowed context (i.e. that a table element isn't a descendent of a caption, that colgroup, thead, tbody, tfoot and tr are permitted, and that character data is not permitted)

Development

If you have cloned this repo you will probably need the dependency:

go get golang.org/x/net/html

Gophers can use their familiar tools:

go build

go test

I personally use a Makefile as it spares typing the same args over and over whilst providing consistency for those of us who jump from language to language and enjoy just typing make in a project directory and watch magic happen.

make will build, vet, test and install the library.

make clean will remove the library from a single ${GOPATH}/pkg directory tree

make test will run the tests

make cover will run the tests and open a browser window with the coverage report

make lint will run golint (install via go get github.com/golang/lint/golint)

Long term goals

  1. Open the code to adversarial peer review similar to the Attack Review Ground Rules
  2. Raise funds and pay for an external security review
Comments
  • Inline Images get stripped

    Inline Images get stripped

    Using this content:

    <img src="data:image/gif;base64,R0lGODdhEAAQAMwAAPj7+FmhUYjNfGuxYY
        DJdYTIeanOpT+DOTuANXi/bGOrWj6CONzv2sPjv2CmV1unU4zPgISg6DJnJ3ImTh8Mtbs00aNP1CZSGy0YqLEn47RgXW8amasW
        7XWsmmvX2iuXiwAAAAAEAAQAAAFVyAgjmRpnihqGCkpDQPbGkNUOFk6DZqgHCNGg2T4QAQBoIiRSAwBE4VA4FACKgkB5NGReAS
        FZEmxsQ0whPDi9BiACYQAInXhwOUtgCUQoORFCGt/g4QAIQA7">
    

    and this policy:

    	// Define a policy, we are using the UGC policy as a base.
    	p := bluemonday.UGCPolicy()
    
    	// Allow images to be embedded via data-uri
    	p.AllowDataURIImages()
    

    i get an empty string back.... whats wrong with my policy?

    cheers max

  • apostrophes get turned into HTML entities - take 2

    apostrophes get turned into HTML entities - take 2

    I realize this was asked before, but I would like to renew the discussion for my use case.

    I am getting data that is being entered into an html form by a user, and that data will be saved in a database eventually and redisplayed later. Obviously such data needs to be sanitized, so I turned to bluemonday. If a user enters an apostrophe in the data, the apostrophe is coming through to my application in the form data as an apostrophe, so GO is not converting it. When I run it through bluemonday, it gets converted to an html entity.

    bluemonday states that it is designed for sanitizing html that will be displayed as html, so I understand why this is correct behavior from bluemonday's perspective.

    Soo..., I am asking for a new sanitizer policy that allows bluemonday to be used as a basic text sanitizer for my scenario, where I am not trying to have the end result be html, but still I want the goal that XSS attacks are cleaned out. I imagine it could be a simple process of running the result through UnescapeString, which I will do for now, but you guys are the experts and might have other thoughts as to why this may or may not be adequate.

  • AllowElements iframe doesn't work

    AllowElements iframe doesn't work

    Hey

    I've tried to add iframe to the whitelist, but it still sanitize iframes.

    Example code:

    package main
    
    import (
    	"github.com/microcosm-cc/bluemonday"
    	"fmt"
    )
    
    func main() {
    	raw := "<iframe></iframe>"
    	p := bluemonday.NewPolicy()
    	p.AllowElements("iframe")
    	 res := p.Sanitize(raw)
    	 if res != raw {
    	 	fmt.Printf("got: %s\n", res)
    	 } else {
    	 	fmt.Println("happy, happy, joy, joy!")
    	}
    }
    

    Any help will be awesome.

    Thanks, Jonathan

  • Prevent escaping special characters

    Prevent escaping special characters

    Hello,

    We have this snippet:

    package main
    
    import (
            "fmt"
    
            "github.com/microcosm-cc/bluemonday"
    )
    
    func main() {
            p := bluemonday.UGCPolicy()
            html := p.Sanitize(
                    `"Hello world!"  <script>alert(document.cookie)</script>`,
            )
    
            // Output:
            // &#34;Hello world!&#34;
            fmt.Println(html)
    }
    

    Which produces the following output:

    &#34;Hello world!&#34;  
    

    We'd like to prevent this script from escaping the special characters like ", is there any way we can tell bluemonday to not escape special chars by default?

  • Custom handlers

    Custom handlers

    What about adding custom handlers for html elements? Without it user is forced to parse html second time by himself. Also custom handler allows to use more complicated content-dependent sanitizing.

    For example, emails. Emails stored as original, but displayed sanitized and to display embed images it is needed to replace content-id ("cid:url") with real url.

    p.SetCustomElementHandler(
        func(token html.Token) bluemonday.HandlerResult {
            for i := range token.Attr {
                // possible image locations
                if token.Attr[i].Key == "src" || token.Attr[i].Key == "background" {
                    cid := token.Attr[i].Val // get content-id
                    url := GetUrlFromCid(cid)
                    token.Attr[i].Val = url
                }
            }
    
            return bluemonday.HandlerResult{
                Token:         token,
                SkipContent:   false,
                SkipTag:       false,
                DoNotSanitize: false,
            }
        },
    )
    

    Or blog-posts, editor can form html with special values in custom attributes (e.g. x-my-attribute) and while saving to database, backend can process these attributes while sanitizing without needs to parse html again -> speedup.

    I suggest this syntax for custom handlers, it receives html.Token and returns struct with modified token and flags:

    • SkipContent - skip content of current html-element only
    • SkipTag - skip closing tag for current html-element only
    • DoNotSanitize - do not apply sanitizing rules for this html-element only

    These flags allows to sanitize with more complex rules that Regexp cannot handle. For example, golang regexp cannot use forward lookup.

  • Force html attribute to specific values

    Force html attribute to specific values

    It would be useful to be able to forcibly add an attribute to elements. You have similar functionality with the rel="nofollow" on links. Something like:

    Policy.AllowAttrs("rel").Force("nofollow").OnElements("a")
    

    or:

    Policy.ForceAttrs("rel").Value("nofollow").OnElements("a")
    
  • Feature Request: Ability to filter URLs on a finer grained level.

    Feature Request: Ability to filter URLs on a finer grained level.

    Suppose I would like to allow using data URI scheme for image urls. For example:

    <img src="data:image/png;base64,iVBORw0KGgoAAAANS...K5CYII=">
    

    Currently, I can achieve that by doing:

    p := bluemonday.UGCPolicy()
    p.AllowURLSchemes("data")
    

    However, that will allow all kinds of things, including "data:text/javascript;charset=utf-8,alert('hi');" or other unexpected values.

    What I'd like to do is be able to filter on a finer level, similarly to what's possible with attributes and elements.

    For example, I would imagine an API something like this:

    p.RequireBase64().AllowMimeTypes("image/png", "image/jpeg").OnURLSchemes("data")
    

    And it would make sure to filter out anything that's not one of those two mime types, not base64, or contains charset, and is valid base64 encoding (i.e., doesn't contain other characters, no query and no fragment).

    What are your thoughts on this proposal?

  • When a self-closing iframe is present, content afterward does not get sanitized.

    When a self-closing iframe is present, content afterward does not get sanitized.

    package main
    
    import (
    	"fmt"
    	"html"
    
    	"github.com/microcosm-cc/bluemonday"
    )
    
    func main() {
    	input := `<iframe /><script type="text/javascript">doBadStuff();</script>`
    
    	fmt.Println(html.UnescapeString(bluemonday.UGCPolicy().Sanitize(input)))
    	// output:
    	// <script type="text/javascript">doBadStuff();</script>
    }
    
  • How to allow custom elements using a regex

    How to allow custom elements using a regex

    I need to be able to whitelist elements based up a regex pattern The reason for this is because I have lots of web components. E.g. {namespace}-my-element

    I would like to white list anything containing {namespace} pattern

    Can this be achieved easily?

  • Allow all body/head/title and only do xss removal

    Allow all body/head/title and only do xss removal

    Whats the easiest way to allow all default/basic html tags especially body/head/title is always removed, even if i allow them.

        p.AllowElements("html", "head", "title")
    

    looking for a quick&dirty xss remover

  • golang.org/x/net/html is obsoleted.

    golang.org/x/net/html is obsoleted.

    Your package import golang.org/x/net/html. This url is obsoleted, which leads to

    unrecognized import path "golang.org/x/net/html" 
    

    Maybe you need change it to https://github.com/golang/net

  • Way to skip html escaping code blocks?

    Way to skip html escaping code blocks?

    I have a use case where I take user input, apply strict policy to escape any html(all input is considered plain text), run it through markdown parser and then via custom bluemonday policy to strip any html tags from markdown generated code that i do not want to support.

    Now what I need is to tell bluemonday to NOT escape input into html entities when it is being wrapped by ``` or ` because it will be rendered by the markdown parser into syntax-highlighted and <pre> or <code> wrapped blocks.

    Right now it seems that I have to insert one step after the strict BM policy and the MD parser and unescape these blocks manually.

  • Paragraph sanitization (e.g. img.alt) is too restrictive, disallows punctuation

    Paragraph sanitization (e.g. img.alt) is too restrictive, disallows punctuation

    This regexp is used to validate alt text of images. It disallows common punctuation, which causes issues when alt text is copied from news articles or source code listings for example. The result is alt attribute being dropped, rendering the image inaccessible to vision impaired people. And the text author is unlikely to even notice the issue, as visually the result seems just fine.

    Subset of common symbols (some used in non-English languages) currently forbidden by this regular expression: "„“”‘’«»#$§%‰&*+±–—:;=?‽¡¿@{}|~…°®™.

    I’m not sure I understand the purpose of restricting to a specific character set here, as opposed to properly escaping special characters (which I believe bluemonday does automatically). Is the concern that the contents of the alt or title attribute might be taken as the HTML source of some pop-up? Wouldn’t it make more sense to blacklist only angle brackets then?

  • Test case not sanitising

    Test case not sanitising

    I've found a test case that does not sanitize correctly. I've done a preliminary investigation to see if I could contribute a fix, but it doesn't seem like a simple case.

    The golang html page is providing the html.Attribute as key="src", val="onmouseover="alert('xxs')"".

    {
      in:              `<IMG SRC= onmouseover="alert('xxs')">`,
      expected: ``,
    },
    

    Here is the output

            input   : <IMG SRC= onmouseover="alert('xxs')">
            output  : <img src="onmouseover=%22alert%28%27xxs%27%29%22">
            expected: 
    

    Happy to try to contribute a fix but I'm a bit short of ideas, I contemplated trying to re-parse attribute values to identify any nested attributes due to this type of input. Not sure how I'd go about re-parsing just attributes, it doesn't seem like it's something supported in the html package?

  • Add callback function before parsing the attributes of an element

    Add callback function before parsing the attributes of an element

    Add callback function before parsing the attributes of an element. It can add/modify/remove attributes.

    If the callback returns nil or empty array of html attributes then the attributes will not be included in the output.

  • Filter external resources

    Filter external resources

    Sometimes it's desirable to disallow external resources (<img>, background: url(…), etc), to prevent sanitized HTML from "calling home" (triggering HTTP requests, e.g. using pixel images for tracking purposes). For instance a webmail might want to do this.

    Would you be interested in adding an API to validate external resources?

  • Add <style> support to allowStyles API

    Add