Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Last update: Dec 26, 2022

Related tags

Overview

sanitize-html

sanitize-html provides a simple HTML sanitizer with a clear API.

sanitize-html is tolerant. It is well suited for cleaning up HTML fragments such as those created by CKEditor and other rich text editors. It is especially handy for removing unwanted CSS when copying and pasting from Word.

sanitize-html allows you to specify the tags you want to permit, and the permitted attributes for each of those tags.

If a tag is not permitted, the contents of the tag are not discarded. There are some exceptions to this, discussed below in the "Discarding the entire contents of a disallowed tag" section.

The syntax of poorly closed p and img elements is cleaned up.

href attributes are validated to ensure they only contain http, https, ftp and mailto URLs. Relative URLs are also allowed. Ditto for src attributes.

Allowing particular urls as a src to an iframe tag by filtering hostnames is also supported.

HTML comments are not preserved.

Requirements

sanitize-html is intended for use with Node.js and supports Node 10+. All of its npm dependencies are pure JavaScript. sanitize-html is built on the excellent htmlparser2 module.

Regarding Typescript

sanitize-html is not written in Typescript and there is no plan to directly support it. There is a community supported implementation, @types/sanitize-html, however. Any questions or problems while using that implementation should be directed to its maintainers as directed by that project's contribution guidelines.

How to use

Browser

Think first: why do you want to use it in the browser? Remember, servers must never trust browsers. You can't sanitize HTML for saving on the server anywhere else but on the server.

But, perhaps you'd like to display sanitized HTML immediately in the browser for preview. Or ask the browser to do the sanitization work on every page load. You can if you want to!

Clone repository and install via npm
Run npm install and:

npm install sanitize-html # yarn add sanitize-html

The primary change in the 2.x version of sanitize-html is that it no longer includes a build that is ready for browser use. Developers are expected to include sanitize-html in their project builds (e.g., webpack) as they would any other dependency. So while sanitize-html is no longer ready to link to directly in HTML, developers can now more easily process it according to their needs.

Once built and linked in the browser with other project Javascript, it can be used to sanitize HTML strings in front end code:

import sanitizeHtml from 'sanitize-html';

const html = "<strong>hello world</strong>";
console.log(sanitizeHtml(html));
console.log(sanitizeHtml("<img src=x onerror=alert('img') />"));
console.log(sanitizeHtml("console.log('hello world')"));
console.log(sanitizeHtml("<script>alert('hello world')</script>"));

Node (Recommended)

Install module from console:

npm install sanitize-html

Import the module:

// In ES modules
import sanitizeHtml from 'sanitize-html';

// Or in CommonJS
const sanitizeHtml = require('sanitize-html');

Use it in your JavaScript app:

const dirty = 'some really tacky HTML';
const clean = sanitizeHtml(dirty);

That will allow our default list of allowed tags and attributes through. It's a nice set, but probably not quite what you want. So:

// Allow only a super restricted set of tags and attributes
const clean = sanitizeHtml(dirty, {
  allowedTags: [ 'b', 'i', 'em', 'strong', 'a' ],
  allowedAttributes: {
    'a': [ 'href' ]
  },
  allowedIframeHostnames: ['www.youtube.com']
});

Boom!

Default options

allowedTags: [
  "address", "article", "aside", "footer", "header", "h1", "h2", "h3", "h4",
  "h5", "h6", "hgroup", "main", "nav", "section", "blockquote", "dd", "div",
  "dl", "dt", "figcaption", "figure", "hr", "li", "main", "ol", "p", "pre",
  "ul", "a", "abbr", "b", "bdi", "bdo", "br", "cite", "code", "data", "dfn",
  "em", "i", "kbd", "mark", "q", "rb", "rp", "rt", "rtc", "ruby", "s", "samp",
  "small", "span", "strong", "sub", "sup", "time", "u", "var", "wbr", "caption",
  "col", "colgroup", "table", "tbody", "td", "tfoot", "th", "thead", "tr"
],
disallowedTagsMode: 'discard',
allowedAttributes: {
  a: [ 'href', 'name', 'target' ],
  // We don't currently allow img itself by default, but this
  // would make sense if we did. You could add srcset here,
  // and if you do the URL is checked for safety
  img: [ 'src' ]
},
// Lots of these won't come up by default because we don't allow them
selfClosing: [ 'img', 'br', 'hr', 'area', 'base', 'basefont', 'input', 'link', 'meta' ],
// URL schemes we permit
allowedSchemes: [ 'http', 'https', 'ftp', 'mailto', 'tel' ],
allowedSchemesByTag: {},
allowedSchemesAppliedToAttributes: [ 'href', 'src', 'cite' ],
allowProtocolRelative: true,
enforceHtmlBoundary: false

Common use cases

"I like your set but I want to add one more tag. Is there a convenient way?"

Sure:

const clean = sanitizeHtml(dirty, {
  allowedTags: sanitizeHtml.defaults.allowedTags.concat([ 'img' ])
});

If you do not specify allowedTags or allowedAttributes, our default list is applied. So if you really want an empty list, specify one.

"What if I want to allow all tags or all attributes?"

Simple! Instead of leaving allowedTags or allowedAttributes out of the options, set either one or both to false:

allowedTags: false,
allowedAttributes: false

"What if I don't want to allow any tags?"

Also simple! Set allowedTags to [] and allowedAttributes to {}.

allowedTags: [],
allowedAttributes: {}

"What if I want disallowed tags to be escaped rather than discarded?"

If you set disallowedTagsMode to discard (the default), disallowed tags are discarded. Any text content or subtags is still included, depending on whether the individual subtags are allowed.

If you set disallowedTagsMode to escape, the disallowed tags are escaped rather than discarded. Any text or subtags is handled normally.

If you set disallowedTagsMode to recursiveEscape, the disallowed tags are escaped rather than discarded, and the same treatment is applied to all subtags, whether otherwise allowed or not.

"What if I want to allow only specific values on some attributes?"

When configuring the attribute in allowedAttributes simply use an object with attribute name and an allowed values array. In the following example sandbox="allow-forms allow-modals allow-orientation-lock allow-pointer-lock allow-popups allow-popups-to-escape-sandbox allow-scripts" would become sandbox="allow-popups allow-scripts":

allowedAttributes: {
  iframe: [
    {
      name: 'sandbox',
      multiple: true,
      values: ['allow-popups', 'allow-same-origin', 'allow-scripts']
    }
  ]
}

With multiple: true, several allowed values may appear in the same attribute, separated by spaces. Otherwise the attribute must exactly match one and only one of the allowed values.

Wildcards for attributes

You can use the * wildcard to allow all attributes with a certain prefix:

allowedAttributes: {
  a: [ 'href', 'data-*' ]
}

Also you can use the * as name for a tag, to allow listed attributes to be valid for any tag:

allowedAttributes: {
  '*': [ 'href', 'align', 'alt', 'center', 'bgcolor' ]
}

Additional options

Allowed CSS Classes

If you wish to allow specific CSS classes on a particular element, you can do so with the allowedClasses option. Any other CSS classes are discarded.

This implies that the class attribute is allowed on that element.

// Allow only a restricted set of CSS classes and only on the p tag
const clean = sanitizeHtml(dirty, {
  allowedTags: [ 'p', 'em', 'strong' ],
  allowedClasses: {
    'p': [ 'fancy', 'simple' ]
  }
});

Similar to allowedAttributes, you can use * as a tag name, to allow listed classes to be valid for any tag:

allowedClasses: {
  '*': [ 'fancy', 'simple' ]
}

Allowed CSS Styles

If you wish to allow specific CSS styles on a particular element, you can do that with the allowedStyles option. Simply declare your desired attributes as regular expression options within an array for the given attribute. Specific elements will inherit whitelisted attributes from the global (*) attribute. Any other CSS classes are discarded.

You must also use allowedAttributes to activate the style attribute for the relevant elements. Otherwise this feature will never come into play.

When constructing regular expressions, don't forget ^ and $. It's not enough to say "the string should contain this." It must also say "and only this."

URLs in inline styles are NOT filtered by any mechanism other than your regular expression.

const clean = sanitizeHtml(dirty, {
        allowedTags: ['p'],
        allowedAttributes: {
          'p': ["style"],
        },
        allowedStyles: {
          '*': {
            // Match HEX and RGB
            'color': [/^#(0x)?[0-9a-f]+$/i, /^rgb\(\s*(\d{1,3})\s*,\s*(\d{1,3})\s*,\s*(\d{1,3})\s*\)$/],
            'text-align': [/^left$/, /^right$/, /^center$/],
            // Match any number with px, em, or %
            'font-size': [/^\d+(?:px|em|%)$/]
          },
          'p': {
            'font-size': [/^\d+rem$/]
          }
        }
      });

Discarding text outside of `<html></html>` tags

Some text editing applications generate HTML to allow copying over to a web application. These can sometimes include undesireable control characters after terminating html tag. By default sanitize-html will not discard these characters, instead returning them in sanitized string. This behaviour can be modified using enforceHtmlBoundary option.

Setting this option to true will instruct sanitize-html to discard all characters outside of html tag boundaries -- before <html> and after </html> tags.

enforceHtmlBoundary: true

htmlparser2 Options

sanitize-html is built on htmlparser2. By default the only option passed down is decodeEntities: true. You can set the options to pass by using the parser option.

Security note: changing the parser settings can be risky. In particular, decodeEntities: false has known security concerns and a complete test suite does not exist for every possible combination of settings when used with sanitize-html. If security is your goal we recommend you use the defaults rather than changing parser, except for the lowerCaseTags option.

const clean = sanitizeHtml(dirty, {
  allowedTags: ['a'],
  parser: {
    lowerCaseTags: true
  }
});

See the htmlparser2 wiki for the full list of possible options.

Transformations

What if you want to add or change an attribute? What if you want to transform one tag to another? No problem, it's simple!

The easiest way (will change all ol tags to ul tags):

const clean = sanitizeHtml(dirty, {
  transformTags: {
    'ol': 'ul',
  }
});

The most advanced usage:

const clean = sanitizeHtml(dirty, {
  transformTags: {
    'ol': function(tagName, attribs) {
      // My own custom magic goes here
      return {
        tagName: 'ul',
        attribs: {
          class: 'foo'
        }
      };
    }
  }
});

You can specify the * wildcard instead of a tag name to transform all tags.

There is also a helper method which should be enough for simple cases in which you want to change the tag and/or add some attributes:

const clean = sanitizeHtml(dirty, {
  transformTags: {
    'ol': sanitizeHtml.simpleTransform('ul', {class: 'foo'}),
  }
});

The simpleTransform helper method has 3 parameters:

simpleTransform(newTag, newAttributes, shouldMerge)

The last parameter (shouldMerge) is set to true by default. When true, simpleTransform will merge the current attributes with the new ones (newAttributes). When false, all existing attributes are discarded.

You can also add or modify the text contents of a tag:

const clean = sanitizeHtml(dirty, {
  transformTags: {
    'a': function(tagName, attribs) {
      return {
        tagName: 'a',
        text: 'Some text'
      };
    }
  }
});

For example, you could transform a link element with missing anchor text:

<a href="http://somelink.com"></a>

To a link with anchor text:

<a href="http://somelink.com">Some text</a>

Filters

You can provide a filter function to remove unwanted tags. Let's suppose we need to remove empty a tags like:

<a href="page.html"></a>

We can do that with the following filter:

sanitizeHtml(
  '<p>This is <a href="http://www.linux.org"></a><br/>Linux</p>',
  {
    exclusiveFilter: function(frame) {
      return frame.tag === 'a' && !frame.text.trim();
    }
  }
);

The frame object supplied to the callback provides the following attributes:

tag: The tag name, i.e. 'img'.
attribs: The tag's attributes, i.e. { src: "/path/to/tux.png" }.
text: The text content of the tag.
mediaChildren: Immediate child tags that are likely to represent self-contained media (e.g., img, video, picture, iframe). See the mediaTags variable in src/index.js for the full list.
tagPosition: The index of the tag's position in the result string.

You can also process all text content with a provided filter function. Let's say we want an ellipsis instead of three dots.

<p>some text...</p>

We can do that with the following filter:

sanitizeHtml(
  '<p>some text...</p>',
  {
    textFilter: function(text, tagName) {
      if (['a'].indexOf(tagName) > -1) return //Skip anchor tags

      return text.replace(/\.\.\./, '&hellip;');
    }
  }
);

Note that the text passed to the textFilter method is already escaped for safe display as HTML. You may add markup and use entity escape sequences in your textFilter.

Iframe Filters

If you would like to allow iframe tags but want to control the domains that are allowed through, you can provide an array of hostnames and/or array of domains that you would like to allow as iframe sources. This hostname is a property in the options object passed as an argument to the sanitize-html function.

These arrays will be checked against the html that is passed to the function and return only src urls that include the allowed hostnames or domains in the object. The url in the html that is passed must be formatted correctly (valid hostname) as an embedded iframe otherwise the module will strip out the src from the iframe.

Make sure to pass a valid hostname along with the domain you wish to allow, i.e.:

allowedIframeHostnames: ['www.youtube.com', 'player.vimeo.com'],
allowedIframeDomains: ['zoom.us']

You may also specify whether or not to allow relative URLs as iframe sources.

allowIframeRelativeUrls: true

Note that if unspecified, relative URLs will be allowed by default if no hostname or domain filter is provided but removed by default if a hostname or domain filter is provided.

Remember that the iframe tag must be allowed as well as the src attribute.

For example:

const clean = sanitizeHtml('<p><iframe src="https://www.youtube.com/embed/nykIhs12345"></iframe><p>', {
  allowedTags: [ 'p', 'em', 'strong', 'iframe' ],
  allowedClasses: {
    'p': [ 'fancy', 'simple' ],
  },
  allowedAttributes: {
    'iframe': ['src']
  },
  allowedIframeHostnames: ['www.youtube.com', 'player.vimeo.com']
});

will pass through as safe whereas:

const clean = sanitizeHtml('<p><iframe src="https://www.youtube.net/embed/nykIhs12345"></iframe><p>', {
  allowedTags: [ 'p', 'em', 'strong', 'iframe' ],
  allowedClasses: {
    'p': [ 'fancy', 'simple' ],
  },
  allowedAttributes: {
    'iframe': ['src']
  },
  allowedIframeHostnames: ['www.youtube.com', 'player.vimeo.com']
});

const clean = sanitizeHtml('<p><iframe src="https://www.vimeo/video/12345"></iframe><p>', {
  allowedTags: [ 'p', 'em', 'strong', 'iframe' ],
  allowedClasses: {
    'p': [ 'fancy', 'simple' ],
  },
  allowedAttributes: {
    'iframe': ['src']
  },
  allowedIframeHostnames: ['www.youtube.com', 'player.vimeo.com']
});

will return an empty iframe tag.

If you want to allow any subdomain of any level you can provide the domain in allowedIframeDomains

// This iframe markup will pass through as safe.
const clean = sanitizeHtml('<p><iframe src="https://us02web.zoom.us/embed/12345"></iframe><p>', {
  allowedTags: [ 'p', 'em', 'strong', 'iframe' ],
  allowedClasses: {
    'p': [ 'fancy', 'simple' ],
  },
  allowedAttributes: {
    'iframe': ['src']
  },
  allowedIframeHostnames: ['www.youtube.com', 'player.vimeo.com'],
  allowedIframeDomains: ['zoom.us']
});

Allowed URL schemes

By default, we allow the following URL schemes in cases where href, src, etc. are allowed:

[ 'http', 'https', 'ftp', 'mailto' ]

You can override this if you want to:

sanitizeHtml(
  // teeny-tiny valid transparent GIF in a data URL
  '<img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" />',
  {
    allowedTags: [ 'img', 'p' ],
    allowedSchemes: [ 'data', 'http' ]
  }
);

You can also allow a scheme for a particular tag only:

allowedSchemes: [ 'http', 'https' ],
allowedSchemesByTag: {
  img: [ 'data' ]
}

And you can forbid the use of protocol-relative URLs (starting with //) to access another site using the current protocol, which is allowed by default:

allowProtocolRelative: false

Discarding the entire contents of a disallowed tag

Normally, with a few exceptions, if a tag is not allowed, all of the text within it is preserved, and so are any allowed tags within it.

The exceptions are:

style, script, textarea, option

If you wish to replace this list, for instance to discard whatever is found inside a noscript tag, use the nonTextTags option:

nonTextTags: [ 'style', 'script', 'textarea', 'option', 'noscript' ]

Note that if you use this option you are responsible for stating the entire list. This gives you the power to retain the content of textarea, if you want to.

The content still gets escaped properly, with the exception of the script and style tags. Allowing either script or style leaves you open to XSS attacks. Don't do that unless you have good reason to trust their origin. sanitize-html will log a warning if these tags are allowed, which can be disabled with the allowVulnerableTags: true option.

Choose what to do with disallowed tags

Instead of discarding, or keeping text only, you may enable escaping of the entire content:

disallowedTagsMode: 'escape'

This will transform <disallowed>content</disallowed> to <disallowed>content</disallowed>

Valid values are: 'discard' (default), 'escape' (escape the tag) and 'recursiveEscape' (to escape the tag and all its content).

Restricting deep nesting

You can limit the depth of HTML tags in the document with the nestingLimit option:

nestingLimit: 6

This will prevent the user from nesting tags more than 6 levels deep. Tags deeper than that are stripped out exactly as if they were disallowed. Note that this means text is preserved in the usual ways where appropriate.

About ApostropheCMS

sanitize-html was created at P'unk Avenue for use in ApostropheCMS, an open-source content management system built on Node.js. If you like sanitize-html you should definitely check out ApostropheCMS.

Support

Feel free to open issues on github.

Comments

load error: Cannot assign to read only property 'exports'

To Reproduce

Just upgraded to sanitize-html 2.0.0 (after noticing the vulnerability reported by Snyk in version 1.27.4) and found that my app does not load. The error produced is:

Reverting to 1.27.4 fixes it.

Expected behavior

There should not be any errors.

Details

This is happening in a rails project that is using webpacker to compile assets.

webpacker v5.2.1 webpack 4.44.1 @babel/core 7.11.6 postcss 7.0.34

opened by gsar 39

Community contribution required: Typescript Compilebreak after updating from nodejs 8 to 12

We used sanitize-html in a typescript project with node 8. After upgrading to node 12, the following typescript compilebreaks occurr in sanitize-html and its dependencies:

node_modules/@types/domutils/index.d.ts:6:10 - error TS2614: Module '"../../../../../node_modules/domhandler/lib"' has no exported member 'DomElement'. Did you mean to use 'import DomElement from "../../../../../node_modules/domhandler/lib"' instead?

6 import { DomElement } from "domhandler";
           ~~~~~~~~~~

node_modules/@types/htmlparser2/index.d.ts:17:10 - error TS2614: Module '"../../../../../node_modules/domhandler/lib"' has no exported member 'DomElement'. Did you mean to use 'import DomElement from "../../../../../node_modules/domhandler/lib"' instead?

17 export { DomElement, DomHandlerOptions, DomHandler, Element, Node } from 'domhandler';


node_modules/@types/sanitize-html/index.d.ts:17:10 - error TS2305: Module '"../../../../../node_modules/htmlparser2/lib"' has no exported member 'Options'.

17 import { Options } from "htmlparser2";

In the package.json, we are using

"sanitize-html": "^1.20.1",

and

"@types/sanitize-html": "^1.20.2",

for the typings.

Could anyone else update successfully to node 12?

Side-note - maybe it helps: It seems the dependency htmlparser2 was ported to typescript in https://github.com/fb55/htmlparser2/commit/759b1220c03e55a895f971deab9f1e94c30a7f61 which was released in https://github.com/fb55/htmlparser2/releases/tag/v4.0.0. Still sanitize-html pulls in "htmlparser2": "^3.10.0", as dependency.

seeking contributions typescript

opened by ceisele-r 24

Wrong build 1.18.3? Uncaught TypeError: (0 , _sanitizeHtml2.default) is not a function

Hey guys, just received this when new version has been released.

    sanitizeHtml.js?70b5f51:7 Uncaught TypeError: (0 , _sanitizeHtml2.default) is not a function
    at sanitize (sanitizeHtml.js?70b5f51:7)
    at createSanitizeMarkup (sanitizeHtml.js?70b5f51:12)
    at TemplatePainterAboutView (View.js?f4af559:50)
    at mountIndeterminateComponent (react-dom.development.js?02e2fdd:8574)
    at beginWork (react-dom.development.js?02e2fdd:8978)
    at performUnitOfWork (react-dom.development.js?02e2fdd:11814)
    at workLoop (react-dom.development.js?02e2fdd:11843)
    at HTMLUnknownElement.callCallback (react-dom.development.js?02e2fdd:100)
    at Object.invokeGuardedCallbackDev (react-dom.development.js?02e2fdd:138)
    at invokeGuardedCallback (react-dom.development.js?02e2fdd:187)

Update: We use import sanitizeHtml from 'sanitize-html' Update2: Affects FE build Update3: in package.json until it's fixed

"sanitize-html": "^1.18.2"
// =>
"sanitize-html": "1.18.2"

opened by march08 19

Allow filtering of iframe src urls
Hey @boutell looks like I'm the motivated developer 😄 .

This PR makes it so the user can pass allowable urls as a property to the sanitizeHtml function which will strip out any urls from an iframe tag's src attribute that are not included in the allowable urls property array.

This is accomplished by just adding a new property to the options object passed to the function like so:

clean = sanitizeHtml(dirty, { allowedTags: ['p', 'iframe', 'a', 'img', 'i'], allowedAttributes: { 'iframe': ['src', 'href'], 'a': ['src', 'href'], 'img': ['src'] }, allowedUrls: ['youtube.com', 'vimeo.com'] })

The PR contains some default urls allowed (youtube.com and vimeo.com), I'm happy adjust these or remove them as you see fit. I can simply add a conditional for passing all as the only property if that is something you think would be helpful.

2 passing tests written.
opened by ryan-verys 19
1.16+ breaks IE

Hello,

Our app suddenly broke in IE11. After digging and digging I figured that postcss includes chalk -> ansi-styles which uses (const key of *). This breaks IE11 as it will not be transpiled. As far as I can see it's not ansi-styles intention to be ran in the client? I'm not really sure where to report.

Can someone confirm?

Yours,

opened by noelheesen 19
Error: Cannot find module './foreignNames.json'
I've also encountered this error:

Error: Cannot find module './foreignNames.json' from '/node_modules/sanitize-html/dist'

This is the culprit from an htmlparser2 dependency making its way into the minified scripts: https://github.com/cheeriojs/dom-serializer/blob/master/index.js#L11
v2
opened by dhoffmann 18

Encoding ampersands on HTML entities when parser.decodeEntities = false

E.g. Code like...

const text = 'This &amp; that &reg';
const sanitizeHtmlOptions = {
    parser: {
        decodeEntities: false
    }
};
demand(sanitizeHtml(text, sanitizeHtmlOptions)).equal(text);

...results in...

AssertionError: "This &amp;amp; that &amp;reg" must equal "This &amp; that &reg"
+ expected - actual

-This &amp;amp; that &amp;reg
+This &amp; that &reg

I'm guessing that this behaviour is not intended?

stale

opened by WillGibson 18

Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag.

In vanilla NodeBB, any combination of <, >, <=, >= can be entered in a post and the results are rendered correctly.

With sanitize-html installed, < and > are handled correctly, but using <= in a post will treat it as an HTML start tag and not include anything beyond the symbol combination.

This, for example: "this <= is a >= test" renders as "this = test"

opened by ELadner 18
Unclosed `` tag breaks tag balance
This is a contrived example of some wild HTML that broke sanitize-html:

> sanitizeHTML('Hey! Here is a broken link tag: <a href="http://www.example.com/lel...') 'Hey! Here is a broken link tag: </a>'

I thought this was because it was in the middle of a string, but it actually triggers if the string ends in the middle of any part of the attribute. But not if the tag doesn't have attributes yet!

> sanitizeHTML('lel <a') 'lel ' // good > sanitizeHTML('lel <a href') 'lel </a>' // bad! > sanitizeHTML('lel <a href=""') 'lel </a>' // bad!
opened by fabiosantoscode 17
OWASP: Grave accent obfuscation

Attempts to solve the grave accent obfuscation XSS attack point (https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#Grave_accent_obfuscation) by just killing grave accents in naughty-href. Not sure if it's the best strategy but seems to fix the problem and pass old tests.

opened by cwill747 16
Possible to prevent removing Doctype?
The following results removes <!DOCTYPE> tag from html.

I tried to add it to allowed tags or even catch it in the exclusiveFilter but nothing!

`
var allowedTags = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'blockquote', 'p', 'a', 'ul', 'ol', 'nl', 'li', 'b', 'i', 'strong', 'em', 'hr', 'br', 'div', 'table', 'thead', 'caption', 'tbody', 'tr', 'th', 'td', 'pre', 'html', 'title', 'iframe', 'header', 'footer', 'body', 'form', 'img', 'meta', 'link', // 'strike', 'code', ];

var allowedAttributes = { a: ['href', 'name', 'target'], img: ['src', 'width', 'height'], iframe: ['src', 'width', 'height'], div: ['id', 'class'], ul: ['id', 'class'], nav: ['id', 'class'], meta: ['name', 'content'], link: ['rel', 'href'] }; var clean = sanitizeHtml(dirtyHtml, { allowedTags: allowedTags, allowedAttributes: allowedAttributes, transformTags: transformTags, });

`
enhancement seeking contributions stale hacktoberfest
opened by NizarBlond 15

feat(parseStyleAttributes): add option to skip style parsing [fix 547]

This will fix https://github.com/apostrophecms/sanitize-html/issues/547

This PR introduces a new option:

options.parseStyleAttributes: boolean

By default set to true, to match the current behavior. But a user can set it to false, to skip parsing style tags. This can avoid issues when the package is used in the browser.

  it('Should ignore styles when options.parseStyleAttributes is false', function() {
    assert.equal(
      sanitizeHtml('<span style=\'color: blue; text-align: justify\'></span>', {
        allowedTags: false,
        allowedAttributes: {
          span: [ 'style' ]
        },
        allowedStyles: {
          span: {
            color: [ /blue/ ],
            'text-align': [ /left/ ]
          }
        },
        parseStyleAttributes: false
      }), '<span style="color: blue; text-align: justify"></span>'
    );
  });

opened by bertyhell 5

^{tag doesn't work with sanitizeHtml}
PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.

To Reproduce

const formattedTitle = 'My Company®'; console.log(sanitizeHtml(formattedTitle));

The superscript tag doesn't work, even though it is in the default tags

Expected behavior

A clear and concise description of what you expected to happen.

The symbol should be a superscript

Describe the bug

A clear and concise description of what the bug is.

The superscript tag doesn't work, even though it is in the default tags

Details

The superscript tag doesn't work, even though it is in the default tags

Version of Node.js: PLEASE NOTE: Only stable LTS versions (10.x and 12.x) are fully supported but we will do our best with newer versions. Node 16

Server Operating System: The server (which might be your dev laptop) on which Apostrophe is running. Linux? MacOS X? Windows? Is Docker involved? Mac OS X

Additional context:

Add any other context about the problem here. If the problem is specific to a browser, OS or mobile device, specify which.

Screenshots If applicable, add screenshots to help explain your problem.
bug
opened by samilieberman 1
Is it possible to transform and wrap?

Say my "dirty HTML" is a <h3>Hello world!</h3> element which I'd like transformed to the following:

Hello world!

How might I achieve that with transformTags, or is there another way?
question

opened by rob-rountree-cyted 0
How can I filter values of the transform attribute?

Question or comment

For certain SVG tags I would like to allow the "transform" attribute. I would like to limit the possible values of this attribute to either "transform='translate(...)'" or "transform='rotate(...)'" where ... means any parameters for the functions indicated.

How do I achive this?

For the G element, I have tried adding the following object to the allowedAttributes:

g: { name: "transform", multiples: false, values: [ "translate", "rotate" ] }

I have also tried using wildcards like this:

g: { name: "transform", multiples: false, values: [ "translate*", "rotate*" ] }

And I have tried using a regular expression like this:

g: { name: "transform", multiples: false, values: [/^(translate|rotate).*$/] }

But all of these options result in the "transform" attribute being passed to the browser without any value.

Details

Node version 16.15.1 MacOS 10.14.6 Chrome browser 107.0.5304.110 (Official Build) (x86_64)
question

opened by novasilva-wouter 0
Add attribute rel to default attributes

It is recommended to add attribute: rel="noopener, noreferrer" to links opening in a new tab with attribute target="_blank". See tabnabbing.

In the current situation, it is not possible to use target and ref attributes together without overriding allowedAttributes list.

Does it make sense to add rel attribute in the list of the defaults attributes? https://github.com/apostrophecms/sanitize-html/blob/795d079282bc4660e2d0740cf112ac6973aa77b1/index.js#L806

opened by jonasgrilleres 0
All subsequent attributes get removed when an invalid attribute is encountered

<a href="test.com" href="javascript:abc.com">test link </a> is changed to <a href="test.com">test link </a> WHEREAS <a href="javascript:abc.com" href="test.com">test link </a> is changed to <a>test link </a>

opened by aditigupta50 0

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance

Related tags

Overview

sanitize-html

Requirements

Regarding Typescript

How to use

Browser

Node (Recommended)

Default options

Common use cases

"I like your set but I want to add one more tag. Is there a convenient way?"

"What if I want to allow all tags or all attributes?"

"What if I don't want to allow any tags?"

"What if I want disallowed tags to be escaped rather than discarded?"

"What if I want to allow only specific values on some attributes?"

Wildcards for attributes

Additional options

Allowed CSS Classes

Allowed CSS Styles

Discarding text outside of <html></html> tags

htmlparser2 Options

Transformations

Filters

Iframe Filters

Allowed URL schemes

Discarding the entire contents of a disallowed tag

Choose what to do with disallowed tags

Restricting deep nesting

About ApostropheCMS

Support

Comments

To Reproduce

Expected behavior

Details

To Reproduce

Expected behavior

Describe the bug

Details

Question or comment

Details

Owner

Apostrophe Technologies

DOMPurify - a DOM-only, super-fast, uber-tolerant XSS sanitizer for HTML, MathML and SVG. DOMPurify works with a secure default, but offers a lot of configurability and hooks. Demo:

Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist

A full stack digital marketplace running on Ethereum, built with Polygon, Next.js, Tailwind, Solidity, Hardhat, Ethers.js, and IPFS

Full stack NFT marketplace built with Polygon, Solidity, IPFS, Web3, Ether, Tailwind & Next.js

📡 Encrypt and authenticate DevTools to use it securely remotely. Add HTTPS, and authentication to --remote-debugging-port to debug, inspect and automate from anywhere and collaborate securely on bugs.

AnonCrypt ciphers and diciphers your messages or strings which makes you send texts to people without them understanding it.

Ganache is an Ethereum simulator that makes developing Ethereum applications faster, easier, and safer

A WebApp that allows you to follow Cryptos' News and Stats

Smart contracts for governance. Contract allows to bond custom/LP UNI-v2 tokens and get voting power

Build a Cryptocurrency Tracker with Next.js and GraphQL

A Secure Web Proxy. Which is fast, secure, and easy to use.

Optimized DNS/HTTP Log Tool for pentesters, faster and easy to use.

Policy-password is a NodeJS library written in Typescript to generate passwords according to policies and constraints.

Storybook Addon Root Attributes to switch html, body or some element attributes (multiple) at runtime for you story

Një projekt open source për komunitetin që shërben për të gjeneruar fjalëkalime me nivele të sigurisë të ndryshme.

A simple library that I use for web scraping. Uses htmlparser2 to parse dom.

Backstretch is a simple jQuery plugin that allows you to add a dynamically-resized, slideshow-capable background image to any page or element. The image will stretch to fit the page/element, and will automatically resize as the window/element size changes.

基于vue3.0-ts-Element集成的简洁/实用后台模板！《带预览地址》vue-admin；vue+admin；vue-element；vue+element；vue后台管理；vue3.0-admin；vue3.0-element。

Create a deep copy of a set of matched elements with the dynamic state of all form elements copied to the cloned elements.

Firebase Extension to automatically push Firestore documents to Typesense for full-text search with typo tolerance, faceting, and more

Discarding text outside of `<html></html>` tags