How to Safely Sanitize HTML Before Rendering on Your Website with ButterCMS

The code snippets in this article will be displaying NextJS components in TypeScript. However, the principles remain the same across tech stacks and there are a many package and libraries that you can use to sanitize HTML in your programming language of choice.

When you’re working with user-generated third-party HTML, whether via a WYSIWYG or HTML field in a component or blog post in Butter, it’s important to sanitize the HMTL before rendering it on a page.

Best practices for HTML sanitization

  • 📚 Use a trusted library like DOMPurify (JavaScript) or bleach (Python) to sanitize HTML instead of custom logic. The libraries are maintained regularly and can handle new security threats.
  • 🔒 Whitelist tags and attributes — You can allow safe tags and attributes, like <p>, <a>, and <strong>.
  • 🔄 Regularly update libraries — Make sure you update libraries frequently so that you don’t risk exposing your website or app to vulnerabilities.
  • ⚠️ Avoid using dangerouslySetInnerHTML when possible — Only use it when absolutely have it and always sanitize it first.

Why you need a package or library to sanitize HTML

Without HTML sanitization, you can expose your website or app to security risks and vulnerabilities like cross-site scripting (XSS) attack, which can compromise your site integrity, SEO scoring, and security compliance with user data. In this article, we’re going to explore how you can safely sanitize HTML from ButteCMS.

How does a HTML sanitizer work?

An HTML sanitizer library removes any potentially dangerous HTML or JavaScript that could lead to cross-site scripting (XSS) attacks. This ensures that the content displayed on your website or app is safe for users.

dangerouslySetInnerHTML

If you’re using JavaScript/TypeScript, then you have heard the criticisms and warnings surrounding the use of dangerouslySetInnerHTML. All of those warnings are true and should be considered before using this attribute on a JSX element.

Cross-Site Scripting (XSS) attacks

The most critical concerns over the use of dangerouslySetInnerHTML is cross-site scripting attacks. If you were to embed an iframe or other HTML or JavaScript that you’re not sure of the source, you could be inviting malicious scripts to execute in the browser.

Vulnerabilities to these attacks can happen when a user inputs information into a field or external data is injected into the DOM without sanitizing it first. Hackers can steal cookies, passwords, hijack sessions, and even deface websites.

const maliciousHTML = "<img src=x onerror=alert('Hacked!') />";
<div dangerouslySetInnerHTML={{ __html: maliciousHTML }} />

It’s name

You shouldn’t always judge a book by it’s cover, but the name of this attribute gives it away. The creators of React named it aptly as dangerouslySetInnerHTML to signal the potential risks and danger when you use it.

Harder to debug and maintain

Using the attribute can make it difficult to debug code because it’s sits outside of the JSX tree and lacks type safety. There can also be issue where invalid markup is introduced and it can be hard to locate the issue.

Performance issues

If you’re using React or a React-based framework, you know that React is using the Virtual DOM to update and render the UI of your website or app. dangerouslySetInnerHTML adds HTML and/or JavaScript directly into the actual DOM; this can cause hinder the performance benefits of using React in the first place.

When it’s okay to use dangerouslySetInnerHTML

Yes, it’s okay to use dangerouslySetInnerHTML sometimes. It’s only okay to use it when you trust and know the source of the injecting HTML and JavaScript, like:

  • A CMS, like ButterCMS. You can read more about our security policies and compliance here.
  • Embedding third-party content, like ads, widgets, and charts

React TypeScript Component

// 'isomorphic-dompurify' is chosen because it works seamlessly on both the client and server sides,
// making it ideal for rendering dynamic HTML in Next.js applications where server-side rendering (SSR) is common.
import DOMPurify from 'isomorphic-dompurify';
import '@/components/_styled/HTMLContent/html-content.css';

// Define the interface for props
interface HTMLContentI {
    textContent: string;
}

// HTMLContent component for rendering sanitized HTML
const HTMLContent: React.FC<HTMLContentI> = ({
    textContent
}) => {
    // Sanitize the HTML content using DOMPurify
    // DOMPurify removes any potentially dangerous HTML or JavaScript that could lead to XSS (Cross-Site Scripting) attacks.
    // This ensures that the content rendered on the page is safe for users.
    const sanitizedContent = DOMPurify.sanitize(textContent);
    
	  // Using 'dangerouslySetInnerHTML' here is safe because the content has been sanitized by DOMPurify,
    // which removes any malicious scripts or unsafe HTML elements, mitigating the risk of XSS attacks.
    return (
        <div
            className='html-content'
            dangerouslySetInnerHTML={{ __html: sanitizedContent }}
        />
    );
};

HTML sanitization libraries for different programming languages

  1. JavaScript / Node.js:
    • DOMPurify: As mentioned above, it's one of the most popular libraries for sanitizing HTML in JavaScript/Node.js applications. It supports both client-side and server-side sanitization.
    • xss: Another popular library that is specifically designed for XSS protection. It allows you to filter and clean HTML content efficiently.
  2. Python:
    • bleach: A Python library for sanitizing and cleaning HTML. It allows you to whitelist tags and attributes while removing potentially dangerous ones.
    • html-sanitizer: A lightweight Python library that focuses on stripping unwanted HTML tags, attributes, and ensuring safe HTML content.
  3. PHP:
    • HTMLPurifier: A well-established PHP library for sanitizing HTML. It ensures content adheres to standards and removes harmful code, making it a good choice for projects requiring strict sanitization.
    • DOMDocument: The built-in DOMDocument class in PHP can be used for sanitizing HTML by stripping tags and attributes deemed unsafe.
  4. Ruby:
    • Sanitize: A Ruby gem designed to clean HTML content. It supports whitelisting of safe tags and attributes while removing unsafe elements like <script> or <iframe>.
    • Nokogiri: Though not specifically a sanitization library, Nokogiri is a powerful HTML and XML parser for Ruby. It can be used to manipulate and sanitize HTML with additional configurations.
  5. Java:
    • OWASP Java HTML Sanitizer: This library from OWASP allows you to sanitize HTML content in Java applications, offering flexibility to define safe HTML tags and attributes while removing potentially dangerous ones.
    • Jsoup: Jsoup is a Java HTML parser that can also be used for sanitizing and manipulating HTML. It’s commonly used for web scraping but can also help clean HTML input.

Still have a question?

Our Customer Success team is standing by to help.