I’ve been doing new development in my offtime and i’m totally enjoying the challenges and hurdles. I wanted to share an interesting problem that I solved, it is a small part on a project that is not public yet (as of 2021 SEP).

Problem

I have a htmlString that I have to convert to react elements.

but why?… the main reasons were

ok, how do I solve this problem?

step 1 :

I need to parse the htmlString to a dom-tree. luckly there is https://developer.mozilla.org/en-US/docs/Web/API/DOMParser

step 2:

I need to convert the dom-tree to react elements but how?. Since the dom is a tree, the problem is to transform the dom tree into a react tree, which if you give it a thought, it tends to have a recursive structure.

step 3:

Identify the api-calls that are needed to transform the dom-tree into a react tree. Checking the docs, I only need one, https://reactjs.org/docs/react-api.html#createelement

step 4:

Attach my wrapper to it.

DomParser, Domtree from string

I was supprised to find that many people don’t know about https://developer.mozilla.org/en-US/docs/Web/API/DOMParser. It has a constructor, which takes a string and returns a dom-tree. I believe this is best way to parse html to dom-tree.

const domParser: any = new DOMParser();

export const htmlProcessor = (html: string): ReactNode => {
  if (typeof html !== 'string') {
    console.error('htmlParser: html is not a string');
    return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
  }

  let doc = domParser.parseFromString(html, 'text/html');

  if (doc === null) {
    console.error('htmlParser: unable to process html');
    return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
  }
};

Writing a function that converts dom-tree and returns a react-tree

With React.createElement, I can create a react-element.

const nodeName = 'div';
const attributes = {
  className: 'class-name-that-i-want', // note: attributes must be camelCase
  id: 'created_by_custom',
};
const children = null; // note: if not null, the children should also be react nodes, ie the output from createElement.

React.createElement(nodeName, attributes, children);

experimenting with a little example above, I identified three things.

  • I need tagName, (‘div’, ‘span’,etc)
  • I need attributes, (key,className, id, etc)
  • I need children, (null, react-elements, text )

So lets begin

I Have to be careful here, I do this operation for every user action. So to maintain perfomance I need to maintain key consistency,

so what is key, why I’m stressing about it?. key React Docs. after going through the docs, see the example below, if i add a new element below div key=“4”, all elements after key=5, will be recreated with differnet keys. This will cause react algorithm to rerender all the elements with the new key.

div key="1"
    div key="2"
        div key="3"
        div key="4"
    div key="5"
        div key="6"
div key="7"
    div key="8"

Instead I need to generate keys in the following structure, this way I can maintain key consistency in the virtual-dom. If I were to append a new element to div ‘new-child-add’, only the div ‘new-child-add’ and its children will be rerendered.

div key="1"
    div id="new-child-add" key="1"
        div key="1"
        div key="2"
    div key="2"
        div key="1"
div key="2"
    div key="1"

Keeping the things above in mind, I wrote the following function.

Define a function that takes a dom-element. I need to call this recursively later. To maintain key consistency, I also need to pass the key as a parameter.

const converter = (element: HTMLElement, key = 0) => {
  if (element === undefined) {
    return;
  }
};

I need to get the nodeName and attributes for the current element first. I use nodeName, attributes & style

const nodeName = element.nodeName.toLowerCase();

let attributes: { [key: string]: string | any } = {};

for (var i = 0; element.attributes && i < element.attributes.length; i++) {
  const attribute = element.attributes[i];
  const reactName = toCamelCase(attribute.name);
  let value = attribute.name === 'style' ? convertStyleStringToObject(attribute.value) : attribute.value.trim();
  attributes[reactName] = value;
}

attributes['key'] = key++;

Before creating the element, I need to process all the child nodes, if there are any. I use childNodes. Note that ‘#text’ have to be handled separately, because it is not a dom-element, its just a string. Also note, I called the function recursively.

let children: ReactNode[] = [];

for (let i = 0; i < element.childNodes.length; i++) {
  let child = element.childNodes[i];

  if (child['nodeName'] === '#text') {
    if (child.textContent) {
      const content = child.textContent.replaceAll('\n', '');
      content && child.push(content);
    }
    continue;
  }

  if (child['nodeName'] === '#comment') {
    continue; // I dont want to render comments
  }

  if (nodeName !== 'script' && nodeName !== 'style') {
    children.push(converter(child as HTMLElement, key++)); //recursive call
  }
}

I have all the properties, now I can create the element.

return React.createElement(nodeName, attributes, children);

My function is now complete. I call my wrapper function in different locations based on the type of the element and its attributes.

const domParser: any = new DOMParser();

export const htmlProcessor = (html: string): ReactNode => {
  if (typeof html !== 'string') {
    console.error('htmlParser: html is not a string');
    return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
  }

  let doc = domParser.parseFromString(html, 'text/html');

  if (doc === null) {
    console.error('htmlParser: unable to process html');
    return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
  }

  return converter(doc as unknown as HTMLElement, 1);
};

const converter = (element: HTMLElement, key = 0) => {
  if (element === undefined) {
    return;
  }
  const nodeName = element.nodeName.toLowerCase();

  let attributes: { [key: string]: string | any } = {};

  for (var i = 0; element.attributes && i < element.attributes.length; i++) {
    const attribute = element.attributes[i];
    const reactName = toCamelCase(attribute.name);
    let value = attribute.name === 'style' ? convertStyleStringToObject(attribute.value) : attribute.value.trim();
    attributes[reactName] = value;
  }
  attributes['key'] = key++;

  let children: ReactNode[] = [];

  /**
   * wrapper logic here and return the element. If I only want the attributes
   * to be processed, but dont want to process the children.
   */

  for (let i = 0; i < element.childNodes.length; i++) {
    let child = element.childNodes[i];

    if (child['nodeName'] === '#text') {
      if (child.textContent) {
        const content = child.textContent.replaceAll('\n', '');
        content && child.push(content);
      }
      continue;
    }

    if (child['nodeName'] === '#comment') {
      continue;
    }

    if (nodeName !== 'script' && nodeName !== 'style') {
      children.push(converter(child as HTMLElement, key++));
    }
  }

  /**
   * wrapper logic here and return the element. If I want the attrbibutes
   * and children to be processed.
   */

  return React.createElement(nodeName, attributes, children);
};

End

This was a fun problem to solve, well that marks the end. ~seeya


I mention the edge cases below, Only read it, if you are interested.

Edge cases

There are the following edge cases that I have not covered in this article. But I have in my project,

  • img tag should not have child param passed to it

    if (nodeName === 'img') {
    return React.createElement(nodeName, { key: key++, ...attributes }, null);
    }
    
  • if its a text node, then its children should be a string.

    if (element.nodeType === 3) {
    return React.createElement(nodeName, { ...attributes, key: key++ }, element.textContent);
    }
    
  • handling ‘<head>’ tag is different too, because each tag with its different property servers a different purpose.

    let el;
    switch (nodeName) {
    case 'meta':
    el = element as HTMLMetaElement;
    // meta tag has http-equiv attribute & content attribute, but no children
    return React.createElement(nodeName, { httpEquiv: el.httpEquiv, content: el.content, key: nodeName + key++ }, null);
    
    case 'link':
    el = element as HTMLLinkElement;
    let type = null;
    if (el.type) {
      type = el.type;
    }
    // external link has rel attribute & href attribute, but no children
    return React.createElement(
      nodeName,
      {
        href: el.href,
        rel: el.rel,
        type,
        key: nodeName + key++,
      },
      null
    );
    
    case 'br':
    return React.createElement(nodeName, { key: nodeName + key++ }, null);
    
    case 'title':
    case 'script':
    case 'style':
    // script/style/title tags have no children
    // script & style have contents in them, so innerHTML
    return React.createElement(nodeName, {
      dangerouslySetInnerHTML: { __html: element.innerHTML, key: nodeName + key++ },
    });
    }