I’ve been doing new development in my offtime and i’m totally enjoying the challenges and hurdles. I wanted to share an interesting problem that I solved, it is a small part on a project that is not public yet (as of 2021 SEP).
Problem
I have a htmlString that I have to convert to react elements.
but why?… the main reasons were
- I have a react wrapper, that enhances the dom-elements.
- I need this wrapper to handle all the elements.
- because Dom Operations are expensive, but React is not https://reactjs.org/docs/faq-internals.html
ok, how do I solve this problem?
step 1 :
I need to parse the htmlString to a dom-tree. luckly there is https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
step 2:
I need to convert the dom-tree to react elements but how?. Since the dom is a tree, the problem is to transform the dom tree into a react tree, which if you give it a thought, it tends to have a recursive structure.
step 3:
Identify the api-calls that are needed to transform the dom-tree into a react tree. Checking the docs, I only need one, https://reactjs.org/docs/react-api.html#createelement
step 4:
Attach my wrapper to it.
DomParser, Domtree from string
I was supprised to find that many people don’t know about https://developer.mozilla.org/en-US/docs/Web/API/DOMParser. It has a constructor, which takes a string and returns a dom-tree. I believe this is best way to parse html to dom-tree.
const domParser: any = new DOMParser();
export const htmlProcessor = (html: string): ReactNode => {
if (typeof html !== 'string') {
console.error('htmlParser: html is not a string');
return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
}
let doc = domParser.parseFromString(html, 'text/html');
if (doc === null) {
console.error('htmlParser: unable to process html');
return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
}
};
Writing a function that converts dom-tree and returns a react-tree
With React.createElement, I can create a react-element.
const nodeName = 'div';
const attributes = {
className: 'class-name-that-i-want', // note: attributes must be camelCase
id: 'created_by_custom',
};
const children = null; // note: if not null, the children should also be react nodes, ie the output from createElement.
React.createElement(nodeName, attributes, children);
experimenting with a little example above, I identified three things.
- I need tagName, (‘div’, ‘span’,etc)
- I need attributes, (key,className, id, etc)
- I need children, (null, react-elements, text )
So lets begin
I Have to be careful here, I do this operation for every user action. So to maintain perfomance I need to maintain key consistency,
so what is key, why I’m stressing about it?. key React Docs. after going through the docs, see the example below, if i add a new element below div key=“4”, all elements after key=5, will be recreated with differnet keys. This will cause react algorithm to rerender all the elements with the new key.
div key="1"
div key="2"
div key="3"
div key="4"
div key="5"
div key="6"
div key="7"
div key="8"
Instead I need to generate keys in the following structure, this way I can maintain key consistency in the virtual-dom. If I were to append a new element to div ‘new-child-add’, only the div ‘new-child-add’ and its children will be rerendered.
div key="1"
div id="new-child-add" key="1"
div key="1"
div key="2"
div key="2"
div key="1"
div key="2"
div key="1"
Keeping the things above in mind, I wrote the following function.
Define a function that takes a dom-element. I need to call this recursively later. To maintain key consistency, I also need to pass the key as a parameter.
const converter = (element: HTMLElement, key = 0) => {
if (element === undefined) {
return;
}
};
I need to get the nodeName and attributes for the current element first. I use nodeName, attributes & style
const nodeName = element.nodeName.toLowerCase();
let attributes: { [key: string]: string | any } = {};
for (var i = 0; element.attributes && i < element.attributes.length; i++) {
const attribute = element.attributes[i];
const reactName = toCamelCase(attribute.name);
let value = attribute.name === 'style' ? convertStyleStringToObject(attribute.value) : attribute.value.trim();
attributes[reactName] = value;
}
attributes['key'] = key++;
Before creating the element, I need to process all the child nodes, if there are any. I use childNodes. Note that ‘#text’ have to be handled separately, because it is not a dom-element, its just a string. Also note, I called the function recursively.
let children: ReactNode[] = [];
for (let i = 0; i < element.childNodes.length; i++) {
let child = element.childNodes[i];
if (child['nodeName'] === '#text') {
if (child.textContent) {
const content = child.textContent.replaceAll('\n', '');
content && child.push(content);
}
continue;
}
if (child['nodeName'] === '#comment') {
continue; // I dont want to render comments
}
if (nodeName !== 'script' && nodeName !== 'style') {
children.push(converter(child as HTMLElement, key++)); //recursive call
}
}
I have all the properties, now I can create the element.
return React.createElement(nodeName, attributes, children);
My function is now complete. I call my wrapper function in different locations based on the type of the element and its attributes.
const domParser: any = new DOMParser();
export const htmlProcessor = (html: string): ReactNode => {
if (typeof html !== 'string') {
console.error('htmlParser: html is not a string');
return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
}
let doc = domParser.parseFromString(html, 'text/html');
if (doc === null) {
console.error('htmlParser: unable to process html');
return React.createElement('p', {}, 'errors: please check dev console') as ReactNode;
}
return converter(doc as unknown as HTMLElement, 1);
};
const converter = (element: HTMLElement, key = 0) => {
if (element === undefined) {
return;
}
const nodeName = element.nodeName.toLowerCase();
let attributes: { [key: string]: string | any } = {};
for (var i = 0; element.attributes && i < element.attributes.length; i++) {
const attribute = element.attributes[i];
const reactName = toCamelCase(attribute.name);
let value = attribute.name === 'style' ? convertStyleStringToObject(attribute.value) : attribute.value.trim();
attributes[reactName] = value;
}
attributes['key'] = key++;
let children: ReactNode[] = [];
/**
* wrapper logic here and return the element. If I only want the attributes
* to be processed, but dont want to process the children.
*/
for (let i = 0; i < element.childNodes.length; i++) {
let child = element.childNodes[i];
if (child['nodeName'] === '#text') {
if (child.textContent) {
const content = child.textContent.replaceAll('\n', '');
content && child.push(content);
}
continue;
}
if (child['nodeName'] === '#comment') {
continue;
}
if (nodeName !== 'script' && nodeName !== 'style') {
children.push(converter(child as HTMLElement, key++));
}
}
/**
* wrapper logic here and return the element. If I want the attrbibutes
* and children to be processed.
*/
return React.createElement(nodeName, attributes, children);
};
End
This was a fun problem to solve, well that marks the end. ~seeya
I mention the edge cases below, Only read it, if you are interested.
Edge cases
There are the following edge cases that I have not covered in this article. But I have in my project,
img tag should not have child param passed to it
if (nodeName === 'img') { return React.createElement(nodeName, { key: key++, ...attributes }, null); }
if its a text node, then its children should be a string.
if (element.nodeType === 3) { return React.createElement(nodeName, { ...attributes, key: key++ }, element.textContent); }
handling ‘<head>’ tag is different too, because each tag with its different property servers a different purpose.
let el; switch (nodeName) { case 'meta': el = element as HTMLMetaElement; // meta tag has http-equiv attribute & content attribute, but no children return React.createElement(nodeName, { httpEquiv: el.httpEquiv, content: el.content, key: nodeName + key++ }, null); case 'link': el = element as HTMLLinkElement; let type = null; if (el.type) { type = el.type; } // external link has rel attribute & href attribute, but no children return React.createElement( nodeName, { href: el.href, rel: el.rel, type, key: nodeName + key++, }, null ); case 'br': return React.createElement(nodeName, { key: nodeName + key++ }, null); case 'title': case 'script': case 'style': // script/style/title tags have no children // script & style have contents in them, so innerHTML return React.createElement(nodeName, { dangerouslySetInnerHTML: { __html: element.innerHTML, key: nodeName + key++ }, }); }