JS中的Selection光标对象(window.getSelection()得到的)怎样优化其算法才可以应对网页中各种复杂的多套娃文本内容进行准确无误的高亮标记?

情况是这样的,我做了一个JS插件,点击网页中某个单词后插件可以高亮显示该单词(红色背景标记)与其所在的句子(蓝色背景标记)。

由于英文文章偶尔会用到粗体斜体之类的标签,插件遍历同级标签时对这类句子中的标签总是以单句或单个单词标记,如图:

img

img

img

img

img

img

所点单词在粗/斜体单词之前,该句标记倒是正常显示出来(由于直接改动innerHtml以插入标签拆分单词句子以及改变背景颜色,所以原先的em、strong之类粗斜体标签都会被覆盖,这个是小问题)

img

img

JS完整代码:


/**
 * @param {function} callback 回调函数
 * @param {array} wordReserved 单词保留列表
 * @param {array} wordEnd 单词结束标志列表
 * @param {array} sentenceReserved 段落保留列表
 * @param {array} sentenceEnd 段落结束标志列表
 */

function plugin(callback, wordReserved, wordEnd, sentenceReserved, sentenceEnd) {
    if (typeof callback !== "function") {
        window.pluginState = false;
        restore();
    }

    window.pluginOpen = function () {
        window.pluginState = typeof callback == "function";
    }

    window.pluginClose = function () {
        restore();
        window.pluginState = false;
    }

    window.plugLastElements = [];
    window.pluginStateLastInnerHTMLs = [];

    function check(letter) {
        if ((letter >= 'A' && letter <= 'Z') || (letter >= 'a' && letter <= 'z') || wordReserved.indexOf(letter) >= 0) {
            return true;
        }
        return false;
    }

    function render(paragraph, index) {

        let newParagraph = paragraph;
        if (index.wordI != null && index.wordJ != null && index.sentenceI != null && index.sentenceJ != null) {
            newParagraph = insertStr(newParagraph, index.sentenceJ, "</span>");
            newParagraph = insertStr(newParagraph, index.wordJ, "</span>");
            newParagraph = insertStr(newParagraph, index.wordI, "<span style='background:red'>");
            newParagraph = insertStr(newParagraph, index.sentenceI, "<span style='background:blue'>");
        } else if (index.sentenceI != null && index.sentenceJ != null) {
            newParagraph = insertStr(newParagraph, index.sentenceJ, "</span>");
            newParagraph = insertStr(newParagraph, index.sentenceI, "<span style='background:blue'>");
        }
        return newParagraph;
    }

    function getIndex(paragraph, startIndex) {
        paragraph = paragraph.replace(/&nbsp;/g, " ");
        console.log(paragraph);
        let wordI = null;
        let wordJ = null;
        let sentenceI = null;
        let sentenceJ = null;

        for (let i = startIndex; i >= 0; i--) {

            if (wordI == null) {
                for (const end of wordEnd) {
                    for (let j = 0; j <= end.length; j++) {
                        if (paragraph.substring(i - end.length + j, i + j) == end) {
                            let isReserved = false;
                            for (const reserved of wordReserved) {
                                for (let k = 0; k < reserved.length; k++) {
                                    if (paragraph.substring(i - reserved.length + k, i + k) == reserved) {
                                        isReserved = true;
                                        break;
                                    }
                                }
                                if (isReserved) {
                                    break;
                                }
                            }
                            if (!isReserved) {
                                wordI = i + 1;
                                break;
                            }
                        }
                    }
                    if (wordI != null) {
                        break;
                    }
                }
            }

            if (sentenceI == null) {
                for (const end of sentenceEnd) {
                    for (let j = 0; j <= end.length; j++) {
                        if (paragraph.substring(i - end.length + j, i + j) == end) {
                            let isReserved = false;
                            for (const reserved of sentenceReserved) {
                                for (let k = 0; k < reserved.length; k++) {
                                    if (paragraph.substring(i - reserved.length + k, i + k) == reserved) {
                                        isReserved = true;
                                        break;
                                    }
                                }
                                if (isReserved) {
                                    break;
                                }
                            }
                            if (!isReserved) {
                                sentenceI = i;
                                break;
                            }
                        }
                    }
                    if (sentenceI != null) {
                        break;
                    }
                }
            }

            if (wordI != null && sentenceI != null) {
                break;
            }
        }

        for (let i = startIndex; i < paragraph.length; i++) {

            if (wordJ == null) {
                for (const end of wordEnd) {
                    for (let j = 0; j <= end.length; j++) {
                        if (paragraph.substring(i + end.length + j, i + j) == end) {
                            let isReserved = false;
                            for (const reserved of wordReserved) {
                                for (let k = 0; k <= reserved.length; k++) {
                                    if (paragraph.substring(i - reserved.length + k, i + k) == reserved) {
                                        isReserved = true;
                                        break;
                                    }
                                }
                                if (isReserved) {
                                    break;
                                }
                            }
                            if (!isReserved) {
                                wordJ = i + end.length;
                                break;
                            }
                        }
                    }

                    if (wordJ != null) {
                        break;
                    }
                }
            }

            if (sentenceJ == null) {
                for (const end of sentenceEnd) {
                    for (let j = 0; j <= end.length; j++) {
                        // if (paragraph.substring(i + end.length + j, i + j) == ".") {
                        // }
                        if (paragraph.substring(i + end.length + j, i + j) == end) {
                            let isReserved = false;
                            for (const reserved of sentenceReserved) {
                                for (let k = 0; k <= reserved.length; k++) {
                                    if (paragraph.substring(i - reserved.length + k, i + k) == reserved) {
                                        isReserved = true;
                                        break;
                                    }
                                }
                                if (isReserved) {
                                    break;
                                }
                            }
                            if (!isReserved) {
                                sentenceJ = i + j + end.length;
                                while (paragraph[sentenceJ - 1] == " ") {
                                    sentenceJ--;
                                }
                                break;
                            }
                        }
                    }

                    if (sentenceJ != null) {
                        break;
                    }
                }
            }

            if (wordJ != null && sentenceJ != null) {
                break;
            }
        }

        return {
            wordI: wordI,
            wordJ: wordJ,
            sentenceI: sentenceI,
            sentenceJ: sentenceJ,
        };

    }

    function insertStr(soure, start, newStr) {
        return soure.slice(0, start) + newStr + soure.slice(start);
    }

    function tranSpan(ele) {
        if (ele.parentElement.innerText == ele.textContent) {
            ele = ele.parentElement;
            ele.innerHTML = ele.textContent.replace(/\s/g, " ");
            return ele;
        } else {
            let newSpan = document.createElement("span");
            newSpan.innerHTML = ele.textContent.replace(/\s/g, " ");
            ele.parentElement.insertBefore(newSpan, ele);
            ele.parentElement.removeChild(ele);
            return newSpan;
        }
    }

    function restore() {
        for (let i = 0; i < window.plugLastElements.length; i++) {
            const ele = window.plugLastElements[i];
            ele.innerHTML = window.pluginStateLastInnerHTMLs[i];
        }
    }

    function forText(selNode) {
        console.log("here");
        console.log(selNode);
        if(selNode.hasChildNodes()){
            console.log("here1");
            for (let i = 0;i < selNode.childNodes.length;i++) {
                console.log(selNode.childNodes[i]);
                // 检索段落 text 节点
                if (selNode.childNodes[i].nodeType == 3) {
                    console.log("here11");
                    forText(selNode.parentNode);
                }
                console.log("here12");
            }
            if(selNode.previousSibling){
                return selNode.parentNode;
            }
            else if(selNode.nextSibling){
                return selNode.parentNode;
            }
            return selNode;
        }
        else{
            if(selNode.previousSibling){
                return selNode.parentNode;
            }
            else if(selNode.nextSibling){
                return selNode.parentNode;
            }
            return selNode;
        }
    }


    if ((navigator.userAgent.match(/(phone|pad|pod|iPhone|iPod|ios|iPad|Android|Mobile|BlackBerry|IEMobile|MQQBrowser|JUC|Fennec|wOSBrowser|BrowserNG|WebOS|Symbian|Windows Phone)/i))) {

        window.addEventListener("touchstart", e => {
            if (!window.pluginState) {
                return;
            }
            restore();
        })
    } else {

        window.addEventListener("mousedown", e => {
            if (!window.pluginState) {
                return;
            }
            restore();
        })
    }
    
    var mousecheck = 0;

    window.addEventListener("mouseup", e => {
        mousecheck++;
        if (!window.pluginState) {
            return;
        }

        let selection = window.getSelection();
        let anchorOffset = selection.anchorOffset;
        let paragraph = selection.focusNode.data;

        if (!paragraph || !check(paragraph[anchorOffset])) {
            mousecheck=mousecheck%2==0?mousecheck:mousecheck+1;
            return;
        }

        if(mousecheck%2==0){
            return;
        }

        let word = "";
        let sentence = "";
        let wordIndex = 0;

        let parentElement = selection.focusNode;
        console.log(parentElement);
        parentElement = tranSpan(parentElement);
        console.log(parentElement);
        
        window.plugLastElements = [];
        window.pluginStateLastInnerHTMLs = [];
        window.plugLastElements.push(parentElement);
        window.pluginStateLastInnerHTMLs.push(parentElement.innerHTML);

        let index = getIndex(parentElement.innerHTML, anchorOffset);
        parentElement.innerHTML = render(parentElement.innerHTML, {
            wordI: index.wordI ? index.wordI-1 : 0,
            wordJ: index.wordJ ? index.wordJ : parentElement.innerHTML.length,
            sentenceI: index.sentenceI ? index.sentenceI : 0,
            sentenceJ: index.sentenceJ ? index.sentenceJ : parentElement.innerHTML.length,
        });

        let element = parentElement;
        let worded = index.wordI != null;

        let wI = index.wordI ? index.wordI-1 : 0;
        let wJ = index.wordJ ? index.wordJ : element.innerText.length;
        word = element.innerText.substring(wI, wJ) + word;
        wordIndex += wI;

        if(index.sentenceI != null){
            if(index.sentenceJ != null)
                sentence = element.innerText.substring(index.sentenceI, index.sentenceJ);
            else
                sentence = element.innerText.substring(index.sentenceI, parentElement.innerHTML.length);
        }
        else{
            if(index.sentenceJ != null)
                sentence = element.innerText.substring(0, index.sentenceJ);
        }
        console.log(index.sentenceI);
        console.log(index.sentenceJ);

        var record1=0,record2=0;

        while (index.sentenceI == null) {
            record1++;
            console.log(worded);
            console.log("sentenceI==null:"+record1);
            console.log(element);
            if (element.previousSibling || element.previousElementSibling) {
                element = tranSpan(element.previousSibling || element.previousElementSibling);
                console.log("previousSibling");
            } else {
                console.log("previousSibling is null");
                break;
            }

            window.plugLastElements.push(element);
            window.pluginStateLastInnerHTMLs.push(element.innerHTML);

            let index2 = getIndex(element.innerText + sentence, element.innerText.length - 1, worded);


            // sentence = (index2.sentenceJ ? (index2.sentenceI ? element.innerText.substring(index2.sentenceI, index2.sentenceJ) : element.innerText.substring(0, index2.sentenceJ)) : element.innerText) + sentence;

            // element.innerHTML = render(element.innerHTML, {
            //     wordI: worded ? null : index2.wordI,
            //     wordJ: worded ? null : index2.wordJ,
            //     sentenceI: index2.sentenceI ? index2.sentenceI : 0,
            //     sentenceJ: index2.sentenceJ ? index2.sentenceJ : parentElement.innerHTML.length,
            // });

            // sentence = (index2.sentenceJ ? element.innerText.substring(0, index2.sentenceJ) : element.innerText) + sentence;
            sentence = (index2.sentenceI ? element.innerText.substring(index2.sentenceI, element.innerText.length) : element.innerText) + sentence;
            element.innerHTML = render(element.innerHTML, {
                wordI: worded ? null : index2.wordI,
                wordJ: worded ? null : index2.wordJ,
                sentenceI: index2.sentenceI ? index2.sentenceI : 0,
                sentenceJ: parentElement.innerHTML.length,
            });

            if (!worded && index2.wordI != null) {
                worded = true;
            }

            wordIndex += index2.sentenceI ? index2.sentenceI : element.innerText.length;

            if (index2.sentenceI != null) {
                break;
            }

        }

        element = parentElement;
        worded = index.wordJ != null;
        while (index.sentenceJ == null) {
            record2++;
            console.log(worded);
            console.log("sentenceJ==null:"+record2);
            console.log(element);
            if (element.nextSibling || element.nextElementSibling) {
                element = tranSpan(element.nextSibling || element.nextElementSibling);
                console.log("nextSibling");
            } else {
                console.log("nextSibling is null");
                break;
            }

            window.plugLastElements.push(element);
            window.pluginStateLastInnerHTMLs.push(element.innerHTML);

            let length = sentence.length;
            index = getIndex(sentence + element.innerText, length, worded);

            // sentence = sentence + (index.sentenceJ ? (index.sentenceI ? element.innerText.substring(index.sentenceI, index.sentenceJ) : element.innerText.substring(0, index.sentenceJ)) : element.innerText);

            // element.innerHTML = render(element.innerHTML, {
            //     wordI: worded ? null : index.wordI - length,
            //     wordJ: worded ? null : index.wordJ - length,
            //     sentenceI: index.sentenceI ? index.sentenceI : 0,
            //     sentenceJ: index.sentenceJ ? index.sentenceJ : parentElement.innerHTML.length,
            // });

            sentence = sentence + (index.sentenceJ ? element.innerText.substring(0, index.sentenceJ) : element.innerText);

            element.innerHTML = render(element.innerHTML, {
                wordI: worded ? null : index.wordI - length,
                wordJ: worded ? null : index.wordJ - length,
                sentenceI: 0,
                sentenceJ: index.sentenceJ ? index.sentenceJ : parentElement.innerHTML.length,
            });


            if (!worded && index.wordJ != null) {
                worded = true;
            }

            if (index.sentenceJ != null) {
                break;
            }
        }

        if(record1==1){
            console.log(123);
            if(index.sentenceJ != null)
                sentence = element.innerText.substring(0, index.sentenceJ);
            if(record2!=0)
                sentence = parentElement.innerText + sentence;
        }
        if(record2==1){
            console.log(456);
            if(index.sentenceI != null)
                sentence = element.innerText.substring(index.sentenceI, parentElement.innerHTML.length);
        }

        console.log(index.sentenceI);
        console.log(index.sentenceJ);
        
        var sentence1="";
        for(var i=0;i<sentence.length;i++){
            if(sentence[i]==" " && (sentence[i+1]==" " || i+1==sentence.length || sentence[i+1]=="\"" || sentence[i+1]=="\'" || sentence[i+1]=="”" || sentence[i+1]=="’")){
                sentence1+=" "+word;
            }
            else{
                sentence1+=sentence[i];
            }
        }

        callback({
            "句子": sentence1.replace(/^\s*|\s*$/g, ""),
            "单词": word.replace(/^\s*|\s*$/g, ""),
            "位置": wordIndex,
            "坐标": {
                "pageX": e.pageX,
                "pageY": e.pageY,
                "screenX": e.screenX,
                "screen": e.screenY,
            }
        })
    })
}

// 配置
plugin(d => {
    console.log(d);
},
    ["'", "-", ".", ","],
    [". ", " ", ", ", "\n", "? ", "! ", ".)", "!)", "?)"],
    ["\"", " ", "'", "·", "-", ","],
    [". ", ".\n", "? ", "! ", "?\n", "!\n", ".)", "!)", "?)"]
)

// 启动
pluginOpen();

// 关闭
// pluginClose();

望给个优化方案🌹

如果不在乎格式,先预处理将多余标签去掉,将段落处理出来放到span class="section"这种容器中,高亮的时候直接设置容器的背景。

要不原来的文章格式化是错误的,如句点在strong之类的标签内还得拆开标签。。

这个不应该用select来做啊。