用php(或python)删除<li>标签之间的html换行符

I have a large data set of html text, and I frequently find unnecessary, and sometimes multiple, <br> line breaks within <li> tags.

For example:

<li>Some string here<br></li><br><li>Another string here<br><br></li><br>

I would like to remove these <br> that appear between <li> and </li> but preserve everything else, including <br> outside of <li> tags. The text above would become:

<li>Some string here</li><br><li>Another string here</li><br>

What is the regular expression for doing this with preg_replace() in php (or re.sub() in python)?

replacing (<br>)+</li> with </li> will take care of line breaks at the end of the li content at least, which may be good enough for you. Otherwise, as <li>s sometimes contain other <li>s, you may have a pretty difficult task in front of you that cannot easily be solved with regexp (and maybe cannot be solved with regexp alone at all), see the accepted answer to this question.

Using PHP Simple HTML DOM Parser you can achieve this easily (just like jQuery)

include('simple_html_dom.php');
$html = str_get_html('<li>Some string here<br></li><br><li>Another string here<br><br></li><br>');
foreach($html->find('li br') as $br){
    $br->outertext='';
}
echo $html;

The output will be

<li>Some string here</li><br>
<li>Another string here</li><br>