python 统计各html标签的个数

统计所处理的所有网页文件中的各个HTML标签的出现次数
在屏幕上分行显示出现最多的三个标记及其出现次数

img


集合里标签的类型有重复,该怎么修改呢?底下的函数怎么写呢?

img


谢谢!

不好意思没看懂你的问题描述……

不过这个功能可以直接调用第三方的html解析库来实现,例如beautifulsoup:


def count_html_tags(html_doc):
  # Parse the HTML document using BeautifulSoup
  soup = BeautifulSoup(html_doc, 'html.parser')

  # Create a dictionary to store the counts of each tag
  tag_counts = {}

  # Iterate over all the tags in the document
  for tag in soup.find_all():
    # If the tag is not in the dictionary, add it and set the count to 1
    if tag.name not in tag_counts:
      tag_counts[tag.name] = 1
    # If the tag is already in the dictionary, increment the count by 1
    else:
      tag_counts[tag.name] += 1

  return tag_counts

html_doc = """
<html>
  <head>
    <title>Example HTML Document</title>
  </head>
  <body>
    <h1>Hello, world!</h1>
    <p>This is an example HTML document.</p>
    <ul>
      <li>Item 1</li>
      <li>Item 2</li>
      <li>Item 3</li>
    </ul>
  </body>
</html>
"""

tag_counts = count_html_tags(html_doc)
print(tag_counts)
# Output: {'html': 1, 'head': 1, 'title': 1, 'body': 1, 'h1': 1, 'p': 1, 'ul': 1, 'li': 3}