重构技巧[关闭]

I don't really have experience in factoring. My code is really long, i don't use functions because i don't know if it needs to be a function. I hope you could give me tips so i could clean up my code.

<?php

# Required files
include("simple-html-dom.php");
require("{$_SERVER['DOCUMENT_ROOT']}/config/pipeline-x.php");

# Define variables
$fn = urlencode($_REQUEST['fn']);
$ln = urlencode($_REQUEST['ln']);

# Connect to database
$db = new px_dbasei();
$db->connect("192.168.50.70", "****", "****", "piasdgeline_tesh45t");

# Query database if a record exist
$sql = "SELECT * FROM linkedin_parse "
       ."WHERE "
       ."`first_name` = '{$fn}' AND "
       ."`last_name` = '{$ln}' ";
$results = $db->query($sql);

# If there is no result
if($results->num_rows == 0):

    # Search linkedin and download page
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://www.linkedin.com/pub/dir/?first={$fn}&last={$ln}&search=Search");
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($ch, CURLOPT_TIMEOUT, 8);
    $res = curl_exec($ch);
    curl_close($ch);
    $html = str_get_html($res);

    # Parse records from the download page
    foreach($html->find('li.vcard') as $vcard):
            $table = array();  

        foreach($vcard->find('span.given-name') as $given_name):                    
            $table['first_name'] = (trim(addslashes($given_name->plaintext), " "));
        endforeach; 
        foreach($vcard->find('span.family-name') as $family_name):
            $table['last_name'] = (trim(addslashes($family_name->plaintext)," "));
        endforeach;
        foreach($vcard->find('span.location') as $location):
            $table['location'] = (trim(addslashes($location->plaintext), " "));
        endforeach;
        foreach($vcard->find('span.industry') as $industry):
            $table['industry'] = (trim(addslashes($industry->plaintext), " "));
        endforeach;
        foreach($vcard->find('dd.current-content') as $headline):
            $table['headline'] = (trim(addslashes($headline->plaintext), " "));
        endforeach;
        foreach($vcard->find('a.btn-primary') as $url):
            $table['url'] = addslashes($url->href);
        endforeach;

        # Insert generated results to the database
        $sql = "INSERT INTO linkedin_parse (`first_name`,`last_name`,`location`,`industry`,`headline`,`url`) "
              ."VALUES "
              ."('{$table['first_name']}',"
              ."'{$table['last_name']}',"
              ."'{$table['location']}',"
              ."'{$table['industry']}',"
              ."'{$table['headline']}',"
              ."'{$table['url']}')";
        $db->query($sql);

        # Get last insert id and query database again
        $new_id = $db->insert_id();
        $sql2 = "SELECT * FROM linkedin_parse WHERE `linkedin_parse_id` = '{$new_id}'";
        $result = $db->query($sql2);

        # Display results in HTML
        ?>
        <ol>
            <?php while($row = $result->fetch_assoc()): ?>
                <li class="vcard">
                    <span class="given-name"><?php echo $row['first_name'] ?></span>
                    <span class="family-name"><?php echo $row['last_name'] ?></span>
                    <span class="location"><?php echo $row['location'] ?></span>
                    <span class="industry"><?php echo $row['industry'] ?></span>
                    <dd class="current-content">
                        <span><?php echo $row['headline'] ?></span>
                    </dd>
                    <a href="<?php echo $row['url'] ?>"></a>
                </li>
            <?php endwhile; ?>
        </ol>
        <?php
    endforeach;
else:
    # Query database if record is 30 days old
    $sql = "SELECT * FROM linkedin_parse "
          ."WHERE "
          ."`first_name` = '{$fn}' AND"
          ."`last_name` = '{$ln}' AND"
          ."`date_inserted` >= DATE_SUB(NOW(), INTERVAL 30 DAY)";
    $results = $db->query($sql);

    if($results->num_rows != 0):
        # Retrieve from database
        $sql = "SELECT * FROM linkedin_parse "
          ."WHERE "
          ."`first_name` = '{$fn}' AND"
          ."`last_name` = '{$ln}' ";
        $result = $db->query($sql);

        # Display results in HTML
        ?>
        <ol>
            <?php while($row = $result->fetch_assoc()): ?>
                <li class="vcard">
                    <span class="given-name"><?php echo $row['first_name'] ?></span>
                    <span class="family-name"><?php echo $row['last_name'] ?></span>
                    <span class="location"><?php echo $row['location'] ?></span>
                    <span class="industry"><?php echo $row['industry'] ?></span>
                    <dd class="current-content">
                        <span><?php echo $row['headline'] ?></span>
                    </dd>
                    <a href="<?php echo $row['url'] ?>"></a>
                </li>
            <?php endwhile; ?>
        </ol>
        <?php
    else:
        # Search linked-in for updated records
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_URL, "http://www.linkedin.com/pub/dir/?first={$fn}&last={$ln}&search=Search");
            curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
            curl_setopt($ch, CURLOPT_HEADER, 0);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
            curl_setopt($ch, CURLOPT_TIMEOUT, 8);
            $res = curl_exec($ch);
            curl_close($ch);
            $html = str_get_html($res);

            # Parse records from the download page
            foreach($html->find('li.vcard') as $vcard):
                $table = array();  

                foreach($vcard->find('span.given-name') as $given_name):                    
                    $table['first_name'] = (trim(addslashes($given_name->plaintext), " "));
                endforeach; 
                foreach($vcard->find('span.family-name') as $family_name):
                    $table['last_name'] = (trim(addslashes($family_name->plaintext)," "));
                endforeach;
                foreach($vcard->find('span.location') as $location):
                    $table['location'] = (trim(addslashes($location->plaintext), " "));
                endforeach;
                foreach($vcard->find('span.industry') as $industry):
                    $table['industry'] = (trim(addslashes($industry->plaintext), " "));
                endforeach;
                foreach($vcard->find('dd.current-content') as $headline):
                    $table['headline'] = (trim(addslashes($headline->plaintext), " "));
                endforeach;
                foreach($vcard->find('a.btn-primary') as $url):
                    $table['url'] = addslashes($url->href);
                endforeach;

                # Update records
                $sql = "UPDATE linkedin_parse "
                      ."SET "
                      ."`date_inserted` = now(),"
                      ."`first_name` = '{$table['first_name']}',"
                      ."`last_name` = '{$table['last_name']}', "
                      ."`location` = '{$table['location']}', "
                      ."`industry` = '{$table['industry']}', "
                      ."`headline` = '{$table['headline']}', "
                      ."`url` = '{$table['url']}' "
                      ."WHERE "
                      ."`first_name` = '{$table['first_name']}' AND"
                      ."`last_name` = '{$table['last_name']}' AND "
                      ."`location` = '{$table['location']}' ";
                $result = $db->query($sql);
                ?>
                <ol>
                    <?php while($row = $result->fetch_assoc()): ?>
                        <li class="vcard">
                            <span class="given-name"><?php echo $row['given-name'] ?></span>
                            <span class="family-name"><?php echo $row['family-name'] ?></span>
                            <span class="location"><?php echo $row['location'] ?></span>
                            <span class="industry"><?php echo $row['industry'] ?></span>
                            <dd class="current-content">
                                <span><?php echo $row['headline'] ?></span>
                            </dd>
                            <a href="<?php echo $row['url'] ?>"></a>
                        </li>
                    <?php endwhile; ?>
                </ol>
                <?php

            endforeach;
    endif;
endif;

As a general concept, I would recommend a few things:

  • "DRY" or "Don't Repeat Yourself" is a good concept to start with, as others have mentioned. If you do something more than once, chances are that it deserves its own function or can be simplified.
  • While typically applied to object-oriented programming, the "single responsibility principle" can be well applied to functions for their maintainability, but you should also avoid creating functions just because you can (function calls require overhead). Eventually, collections of functions often end up in reusable classes.
  • "KISS" or "Keep it simple, stupid" (note: this is better looked at as a self-referencing "stupid", like when someone says, "I'm such an idiot!" when they figure something out) -- Simplify your logic and code whenever you can. "The simplest explanation is usually the correct one."

To apply these concepts, here is how I would re-structure (not how I would write) your script:

  1. Query whether any profiles match that are less that 30 days old, since you are updating the profiles when none exist or all are more than 30 days old.
  2. If you didn't return any rows:
    • Query whether any profiles match at all, to determine updates vs inserts.
    • Download and parse the page.
    • Save the new/updated records.
    • Keep the matches you parsed, rather than downloading from the database what you just inserted/updated.
  3. Finally, display your matches (regardless of whether they came from database or parsing).

By restructuring your code this way:

  • You eliminate most of the duplication and potential confusion
  • You simplify your codepath
  • You group your logical components (search, parse/store if necessary, display)
  • All of your HTML can be placed at the end of your script, which is more readable (or can be easily placed in a separate file)
  • The places where a function will make sense will be more apparent, such as your parsing loops.

Last note -- you should place an emphasis on security whenever you process data from an "unknown source" (user, website, provided file, etc.). While addslashes() and urlencode() are a nice idea, there are a number of resources that can help you understand how to avoid SQL Injection, cross-site scripting and other potential threats. An example of a risk in your code is the use of $_REQUEST without escaping your database query.