如何从浏览器中读取.vcf文件?

I am trying to retrieve all the email addresses from the exhibitors of the IFA Berlin. This is pretty easy to crawl though.

But as a tricky part, they just allow us to download a .vcf file or to send an email (throught their server I guess). I would like to find that email address without downloading that vcf file. Otherwise I could download it and read it easily using PHP (since my crawler is also in PHP).

This is also my first question here after lurking for years! Nice meeting you guys.

How to read .vcf file from browser?

This file will always be a file download and never displayed in a browser. One way to make it work is to setup a custom browser extension, which temporary stores the file and parses the microformat and displays the information.

PHP scraping approach

There are vcard parsers out there: https://github.com/nuovo/vCard-parser but i think you could base this on a RegExp solution: /EMAIL;INTERNET:(.*)/.

Let's pretend, your first scraping run gives you a list of attendee IDs, then your second (vcard) scraping run could fetch and extract the name and emails by ID:

<?php

function getVcard($id) {
    return file_get_contents('http://www.virtualmarket.ifa-berlin.de/?Action=attendeeVcard&id=' . $id);
}

function getEmailFromVcard($vcard)
{
    preg_match('/EMAIL;INTERNET:(.*)/', $vcard, $matches);
    if(isset($matches[1])) {
        return $matches[1];
    }
}

function getNameFromVcard($vcard)
{
    preg_match('/N:(.*);;/', $vcard, $matches);
    if(isset($matches[1])) {
        $array = explode(';', $matches[1]);
        $name = trim($array[1]) . ' ' . trim($array[0]);
        return $name;
    }
}

$id = 1775586;

$vcard = getVcard($id);
$email = getEmailFromVcard($vcard);
$name = getNameFromVcard($vcard);

echo $name . ' ' . $email;