日志文件与数据库在哪里保存用户活动数据以进行分析?

I am currently working on a website which have functionality of login. I need to track user activities like time of login-logout, total duration of browsing, IP Address , location etc. This all data will be used for analysis and security purposes.

Now , there are two options (at least i know) to save such a huge data either in database or in log files.

What is right thing to do save in DB or in logs? .

In case anyone wants to know, i am using using PHP as programming language and MySQL as DB and don't have any experience in data analysis.

Better to go with DB because if you want to analyze or sort login tries by IP, location ..etc. you can easily do that with MySQL queries but when you go to log you should have an editor and search for something will be really hard. I personally log the same functionality in my app here is some code how to get browser info and IP.

<?php

function log_login_activity($loginEmail, $loginAuthType = '', $loginAttemptStatus = '', $error = '', $loginRedirect = '',$HeaderInfo = ''){
    $loginTime = time();
    $browserInfo = getBrowser();
    $browser = $browserInfo['name'].' '.$browserInfo['version'];
    $loginIP = isset($_SERVER['HTTP_X_FORWARDED_FOR']) ? $_SERVER['HTTP_X_FORWARDED_FOR'] : $_SERVER['REMOTE_ADDR'];
    $protocol = (!empty($_SERVER['HTTPS']) && $_SERVER['HTTPS'] !== 'off' || $_SERVER['SERVER_PORT'] == 443) ? "HTTPS" : "HTTP";
    $browserAgent = $browserInfo['userAgent'];
    DB::insert('?:login_logs',array('email' => $loginEmail, 'time' =>$loginTime, 'browserInfo' =>$browser, 'loginAuthType' =>$loginAuthType, 'IP' =>$loginIP, 'error' => $error, 'protocol' => $protocol, 'loginRedirect' => $loginRedirect, 'browser' => $browserAgent));
}

I think DB is the right choice here. It's far more powerful & flexible. Otherwise, you'll just end up with (multiple?) large & meaningless files.

That's definitely depends on two things:
1. Users actions amount.
2. Data usage scenarios.
For instance, if there are 500000 new daily records and everything you want is to do some aggregation analysis then you can save log data to HDFS and do analytics using Apache Hive or Apache Spark.
If the data amount is still huge and you want to do analytics and besides you want to have ability of action record retrieval based on user and timestamp, then you need to save data in some key-value database (like Apache Cassandra) first and then perform analytics using Apache Spark. You can read more about Cassandra and Big Data scenarios here (disclaimer: I work at this company).
If there are 2000 records daily, you just put it to any relational database and do analysis right there, and it would be the best solution.

It's worth stepping back here and analysing the requirements.

Typically, business users need to understand the business-focused behaviour of the site. How many people logged in yesterday? How much time did they spend on the site? Did they buy something?

The common way to meet this requirement is by configuring an analytics package (e.g. Google Analytics). Analytics packages are good at understanding behaviour on the web site, and can be configured easily to change the reporting and analysis structures. However, they're usually not very good at reporting on individual actions, and their reporting is based on "web behaviour" - you have to translate "clicked on add to cart button" to the business concept of "bought something".

Customer support users, and application logic, needs to understand the specific behaviour of individuals. When customer support get a call saying "Help, I can't log in", they probably want to know when's the last time this user logged in? If an application logic module wants to know whether this user is interested in product X, it needs to be know whether they looked at related products.

This data is usually included as relational data in a database, because it's easy to query. However, it's hard to modify relational models, and non-technical users can't write SQL queries, so it's much more rigid.

Technical users need to understand the health of the application, and be able to investigate incidents.

This information is usually stored as log files. Log files are often huge - a moderately busy website will create apache logs of many gigabytes per day - and can only be queried using dedicated log parsing tools; these are aimed at technical users, not business people. Log files are often retained for a short period (weeks or months), and rotated once a day. So, answering the question "when did user x last log in" may require parsing a month's worth of log files, and if you delete logs after a month, you may not get the right answer. However, log statements are easy to insert in the code, and changing the logging (e.g. by only recording "error", and not "debug" messages) is easy.

So, for "analysis" (I'm assuming that's by business users) - insert into a database or use web analytics. For "security purposes" (I'm assuming that's for incident analysis by technical users) - log files.