I'm running a LAMP web server.
I'd like to include my script files on my page with:
<script src="http://domain.com/script.js"></script>
I would like visiting http://domain.com/script.js
to display either an error, or a blank page.
I've seen other similar questions, of which the answer was "just obfuscate it", or "security by obfuscation is bad".
This isn't for the sake of security. I'm wanting to stop bots from pulling my code automatically. I'm ok with human users getting the code. I would simply like this as an alternative to obfuscation.
I've already attempted this with the use of base64_encoded $_GET
and $_SESSION
parameters. I'm wondering if there's a more elegant solution out there.
CLARIFICATION:
I am aware that Javascript is still available to the user. I am perfectly fine with the code being accessible via Firebug, Chrome's developer tools, etc. I simply want the code accessible via my tags, and inaccessible directly. This is not for security, and not to "hide" my code.
Clarification 2:
The reason I need this is because our company recently found a competitor running scripts to scrape data off of our site. I would like to be able to prevent the data from being scraped via their script, and force them to do it manually.
I opted to just pursue the $_SESSION
/$_GET
/$_POST
-gated script I had started before visiting StackOverflow.
The solution's not perfect, but it suits my needs, in that the scripts are accessible via my tags, but inaccessible directly. This is a simplified version of what I am doing:
File 1 is the PHP file generating the HTML page the user sees. This file creates a random value, and sets the value to the session. The script file File 2 is included using this random value as a GET parameter.
File 1:
<?php
session_start();
$gate['first_gate'] = crypt((time() * md_rand()) . 'salt');
$gate['second_gate'] = null;
$_SESSION['gate'] = json_encode($gate);
?>
<html>
...
<!--this is just the HTML page including the script-->
<script src="file_2.php?gate=<?=base64_encode(json_encode($gate))?>"></script>
...
</html>
File 2 is the PHP file functioning as a gate for the actual JavaScript code. It verifies that the randomized session variable is equal to the GET parameter, then grabs the code from File 3 using a POST request.
File 2:
<?php
$session_gate = json_decode($_SESSION['gate']);
$get_gate = json_decode(base64_decode($_GET['gate']));
//Exit if the session value != the get value
if($get_gate->first_gate != $session_gate->first_gate) exit;
//Set first gate to null to prevent re-visit
$session_gate->first_gate = null;
$session_gate->second_gate = crypt((time() * md_rand()) . 'salt');
$_SESSION['gate'] = json_encode($session_gate);
header('Content-Type: application/javascript');
?>
//This is visible via "view source" (then clicking on the script's URL)
//Grab the actual JS file, hidden behind a POST "wall"
$.post("file_3.php", { gate: '<?=base64_encode($_SESSION['gate'])?>' });
File 3 is inaccessible when directly viewing the page, as it exits without the POST data from File 2. Bots will still be able to ping it with a POST request, so some additional safety measures should be added here.
File 3:
<?php
$session_gate = json_decode($_SESSION['gate']);
$post_gate = json_decode(base64_decode($_POST['gate']));
//Exit without a POST request. Use a more specific value, other than
//the $_POST superglobal by itself (just using $_POST for illustrative purposes)
if(!$_POST) exit; //or print an error message
//Exit if the session value != the get value
if($get_gate->second_gate != $session_gate->second_gate) exit;
//Set both gates to null to prevent re-visit
$session_gate->first_gate = null;
$session_gate->second_gate = null;
$_SESSION['gate'] = json_encode($session_gate);
//Additional safety measures (such as IP address/HOST check) here, if desired
header('Content-Type: application/javascript');
?>
//Javascript code here
I'm asking, quite specifically, how to make the file accessible via my
<script>
tag, and inaccessible directly.
Two options come to mind:
As @MichaelBerkowski pointed out, this is very similar to the common requirement of not allowing hotlinking of images, and the same sorts of solutions apply, with the same caveats and pitfalls. Basically, it's either of the following or both in combination:
Checking REFERER
(sic) headers on requests for your JavaScript files and denying those requests if REFERER
doesn't refer to one of your pages.
Remembering the IP addresses of machines that request your HTML pages for a brief time (say, up to a minute), and only allowing those IP addresses to download the JavaScript files, denying attempts from all other IP addresses.
The first is trivial, but also trivially bypassed. The second is a lot less trivial, but also readily bypassed (by simply issuing a request for the HTML and then disregarding the result), but does at least require that the request be made.
An alternative to doing that is to use an Apache module to minify the script and inject it into your HTML file at the point you have your <script src="myfile.js"></script>
tag, resulting in a <script>codehere</script>
tag instead. Then there's no JavaScript file to request. This has the downside of meaning that the same JavaScript on multiple pages doesn't benefit from caching, but then again it has the upsides of A) Not requiring a separate HTTP request, and B) Making it impossible for people to download your JavaScript files (as you'd simply not host them externally-visible at all).
Neither of the above means people can't get access to your code, because fundamentally that's impossible (the best you can do is obfuscate, and de-obfuscators are pretty good; fundamentally if the browser can run your script, anyone can see it), but it's clear from the comments on the question that you understand that.
With your clarification #2 in mind, you might consider using PHP sessions.
You could first have the user hit a page that requires a captcha to proceed. Once the captcha is submitted and verified, a PHP session is started (or updated) with a boolean $isHuman
that shows you are indeed dealing with a human.
Requests for scripts are directed to a php page that serves the script only if a session exists and $isHuman
is true.
As several people have tried to explain in the comments, this isn't really possible because the server how no way to know whether a JS file is being requested as part of the HTML page or on it's own.
The closest you are going to get to achieving this is by creating a random string and appending it to your script when the HTML is generated and checking for that string when the JS is called.
This is how CAPTCHA's work, BTW.
In you HTML
<?php
session_start(); //start session
// Function to generate random str, borrowed from here: http://stackoverflow.com/questions/4356289/php-random-string-generator
function randStr() {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$randomString = '';
for ($i = 0; $i < 10; $i++) $randomString .= $characters[rand(0, strlen($characters) - 1)];
return $randomString;
}
// set random str to session variable
$_SESSION['JS_STR'] = randStr();
// append random string to JS file, which will have to have a php extention
echo "<script src='myjavascriptfile.php?str=".$_SESSION['JS_STR']."' />";
?>
In you JS
You will have to change your .js file to .php
<?php
session_start();
//check make sure session variable matches the appended string
if(!isset($_SESSION['JS_STR']) || !isset($_GET['str']) || $_GET['str'] !== $_SESSION['JS_STR']) die("you don't have permission to view this");
//tell the browser your serving some JS
header('Content-Type: application/javascript');
?>
window.alert("your JS goes here...");
Following your answer, this is a simplified version of your solution:
<?php
//file1
session_start();
$token = uniqid();
$_SESSION['token'] = $token;
?>
<!--page html here-->
<script src="/js.php?t=<?php echo $token;?>"></script>
.
header('Content-Type: application/javascript');
$token = isset($_GET['t'])? $_GET['t'] : null;
if(!isset($_SESSION['token']) || $_SESSION['token'] != $token){
//lets mess with them and inject some random js, ih this case a random chunk of compressed jquery
die('n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return n.each(this,a,b)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)}');
}
//regenerate token, this invalidates current token
$token = uniqid();
$_SESSION['token'] = $token;
?>
$.getScript('js2.php?t=<?php echo $token;?>');
.
<?php
//js2.php
//much the same as before
session_start();
header('Content-Type: application/javascript');
$token = isset($_GET['t'])? $_GET['t'] : null;
if(!isset($_SESSION['token']) || $_SESSION['token'] != $token){
//lets mess with them and inject some random js, ih this case a random chunk of compressed jquery
die('n=function(a,b){return new n.fn.init(a,b)},o=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,p=/^-ms-/,q=/-([\da-z])/gi,r=function(a,b){return b.toUpperCase()};n.fn=n.prototype={jquery:m,constructor:n,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=n.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return n.each(this,a,b)},map:function(a){return this.pushStack(n.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)}');
}
unset($_SESSION['token']);
//get actual js file, from a folder outside of webroot, so it is never directly accessable, even if the filename is known
readfile('../js/main.js');
Note the main changes are:
Simplifying the token system. As the token is in the page source, all it needs to do to function is to be unique, attempts to make it 'more secure' with encoding and salts etc do nothing.
The actual js file is saved outside the web root, so its not possable to access directly even if you know the filename
Please note that i still stand by my comment about IP banning bots. This solution will make scraping a lot harder, but not impossible, and could have unforeseen consequences for genuine visitors.