I'm trying to create a webpage version control backup / log. Where if the webpage (including JS and CSS) gets altered it saves a static copy on the drive.
How do I get
the CSS and javascript of a webpage? Getting the HTML is easy by simply connecting to the webpage and read the contents and return it. But how do I get the CSS & Javascript of this page too?
The system doesnt have direct access to the webserver(s) so I have to do everything over the network
remotely.
My idea is I search the HTML I scraped for .css
and '.js' and take everything until the first quote "
and directly access the CSS / javascript file as webpage. But I think this might not be very reliable?
Not sure why this is marked as too broad. I'm asking how to get the CSS and javascript of a webpage. I reformed my question, hopefully its better now.
Instead of searching for .js
and .css
, I'd look for <script>
and <link>
tags instead and use their src
and href
properties respectively to perform another network request and retrieve those files for comparison.
This will be more reliable because you won't have to worry about the page's content containing js
or css
, and you could also use an XML parser to ensure things like single-quotes vs. double aren't an issue.