Every so often when you want to archive a webpage, you notice it’s full of dynamic content and javascript which won’t easily be archived. I was recently looking to archive a matterport 3D image. This is a typical website that won’t easily save using normal web-archivers, as it relies on javascript to dynamically fetch images as you move through the 3D space.
One generic solution to capture something like this is to use a proxy in the web browser and save everything that passes through it. But most proxies only cache things for a limited time and respect headers like no-cache
1. But if the proxy would ignore that and store all requests that flow through it indefinitely, you can maybe create a “snapshot” of a website by browsing it trough this archiving proxy.
Turns out I am not the first one to come up with this idea, there are at least two tools out there which do this. The first one I tried was Proxy Offline Browser, which is a Java GUI application which does this. It worked quite well, but the free version does not do TLS/HTTPS. The Pro version is only 30 euro, but I was curious to see if there was any open-source solution that could do this.
Turns out there is, it’s called WWWOFFLEand it has a lovely compatible webpage. After some trying, I got it working, and I’ll describe rough outlines on how to get it working here. Note though, if you value your time or don’t feel like fiddling around in the terminal, I do recommend just paying 30 euro for the Proxy Offline Browser and be done with it.
First you need to download wwwoffle
source code and ensure you have GNUTLS headers and libraries, so you can use it for HTTPS.
Then compile it with
./configure --prefix=/usr/local/Cellar/wwwoffle/2.9j/ --with-gnutls=/usr/local --with-spooldir=/usr/local/var/run/wwwoffle --with-confdir=/usr/local/etc/
make
make install
Then run it
wwwoffled -c /usr/local/etc/wwwoffle.conf -d
Now there is a few more steps before you can start archiving.
First reconfigure your browser2 to use wwwoffle
as proxy. Then visit https://localhost:8080
in the browser to get to the wwwoffle
page. Using this page, you can control wwwoffle
and see what it has cached.
First, you will need to get the CA certificate, so you won’t get SSL warnings all the time. Go to http://localhost:8080/certificates/root, download and install it.
Then you need to put wwoffled
into online
mode, which you can do here http://localhost:8080/control/
Then configure wwwoffled
itself, which you can do using the built-in web-based configuration tool.
The settings to change are
http://localhost:8080/configuration/SSLOptions/enable-caching to yes
and
http://localhost:8080/configuration/SSLOptions/allow-cacheto allow-cache = *:443
That should hopefully be enough. Now try browsing some website. Then go to the control page and put wwwoffled
into offline
mode. Hopefully, you should still be able to browse the same page, using the cache.
Additionally, I had to add
CensorHeader
{
Access-Control-Allow-Origin = *
}
To http://localhost:8080/configuration/CensorHeader/no-nameto ensure AJAX3 requests worked in some cases.
If you run in to other issues, you can either start debugging or go back and cough up the money :-)
Have you ever wanted to deploy a website to test that it works, without everyone else being able to see it?
If you are using a dynamic language or CMS for your webpage (PHP, Wordpress or Ruby on Rails) there are straightforward ways to accomplish this.
But what happens if you have a static webpage? Here I will present one solution using only a nginx config file to accomplish this.
# first we need to allow access to the soon.html
# and also a logo which is linked from the soon.html
# if your soon.html links more resources in this server
# you need to update the regex to match that also
location ~ /(soon\.html|images/logo_white.png) {
try_files $uri =404;
}
# this is the secret way to get past the block
# it will set a magic cookie with a lifetime of 1 month
# and redirect back to the host
location /iwanttobelieve {
add_header Set-Cookie "iwantto=believe;Domain=$host;Path=/;Max-Age=2629746";
return 302 $scheme://$host;
}
# this is the normal serve, but with a condition that everything
# everyone that does NOT have the magic cookie set will be served
# the content of soon.html
location / {
if ($http_cookie !~* "iwantto=believe") {rewrite ^ /soon.html last; }
try_files $uri $uri/ =404;
}
That it! Copy and paste the above into a server {}
block. Make sure to take not of the order though to ensure you don’t have anything else before this which would take precedence. Then change all occurrences of soon.html
if you use something else. And remember that the first match needs to match everything that this soon.html
tries to reference, otherwise they will just get back the content of /soon.html
for all other requests.
Note that if
is a bit finicky in nginx, check their documentation for more details.
The other day I wanted to use my noscript.it with one of my old iPhone 4S running iOS 6, but I was met with “could not establish a secure connection to the server”.
Turns out it was because I had, out of habit, configured the server with a “modern” list of TLS ciphers. And the poor old iOS 6 didn’t support any of them.
So, I went on a mission to ensure noscript.it works with as old devices as possible.
It turns out enabling TLS1 and TLS1.1 on Ubuntu 20.04 is a bit harder than I expected1. Luckily someone else solved it already.
So now, after using the old mozilla SSL config and appending @SECLEVEL=1
, it works. Even on my vintage iPhone 3G. Hurray!
But, I hear you say, isn’t this less secure? I mean now you only get a B on the Qualys SSL Report! Clearly this is bad!?
Let’s take a step back and think about what the score actually means. noscript.it automatically gets a B because it supports TLS1. But let’s go one step further and assume we’re looking a bank with a C2. A site gets a C if it supports SSLv3, meaning it is vulnerable to the SSLv3 POODLE3 attack. This is clearly bad for a bank!? Or is it? How likely is it that someone will successfully execute this attack, which requires the attacker to have the ability to intercept and modify the data transmitted. And compare this likelihood with how likely is it that someone will need to access the bank website from an old XP (pre-SP3) machine only supporting SSLv3? The second seems more likely to me.4
Okay, you say, but won’t keeping SSLv3 around make everyone vulnerable because of downgrade attacks? If that were the case, the risk calculation would be different. But luckily, we have TLS_FALLBACK_SCSV to avoid that. TLS_FALLBACK_SCSV
ensures that modern client and browser won’t risk being fooled to downgrade its encryption.
So to wrap things up, don’t stare blindly at the rating or certification. A site with A++ is more secure than one with a C rating. But if you (or someone less fortunate) can’t access the site when they need it, it will be a pretty useless site. Personally, from now on, unless the site needs 5 absolute security, all my projects will optimise for compatibility rather than getting an A++. After all, it is much more likely someone will try using it with a Windows XP or old Smart-TV compared to someone MITM-ing that person at that moment.
Please note though, don’t read this as an argument against doing things securely as default and following best practices. Rather it is just some thoughts on this specific issue of TLS and SSL configurations. If you break with best practice, make sure you understand the reason why it’s best practice to begin with and what risks or weaknesses you introduce by not following them.
There are lots of static page generators, I personally used Hugo and there like 100 others. But I had a project where I wanted something even simpler, and had a few requirements. I wanted to
For 1, you don’t need anything other than an editor. 2 is where you need something more than HTML.
I recently came across a project that promised to do more or less exactly what I wanted, xm
But it was written in node/javascript, so I went to look for something else.1
After not finding anything similar, I decided to to do it myself in the 4th most dislike programming language, PHP.
PHP is ubiquitous on Linux servers, and it’s great at generating HTML. The downside for using it as a static page generator is… that it’s not static.
Each time you request a .php page, php will compile and interpret the code and return the output.
The first and obvious solution is to just store the output as html, and you turned it in to a static page generator. Like so
php page.php > page.html
This might get tedious though, and although you can just do a build system which does it, I got curious if it would be possible to do it “on-demand”.
And as a challenge to myself, I wanted to see if it would be possible if I could make it small enough to fit in a tweet2 and without any other dependencies than PHP.
And without further ado, I present to you,
PHP keep It Stupid Simple, in short PISS.
<?php
ob_start(
function($output) {
$t = substr(__FILE__, 0, -4) . '.html';
($f = fopen($t, 'w')) || header("HTTP/1.1 500") && exit(1);
fwrite($f, $output);
header("Location: " . substr($_SERVER['REQUEST_URI'], 0, -4 ) . ".html");
}
);
?>
Because this is a Real-Serious-Project™ it’s available on GitHub with an issue tracker and all other features that a Real-Serious-Project™ needs.
lol what?
font-family: monospace, monospace
Is not the same as
font-family: monospace
I’m so happy there are other people who figure these things out. https://stackoverflow.com/questions/38781089/font-family-monospace-monospace
subscribe via RSS