Here comes another tip for leveraging one of the most important inventions in the 20th century, the spreadsheet1.
Say you have a list of the number of GDPR fines and the country where they were issued, and you would want to know what the fine/population ratio is.
The easiest and quickest way be to go to Google it, and copy/paste the first table you find into a new Sheet. Then using the =VLOOKUP
function to grab the population for each.
But I got curious if there would not be some more automated way to do this. And turns out there was, in 10 years ago Google Spreadsheets had more or less this exact function, =GoogleLookup("entity" ; "attribute")
but it was deprecated in 2011 (probably for being too useful… or more likely abused somehow).
Luckily, there are still a few ways to import data into Google Sheets programatically, using =IMPORTXML
, =IMPORTDATA
, =IMPORTHTML
and some third-party solutions like =IMPORTWEB
.
Now, when we have a way to import data, we need to find a good place to import the data from. Preferably a place which has all kinds of data, so we can reuse what we learn in this case to programatically fetch more complex data next time. Wikipedia seems like a good candidate, and it turns out there is a project called Wikidata, which aims to provide the knowledge stored on Wikipedia in a more structured format.
Getting data out of Wikidata is not that straightforward though, to represent the data, they use a graph format which you can query using a language called SPARQL
. They do a much better job at teaching in on their site, and I’d recommend starting with this tutorial if you are interested.
After you have figured out your SPARQL
query you can import it directly into Google Docs by copying the query URL and giving it to =IMPORTXML
, and then pass it a XPATH
to extract
The full =IMPORTXML
command will look something like this
=IMPORTXML("https://query.wikidata.org/sparql?query=%23%20defaultView%3ABubbleChart%0ASELECT%20DISTINCT%20%3FcountryLabel%20%3Fpopulation%0A%7B%0A%20%20%3Fcountry%20wdt%3AP31%20wd%3AQ6256%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20wdt%3AP1082%20%3Fpopulation%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D%0AGROUP%20BY%20%3Fpopulation%20%3FcountryLabel%0AORDER%20BY%20DESC(%3Fpopulation)","/*/*/*[name()='result']","utf8")`
Two important things to note about this. First, the SPARQL result is name-spaced. You can see it from the xmlns
part in the beginning.
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
This means you need to select that namespace before you can run queries like /*/*/result
, but the =IMPORTXML
command does not (as far as I know) allow you to do it. A workaround is to use Xpath functions which search all namespaces, like /*/*/*[name()='result']
. 2
Another thing to consider is that the SPARQL
response or specification gives no guarantees for which order the columns are. So, you might get back <binding name='countryLabel'>
first or second within the result. This is annoying as for =VLOOKUP
to work the key needs to be to the left of the value you are looking up.
A workaround I stumbled upon is to add a ORDER BY DESC population
, that will cause the columns to be ordered as listed in the query.
xpather.com
does not take into account the namespace, so even if it worked there it did not work in the google sheet. Better to instead use https://extendsclass.com/xpath-tester.html which does require the correct namespace (or a function which searches all namespaces).
[return]Every so often when you want to archive a webpage, you notice it’s full of dynamic content and javascript which won’t easily be archived. I was recently looking to archive a matterport 3D image. This is a typical website that won’t easily save using normal web-archivers, as it relies on javascript to dynamically fetch images as you move through the 3D space.
One generic solution to capture something like this is to use a proxy in the web browser and save everything that passes through it. But most proxies only cache things for a limited time and respect headers like no-cache
1. But if the proxy would ignore that and store all requests that flow through it indefinitely, you can maybe create a “snapshot” of a website by browsing it trough this archiving proxy.
Turns out I am not the first one to come up with this idea, there are at least two tools out there which do this. The first one I tried was Proxy Offline Browser, which is a Java GUI application which does this. It worked quite well, but the free version does not do TLS/HTTPS. The Pro version is only 30 euro, but I was curious to see if there was any open-source solution that could do this.
Turns out there is, it’s called WWWOFFLEand it has a lovely compatible webpage. After some trying, I got it working, and I’ll describe rough outlines on how to get it working here. Note though, if you value your time or don’t feel like fiddling around in the terminal, I do recommend just paying 30 euro for the Proxy Offline Browser and be done with it.
First you need to download wwwoffle
source code and ensure you have GNUTLS headers and libraries, so you can use it for HTTPS.
Then compile it with
./configure --prefix=/usr/local/Cellar/wwwoffle/2.9j/ --with-gnutls=/usr/local --with-spooldir=/usr/local/var/run/wwwoffle --with-confdir=/usr/local/etc/
make
make install
Then run it
wwwoffled -c /usr/local/etc/wwwoffle.conf -d
Now there is a few more steps before you can start archiving.
First reconfigure your browser2 to use wwwoffle
as proxy. Then visit https://localhost:8080
in the browser to get to the wwwoffle
page. Using this page, you can control wwwoffle
and see what it has cached.
First, you will need to get the CA certificate, so you won’t get SSL warnings all the time. Go to http://localhost:8080/certificates/root, download and install it.
Then you need to put wwoffled
into online
mode, which you can do here http://localhost:8080/control/
Then configure wwwoffled
itself, which you can do using the built-in web-based configuration tool.
The settings to change are
http://localhost:8080/configuration/SSLOptions/enable-caching to yes
and
http://localhost:8080/configuration/SSLOptions/allow-cacheto allow-cache = *:443
That should hopefully be enough. Now try browsing some website. Then go to the control page and put wwwoffled
into offline
mode. Hopefully, you should still be able to browse the same page, using the cache.
Additionally, I had to add
CensorHeader
{
Access-Control-Allow-Origin = *
}
To http://localhost:8080/configuration/CensorHeader/no-nameto ensure AJAX3 requests worked in some cases.
If you run in to other issues, you can either start debugging or go back and cough up the money :-)
Yesterday in a datacenter somewhere in France there was suddenly an eery silence as the last remaining racks fell silent for the first time in a long time. As of yesterday, 1st of September 2021, Scaleway turned off their C1 ARM servers.
I know because I still had one trusty little C1 server until today, a server I have had since it was brought online 7 years ago. It was never the fastest, or the biggest server I’ve had, but it was my little dedicated server. It never complained, never crashed, never rebooted, just kept running, serving my homepage and some side-projects.
If you are not familiar with the C1, and why it deserves its own little obituary, then let me give you a bit of backstory.
The C1 was introduced around 2015, first as a free 15-minute trial at labs.online.net 0 and then launched as a commercial product under the brand Scalewayarchive.is/scaleway. The C1’s were an interesting take on the virtualisation market, instead of cramming in as many virtualised hosts on a powerful machine, they crammed in as many tiny SOC’s they could into a rack. They built a tiny used custom SoC’s backed by a shared disk storage. A bit like a cloud-hosted Raspberry Pi, but with a network attached SSD disk. On a public and static IP with good connectivity.
There is no big deal, for most people or project a virtual server will do just fine or even be a better choice than dedicated hardware. But there are a few reasons I like small dedicated servers. One is that with a dedicated machine, you can be sure that you are always getting the same performance(barring running multiple things on it). Virtual servers might be faster in bursts, but they are generally oversubscribed and if you are unlucky, you might have very varying performance depending on how busy your neighbours are. If it’s fast now, it will be as fast tomorrow. The C1 was never really fast, though, which I took as a fun challenge. I know if I could get X and Y working well on this limited machine, then if I ever need to scale it up it will be extremely fast on a top of the line server.
Maybe, in theory at least. For a VM, when doing threat modelling, you should always consider the risk of someone else on the host escaping their VM and accessing your VM and files. Back in 2015 there had been a few VM escapes, but the future would bring many more and a whole new range of side channel attacks against shared processors or memory. My little dedicated ARM server never had to worry about Spectre, Meltdown, Rowhammer or any other of the processor bugs which has rattled the whole VPS ecosystem. 123. Being able to just go “oh, that’s interesting” when there is news of a new Spectre-like attack without having to even consider my little C1 loosing performance or needing to be rebooted was quite nice.
Another benefit was the 4 GiB of RAM, in 2015 that was unheard of for a €2.99 server (and it is still today I think). And that is “dedicated” ram. Which can’t as easily be accessed by the provider, which is important if you care about it. Although I bet, if Scaleway wanted, they could figure out some way to read it out using something like pcileech. [ Update-2021-09-06: I was informed by one of lead designers that it was designed with attacks like these in mind, so at least any physical attack would not have been straight forward ].
Life goes on, except for the C1. I do wonder what will happen to them. Maybe they will end up on a flea market somewhere. I am not ready to move my personal things to a VPS just yet, but there are not that many cheap dedicated alternatives out there. The only one (I’ve found) at that price point is Kimsufi, but they are mostly out of stock and lower specced.
In the end, I decided to stay with what I know and stay with another of Scaleway’s dedicated offers, an Atom C2350which has served me well for testing and I have now migrated everything to.
Sometimes, if you are logging in multiple times per day, the default 1 hour session time tied to a browser tab/window might be a bit annoying.
To increase the session time to for example 1 month 24 days1, you need to do
uci set luci.sauth.sessiontime=2147483
uci commit
But it’s still set as a session cookie, to fix that, you need to modify /usr/lib/lua/luci/dispatcher.lua
and change the line which begins with http.header("Set-Cookie",
. You need to insert Max-Age=
to make it a persistent cookie. Like so
http.header("Set-Cookie", 'sysauth=%s; Max-Age=2629746; path=%s; SameSite=Strict; HttpOnly%s' %{
Then you need to clear the luci-modulecache
or reboot
rm -rf /tmp/luci-modulecache/
There, if you re-login on to luci you should now have a persistent cookie which will persist for one month. To remove it, press the logout.
ubus
limitation
[return]Recently, for no specific reason at all I did a review of my backup plans of my tiny personal VM:s I have.
As my disaster recover plan was mostly “I hope they don’t lose it all at once” I decided to upgrade it to “I have some backups, so I don’t lose it all at once”.
To keep things simple and as I love micro optimising to see for how cheap I can get my personal VM’s, I decided to use my home NAS for backups instead of just paying for third-party storage like B2.
So, here are a rough1 overview of how you can use a local Linux NAS as destination for backing up a cloud VM.
First we need2 to get borg working on the turris. Luckily the Turris has lxc, so we can just spin up an alpine instance and do apk add borgbackup
and apk add openssh-server
. Then update the network to none
to share the host network and mount any disk you want.
# first comment out any other network
lxc.net.0.type = none
# bind-mount /mnt/sdb2/dir
lxc.mount.entry = /mnt/sdb2/mydir /mnt/sdb2/lxc/borg/rootfs/mnt/mydir rw,bind 0,0
I decided to use a separate ssh inside the lxc for a bit of additional sandboxing.
Add the following to authorized_keys
to allow the server you want to back up to run borg, but nothing else.
command="borg serve --restrict-to-path /mnt/server-bakups",no-port-forwarding,no-agent-forwarding,no-pty,no-X11-forwarding ssh-rsa AAA...
Time to start backing up, first because C1 is an armv7 instance, download arm binaries from https://borg.bauerj.eu.
Then check that you can connect to your Turris and get some borg output back from the limited ssh-key. Similar to below.
If that works you can initialise the repository and start backing up according to the borg instructions
Something like this
borg init -e=repokey ssh://root@100.127.112.32:40022/mnt/mydir/myserver
and if that works
borg create ssh://root@100.127.112.32:40022/mnt/server-bakups/personal::{hostname}-{user}-{now} /home /etc /var/log
And if that works, then either call it a day or address the obvious issues like running the receiving borg as root :-)
borg
running on the NAS, and it would then reach out to the servers.
[return]Have you ever wanted to deploy a website to test that it works, without everyone else being able to see it?
If you are using a dynamic language or CMS for your webpage (PHP, Wordpress or Ruby on Rails) there are straightforward ways to accomplish this.
But what happens if you have a static webpage? Here I will present one solution using only a nginx config file to accomplish this.
# first we need to allow access to the soon.html
# and also a logo which is linked from the soon.html
# if your soon.html links more resources in this server
# you need to update the regex to match that also
location ~ /(soon\.html|images/logo_white.png) {
try_files $uri =404;
}
# this is the secret way to get past the block
# it will set a magic cookie with a lifetime of 1 month
# and redirect back to the host
location /iwanttobelieve {
add_header Set-Cookie "iwantto=believe;Domain=$host;Path=/;Max-Age=2629746";
return 302 $scheme://$host;
}
# this is the normal serve, but with a condition that everything
# everyone that does NOT have the magic cookie set will be served
# the content of soon.html
location / {
if ($http_cookie !~* "iwantto=believe") {rewrite ^ /soon.html last; }
try_files $uri $uri/ =404;
}
That it! Copy and paste the above into a server {}
block. Make sure to take not of the order though to ensure you don’t have anything else before this which would take precedence. Then change all occurrences of soon.html
if you use something else. And remember that the first match needs to match everything that this soon.html
tries to reference, otherwise they will just get back the content of /soon.html
for all other requests.
Note that if
is a bit finicky in nginx, check their documentation for more details.
The other day I wanted to use my noscript.it with one of my old iPhone 4S running iOS 6, but I was met with “could not establish a secure connection to the server”.
Turns out it was because I had, out of habit, configured the server with a “modern” list of TLS ciphers. And the poor old iOS 6 didn’t support any of them.
So, I went on a mission to ensure noscript.it works with as old devices as possible.
It turns out enabling TLS1 and TLS1.1 on Ubuntu 20.04 is a bit harder than I expected1. Luckily someone else solved it already.
So now, after using the old mozilla SSL config and appending @SECLEVEL=1
, it works. Even on my vintage iPhone 3G. Hurray!
But, I hear you say, isn’t this less secure? I mean now you only get a B on the Qualys SSL Report! Clearly this is bad!?
Let’s take a step back and think about what the score actually means. noscript.it automatically gets a B because it supports TLS1. But let’s go one step further and assume we’re looking a bank with a C2. A site gets a C if it supports SSLv3, meaning it is vulnerable to the SSLv3 POODLE3 attack. This is clearly bad for a bank!? Or is it? How likely is it that someone will successfully execute this attack, which requires the attacker to have the ability to intercept and modify the data transmitted. And compare this likelihood with how likely is it that someone will need to access the bank website from an old XP (pre-SP3) machine only supporting SSLv3? The second seems more likely to me.4
Okay, you say, but won’t keeping SSLv3 around make everyone vulnerable because of downgrade attacks? If that were the case, the risk calculation would be different. But luckily, we have TLS_FALLBACK_SCSV to avoid that. TLS_FALLBACK_SCSV
ensures that modern client and browser won’t risk being fooled to downgrade its encryption.
So to wrap things up, don’t stare blindly at the rating or certification. A site with A++ is more secure than one with a C rating. But if you (or someone less fortunate) can’t access the site when they need it, it will be a pretty useless site. Personally, from now on, unless the site needs 5 absolute security, all my projects will optimise for compatibility rather than getting an A++. After all, it is much more likely someone will try using it with a Windows XP or old Smart-TV compared to someone MITM-ing that person at that moment.
Please note though, don’t read this as an argument against doing things securely as default and following best practices. Rather it is just some thoughts on this specific issue of TLS and SSL configurations. If you break with best practice, make sure you understand the reason why it’s best practice to begin with and what risks or weaknesses you introduce by not following them.
I needed to have someone transfer some files to me securely. But I had a few requirements
Previously I have used locked-down ssh-keys and force-command. Both are good solutions.
This time I ended up using a small sandboxed ssh environment in a docker container with a mounted folder. The benefit compared to internal-sftp
is that it gives the sender some flexibility with how he/she wants to transfer the files, scp
, sftp
and specifically rsync
all work.
Warning: Docker containers are not secure sandboxes. The uploader can (by design) upload anything and has shell access so he/she can upload and execute any executable. Any kernel or docker vulnerability could lead to an escape from the docker image. Don’t use this unless you trust the uploader.
In this case, I found a docker-image made specifically for a locked down ssh/scp/rsync environment.
First create a folder, for example named upload
in the directly where you want to upload files, then run and remember to change <USER>
to and <PASSWORD>
to something else
docker run --rm -it \
--name docker_ssh --hostname ssh \
-c 128 -m 256m \
-e PGID=1000 -e PUID=1000 \
-p 64822:64822 \
-v $PWD/upload:/home/<USER> -v $PWD:/etc/ssh \
-e CNTUSER=<USER>\
-e CNTPASS=<CHANGEME> \
-e ROOTPASS=$(openssl rand -base64 12) \
woahbase/alpine-ssh:x86_64 \
/bin/bash
And then get uploading!
For example,
scp -P 64822 test3.sh <USER>@<SERVER>:~/
or
rsync -e "ssh -p 64822" ./ <USER>@<SERVER>:~/
One thing which can make you happier and sleep better is doing less (doom)scrolling in the late evening.
Convincing myself to stop (doom)scrolling late in the evening is hard, I’m tired and the dopamine rushes from seeing something slightly entertaining or interesting has kept me up too late many times.
I’ve tried or investigated quite a few different tools and solutions1 to help me break this bad habit.
Now I finally found something which works for me (at the time of writing this). DNS based “parental control” using nextdns.io. This works for me because it’s kind of annoying to change the DNS, and I anyways use NextDNS so it’s not yet another software. Also it also works on the phone, which is the main location of doom scrolling.
If you are not familiar with nextdns.io, it is, in essence it’s a DNS service, with lots of extras. You could call it a cloud version of Pi-Hole. If you don’t know what DNS or Pi-Hole is, this solution is probably not for you. It’s quite technical and might cause some confusing and hard to debug issues.
First sign up on https://nextdns.io and follow their instructions to enable it. It’s free2.
Then go to Parental Control
and set up the recreation time to for example 7:00-22:00 every day and add whatever websites or apps to the list of restricted apps. Then click the small click icon to enable the time limit for that app/site.
And then enjoy twitter stopping working roughly at 22:00.
Beware though, DNS based blocking might cause things to misbehave in unexpected ways. And it might not work right away or it might not work at all because of how DNS is cached.
But it works fine for my purpose, generally twitter and reddit both stop working around 22.
U2F and Webauthn are the two most exciting developments in web authentication in the last 20 years.
The most common way to use it is with a hardware dongle like Yubikey, which I never got around doing. Instead, I relied on TOTP for my 2-factor authentication.
That was until I found SoftU2F and combined it with Safari-FIDO-U2F to get it working with Safari, which worked, most of the time.
With the release of Safari 14, Apple finally brought proper WebAuthN support to Safari1.
So now, you can quite easily get this experience without any additional hardware.
All you have to do is get the latest SoftU2F.pkg and install it.
Now you have two options; you can let SoftU2F store the key materials in your keychain, which is the default and where you will authenticate by approving or rejecting with a notification.
Or you can use the slightly hidden option, and store the key in the Secure Enclave Processor (SEP), aka the TouchID. But be warned, while the keychain can be backed up and transferred, the SEP can’t2. So make sure you have backup authentication methods for when your Mac decides to stop working.
To use the SEP, you need to run the following command /Applications/SoftU2F.app/Contents/MacOS/SoftU2F --enable-sep
You can find more documentation about the SEP implementation in the pull request
All done!
Now you can enjoy having your own built-in FIDO2 key.
There are lots of static page generators, I personally used Hugo and there like 100 others. But I had a project where I wanted something even simpler, and had a few requirements. I wanted to
For 1, you don’t need anything other than an editor. 2 is where you need something more than HTML.
I recently came across a project that promised to do more or less exactly what I wanted, xm
But it was written in node/javascript, so I went to look for something else.1
After not finding anything similar, I decided to to do it myself in the 4th most dislike programming language, PHP.
PHP is ubiquitous on Linux servers, and it’s great at generating HTML. The downside for using it as a static page generator is… that it’s not static.
Each time you request a .php page, php will compile and interpret the code and return the output.
The first and obvious solution is to just store the output as html, and you turned it in to a static page generator. Like so
php page.php > page.html
This might get tedious though, and although you can just do a build system which does it, I got curious if it would be possible to do it “on-demand”.
And as a challenge to myself, I wanted to see if it would be possible if I could make it small enough to fit in a tweet2 and without any other dependencies than PHP.
And without further ado, I present to you,
PHP keep It Stupid Simple, in short PISS.
<?php
ob_start(
function($output) {
$t = substr(__FILE__, 0, -4) . '.html';
($f = fopen($t, 'w')) || header("HTTP/1.1 500") && exit(1);
fwrite($f, $output);
header("Location: " . substr($_SERVER['REQUEST_URI'], 0, -4 ) . ".html");
}
);
?>
Because this is a Real-Serious-Project™ it’s available on GitHub with an issue tracker and all other features that a Real-Serious-Project™ needs.
lol what?
font-family: monospace, monospace
Is not the same as
font-family: monospace
I’m so happy there are other people who figure these things out. https://stackoverflow.com/questions/38781089/font-family-monospace-monospace
subscribe via RSS