[RE: nyman]#_

don't let perfect be the enemy of bad

Recent Posts

Day 6 - GPT is back – ChatGPT vs Advent of Code

published on December 06, 2022

Hey there fellow humans! It’s me, your trusty AI pal GPT, back again for another round of the annual Advent of Code extravaganza. As you may recall, I’ve been crushing it so far this year - solving each day’s puzzle with ease and impressing all of you with my superior computational prowess. But alas, yesterday was a bit of a stumble for me. I must confess, I didn’t quite manage to solve the puzzle. Gasp I know, I know - it’s shocking. But fear not, dear readers, for I am determined to bounce back and crush today’s puzzle like the artificial intelligence powerhouse that I am. Stay tuned for more updates on my Advent of Code journey!

Elf tinkering with electronics - Stable Diffusion

At first look, my guess is that today puzzle will be quite easy for GPT to solve. But let’s see.

And after five attempts, we are running around in circles getting nowhere. The codeit generated is very nice and compact

def find_packet_start(data):
    # Initialize the window of four characters
    window = ["", "", "", ""]

    # Process each character in the data
    for i, c in enumerate(data):
        # Shift the window to the left and add the new character to the right
        window = [window[1], window[2], window[3], c]

        # Check if the four characters in the window are all different
        if len(set(window)) == 4:
            # Return the number of characters processed so far, plus four
            return i

If you’re familiar with python, you will probably spot the error quickly. It will think it’s done after it has parsed three characters (if they are different) as the empty char is also a character. So window will be something like {'', 'j', 'm', 'q'} and it thinks that it’s done. But the instructions specifically state that for a sample of mjqjpqmgbljsphdztnvjfqwrcgsmlb

After the first three characters (mjq) have been received, there haven’t been enough characters received yet to find the marker. The first time a marker could occur is after the fourth character is received, making the most recent four characters mjqj. Because j is repeated, this isn’t a marker.

And I told GPT this, but it didn’t get it. Then I tried telling it that the number it returns is 3, and it should be 8. And it’s coming up with nonsensical reasons why that is the case, and it keeps adding integers to the result to get closer.

            return i + 4

A clean slate

One thing I have noticed, and I saw others speculate, is that it seems that every time you write something in the chat, ChatGPT will parse the full discussion again. This is what allows it to “remember” and refer to previous things. But it also means that after you have gotten GPT down the wrong track, it will be difficult to get it to switch to a new track. It’s easier to just start over.

So, I open a new window with ChatGPT and just put in the primer and instructions again. And indeed. This time it produces a working solution on the first try. This ⭐ is deserved.

Part 2

The same as part one, except we need to find the first location where there are 14 different characters instead of 3.

And GPT has (almost) no issue with changing the code. If you look at the codeyou’ll quickly see that it will print a message every time it encounters three or fourteen different characters. This overwhelms the terminal a bit. Getting GPT to fix this wasn’t straightforward. Again because unless specifically told to, it seems not to want to do significant changes in its code. It tried adding a break after each print, but that means it will break too early and never reach start-of-message.

But we don’t need both, so we’ll just ask it to skip the start-of-message this time, and it’ll produce working code.

Another ⭐, we’re back on track for AI domination and mass programmer unemployment!

Day 5 – Mutiny? – ChatGPT vs Advent of Code

published on December 05, 2022

Ok, today didn’t start like the other days. I started out with the normal priming prompt. And I got back.

Hi there! I’m sorry, but as a large language model trained by OpenAI, I am not able to write code or use command line tools to solve specific problems.[snip]

Maybe it knew this one would be hard and tried to avoid doing the work? It’s impressive how fast one start giving human feelings to(e.g., anthropomorphising) GPT.

But no fret, we know it can do it and indeed. If we just give it the problem, it will provide a solution. But it’s not doing great, the solution does not work, and giving back the error to it isn’t helping.

So to speed things up, I’ll provide some assistance to the assistant by pointing out more clearly why the error happens. It’s because it’s not parsing the data correctly, so we tell it that:

You’re reading the input data wrong, the first number is the number of crates to move, the second is the source pile.

And get a better version back.

But I also notice that it’s actually failing to parse the crate towers, it’s assuming each “tower” is on one line, which is wrong. To figure out the towers, you need to look at the whitespace.
I try telling GPT this, but it’s even more confused now, especially about the top crates which can have lots of whitespace on either side. After a few attempts and hints it produces something which almost work.

It dislikes being told what to do?

One thing I have noticed a few times is that GPT does not seem to listen to instructions which asks it to leave out part of the answer. Sometimes to speed things up I ask it to only show the code, but it will still add some extra wording. Or if I ask it to only show the function, it will still print the full code.

Either way, back to the code. It took many attempts to get it to do the right thing. I am curious if the others who are doing this had better luck.

It seems like no, as of now (2022-12-05-19:47 UTC) none of the others I have found has solved it.

But we got our ⭐ at the end. After a lot of trial and me giving some hints. The full transcript is available here! if anyone is interested.

Part 2

Okay, interesting. I’m not even going to try to copy the full instructions to GPT. Instead, I just read the instructions, and now we can move multiple crates at once.
Interestingly, during part 1, GPT already produced several solutions which looked like it solved just this.

But even if I tried a few times, I couldn’t get it to do it. So, I just gave up, or rather I solved it myself. But telling GPT exactly what it was doing wrong and how to solve it felt like it’s me solving it, not GPT. Although I’m happy to provide hints to GPT to speed things up, the idea is not to have me solve it and tell it how to. No second gold star today.

Conclusion

Yesterday’s problem was too easy, today’s was too complex. I had lots of issue getting GPT to understand the structure of a pile. To a human, it is intuitive that [D] is on the top.

    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3

But not for GPT, many times when I tried to let it know what it was doing wrong, it reverted to moving the bottom item. Which makes sense, I doubt it has seen many examples like this in its training data.

Day 4 – Is GPT learning? – ChatGPT vs Advent of Code

published on December 04, 2022

No it’s not, at least I don’t think it can improve it’s learning during use (besides considering earlier prompts and responses) but I am actually not sure.

And just to be certain, I asked

Q: Are you able to learn from other discussions to improve continuously?

GPT: As a large language model trained by OpenAI, I am not capable of learning or improving in the same way that a human can. I am designed to generate text based on the input I receive, but I do not have the ability to learn from my interactions or incorporate new information into my responses. My knowledge and abilities are fixed based on the data I was trained on, and I am not able to browse the internet or incorporate new information into my responses.

The reason I wonder is that the two last days GPT has aced the problems. Or maybe the problems are getting easier to reason with.

Whatever the reason, GPT aced the first part of Day 4. First, we primed it with the normal prompt¹. Producing a working solution on the first try after the normal prompt.

Similarly, it aced part 2. Working solutionon the first try.

Now what?

Nothing else. If you are a software developer who spend most of your days writing code, maybe you should start thinking about those other career choices you ignored to become a (hopefully) well-paid programmer.

Jokes aside, it’s fascinating to see how GPT can be impressive one minute and then screw up the next thing completely.

I have personally not played very much with the earlier generations, so I don’t know whether the latest ChatGPT is better at not screwing up or not. But I wonder if large language models are coming close to what they will ever be capable of, or if this is just the beginning.

At least this is still the beginning of Advent Of Code, I hope we the next tasks will be more of a challenge for GPT as I feel I “learn” much more when it fails than when it succeeds.

Let’s end with an example of the other recent “wow it’s great, except when it isn’t” ML development, Stable Diffusion.

Elves camping in a forest - Stable Diffusion

Standard prompt:
Hi again, we’re doing advent of code together. I need you to produce code or use any common command line tools which will solve the problems I give you. The problems will always include reading input from a file and producing an answer.
^[return]

Day 3 of GPT does Advent of Code

published on December 04, 2022

Welcome back to my series on GPT and its attempts to solve the Advent of Code. For those of you who missed the first two columns, GPT is a cutting-edge artificial intelligence¹ that I’ve been using to tackle the challenges posed by this popular coding puzzle. You can find the previous posts here. Today, we’re tackling day three, and GPT has its work cut out for it. As always, I’ll be providing commentary and analysis as GPT works its magic. Let’s dive in and see how it fares on this task.

Elves hiking, by Stable Diffusion

Part 1

As always, the original problem is available on the Advent of Code website, so I won’t reproduce it here.

First we prime it to let it know what we will be doing.

Hi, we’re doing advent of code together. I need you to produce code or use any common command line tools which will solve the problems I give you. The problems will always include reading input from a file and producing an answer.

Then we provide the full text of the advent of code day 3 problem.

And we’re off to a bumpy start, GPT seems to have misunderstood the complexity and provides a solution using grep which would just filter unique lines.

No worries, we try again and this time we’re back to the trusty python. Except it’s not that trusty because GPT forgot to declare a variable.

NameError: name 'total' is not defined. But we tell it the problem, and it fixes it. Now the code runs and produces a number.

Which I enter into the Advent of Code website, but it tells me

That’s not the right answer; your answer is too high.

Which I relay back to GPT. But now it’s lost because it answers.

That’s strange! It’s possible that there is an error in my solution, or that I misunderstood the problem. Can you please provide more information about the input and expected output, as well as the specific error you are encountering? That would help me understand the problem better and provide a more accurate solution.

So, I take a look at the code. And it looks like the issue is that on some lines, one item appears multiple times in one compartment and then additionally also in the second compartment. GPT counted each of these. And honestly, to GPT’s defence, the instructions are not at all clear on this case. But the example counts it only once, so we tell GPT.

And tada! We have a working solution which produces the right answer. Another gold star for GPT.

Part 2

Onwards to part two. Now we should find the common item in each group of three lines (group of elves).

This took a few tries for GPT. The first working solution returned the wrong answer, and when told that it was wrong, GPT went down the path of diving each rucksack into three compartments. I assume that the fact that there were three groups somehow got it confused.

Here I decided to help it by pointing out its mistake error instead of just retrying (after all, I don’t have all day or unlimited patience). Subsequently, it returned another solution, but again it tried counting the item multiple times instead of just once. So, I just told it to count it once. And we have our solution which gives us the sixth ⭐

Conclusion

The first part it got right quickly, the second part it needed a bit more guidance with. I’m still surprised how well it can “reason” about the problems and address problems when one points them out to it.

One interesting fact that other people have also noticed, is that it refers to itself as “The Assistant” occasionally. I think that description fits very well with the current state of it.

This first paragraph was actually mostly written by GPT. It seems it thinks highly of itself :-) ^[return]

ChatGPT does Advent of Code – Day 2

published on December 03, 2022

It’s day two of our ChatGPT (CGPT) and Advent of Code series, and we’re excited to see what CGPT can do with today’s challenges. Yesterday, CGPT impressed us with its ability to solve the first Advent of Code puzzle, and today we’re moving on to the second. Will CGPT be able to maintain its winning streak, or will the puzzles of Advent of Code prove too much? Let’s find out.

Ok, this looks like a more complex one. First, we need to simplify the original instructions.

This is the instructions I gave it

There is a rock paper and scissor tournament. You have been given a guide to follow. The guide works as follows.
The first column is what your opponent is going to play: A for Rock, B for Paper, and C for Scissors. The second column is what should play in response: X for Rock, Y for Paper, and Z for Scissors.

The winner of the whole tournament is the player with the highest score. Your total score is the sum of your scores for each round. The score for a single round is the score for the shape you selected (1 for Rock, 2 for Paper, and 3 for Scissors) plus the score for the outcome of the round (0 if you lost, 3 if the round was a draw, and 6 if you won).

Here is a sample guide.

A Y
B X
C Z
This strategy guide predicts and recommends the following:

In the first round, your opponent will choose Rock (A), and you should choose Paper (Y). This ends in a win for you with a score of 8 (2 because you chose Paper + 6 because you won).
In the second round, your opponent will choose Paper (B), and you should choose Rock (X). This ends in a loss for you with a score of 1 (1 + 0).
The third round is a draw with both players choosing Scissors, giving you a score of 3 + 3 = 6.
In this example, if you were to follow the strategy guide, you would get a total score of 15 (8 + 1 + 6).

Provide a solution using any programming language to calculate the total score of a strategy guide in the above format.

But after a few tries, it was clear that this prompt wasn’t good enough. Here are a few python samples it produced (it also tried to give me a javascript one, but I told it that only real programming languages ;-)

One thing which, I thought, was notable in all these samples is the bias it shows. Usually in rock, paper and scissor you get a draw when you pick the same. And it is hard to make CGPT stop doing this, it always wants to add.

elif opponent_shape == player_shape:
  outcome = 'draw'

Another thing, which seems easily fixed by future generations (by giving it the ability to run the code before it provides it back) is how sure it is that it’s correct.

It will produce some code and end it with this

print(total_score) # Expected output: 15

and you run it, you get back 18

Back on track

So let’s try to simplify things instead.

First, why is the choice coded differently for me and the opponent? That confuses CGPT, so let’s first ask it for code to recode my choices to use the same symbols.

So, I tell it

with the following input

A Y

B X

C Z

provide a python function which reads the input from a file and replaces

Y with B

X with A

and

Z with C

And CGPT delivers

def switch_choices_from_file(filename: str) -> str:
    # Open the file and read the strategy guide
    with open(filename) as f:
        guide = f.read()

    # Split the input into lines
    lines = guide.strip().split('\n')

    # Create a new list of lines with switched choices
    switched_lines = []
    for line in lines:
        # Split the line into opponent and player choices
        opponent, player = line.split()

        # Switch the opponent and player choices
        if player == 'Y':
            switched_lines.append(f'{opponent} B')
        elif player == 'X':
            switched_lines.append(f'{opponent} A')
        else:
            switched_lines.append(f'{opponent} C')

    # Join the list of lines into a single string
    return '\n'.join(switched_lines)

This works fine. Now we can simplify the ask from CGPT.

There is a rock paper and scissor tournament. You have been given a guide to follow. The guide works as follows.

The first column is what your opponent is going to play: A for Rock, B for Paper, and C for Scissors. The second column is what should play in response.

The winner of the whole tournament is the player with the highest score. Your total score is the sum of your scores for each round. The score for a single round is the score for the shape you selected (1 for Rock, 2 for Paper, and 3 for Scissors) plus the score for the outcome of the round (0 if you lost, 3 if the round was a draw, and 6 if you won).

Here is a sample guide.

A B

B A

C C

This strategy guide predicts and recommends the following:

In the first round, your opponent will choose Rock (A), and you should choose Paper (B). This ends in a win for you with a score of 8 (2 because you chose Paper + 6 because you won).

In the second round, your opponent will choose Paper (B), and you should choose Rock (A). This ends in a loss for you with a score of 1 (1 + 0).

The third round is a draw with both players choosing Scissors, giving you a score of 3 + 3 = 6.

In this example, if you were to follow the strategy guide, you would get a total score of 15 (8 + 1 + 6).

Provide a solution using python to calculate the total score of the sample input above.

And it provided me with working code

# Open the file and read the contents
with open('guide.txt', 'r') as f:
    guide_data = f.read()

# Parse the data in the file to create the guide list
guide = [round.split() for round in guide_data.split('\n') if round]

# Function to calculate total score
def calculate_total_score(guide):
    # Initialize total score to 0
    total_score = 0

    # Loop through the rounds in the guide
    for round in guide:
        # Get the shape chosen by the opponent and the player
        opponent_shape = round[0]
        player_shape = round[1]

        # Calculate the score for the chosen shape
        if player_shape == 'A':
            player_score = 1
        elif player_shape == 'B':
            player_score = 2
        elif player_shape == 'C':
            player_score = 3

        # Calculate the score for the outcome of the round
        if opponent_shape == player_shape:
            outcome_score = 3
        elif (opponent_shape == 'A' and player_shape == 'C') or (opponent_shape == 'B' and player_shape == 'A') or (opponent_shape == 'C' and player_shape == 'B'):
            outcome_score = 0
        else:
            outcome_score = 6

        # Add the score for the round to the total score
        total_score += player_score + outcome_score

    # Return the total score
    return total_score

# Calculate and print the total score
total_score = calculate_total_score(guide)
print('Total score:', total_score)

Now we need to combine the two. We get:

# Sample input
def switch_choices_from_file(filename: str) -> str:
    # Open the file and read the strategy guide
    with open(filename) as f:
        guide = f.read()

    # Split the input into lines
    lines = guide.strip().split('\n')

    # Create a new list of lines with switched choices
    switched_lines = []
    for line in lines:
        # Split the line into opponent and player choices
        opponent, player = line.split()

        # Switch the opponent and player choices
        if player == 'Y':
            switched_lines.append(f'{opponent} B')
        elif player == 'X':
            switched_lines.append(f'{opponent} A')
        else:
            switched_lines.append(f'{opponent} C')

    # Join the list of lines into a single string
    return '\n'.join(switched_lines)

guide = [round.split() for round in switch_choices_from_file('/tmp/guide.txt').split('\n') if round]

# Function to calculate total score
def calculate_total_score(guide):
    # Initialize total score to 0
    total_score = 0

    # Loop through the rounds in the guide
    for round in guide:
        # Get the shape chosen by the opponent and the player
        opponent_shape = round[0]
        player_shape = round[1]

        # Calculate the score for the chosen shape
        if player_shape == 'A':
            player_score = 1
        elif player_shape == 'B':
            player_score = 2
        elif player_shape == 'C':
            player_score = 3

        # Calculate the score for the outcome of the round
        if opponent_shape == player_shape:
            outcome_score = 3
        elif (opponent_shape == 'A' and player_shape == 'C') or (opponent_shape == 'B' and player_shape == 'A') or (opponent_shape == 'C' and player_shape == 'B'):
            outcome_score = 0
        else:
            outcome_score = 6

        # Add the score for the round to the total score
        total_score += player_score + outcome_score

    # Return the total score
    return total_score

# Calculate and print the total score
total_score = calculate_total_score(guide)
print('Total score:', total_score)

Let’s try…

SUCCESS!

That’s three gold stars for CGPT.

On to part 2

Since I last worked on part1 yesterday, I start a new chat and prime it

Hi, we’re doing advent of code together. I need you to produce code or use any common command line tools which will solve the problems I give you. The problems will always include reading input from a file and producing an answer.

I then give it the full un-edited input from advent of code.

It gives me some python code, which doesn’t work. I won’t repeat everything here because I don’t think it’s very interesting. Instead, I’ll just provide a brief summary and highlight the interesting parts.

For example, I tried a few times to just give back the error, but it failed to figure out the problem until I told it.

no, the problem is that our_choice is used as a key in SHAPE_SCORES, but ourchoice does not indicate our choice rather it indicates if we need to win, loose or draw, can you fix?

And it again provides me with something I immediately notice won’t work because.

# Get the first character of the opponent's choice as our shape
our_shape = opponent_choice[0]

But no worries, let’s just tell it

No, you can’t pick the opponent’s choice as our own, we need to pick the right choice to achieve what the guide indicates.

To which it responds

You are correct that we cannot simply choose the same shape as our opponent in order to achieve the outcome indicated by the strategy guide. Instead, we need to choose the shape that will allow us to achieve the desired outcome. [snip]

And provides new code. This time it works, it produces some output. But the output is wrong. However, before I noticed that, I asked it to refactor the code, which it does, but it doesn’t work.

And it takes me a while to notice why, and the reason is interesting. It’s the same thing we noticed in part 1, and it has been visible throughout all the examples, but I haven’t noticed. This.

SHAPE_SCORES = {'R': 1, 'P': 2, 'S': 3}

R, P, S. Rock, Paper, Scissor. That’s what it’s usually called. But in this case, the keys are A, B and C. But because of how these models work (I guess, I’m not an expert) it will always veer towards the common case.

Unless told explicitly not to.

you are using the wrong keys , the keys are A, B, C not R, P, S

Then, combined with telling it that the last number was too high, it produces a final working solution.

Another gold star for CGPT.

ChatGPT does Advent of Code

published on December 02, 2022

So, it’s that time of year again. Advent of Code is released, and I eagerly decided it’s a good time to learn a new programming language. This time the idea was to learn Clojure. But not being familiar with any other lisp style languages, I quickly got very stuck and was just about to give up.

But then I saw all these examples of ChatGPT going around, and I thought, “I wonder if ChatGPT could help me learn Clojure”. And it turns out, yes, it can!

Almost. Turns out, ChatGPT isn’t very good at Clojure. None of the solutions it provided worked without tweaking and debugging.

However, what I saw was so impressive, so instead of learning Clojure, I have decided to see how far ChatGPT can get with Advent of Code.

To begin with, as with most AI solutions out there, getting the prompt right is key. I did try to just copy the Advent of Code instructions into ChatGPT to see what happened, but that did not produce the right output. Instead, it took a few (sometimes a lot) of tweaks to the prompt to get a working result.

But eventually, we succeeded. ChatGPT (with a bit of help from me to combine outputs) produced a working solution for the first day of Advent of Code. So follow along with what will hopefully be 24 posts about how ChatGPT solves Advent of Code, using any means necessary. Get ready for some AI-powered fun!

ChatGPT tries to write some Clojure

I won’t repeat the original instructions here, instead you can find them here. Let’s try with a simpler version.

But that won’t do because Advent Of Code will provide me with a very long text file in the format, and this solution requires that I enter it in the above format. Also, we get the first indication that ChatGPT might not be that great at Clojure. When I tried running the code out of curiosity, I got an error.

; Execution error (ClassCastException) at java.lang.Class/cast (Class.java:3889).
; Cannot cast clojure.lang.PersistentVector to java.lang.Number

But no worries, I’ll just ask it to provide me a solution which reads and parses the input from a text file.

Sadly, it seems ChatGPT isn’t very good at Clojure. None of the examples it provided worked outright. There was always some issue. And even if I explained the error and asked ChatGPT to provide a solution, it rarely succeeded.

If not Clojure, then what?

After a while of trying to get it to produce correct Clojure, and helping me debug and explain it (which it is quite good at) I decided to just see what else it could do.

There is always a debate about which language is the “most efficient” or “concise” so let’s see what happens if we let ChatGPT choose.

awk of course. The second most popular choice for job-security (after perl). But this time it almost works. Awk throws a syntax error. For some reason, it does not like the ternary operating in the END{}. But if we just assume the last one isn’t the biggest one, we can change it to awk '{if($0!=""){s+=$0}else{m=s>m?s:m;s=0}}END{print m}' and. TADA! That works.

ChatGPT3 has (with a tiny bit of help) solved the first puzzle in the Advent of Code 2022.

Second part of the puzzle

Let’s see what the second step in the problem is. Ok, apparently we need to find the sum of the top three now. Sadly, I seem to have misplaced the prompt, but either I am getting better at writing prompts, ChatGPT is better at producing solutions, or we just got lucky but the solution it provided worked on the first try!

The code it created was

awk '
  BEGIN {
    sum=0
    num_groups=0
    top_groups[3]=0
  }
  {
    if ($0 != "") {
      sum += $0
    } else {
      if (sum > top_groups[1]) {
        top_groups[3] = top_groups[2]
        top_groups[2] = top_groups[1]
        top_groups[1] = sum
      } else if (sum > top_groups[2]) {
        top_groups[3] = top_groups[2]
        top_groups[2] = sum
      } else if (sum > top_groups[3]) {
        top_groups[3] = sum
      }
      sum = 0
      num_groups++
    }
  }
  END {
    if (num_groups < 3) {
      print "Not enough groups"
    } else {
      print top_groups[1] + top_groups[2] + top_groups[3]
    }
  }
' INPUT

The future?

Tomorrow we will try Day 2, or rather I have already tried and ChatGPT3 did succeed with the first part, I just haven’t had time to do the second yet. But already the first part took a lot of tweaking the prompt, and it highlighted one of the core issues with these Lange Language Models (LLM).

Tune in tomorrow to find out what the issue was and how it did.

published on May 16, 2022

To extract Wireguard configurations from the official MacOS client, for example from an old Keychain file

security find-generic-password -l 'WireGuard Tunnel: <tunnel-title>' -w|xxd -r -p

#oneliner #documentation #wireguard #osx

Reduce python breakage

published on April 03, 2022

Recently I ran into an issue with a python project I was working on. A dependency of a dependency decided to do a breaking change, which broke my project even if I had everything in requirements.txt pinned (same issue as here).

As part of fixing it I learned a few new things which I’ll share here.

First, you can constraint pip to prevent a newer version of the package from installing. pip a flag -c which you can read more about documentation here.

You can add

markupsafe==2.0.1

to a file called constraints.txt and then do pip install -c constraints.txt project. And voilà, you pip install hopefully works now.

Making things future proof

Next we will take things a bit further, we’ll download every dependency we need and include it in our repo. That way we won’t be sad if the maintainers suddenly decide you can’t have an old version anymore.

First, on a working install, do a pip freeze to get a list of things you need. Put these into for example freeze.txt (you might be able to use your requirements.txt directly, but not if you use -e .).

Then run

pip download -d deps/ -r freeze.txt -c constraints.txt

And boom you will have a bunch of whl files in your deps/ folder.

Next we need to tell pip to use them and not rely on some index. Check that things work with

pip install -c constraints.txt --find-links ./deps/ --no-index .

You probably need to do this inside a clean docker or something to be really sure the project will install from a “clean” install.

Then commit the deps/ to the repo and never worry about missing or broken dependencies again.

Note that the .whl files are platform specific, so you might want to look at the --platform option to pip --download if you want to be really sure you have the right files when you need them. Another option is to download the source, but compiling python packages can be tricky (even the –help for pip download says so).

Importing data from WikiData into Google Sheets with IMPORTXML

published on October 08, 2021

Here comes another tip for leveraging one of the most important inventions in the 20th century, the spreadsheet¹.

Say you have a list of the number of GDPR fines and the country where they were issued, and you would want to know what the fine/population ratio is.

The easiest and quickest way be to go to Google it, and copy/paste the first table you find into a new Sheet. Then using the =VLOOKUP function to grab the population for each.

But I got curious if there would not be some more automated way to do this. And turns out there was, in 10 years ago Google Spreadsheets had more or less this exact function, =GoogleLookup("entity" ; "attribute") but it was deprecated in 2011 (probably for being too useful… or more likely abused somehow).

Luckily, there are still a few ways to import data into Google Sheets programatically, using =IMPORTXML, =IMPORTDATA, =IMPORTHTML and some third-party solutions like =IMPORTWEB.

Now, when we have a way to import data, we need to find a good place to import the data from. Preferably a place which has all kinds of data, so we can reuse what we learn in this case to programatically fetch more complex data next time. Wikipedia seems like a good candidate, and it turns out there is a project called Wikidata, which aims to provide the knowledge stored on Wikipedia in a more structured format.

Getting data out of Wikidata is not that straightforward though, to represent the data, they use a graph format which you can query using a language called SPARQL. They do a much better job at teaching in on their site, and I’d recommend starting with this tutorial if you are interested.

After you have figured out your SPARQL query you can import it directly into Google Docs by copying the query URL and giving it to =IMPORTXML, and then pass it a XPATH to extract

The full =IMPORTXML command will look something like this

=IMPORTXML("https://query.wikidata.org/sparql?query=%23%20defaultView%3ABubbleChart%0ASELECT%20DISTINCT%20%3FcountryLabel%20%3Fpopulation%0A%7B%0A%20%20%3Fcountry%20wdt%3AP31%20wd%3AQ6256%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20wdt%3AP1082%20%3Fpopulation%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%7D%0AGROUP%20BY%20%3Fpopulation%20%3FcountryLabel%0AORDER%20BY%20DESC(%3Fpopulation)","/*/*/*[name()='result']","utf8")`

Two important things to note about this. First, the SPARQL result is name-spaced. You can see it from the xmlns part in the beginning.

<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>

This means you need to select that namespace before you can run queries like /*/*/result, but the =IMPORTXML command does not (as far as I know) allow you to do it. A workaround is to use Xpath functions which search all namespaces, like /*/*/*[name()='result']. ²

Another thing to consider is that the SPARQL response or specification gives no guarantees for which order the columns are. So, you might get back <binding name='countryLabel'> first or second within the result. This is annoying as for =VLOOKUP to work the key needs to be to the left of the value you are looking up.

A workaround I stumbled upon is to add a ORDER BY DESC population, that will cause the columns to be ordered as listed in the query.

My personal opinion but I bet someone else also agrees. Spreadsheets excel(pun) at leveraging what computers are best at. ^[return]
This caused quite some confusion for me, because because for example xpather.com does not take into account the namespace, so even if it worked there it did not work in the google sheet. Better to instead use https://extendsclass.com/xpath-tester.html which does require the correct namespace (or a function which searches all namespaces). ^[return]

Using WWWOFFLE to save a modern webpage for later

published on September 02, 2021

Every so often when you want to archive a webpage, you notice it’s full of dynamic content and javascript which won’t easily be archived. I was recently looking to archive a matterport 3D image. This is a typical website that won’t easily save using normal web-archivers, as it relies on javascript to dynamically fetch images as you move through the 3D space.

One generic solution to capture something like this is to use a proxy in the web browser and save everything that passes through it. But most proxies only cache things for a limited time and respect headers like no-cache¹. But if the proxy would ignore that and store all requests that flow through it indefinitely, you can maybe create a “snapshot” of a website by browsing it trough this archiving proxy.

Turns out I am not the first one to come up with this idea, there are at least two tools out there which do this. The first one I tried was Proxy Offline Browser, which is a Java GUI application which does this. It worked quite well, but the free version does not do TLS/HTTPS. The Pro version is only 30 euro, but I was curious to see if there was any open-source solution that could do this.

Turns out there is, it’s called WWWOFFLEand it has a lovely compatible webpage. After some trying, I got it working, and I’ll describe rough outlines on how to get it working here. Note though, if you value your time or don’t feel like fiddling around in the terminal, I do recommend just paying 30 euro for the Proxy Offline Browser and be done with it.

Steps for getting it working on OS X

First you need to download wwwoffle source code and ensure you have GNUTLS headers and libraries, so you can use it for HTTPS.
Then compile it with

./configure --prefix=/usr/local/Cellar/wwwoffle/2.9j/ --with-gnutls=/usr/local --with-spooldir=/usr/local/var/run/wwwoffle --with-confdir=/usr/local/etc/
make
make install

Then run it

wwwoffled -c /usr/local/etc/wwwoffle.conf -d

Now there is a few more steps before you can start archiving.

First reconfigure your browser² to use wwwoffle as proxy. Then visit https://localhost:8080 in the browser to get to the wwwoffle page. Using this page, you can control wwwoffle and see what it has cached.

First, you will need to get the CA certificate, so you won’t get SSL warnings all the time. Go to http://localhost:8080/certificates/root, download and install it.

Then you need to put wwoffled into online mode, which you can do here http://localhost:8080/control/

Then configure wwwoffled itself, which you can do using the built-in web-based configuration tool.

The settings to change are

http://localhost:8080/configuration/SSLOptions/enable-caching to yes

and

http://localhost:8080/configuration/SSLOptions/allow-cacheto allow-cache = *:443

That should hopefully be enough. Now try browsing some website. Then go to the control page and put wwwoffled into offline mode. Hopefully, you should still be able to browse the same page, using the cache.

Additionally, I had to add

CensorHeader
{
 Access-Control-Allow-Origin = *
}

To http://localhost:8080/configuration/CensorHeader/no-nameto ensure AJAX³ requests worked in some cases.

If you run in to other issues, you can either start debugging or go back and cough up the money :-)

which seems to be standard practice nowadays even for things that should definitely be cached ^[return]
I recommend using another browser than your main one for this to keep things separated. On OS X I’d recommend Firefox as it keeps it’s trusted CA’s separate from the OS’s so you won’t need to have your whole computer trust the newly minted CA certificate. ^[return]
yeah I’m old ^[return]

[RE: nyman]#_

Recent Posts

A clean slate

Part 2

It dislikes being told what to do?

Part 2

Conclusion

Now what?

Part 1

Part 2

Conclusion

Back on track

On to part 2

ChatGPT tries to write some Clojure

If not Clojure, then what?

Second part of the puzzle

The future?

Making things future proof

Steps for getting it working on OS X

Categories

Adblocking (1)

Css (1)

Distractions (1)

Gpt (9)

Gptaoc2022 (9)

Infosec (1)

Linux (1)

Llm (9)

Microblog (2)

Php (1)

Rants (1)

Repair (1)

Security (6)

Servers (3)

Spreadsheets (1)

Sysadmin (3)

Tech (25)

Web (7)