summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/projects/img/one_top_song/demo.mp4bin0 -> 2059865 bytes
-rw-r--r--docs/projects/img/one_top_song/difficult_beast.mp3bin0 -> 102967 bytes
-rw-r--r--docs/projects/img/one_top_song/top_logo.pngbin0 -> 25935 bytes
-rw-r--r--docs/projects/img/one_top_song/ui_desktop.pngbin0 -> 70797 bytes
-rw-r--r--docs/projects/img/one_top_song/ui_mobile.pngbin0 -> 61731 bytes
-rw-r--r--docs/projects/img/one_top_song/words_all.pngbin0 -> 84651 bytes
-rw-r--r--docs/projects/img/one_top_song/words_frequent.pngbin0 -> 51920 bytes
-rw-r--r--docs/projects/index.md11
-rw-r--r--docs/projects/one_top_song.md527
9 files changed, 538 insertions, 0 deletions
diff --git a/docs/projects/img/one_top_song/demo.mp4 b/docs/projects/img/one_top_song/demo.mp4
new file mode 100644
index 0000000..367daee
--- /dev/null
+++ b/docs/projects/img/one_top_song/demo.mp4
Binary files differ
diff --git a/docs/projects/img/one_top_song/difficult_beast.mp3 b/docs/projects/img/one_top_song/difficult_beast.mp3
new file mode 100644
index 0000000..963e093
--- /dev/null
+++ b/docs/projects/img/one_top_song/difficult_beast.mp3
Binary files differ
diff --git a/docs/projects/img/one_top_song/top_logo.png b/docs/projects/img/one_top_song/top_logo.png
new file mode 100644
index 0000000..1cdd290
--- /dev/null
+++ b/docs/projects/img/one_top_song/top_logo.png
Binary files differ
diff --git a/docs/projects/img/one_top_song/ui_desktop.png b/docs/projects/img/one_top_song/ui_desktop.png
new file mode 100644
index 0000000..5f44ca2
--- /dev/null
+++ b/docs/projects/img/one_top_song/ui_desktop.png
Binary files differ
diff --git a/docs/projects/img/one_top_song/ui_mobile.png b/docs/projects/img/one_top_song/ui_mobile.png
new file mode 100644
index 0000000..0aadb87
--- /dev/null
+++ b/docs/projects/img/one_top_song/ui_mobile.png
Binary files differ
diff --git a/docs/projects/img/one_top_song/words_all.png b/docs/projects/img/one_top_song/words_all.png
new file mode 100644
index 0000000..bf377b9
--- /dev/null
+++ b/docs/projects/img/one_top_song/words_all.png
Binary files differ
diff --git a/docs/projects/img/one_top_song/words_frequent.png b/docs/projects/img/one_top_song/words_frequent.png
new file mode 100644
index 0000000..a56ee5b
--- /dev/null
+++ b/docs/projects/img/one_top_song/words_frequent.png
Binary files differ
diff --git a/docs/projects/index.md b/docs/projects/index.md
index 529f11d..12b21ed 100644
--- a/docs/projects/index.md
+++ b/docs/projects/index.md
@@ -8,6 +8,17 @@ MkDocs). But the few that do, are here.
Projects below are sorted reverse chronologically (most recent first).
+## [One tøp song](one_top_song)
+
+![Screenshot of desktop UI](img/one_top_song/ui_desktop.png)
+
+On April 19, 2022, I released a web game made out of words that only
+appear in one twenty øne piløts song. It involves automation using curl,
+Python, and Unix utilities, but on top of it there's a lot of manual work.
+Here are the steps I took over the course of this project, from
+downloading the lyrics, to generating a dataset, and finally making
+a game.
+
## [Kanvas](kanvas)
![Screenshot of Kanvas 0.1.0](img/kanvas/screenshot_0.1.0.png)
diff --git a/docs/projects/one_top_song.md b/docs/projects/one_top_song.md
new file mode 100644
index 0000000..761b769
--- /dev/null
+++ b/docs/projects/one_top_song.md
@@ -0,0 +1,527 @@
+# One tøp song
+
+2022-05-08
+
+I'm a die-hard fan of twenty øne piløts (did you know they're a two piece
+band?) You can see this from the fact that I take the trouble to stylize
+the band name with ø's, even in its acronym, tøp. Therefore, you wouldn't
+expect neutrality from this blogpost.
+
+The band and its members, Tyler Joseph and Josh Dun, are known for
+a Grammy and two all-gold records on RIAA, but to me they're irrelevant
+(the awards, not the members). I like the vibe of their songs and
+especially the lyrics. For example, take a look at the insightful final
+lines from _Pet Cheetah_ (Trench) that build up to a pumping crescendo:
+
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+> Pet cheetah, cheetah
+
+Whatever you say, I think it's a one-of-a-kind song that discusses making
+music for a fanbase (yes they make a lot of meta songs like this). The
+lines above are simple, but unique as well. Among all tøp songs, this is
+the only one that features the word "pet", and also "cheetah", just like
+how _Nico And The Niners_ is the only one with "Nico" and "Niners". Wait,
+that's not right, because "Nico" appears earlier in the album, in the
+second verse of _Morph_.
+
+This brought me into thinking: How many words are there that appear in
+only one twenty øne piløts song? And to pay off my efforts, can I turn
+this into a fun game for other tøp fans to play?
+
+For the impatient, you may skip all the procedures and technicality. Go
+ahead and check out the [results](#results). Everyone else, please take
+your time on your ride.
+
+## Step 1: Download the lyrics
+
+This isn't as easy as it seemed, nor is it too hard. The lyric provider is
+azlyrics.com, because it works without JavaScript and serves
+machine-readable HTML. So I went ahead and curl'd a random page.
+
+```
+$ curl https://www.azlyrics.com/lyrics/twentyonepilots/truce.html
+<html>
+<head><title>302 Found</title></head>
+<body>
+<center><h1>302 Found</h1></center>
+<hr><center>nginx</center>
+</body>
+</html>
+```
+
+OK, time to `man curl` for the option to follow redirections. It's `-L`,
+btw. (HTML prettified)
+
+```
+$ curl -L https://www.azlyrics.com/lyrics/twentyonepilots/truce.html
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ <!-- some meta tags -->
+ <title>AZLyrics - request for access</title>
+ <!-- some stylesheets -->
+ <!-- some <IE9 compat scripts -->
+ <!-- jquery and the like -->
+ <!-- recaptcha script -->
+ </head>
+
+ <body>
+ <nav>...</nav>
+ <!-- a commented out banner -->
+
+ <!-- a few nested divs -->
+ Access denied.
+ <!-- end nested divs -->
+
+ <!-- a commented out block with the note "bot ban" -->
+
+ <!-- footer -->
+ </body>
+</html>
+```
+
+Damn, that's pretty… nasty, but it's exactly how I expected it to go. Now,
+I've done a lot of web scraping, so I know it's possible to fake a few
+HTTP headers to give curl some human skin. The most common headers are:
+
+- Referer
+- Cookie
+- User-Agent
+
+So I tried them one by one. User-Agent worked.
+
+```
+$ curl -L -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0' \
+ https://www.azlyrics.com/lyrics/twentyonepilots/truce.html
+```
+
+What is extra funny though, is that the server accepts even an empty UA
+string.
+
+```
+$ curl -L -H 'User-Agent: ' https://www.azlyrics.com/lyrics/twentyonepilots/truce.html
+```
+
+I couldn't resist:
+
+```
+$ curl -L -H 'User-Agent: definitely not curl' https://www.azlyrics.com/lyrics/twentyonepilots/truce.html
+```
+
+Above the lyrics an HTML comment reads:
+
+> Usage of azlyrics.com content by any third-party lyrics provider is
+> prohibited by our licensing agreement. Sorry about that.
+
+This won't stop me because I can't read.
+
+So long story short, I curl'd the twenty øne pilots index page and
+BeautifulSoup'd all the title-URL pairs, which are once again curl'd and
+BeautifulSoup'd.
+
+Soon I have a directory full of [song title].txt, but not all of them are
+useful. A few songs are not technically part of tøp's canon discography
+(some fans are gonna disagree on this one but I don't care), like the
+Elvis cover _Can't Help Falling In Love_, which is just a [YouTube
+video](https://www.youtube.com/watch?v=6ThQkrXHdh4) of Tyler singing in
+the street; another one, [_Coconut Sharks In The
+Water_](https://youtu.be/jFwsnrkK9sU), although well-known among fans, was
+only performed once for comical effect in 2011. In the end, I included
+their six studio albums and five singles, totaling 79 songs.
+
+On to step 2!
+
+## Step 2: Look for every word
+
+This is the core part of the project. I knew it's impossible by hand, so
+I sat down to write an algorithm in Python. It goes like this in
+pseudocode:
+
+```
+lyrics = dict()
+for song in all_songs:
+ lyrics[song] = read(song + ".txt").split_words()
+
+for song in all_songs:
+ other_songs = list(s in all_songs such that s != song)
+ for word in lyrics[song]:
+ for other_song in other_songs:
+ if lyrics[other_song].includes(word):
+ found = True
+
+ if not found:
+ append("results.txt", song + "\t" + word)
+```
+
+The latter block had three nested for loops. To optimize it a bit, I read
+all files before hand, split each one up into individual words, then threw
+them into a set to remove the duplicates. As for the third for loop,
+I _could_ call `break` right after `found = True`, but instead resorted to
+the magic of list comprehension (variable names and structure taken from
+pseudocode above):
+
+```
+ if any([(word in lyrics[o]) for o in other_songs]):
+ append("results.txt", song + "\t" + word)
+```
+
+I like to imagine Python optimized this one for me, but I'm not sure.
+Anyway, even if it doesn't this shouldn't be too bad. Plus, I like
+one-liners.
+
+When splitting words, they are converted to lowercase. Punctuation marks
+and suffixes like 's and 'd are removed, but I forgot to remove 've.
+Fortunately there weren't many of them, so I removed them by hand.
+
+You can read the real source code here:
+[`data/one_song_words.py`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/one_song_words.py)
+
+## Step 3: Dedupe
+
+The previous step brought about a problem. The script I wrote treated
+inflections as separate words, e.g. "vibe" (_Chlorine_), "vibes" and
+"vibing" (_The Outside_). So I wrote a script to find most of them.
+
+The script reports occurrences of the following inflections of `word`:
+
+```
+word + "s",
+word + "es",
+word + "d",
+word + "ed",
+word + "ing",
+```
+
+and also in reverse, if `word` is already inflected:
+
+```
+re.sub("s$", "", word),
+re.sub("es$", "", word),
+re.sub("d$", "", word),
+re.sub("ed$", "", word),
+re.sub("ing$", "", word),
+```
+
+And when I ran it, what happened is it caught most of the offenders — like
+"vibe" vs. "vibes" — but not more subtle ones like "vibing". I ended up
+removing them again by hand, but it's possible I missed some.
+
+Why didn't I just tell the script to remove the inflections automatically?
+Because there were false positives. For example, "sing" (_Bandito_ and
+many others) and "singed" (_Leave The City_) are not the same thing. Other
+examples include "to" and "toes", "she" and "shed", "not" and "notes",
+"even" and "evening", etc. Also, although some pairs are of the same
+origin, they're pretty different semantically, like "weathered"
+(_Chlorine_) and "weather" (_Good Day_ and _Migraine_). Leaving these
+alone, I axed everything else from my list.
+
+Source code: [`data/dedupe.py`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/dedupe.py)
+
+## Step 4: Manual inspection
+
+It was at this moment that I realized that I had forgot about stuff like
+"[x10]" (_Holding On To You_) that marks a repeated line. There were some
+onomatopoeic words like "mm-mm" (_Choker_), too, and don't get me started
+on hyphenated words: there were "treehouse" (_Forest_) and "tree-house"
+(_Stressed Out_). Words like "migraine", which comes from a song titled
+_Migraine_, are too easy for a game, so they are not included either.
+I also capitalized proper nouns like "Monday", and removed trailing
+periods and commas from every line I could find. In retrospect it could
+have been easier if I sanitized the lyric files from the beginning. At
+this moment there are 1,002 words left, but I don't know if there's more
+to knock out. I doubt anyone will notice.
+
+Here's a fun story: after I deployed the app (yes there'll be a web app at
+the end) on r/twentyonepilots, one player reported an incorrect lyric from
+_Migraine_:
+
+> A difficult to be, stop feasting lumber-down trees
+
+At first glance this lyric seemed unfamiliar to me, and it definitely
+isn't grammatically correct. I checked multiple sources: on azlyrics of
+course it's this one, but on
+[Genius](https://genius.com/Twenty-one-pilots-migraine-lyrics) it says
+otherwise:
+
+> A difficult beast feasting on burnt down trees
+
+Oops, better go check out the description from the [official
+audio](https://www.youtube.com/watch?v=Bs92ejAGLdw) on Fueled By Ramen's
+(tøp's label, FBR for short) YouTube channel:
+
+> a difficult to be, stop feasting lumber down trees
+
+And [this video at 14:40](https://youtu.be/HutQvZWJ_60?t=880) on Warner
+Music Japan's channel with Japanese and English subtitles:
+
+> 燒け落ちた木々貪り食う、気難しい野獸
+> A difficult beast feasting on burnt down trees
+
+Well, I tried.
+
+So to settle this the only thing I could do was find out by myself.
+I grabbed [WrightP's Official Acapella
+version](https://www.youtube.com/watch?v=qGLEH_VeCpE) and extracted that
+bit with Audacity. I slowed it down 50%, and it sounds like this:
+
+<audio controls src="../img/one_top_song/difficult_beast.mp3"></audio>
+
+Let me explain what I heard:
+
+> A difficult-a beast-a feasting-on bur- down trees
+
+The "ng" sound between "feasting" and "on" is audible. There is no "l"
+sound as in "lumber-down", and there is no /ɒ/ or /ɑ/ sound following
+"st", which rules out "stop".
+
+That settles it: Genius and WMG Japan are right, azlyrics and FBR are
+wrong. I suspect that azlyrics got its lyrics from FBR in the first place.
+
+Track-word pairs:
+[`data/track_words`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/tracks_words)
+
+## Step 5: Generating a dataset
+
+Now that I have a 1000-something-line-long file of tab-separated track titles
+and unique words, it's time to generate a dataset for the game. Since I'll be
+producing a web game, the language is gonna be JavaScript, so the dataset
+will be in JSON. The first challenge is we need to know the line from
+which each word came from. This way if the player fails to recall it,
+we'll show them the line and they will go "hmm, yeah, Tyler really *did*
+sing this". But you see, my step 2 script completely scrambled the lyrics.
+So I wrote another Python script to "grep" them from the giant heap of txt
+files. It was pretty easy, and moments later I have this JSON file
+structured like this:
+
+```
+[
+ {
+ "track": "Redecorate",
+ "word": "blankets",
+ "lines": [
+ "Then one night she got cold with no blankets on her bed",
+ "Blankets over mirrors, she tends to like it"
+ ]
+ },
+ {...},{...},...
+]
+```
+
+I should try to shrink the 135kB (kilo, not kibi) dataset. First, the
+prettyprint was unnecessary, so let's do away with it. It instantly went down
+to 99kB. However, having everything on one line makes batch editing in vim
+a huge pain, and every launch took seconds. So as a compromise I inserted
+a linebreak after every word object, so for x words there would be (x+2)
+lines including the brackets. 1kB well spent. The JSON file is now a neat
+100kB, which is a 26% optimization compared to the initial 135kB.
+
+However, as I was coding JavaScript I realized that, since we're using the
+dataset as a JavaScript object, we don't have to play by JSON's rules.
+This means no more double quotes around keys! Each word object has
+6 double quotes, 6 times 1000 is… 6kB! That's right, we just shrank the
+dataset to 94kB. Now that's a 30% optimization. All by frugal management
+of whitespace.
+
+Later I found it would be better if I tagged the _album_ to each word, but
+it would be super redundant. So instead, I placed lists of tracks in each
+album inside another JS file that is load alongside the words.
+
+JSON generator script: [`data/mkjson.py`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/mkjson.py)
+
+Datasets: [`data/words.json`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/words.json),
+[`words.js`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/words.js),
+and [`albums.js`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/albums.js)
+
+## Step 6: Design the game UI
+
+I thought I despised the "mobile first" approach, but it turns out what I hated
+was the "mobile only" garbage. [Mobile Wikipedia] actually works remarkably
+well on desktop. What I'm doing is so much simpler than Wikipedia. The page
+contains the following fundamental elements:
+
+- the word
+- textbox for user input
+- candidate list
+- controls
+
+I swear, the desktop version works just as smoothly as on mobile (although
+I failed to center a few elements).
+
+![Desktop UI](img/one_top_song/ui_desktop.png)
+
+![Mobile UI](img/one_top_song/ui_mobile.png)
+
+▲ Notice that "twenty øne piløts" are joined with non-breaking spaces
+
+And my absolute favorite thing here is the candidate list. I wouldn't expect
+anyone to type "House Of Gold" in its entirety, would I? Of course there should
+be some sort of search suggestion. The candidate list I implemented tries to
+match user input against the beginning of each song title, as well as acronyms.
+For example "hot" gives you _Holding On To You_. A hack was written for
+_Heavydirtysoul_ so that "hds" would match it.
+
+Oh, I almost forgot: the three buttons are twenty øne piløts-themed.
+
+![The classic |-/ logo: blue vertical bar, black dash, and red slash](img/one_top_song/top_logo.png)
+
+▲ Former tøp logo from the Regional at Best era
+
+## Step 7: Game logic
+
+From this point there's no repetitive chores, and I can finally focus on
+making a game. The concept is simple: the player tries to guess the song
+that a word came from.
+
+Let me enumerate the steps in which the player would interact with my game:
+
+1. Game shows random word taken from dataset
+2. Player types track title into textbar, confirms
+3. Game indicates correct answer, shows album and line
+4. Player clicks Next, go to 1
+
+The player might not be always right. In that case the flow would be:
+
+1. Game shows random word taken from dataset
+2. Player types track title into textbar, confirms
+3. Game indicates wrong answer
+4. Player tries again, go to 2; or clicks Next, go to 1
+
+We need some hint mechanism so a clueless player has a chance of recalling
+something.
+
+1. Game shows random word taken from dataset
+2. Player does nothing, or makes incorrect guesses
+3. Player clicks Hint
+4. Game reveals some information about correct answer unless hints are
+ depleted. Go to 2
+
+I wanted this game to be as pressure-free as possible. Therefore, players
+can skip words or show answer at any time, and there are no scorekeeping
+counters or timers. Every 50 guesses the players made, the game reminds
+them to take a rest.
+
+Source code: [`index.js`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/index.js)
+
+## Step 8: Debugging
+
+The game was designed to run offline. The server, if any, is there just to
+send you the HTML, stylesheet, and JavaScript for datasets and the game
+itself. This means it is possible to do everything in a `file://` browser
+tab.
+
+Because the web game is designed "mobile first" (but in a good way),
+I tested the UI extensively with and without DevTools mobile emulator, and
+on my phone. This way I figured out what interactions worked best on both
+keyboard and touchscreen.
+
+As to the JavaScript, I did not exactly enjoy writing it, but it wasn't
+hellish suffering either. I no longer "hate" JavaScript; I just want to
+stay away from it from now on. I would describe my code as *pretty*
+type-safe… until it isn't.
+
+## Step 9: Visualizing and having fun with the dataset
+
+No, it's not about fancy charts or scatter plots. I just thought it would be
+helpful if we could display all the words in a table, so I made a webpage
+for that. Fun fact: I gave up indentation for all the `<tr>` tags.
+Otherwise there would be 28\*1002 = 28kB of wasted data.
+
+![Table of a few tracks, words that only appear in each one, and respective
+lines](img/one_top_song/words_all.png)
+
+Then I thought, "hey, what if I pulled up a list of most frequently used
+English words and compared that to those I found?" So I downloaded a list from
+Wiktionary titled [Frequency
+lists/TV/2006/1-1000](https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/TV/2006/1-1000)
+which is the top 1000 words used in "a collection of TV and movie
+scripts/transcripts" as of 2006. This time though, I made more use of Unix
+tools. It worked like this (the 1000-word list was saved in file `1000`):
+
+```
+$ cut -f2 tracks_words # extract word from "track<tab>word" | sort > /tmp/top
+$ sort 1000 > /tmp/freq
+$ comm -12 /tmp/top /tmp/freq # find common words between the two files
+ahead
+anybody
+anyway
+...
+```
+
+And here we have the most frequent 88 words:
+
+![Table of a few words, and the track they are in](img/one_top_song/words_freq.png)
+
+I ran some more stupid analysis on the dataset and found that the only
+song that had absolutely no unique word is _Truce_ (a bad day to the _Truce_
+fans out there, eh?), and songs closest to zero are _Before Your Start Your
+Day_ and _Trees_, contributing 2 each. The figures go all the way up to 51:
+_Neon Gravestones_, which is basically a rapped-out essay, has the most
+expansive vocabulary among all tøp songs. I wrote all my interesting findings
+in the trivia section for players to discover.
+
+The scripts I used to generate HTML:
+[`data/mkhtml_all.py`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/mkhtml_all.py),
+and [`data/mkhtml/freq.py`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/data/mkhtml_freq.py)
+
+The HTML: [`words.html`](https://git.sr.ht/~fkfd/one_top_song/tree/main/item/words.html)
+
+## Step 10: Deployment
+
+The only thing it takes to deploy a static website is `scp`. `rsync` if
+you have lots of data. Let's calculate how much data we have to transfer.
+
+File | Size (kB)
+------------|------------
+index.html | 9.6
+words.html | 88
+index.css | 1.7
+index.js | 6.5
+words.js | 94
+albums.js | 2.3
+img/\*.jpg | 115.4
+__Total__ | __317.5__
+
+Incidentally, this is how much my game will consume from a player's data
+plan. *I* think it's small enough for anyone.
+
+## Results
+
+On April 19, 2022,
+I [published](https://www.reddit.com/r/twentyonepilots/comments/u68pzy/this_word_only_appears_in_one_twenty_%C3%B8ne_pil%C3%B8ts/)
+a version I thought was stable enough to r/twentyonepilots. It went
+reasonably popular. You can play it here:
+[One tøp song](https://fkfd.me/toys/one_top_song/)
+
+Here's a demo video (2.0 MiB):
+
+<video controls> <source src="../img/one_top_song/demo.mp4" /> </video>
+
+The source code (MIT) is [here](https://git.sr.ht/~fkfd/one_top_song). If
+you want, you can download lyrics to your favorite artists' songs and
+generate your own dataset to play with. A redditor considered Taylor
+Swift, and I'm looking forward to their progress.
+
+In conslusion, I think I did a pretty good job at extracting,
+representing, and toying with data, but the process left a lot to improve.
+NLP connoisseurs are gonna be mad at me for not using this and that
+library, and some Unix guru might be capable of rewriting my Python
+scripts with sed, awk, and jq. I do not care. The final product is one of
+my better interactive web designs, made with no framework and minimal
+assets. The game is not designed to be addictive, unlike
+$insertGameNameHere. It is, after all, just for fun; in the disclaimer
+I wrote that the game is "not a tool for gatekeeping." That's how things
+are supposed to work.