From 3785b2430a675c52bcb5f0cf6aac9d1ef7cca3c6 Mon Sep 17 00:00:00 2001
From: Frederick Yin <fkfd@fkfd.me>
Date: Mon, 11 Jul 2022 15:23:25 +0800
Subject: Write instructions for self about patching dataset

---
 README.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
diff --git a/README.md b/README.md
index d6ae9e6..84df4dc 100644
--- a/README.md
+++ b/README.md
@@ -109,3 +109,24 @@ I don't have any lawyer friends but what I know is no one can own
 non-trademarked words in the English language. On this ground, all words
 in the datasets are in the public domain, but the lyrics in the form of
 full lines are owned by TØP and/or FBR.
+
+## Patching the dataset
+
+The dataset I'm using right now is 70% machine-generated and 30% manual
+labor. Re-generating it then doing all the work again is way beyond
+practicality. It has happened so many times I had to manually fix the
+dataset because of a mistake I made, but forgot to modify metadata.
+Therefore, I decided to put this checklist here for future me.
+
+### Procedure for deleting a word
+
+- Remove word from `words.js`
+- Remove word from `words.html`
+- Decrement rowspan of `<td>` element for track title
+- Search for word in most frequent list, remove row if present
+- Decrement word count in `<h2>` element(s) of `word.html`
+- Decrement word count in "How many words are there?" section of `index.html`
+- Decrement word count in `README.md`
+- `grep` for word in `data/`, remove all occurrences in `tracks_words`,
+  `words`, `words.json`, and `most_frequent`
+- `scp *.html words.js www@fkfd.me:www/toys/one_top_song/`
-- 
cgit v1.2.3