summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md21
1 files changed, 21 insertions, 0 deletions
diff --git a/README.md b/README.md
index d6ae9e6..84df4dc 100644
--- a/README.md
+++ b/README.md
@@ -109,3 +109,24 @@ I don't have any lawyer friends but what I know is no one can own
non-trademarked words in the English language. On this ground, all words
in the datasets are in the public domain, but the lyrics in the form of
full lines are owned by TØP and/or FBR.
+
+## Patching the dataset
+
+The dataset I'm using right now is 70% machine-generated and 30% manual
+labor. Re-generating it then doing all the work again is way beyond
+practicality. It has happened so many times I had to manually fix the
+dataset because of a mistake I made, but forgot to modify metadata.
+Therefore, I decided to put this checklist here for future me.
+
+### Procedure for deleting a word
+
+- Remove word from `words.js`
+- Remove word from `words.html`
+- Decrement rowspan of `<td>` element for track title
+- Search for word in most frequent list, remove row if present
+- Decrement word count in `<h2>` element(s) of `word.html`
+- Decrement word count in "How many words are there?" section of `index.html`
+- Decrement word count in `README.md`
+- `grep` for word in `data/`, remove all occurrences in `tracks_words`,
+ `words`, `words.json`, and `most_frequent`
+- `scp *.html words.js www@fkfd.me:www/toys/one_top_song/`