Commit 65ecd26
2023-07-03 15:47:14
Changed files (1)
README.md
@@ -0,0 +1,40 @@
+## Calvin and Hobbes json
+
+## Sources
+
+Comic text sourced from [Calvin and Hobbes quotations](https://www.s-anand.net/calvinandhobbes.html) by [S Anand](https://www.s-anand.net/)
+
+See also:
+ - [Calvin & Hobbes Search Engine](https://michaelyingling.com/random/calvin_and_hobbes/)
+ - [Calvin And Hobbes Complete Digital Collection v1.2](https://archive.org/details/calvin-and-hobbes-complete-digital-collection/%281%29%20Calvin%20%26%20Hobbes%20-%20Bill%20Watterson/)
+
+#### Related projects and Links
+ - [GoComics API wrapper - irahorecka/comics](https://github.com/irahorecka/comics)
+ - [Kumiko, the Comics Cutter - njean42/kumiko](https://github.com/njean42/kumiko)
+ - [Commic Book Panel Segmentation](https://maxhalford.github.io/blog/comic-book-panel-segmentation/)
+
+## Recration:
+
+### Collect quotes
+
+```bash
+curl 'https://www.s-anand.net/comic.calvin.jsz' --compressed --output quotes.tsv
+```
+
+### Collect images
+```bash
+cut -f1 quotes.tsv | grep -v id | xargs -I {} date --date={} +'%Y-%m-%d' | xargs -P10 -I {} ./dl.py {} | tee aria2c.list
+```
+
+### Panel detection
+
+```bash
+kumiko -i comic.jpg --min-panel-size-ratio=.15
+```
+
+
+## Ideas
+ - Image boundry detection to split the comics into separate panels
+ - Character identification to enumerate characters per panel
+ - OCR to split text into per-panel text
+ - Transform data into fine-tuning style json