Statistics for reading Japanese

Statistics for reading Japanese
Photo by Mohammadreza Charkhgard / Unsplash

Hi friends!

I just released my first ever web-dev project!

There's a tool called Game Sentence Miner which uses OCR to grab text from your screen, and it lets you look that text up in a dictionary.

GitHub - bpwhelan/GameSentenceMiner: An All-in-One immersion toolkit for learning Languages through games and other visual media.
An All-in-One immersion toolkit for learning Languages through games and other visual media. - bpwhelan/GameSentenceMiner

It writes this to a database file:

  • The line of text
  • Where you read it
  • At what time did you read it

I thought....

"What statistics can I learn from this data?"

So I went out to make my first ever web-dev project, a statistics page!

PLEASE remember I am not a web dev! I do SecDevOps! Pls dont judge me too hard for my terrible design 😂 💛s

Notably this has a few cool features:

  • Daily overview of your reading
  • An overview of the current game you're playing
  • An overview of all games you've played

Because I have both the line of text and when it came in, I can guess a lot of things about your reading.

If you read a bit of text and then go away for 2 minutes I assume you are AFK, so if you press "next line of text" within 2 minutes I assume you are actively reading.

This lets me calculate a bunch of stuff.

I made a GitHub style heatmap of your reading along with streaks and average time spent per day reading.

I have a bunch more stats like:

  • How fast do you read, over time?
  • How many hours per day do you spend reading?
  • How many characters of text do you read over time?

As well as a Kanji heatmap:

Everytime you read a sentence that contains Kanji, it does +1 to these Kanji. The more times you read it, the closer to cyan it gets.

If you read a kanji 500 times, I assume you know it really well so it becomes cyan.

I also added a screenshot feature using html2canvas so you can take nice screenshots like the one I took above :)

Anki Integration

I integrated with Anki too

So if you see kanji a lot while reading but that kanji is not in any Anki cards, you can see that here!

And if you wonder "hmmm. When have I ever read 松?"

You can click on the kanji and go to a live search of every sentence you have ever read:

So you can see exactly where you have read this kanji before.

Data Cleanup

You can also deduplicate text, so if the same sentence exists within 5 minutes of each other within the same game I assume it's a duplicate and you can delete that.

You can also clean up text using Regex:

Or delete entire games you don't care about.

Web Dev Stuff

This is made using Flask, Jinja2 for templates, HTML, CSS and JS.

I do use Chart.js:

Chart.js
Simple yet flexible JavaScript charting library for the modern web

but tbh its so simple its hardly an advanced framework like Svelte or whatever.

The most advanced thing I use I use is Flexbox, but even that's not perfect.

If you look here:

The Kanji in these boxes are aligned horizontally but not vertically.

This is because Flexbox does not support vertical alignment.

I Googled it and tbh every answer was a wall of archaic wizard poetry, and the answers that were not were something like "bro flexbox is so out of date bro use css grid if your capacitor is within the sync of the flux device, otherwise you need to dynamically load the static content via the HTML 5.0 spec bro"

like i dont care that much someone else can fix this its open source