Learning 10,000 characters with Skritter

In Chinese by Olle Linge

I recently saw Skritter user Emil Persson mention that he had learnt 10,000 characters. This is certainly out of the ordinary, so I contacted him and asked him a few questions about his journey to 10,000 characters. Below, you can find his answers and the story of how he learnt 10,000 characters through Skritter.

Image30First and foremost, please introduce yourself! Who are you?
Emil Persson. I live just outside of Stockholm with my wife and two sons. My wife is Chinese and my older son (4 years) is bilingual. I expect my younger son to also become bilingual, but so far he’s not really speaking much at all as he’s only 1.5 years.

I’m a game developer. That’s my short work description. A slightly longer version is that I’m a graphics programmer, and for those that really care I’m Head of Research at Avalanche Studios and I research rendering techniques that are relevant for games and frequently show up as a speaker on industry conferences talking about the same. I may be more known by my nickname Humus, which is also my Skritter name.

I am NOT a language nerd, as one might rightfully suspect, nor do I think I’m talented beyond the average. However, I can be very determined and focused. Which is something I think helps an awful lot with learning Chinese as it’s a quite steep uphill battle in the beginning.
When and why did you start learning Chinese? Was your interest in characters there from the beginning or was it something that came later?
I began studying Chinese in the fall of 2012. This wasn’t my own idea, my wife had to push me to get started. After all, she’s Chinese and had learned Swedish, so it was only fair that I try to learn her native language as well. So eventually I somewhat reluctantly accepted to take a beginner’s course at Stockholm University. It was a night time course at 50% speed, and I stuck through it to the end, so after that year I had basically the equivalent of half a year of full-time studies. And that’s actually still the only formal education I have in Chinese.

When did you start using Skritter and what role has it played in your learning?
Skritter was really the thing that made Chinese interesting for me. As I set out on that beginners’ course, I never really expected it to amount to much beyond learning to say hello and a handful of basic phrases. At best I hoped to be able to navigate the most basic social situations without embarrassing myself too much. After all, the goals for the course was also set fairly low.

After a year we were supposed to know 300 characters and 600 words. I knew that in order to read a newspaper, I would need to know over 3,000 characters, so at that speed it would take me a decade to reach that level. So my expectations were set accordingly, i.e. I didn’t expect to ever reach the point where I could meaningfully communicate in Chinese beyond the absolute basics.

So I was actually a fairly mediocre student to begin with. I did find it a bit interesting, but wasn’t spending too much time on it beyond the lectures, and was quite frankly lagging behind on the homework. I stuck with pinyin for quite a bit longer than I should have, even as we started to get deeper into characters during lectures.

Eventually as I tried to catch up on homework I encountered some practical problems. Our textbooks weren’t of the best quality, and the glossary list was lacking some characters that occurred in the text. That eventually led me to google up a way to search characters by hand-drawing them. After some searching I found nciku, and from there I soon got directed to practice strokes on Skritter, and shortly after I had the app on my phone.

This was a game-changer for me. At that point I was three months into the course, and I only really knew perhaps 20 characters. Then within a week I knew over 100, and after less than three weeks I knew 300, the number I was supposed to know at the end of the course.

When I realized I was learning characters and words an order of magnitude quicker than I had before, learning Chinese suddenly became something that was realistic and feasible, and therefore also much more interesting and fun. At the time I took the final test, I knew 2,700 characters, and I was making my first attempts at reading an actual book in Chinese.

Reading at the speed and accuracy of a 6-year old of course, constantly referring to the dictionary, and in absolute need of the parallel English on the opposite side to not lose context and to make sure I really understood the text somewhat correctly, which of course in many cases I didn’t. But still, I was reading Chinese, which I never thought I would be able to.

When did  you get the idea to aim for 10,000 characters?
When I passed 9,000 characters. 🙂

Seriously though, it’s been a moving target. The first goal I set was to get through the course glossary, so I made a custom list for that. And that was quickly done, so I set of new goal of 1,000 characters. Once at that point, it was clear that it was realistic to shoot for the 3,000 characters, where I was supposed to be able to read a newspaper.

Of course, I also had all these intermediate goals at every 500 characters. A short-term goal, and a more far looking one. Once at 3,000 characters, which I reached within a year, I set a new goal at 4,000, because that’s the upper number typically quoted for the “able to read a newspaper” range.

Just to be clear, most of the time I wasn’t primarily studying characters per se, I was mostly studying words, but I still paid the most attention to my character stats. That’s because I had some sort of long-term target number there to shoot for, whereas it wasn’t clear just how many words I would need to know before I could read Chinese. So while I was primarily studying words, the number of characters still kept increasing in a fairly linear fashion for a long while.

Now, I was only studying simplified at that point. I really had no intent on anything else to be honest. My wife is from mainland China, and I wasn’t expecting to go to Taiwan any time soon. But it was during a trip to China I came to realize that learning traditional characters was meaningful too. Even if simplified is what you’ll see in most bodies of text, the traditional forms are definitively still around on the mainland too. They are very popular on business names and signs, art, decoration, and especially calligraphy.

So I changed my Skritter configuration from “simplified” to “both” and began crunching through the traditional forms. If you already know the simplified, it’s easy to learn the traditional variants too, so this period is when I learned characters the quickest.  At that point I was at around 5,000, but rushed to 7,400 within two months.

After that I didn’t really learn many new characters for a long while. At that point I felt like I had reached what was the meaningful set of characters to learn, already knowing both simplified and traditional, and having already studied a bunch of rare and obscure characters.

At some point I made another push and reached 9,000. Why? I can’t remember actually. Guess I just had too much commute time to fill. 🙂 Quite frankly, at times I was running out of material to study, and occasionally I have studied pure character lists. The words I accumulate from whatever I have happened to look up in Pleco only goes so far. And random Skritter lists from other people eventually gets to the point where there are actually not that many new words in any given list. I found Jun Da’s enormous character frequency lists and ended up studying the list of classical Chinese characters. I guess that was the natural progression after learning traditional. And I guess part of the motivation also was to fill any gaps I might have.

Now the big question: Why learn 10,000 characters? 🙂
Once I was beyond 9,000, well, it was close enough that it would be unreasonable to not shoot for 10,000. 🙂 Of course, at that point, I wasn’t learning new characters because I expected to really have a great use of them, it was increasingly just becoming a meta-game, just grabbing the next hi-score.As it turned out, it wasn’t as easy to reach 10,000 as I had originally thought. It turned out that at this range of the character frequency, the number of characters that are actually in Skritter’s database drastically dropped off. I plateaued at 9,400, despite studying lists that themselves contained well over 10,000 unique characters.I found a character list with all characters in the BIG5 character encoding standard. It had over 13,000 characters in it, and was happy to once again gain characters at great speed. Confident I would reach 10,000 soon I was extremely frustrated to finally exhaust the list and only reach 9,940.So I did the most completionist thing I have ever done. I found a complete directory of all the characters that exist in Unicode, i.e. essentially all characters you can represent on a computer without resorting to image files.  That turned out to be a bit over 20,000 characters, and I made one huge Skritter list of that. Should anyone else be crazy enough to try to repeat this, it is available here. That allowed me to finally pass 10,000 characters, but actually not by a whole lot. Having finished that list, I’m now at just 10,150 on my writing stats.I (that’s Olle) have learned roughly 6,000 characters in Skritter (traditional only), but felt that few beyond 5,000 were actually useful. Do you agree? What’s your take on this?I agree generally speaking. There’s of course a point where the time spent learning new characters becomes greater than the total amount of time you’ll ever spend looking up characters at that frequency range, so one can easily argue that should you ever encountered a super rare character, you might as well just look it up that one time.It’s obviously the case that I’ve passed the break-even point a long time ago. I would say that if you only study one or the other, the break-even point is probably around 5,000 characters. If you study both and count both as separate, as Skritter does, then add another 2,000 to that for the variants of the other set.When I was at 5,000 and still only studying simplified, I felt most characters were still useful. After learning all traditional and reaching 7,400, well, that’s probably the point where I would say the rest was more for bragging right than for practical utility.But with that said, I have also encountered a number of these rare characters in actual text and it’s very satisfying to realize that this character that I just was able to read is super rare. In the summer of 2014 I visited the China Town in Vancouver and brought home a novel collection by a selection of Chinese-Canadian authors. It was written with traditional characters, and the by far hardest reading I have done so far, but it also gave me plenty of opportunities to put rare characters to use. It was as if on every other page there was another instance of a really rare character. Maybe it’s a China-Town thing, where they may still use old language that has fallen out of fashion in China? But it did at least confirm to me that I wasn’t entirely wasting my time and characters in this range do after all occur in real text.Of course, some characters at that point are just weird, like for instance 聝 (guó, to cut the left ears of the slain). Others are rare not because they are weird, but only because they are kind of specific, like 铹 (láo, lawrencium) which you may still encounter while reading something chemistry related, but sits in the 8,500 range in Jun Da’s frequency list of modern Chinese, and is actually slightly more rare than 聝.I recently encountered another chemistry-related character 硫 (liú, sulfur) on-board a Chinese military ship that was visiting Sweden and open for the public to see, and was able to outshine my native Chinese friends who didn’t recognize it, to my great satisfaction. 🙂 My wife, however, knew the character, and I’m still waiting for a real-world situation where I can beat her. 😉 Of course, this isn’t nearly as rare, sitting in the 2,700 range.

And then of course you have all these characters for names of places. I visited 峨嵋山 (Éméi shān) with my wife in early 2014, and there you have 嵋 which I believe only ever occurs in the name of that mountain, sitting in the 3,800 range. It’s a famous mountain though, which probably explains why it’s not even further down the list. Btw, you should go definitively there if you ever get the chance, it’s really beautiful. 🙂

What was the biggest challenge with learning so many characters?
As I mentioned, the biggest challenge in the end was simply finding enough characters that actually exist in Skritters database and has stroke data such that I could study their writing. This is a not a problem the average Skritterer will run into. I might just be the only one to ever have that problem.

As for actually learning characters, that’s not so hard, at least if you use Skritter. And the more characters you know, the easier it is to learn more. Most of the rare ones are actually very standard fusions of a semantic and a pronunciation part, with very few surprises. In fact, I think you’ll find the most odd exceptions and weirdness among the really common characters. Like the most common character of all, 的, its components helps neither with meaning or pronunciation in modern Chinese.

Have you learnt any valuable lessons about learning characters that other students would benefit from, even if they didn’t aim for 10,000 characters?
Well, if you’re starting fresh, my recommendation is that once you’ve warmed up a bit and perhaps is past the first 100 characters or so, you may want to study a radicals list before continuing with more directly useful characters.

This will help speed up learning characters in the long run, and you learn to subdivide characters into their components instead of thinking about individual strokes. It really pays off in the long run.

Other than that, while it’s probably a good idea to stick to either simplified or traditional to begin with, I would also recommend that you eventually learn the other character set too. It’s actually a surprisingly quick and effortless thing to do. It takes a few weeks for the entire set.

Where do you go from here? Do you have any other crazy challenges in store?
Well, this may make you sad, but I actually just recently unsubscribed from Skritter. But then again, it’s perhaps also a great message to all the newbies out there that are just getting started, that there is an endpoint to this. Eventually you’ll reach the point where you accomplish what you set out to do.

It may look like a lot, but Skritter is an awesome tool for learning that much. It has served me really well. Three years ago I knew essentially no Chinese at all. Now I can read Chinese. I know more characters than I’ll ever need, and more than most native Chinese people. Vocabulary is certainly not a weak spot either.

My weak spot is the spoken language. Listening comprehension is OK. I can sit with Chinese people in a conversational setting and understand most of it. But I have trouble keeping up while watching TV, especially the news, and movies can be tricky too, depending on the dialog, dialects of the actors and what not. So that’s my next step where I want to improve. I will of course keep reviewing my due items in Skritter, to make sure I retain what I’ve learned, and eventually I’ll probably return to boost my vocabulary again, but for now it’ll be listening comprehension.

Finally, my spoken Chinese truly sucks. But that’s the aspect of the language that will always trail for me, because I’m a quiet guy and I feel uncomfortable speaking a language I don’t know very well, which in itself becomes an impediment for learning.

My hopes is that an improved listening comprehension will lower that threshold for me to the point I’ll feel a bit more comfortable actually practicing talking Chinese. So my shorter term goal (next 6 month to a year) is to be able to follow TV and fast movie dialogues, and the long-term goal (1-2 years) is to become moderately fluent in speaking.

A big thanks to Emil Persson for sharing the story of his journey to 10,000 characters in Skritter! This is not really something we recommend you try at home, but it’s still a great accomplishment. Let it inspire you, regardless of what character-learning goals you have!

