Orthography of Japanese compound verbs

Should one write ありえない or あり得ない? 勉強し始める or 勉強しはじめる? In other words, should the second part of Japanese V+V compound verb be spelled in kanji or kana? Let the actual usage1 answer!

occurrences written in kana (100 most common V2 verbs, covering 82 % of the 36,184 compound verbs occurrences in the corpus)
Numbers of occurrences of each verb as a V2 in a compound verb are in parentheses. Black circles (●) mark V2 with predominant kana spelling, empty circles (○) V2 with predominant kanji spelling. Overall result is in the last row.

As you can see, kana wins the battle by a narrow margin. Total 58 % of the 100 most common V22 occurrences are written in kana. (It is not in the graph, but even if we took in account the less common verbs, the ratio averaged over occurrences would still 56 %.)

What is more interesting, is what particular verbs and V2 verbs are written in kana or kanji:

The most common V2 verb 出す is divided between kana and kanji spelling very evenly (48 % vs. 52 %), but if we were to look at specific compounds, we would discover that the most common spelling differs based on V1:

  • most often written in kanji are: 歩き出す, 云い出す (いい出す/言い出す), 泣き出す, 笑い出す, とり出す, …
  • most often written in kana are: 思いだす, 引っぱりだす, やりだす, 持ちだす, …

Is there any reason why 持ちだす is almost exclusively written in kana and とり出す almost exclusively in kanji? Perhaps to have at least one kanji per word?3

Why is はじめる most often written in kana (76 %), but 終わる most often in kanji (94 %!). There seems to be no particular reason for this, except for custom.

Another peculiarity is that when V1 is 見る, which is reduced to 見 [み] in a compound, then V2 tends to be written in kana. Why? Perhaps because 見回す, 見比べる, 見上げる, 見守る, having no okurigana between the two components, could suggest on’yomi reading at first glance?

Perhaps the choice of kanji vs. kana is also influenced by style. And as some verbs tend to be used in a particular style, they also tend to use a particular orthography (し得る tends to be written in kanji, but やりだす in kana).

You can also download the full data in tab-separated UTF-8 plain text, which includes all the spellings of particular verbs (not just kana vs. kanji comparison).

  1. There is a caveat though: the corpus I have used consists of literary works written 50 to 70 years ago. Current usage, especially in informal communication may have shifted.

  2. In a verb+verb compound the first constituent verb is often labelled as V1 and the second constituent verb as V2.

  3. Similarly the V2 得る is most often written in kana, except for し得る (which also pulls the overall average for 得る towards kanji).