Japanese compound verbs quantitatively

Recently, I’ve been playing with Japanese V+V compound verbs (verbs consisting of two verbs V1 and V2, where V1 is in 連用形, i.e. the “-masu stem” or i-stem). Basically, I’ve extracted all occurrences of such verbs from post-war works1 found in Aozora Bunko, and computed some quantitative information about them. If you want to know more details about what I’ve done, what data I used, why I did it, and what I’d like to do next, here’s a short report about it (in Japanese):

現代日本語のコーパス:複合動詞の研究に向けて (PDF)

In addition to the background information, the report also contains some interesting quantitative results not mentioned here. Hopefully I will write more about this in English later.

Lists with number of occurrences (tokens) follow. You can also browse all the sentences in which the compound verbs occur.

If you want to use or reproduce any of the data, please contact me.

50 most common V2 verbs

The numbers indicate number of tokens (i.e. occurrences, only those on the second position of a compound verb are counted). Note: While simple orthographic variants have been coalesced whenever possible using information from the EDICT2 dictionary, ambiguous forms are still counted separately to avoid any confusion. The following two cases are most conspicuous:

  • 〜切れる is mostly a potential form of 〜切る, but sometimes it actually is the verb with dictionary form 〜切れる.
  • 〜上る often means 〜あげる (usually written 〜上げる), but sometimes it also means 〜のぼる.

(This also applies to the list of whole compound verbs.)

〜出す 3569, 〜始める 2373, 〜得る 2240, 〜込む 1678, 〜合う 1362, 〜続ける 1016, 〜切る 1011, 〜掛ける 951, 〜上げる 818, 〜上る 774, 〜過ぎる 764, 〜付ける 620, 〜切れる 604, 〜回る 532, 〜兼ねる 480, 〜掛かる 442, 〜回す 416, 〜取る 378, 〜出る 343, 〜付く 305, 〜終わる 293, 〜上がる 280, 〜去る 273, 〜遣る 228, 〜直す 228, 〜止める 220, 〜立つ 217, 〜立てる 211, 〜尽くす 208, 〜捨てる 206, 〜抜く 206, 〜落とす 192, 〜返す 190, 〜治す 188, 〜起こす 169, 〜寄せる 163, 〜合わせる 160, 〜入る 155, 〜入れる 146, 〜替える 145, 〜見る 133, 〜落ちる 131, 〜果てる 131, 〜殺す 121, 〜待つ 110, 〜集める 106, 〜歩く 105, 〜通す 102, 〜比べる 101, 〜合わす 101

Total token count: 35221 (all verbs, not just the top 50). (This is at the same time the total token count of all compound verbs and token count of all V2 verbs occurring within them. Verbs made up of three and more components, such as 見回し始める, are counted two and more times, e.g. once as 見回す = 見る + 回す and once as 見回し始める = 見回す + 始める, both being valid occurrences of a compound verb.)

Full list (text file, UTF-8).

50 most common compound (V+V) verbs

Note: All 〜する verbs are counted as as a single verb 為る/し. The note for the previous list applies too.

Compound verbV1.V2Number of tokens
有り得る有る.得る483
し得る為る.得る479
し始める為る.始める391
歩き出す歩く.出す303
言い出す言う.出す291
思い出す思う.出す239
泣き出す泣く.出す187
見回す見る.回す182
笑い出す笑う.出す178
考え込む考える.込む151
し合う為る.合う148
歩き回る歩く.回る135
起き上る起きる.上る127
し続ける為る.続ける127
し切る為る.切る126
言い掛ける言う.掛ける122
出来上る出来る.上る107
成り得る成る.得る106
引っ張り出す引っ張る.出す97
駆け付ける駆ける.付ける95
通り過ぎる通る.過ぎる93
分かり切る分かる.切る85
思い浮かべる思う.浮かべる85
疲れ切る疲れる.切る82
遣り出す遣る.出す82
辿り付く辿る.付く80
見比べる見る.比べる78
向かい合う向かう.合う76
見上げる見る.上げる73
駆け込む駆ける.込む71
し過ぎる為る.過ぎる71
成り掛ける成る.掛ける67
取り掛かる取る.掛かる67
黙り込む黙る.込む67
起ち上る起つ.上る67
言い切れる言う.切れる66
突き止める突く.止める66
見守る見る.守る66
為し得る為す.得る65
飛び上る飛ぶ.上る63
持ち出す持つ.出す61
し兼ねる為る.兼ねる60
し出す為る.出す60
取り出す取る.出す60
書き始める書く.始める59
有り過ぎる有る.過ぎる58
焼け残る焼ける.残る57
呼び止める呼ぶ.止める57
知り得る知る.得る57
照らし出す照らす.出す56

Full list (text file, UTF-8).

Compound verbs in context (usage examples)

Here are browsable pages with sentence contexts extracted from the Aozora Bunko works (including links to the respective works). The second file is a subset of the first one. Spaces (not present in the original works) indicate morpheme boundaries.

If you want to use or reproduce any of the data, please contact me.

  • Compound Verb Lexicon at NINJAL, an effort led by Kageyama Tarō, provides classification, definitions and examples of common lexical compound verbs as well as background info (in English and Japanese). It has links to the BCCWJ corpus for most of them, but the database itself does not contain frequency information.

  • Morphological Analysis of Japanese at NINJAL: you can try it online, it uses MeCab and various versions of UniDic, or alternatively IPAdic.

  • Papers available online:

    • Asao Yoshihiko (2007) 複合語の生産性と文法的性質 (in Japanese, “Productivity and Grammatical Nature of Compound Verbs”).
    • Harald Baayen’s home page hosts most of his publications about morphological productivity.
    • Himeno Masako’s papers at the TUFS repository, on which a large part of her book on compound verbs (複合動詞の構造と意味用法 = The Structure, Meaning and Usage of Compound Verbs) is based.
    • Tamaoka Katsuo et. al (2004) Entropy and Redundancy of Japanese Lexical and Syntactic Compound Verbs.

See my report (Japanese, PDF) for more pointers to information and some discussion of the papers above.

  1. More specifically, works written with new ortography (新仮名遣い), with first publication date from 1945 on. There are 1185 of them and they make up about 9 % of Aozora Bunko, which concentrates mostly on older works. This is because it consists only of works that out of copyright (i.e. mostly with expired copyright). Thankfully, in Japan copyright expires after 50 years since the death of the author.