Japanese compound verbs quantitatively
Recently, I’ve been playing with Japanese V+V compound verbs (verbs consisting of two verbs V1 and V2, where V1 is in 連用形, i.e. the “-masu stem” or i-stem). Basically, I’ve extracted all occurrences of such verbs from post-war works1 found in Aozora Bunko, and computed some quantitative information about them. If you want to know more details about what I’ve done, what data I used, why I did it, and what I’d like to do next, here’s a short report about it (in Japanese):
現代日本語のコーパス:複合動詞の研究に向けて (PDF)
In addition to the background information, the report also contains some interesting quantitative results not mentioned here. Hopefully I will write more about this in English later.
Lists with number of occurrences (tokens) follow. You can also browse all the sentences in which the compound verbs occur.
If you want to use or reproduce any of the data, please contact me.
50 most common V2 verbs
The numbers indicate number of tokens (i.e. occurrences, only those on the second position of a compound verb are counted). Note: While simple orthographic variants have been coalesced whenever possible using information from the EDICT2 dictionary, ambiguous forms are still counted separately to avoid any confusion. The following two cases are most conspicuous:
- 〜切れる is mostly a potential form of 〜切る, but sometimes it actually is the verb with dictionary form 〜切れる.
- 〜上る often means 〜あげる (usually written 〜上げる), but sometimes it also means 〜のぼる.
(This also applies to the list of whole compound verbs.)
〜出す 3569, 〜始める 2373, 〜得る 2240, 〜込む 1678, 〜合う 1362, 〜続ける 1016, 〜切る 1011, 〜掛ける 951, 〜上げる 818, 〜上る 774, 〜過ぎる 764, 〜付ける 620, 〜切れる 604, 〜回る 532, 〜兼ねる 480, 〜掛かる 442, 〜回す 416, 〜取る 378, 〜出る 343, 〜付く 305, 〜終わる 293, 〜上がる 280, 〜去る 273, 〜遣る 228, 〜直す 228, 〜止める 220, 〜立つ 217, 〜立てる 211, 〜尽くす 208, 〜捨てる 206, 〜抜く 206, 〜落とす 192, 〜返す 190, 〜治す 188, 〜起こす 169, 〜寄せる 163, 〜合わせる 160, 〜入る 155, 〜入れる 146, 〜替える 145, 〜見る 133, 〜落ちる 131, 〜果てる 131, 〜殺す 121, 〜待つ 110, 〜集める 106, 〜歩く 105, 〜通す 102, 〜比べる 101, 〜合わす 101
Total token count: 35221 (all verbs, not just the top 50). (This is at the same time the total token count of all compound verbs and token count of all V2 verbs occurring within them. Verbs made up of three and more components, such as 見回し始める, are counted two and more times, e.g. once as 見回す = 見る + 回す and once as 見回し始める = 見回す + 始める, both being valid occurrences of a compound verb.)
Full list (text file, UTF-8).
50 most common compound (V+V) verbs
Note: All 〜する verbs are counted as as a single verb 為る/し. The note for the previous list applies too.
Compound verb | V1.V2 | Number of tokens |
---|---|---|
有り得る | 有る.得る | 483 |
し得る | 為る.得る | 479 |
し始める | 為る.始める | 391 |
歩き出す | 歩く.出す | 303 |
言い出す | 言う.出す | 291 |
思い出す | 思う.出す | 239 |
泣き出す | 泣く.出す | 187 |
見回す | 見る.回す | 182 |
笑い出す | 笑う.出す | 178 |
考え込む | 考える.込む | 151 |
し合う | 為る.合う | 148 |
歩き回る | 歩く.回る | 135 |
起き上る | 起きる.上る | 127 |
し続ける | 為る.続ける | 127 |
し切る | 為る.切る | 126 |
言い掛ける | 言う.掛ける | 122 |
出来上る | 出来る.上る | 107 |
成り得る | 成る.得る | 106 |
引っ張り出す | 引っ張る.出す | 97 |
駆け付ける | 駆ける.付ける | 95 |
通り過ぎる | 通る.過ぎる | 93 |
分かり切る | 分かる.切る | 85 |
思い浮かべる | 思う.浮かべる | 85 |
疲れ切る | 疲れる.切る | 82 |
遣り出す | 遣る.出す | 82 |
辿り付く | 辿る.付く | 80 |
見比べる | 見る.比べる | 78 |
向かい合う | 向かう.合う | 76 |
見上げる | 見る.上げる | 73 |
駆け込む | 駆ける.込む | 71 |
し過ぎる | 為る.過ぎる | 71 |
成り掛ける | 成る.掛ける | 67 |
取り掛かる | 取る.掛かる | 67 |
黙り込む | 黙る.込む | 67 |
起ち上る | 起つ.上る | 67 |
言い切れる | 言う.切れる | 66 |
突き止める | 突く.止める | 66 |
見守る | 見る.守る | 66 |
為し得る | 為す.得る | 65 |
飛び上る | 飛ぶ.上る | 63 |
持ち出す | 持つ.出す | 61 |
し兼ねる | 為る.兼ねる | 60 |
し出す | 為る.出す | 60 |
取り出す | 取る.出す | 60 |
書き始める | 書く.始める | 59 |
有り過ぎる | 有る.過ぎる | 58 |
焼け残る | 焼ける.残る | 57 |
呼び止める | 呼ぶ.止める | 57 |
知り得る | 知る.得る | 57 |
照らし出す | 照らす.出す | 56 |
Full list (text file, UTF-8).
Compound verbs in context (usage examples)
Here are browsable pages with sentence contexts extracted from the Aozora Bunko works (including links to the respective works). The second file is a subset of the first one. Spaces (not present in the original works) indicate morpheme boundaries.
- All verbs (HTML, 11.2 MB)
- “Directional” verbs (i.e. where V2 seems to indicate upward, downward, inward or outward direction) (HTML, 2.6 MB)
If you want to use or reproduce any of the data, please contact me.
Related Links
-
Compound Verb Lexicon at NINJAL, an effort led by Kageyama Tarō, provides classification, definitions and examples of common lexical compound verbs as well as background info (in English and Japanese). It has links to the BCCWJ corpus for most of them, but the database itself does not contain frequency information.
-
Morphological Analysis of Japanese at NINJAL: you can try it online, it uses MeCab and various versions of UniDic, or alternatively IPAdic.
-
Papers available online:
- Asao Yoshihiko (2007) 複合語の生産性と文法的性質 (in Japanese, “Productivity and Grammatical Nature of Compound Verbs”).
- Harald Baayen’s home page hosts most of his publications about morphological productivity.
- Himeno Masako’s papers at the TUFS repository, on which a large part of her book on compound verbs (複合動詞の構造と意味用法 = The Structure, Meaning and Usage of Compound Verbs) is based.
- Tamaoka Katsuo et. al (2004) Entropy and Redundancy of Japanese Lexical and Syntactic Compound Verbs.
See my report (Japanese, PDF) for more pointers to information and some discussion of the papers above.
-
More specifically, works written with new ortography (新仮名遣い), with first publication date from 1945 on. There are 1185 of them and they make up about 9 % of Aozora Bunko, which concentrates mostly on older works. This is because it consists only of works that out of copyright (i.e. mostly with expired copyright). Thankfully, in Japan copyright expires after 50 years since the death of the author. ↩