An Extended Subcategorization Frames Dictionary
A.F. Gelbukh, Sofia N. Galicia-Haro
The information on syntactic government, or subcategorization, is an integral part of a language lexicon. The lack of this information leads to such errors as *to marry with Mary that can be said by a French-speaking person, or, say, *to marry on Mary or *to marry behind John that can be said by a Russian-speaking person; clearly, for an English-speaking person it would be difficult to choose the correct preposition when speaking in these languages: *se marier Marie. The intuition based on one's native language often does him or her an ill service when speaking in another, even closely cognate language. Unfortunately, in the teaching practice and in the existing manuals little attention is paid to clear and systematic consideration of syntactic government. In the paper, a Spanish dictionary of extended subcategorization frames is presented. Such a frame (a government pattern) for a word lists the means of expression of its valences as well as gives the information on the compatibility of these valences or specific means of their expression (e.g., Spanish *mover de ... hasta ...). A statistical algorithm of compilation of such a dictionary from a large unprepared text corpus is discussed. The algorithm is non-supervised and produces a list of prepositions (or grammatical cases) used with each word; at the same time the algorithm resolves the syntactic ambiguity in the corpus. Another, supervised algorithm is intended for computer-aided compilation of the human-oriented dictionary.