National Polytechnic Institute
Center for Computing Research
Natural Language Processing Laboratory

 

 

 

 

 

Demo 2.1

 

User Manual

 

Alexander Gelbukh

Grigori Sidorov

 

 

 

 

 

Mexico City

1999


Contents

1. Welcometo Parser..............................................................................................................................................

1.1 The main screen......................................................................................................................................................

1.2 How Do I........................................................................................................................................................................

2. Screen Elements......................................................................................................................................................

2.1 Trees page...................................................................................................................................................................

2.1.1 Constituency format.............................................................................................................................................

2.1.2 Dependency format...............................................................................................................................................

2.1.3 Graphical format..................................................................................................................................................

2.2 Morphology page...................................................................................................................................................

2.3 Dump page.................................................................................................................................................................

2.4 Tracing page...........................................................................................................................................................

2.5 Text area..................................................................................................................................................................

3. Toolbar.....................................................................................................................................................................

3.1 Open button.............................................................................................................................................................

3.2 Type button.............................................................................................................................................................

3.3 Options button......................................................................................................................................................

3.4 Process All button.............................................................................................................................................

3.5 Zoom Picture button.........................................................................................................................................

3.6 View Additional Nodes button...................................................................................................................

3.7 Constituent Structure button..................................................................................................................

3.8 Dependency Structure button....................................................................................................................

3.9 Graphical Representation button...........................................................................................................

3.10 Help button...........................................................................................................................................................

4. Options.......................................................................................................................................................................

4.1 Proportional Representation checkbox.............................................................................................

4.2 Show Words Vertically checkbox...........................................................................................................

4.3 Morphology radiobuttons...........................................................................................................................

4.4 Trace Parsed checkbox...................................................................................................................................

4.5 Trace Not Parsed checkbox..........................................................................................................................

4.6 Gap Between Levels setting...........................................................................................................................

4.7 Gap On One Level setting..................................................................................................................................

4.8 Check Rules Changes for Each Sentence checkbox.......................................................................

4.9 Only Number of Variants checkbox........................................................................................................

4.10 Maximal Number of Variants in Input setting..............................................................................

5. Grammar....................................................................................................................................................................

5.1 The grammar provided with the parser..............................................................................................

5.2 The grammar compiler....................................................................................................................................

6. Development team...............................................................................................................................................

 

1.     Welcometo Parser

The Parser Demo program allows you to investigate the syntactic and morphological structure of Spanish sentences using an Extended Context-Free grammar formalism. It is useful for learning the Extended Context-Free grammar formalism and for development and testing the grammar.

Namely, it allows you:

·                     To view the variants of the syntactic structure of sentences,

·                     To view the variants of morphological analisis of the words in sentences,

·                     To investigate the protocol of the parsing process, in order to understand the internal working of the parser.

Please see the following topics:

            How Do I..., page 3,ID_How Do I

            Screen ElementsID_Screen Elements, page 5,

            Development team, page 18.ID_Development team

1.1     The main screen

This is a sample screen of the program:

The screen shows a variant of syntactic structure of a Spanish sentence.

1.2     How Do I...

·                     To type a sentence to analyze, use the  New Text button.

·                     To open a file with a text, use the ID_Open button Open buttonID_Open button.

·                     To view the syntactic tree of the sentence, swicth to the Trees tab and be sure the  button is pressed.

·                     To view the morphological structure of the sentence, switch to the Morphology tab.

·                     To view the technical information about the parsing process, swicth to the Dump or Tracing tabID_BD_Topic tree.

2.     Screen Elements

On the Parser Demo screen, you can choose one of the following four pages by clicking on the tabs at the top of the screen:

            Trees page, page 5ID_How Do IID_Document page.

            Morphology page, page 8ID_By Topic page.

            Dump page, page 10ID_By Document page.

            Tracing, page 11ID_Dictionary page.

Also on the Parser Demo screen you will found the following elements

            The Text area, page 11.

            ToolbarID_ToolbarID_Open buttonID_Options buttonID_Statistics buttonID_Zoom Diagram buttonID_Search button, page 12ID_Help button.

            OptionsID_Options button, page 16.

2.1     Trees page

The Trees page allows you to investigate the syntactic structure of the sentences selected in the Document page.

Depending on the settings of the  buttons, it can show the tree in one of the following formats:

            Constituency formatID_D_Files list.

            Dependency formatID_D_Topic tree.

            GraphicalID_Document page formatID_D_Results sheet.

You can choose any variant of the syntactic tree in the list presented in the right part of the screen. The limit on the number of the variants presented on the screen is set under the  button. If this limit was exhausted, the rest of the variants is ignored.

2.1.1     Constituency format

In this mode, the syntactic trees are presented as constituency structures.

This format of the syntactic tree is set with the  button.

2.1.2     Dependency format

In this mode, the syntactic trees are presented as constituency structures.

This format of the syntactic tree is set with the  button.

2.1.3     Graphical format

In this mode, the syntactic trees are presented in graphical form. The tree is in the form of constituency structure, but the dependency links (heads) are shown in red color.

This format of the syntactic tree is set with the  button.

2.2     Morphology page

The Morphology page allows you to investigate the morphological structure of sentences. For each word of the current sentence, a series of its possible normalized forms and their morphological codes is presented.

2.3     Dump page

The Dump page presents all the variants of the syntactic structure simultaneously in the text form.

2.4     Tracing page

The Tracing page allows you to investigate the process of parsing. It shows which rules were triggered and in what order, and also shows how the pieces of the structure were built.

2.5     Text area

In this area the current text is shown, and the current sentence is selected. You can select a sentence to investigate in this area by clicking on it.

3.     Toolbar

ID_Open buttonID_Options buttonID_Statistics buttonID_Zoom Diagram buttonID_Search buttonID_Help button

The Toolbar provides access to the following settings and tools:

            ID_Open button Open buttonID_Open button.

                ID_Open button Type button

            ID_Options button Options buttonID_Options button.

            ID_Font button Process All buttonID_Font button.

            ID_Zoom Diagram button Zoom Picture buttonID_Zoom Diagram button.

            ID_Zoom Diagram buttonID_Order button View Additional Nodes buttonID_Order button.

            ID_Zoom Diagram buttonID_Statistics button Constituent Structure buttonID_Statistics button.

            ID_Search button Dependency Structure buttonID_Search button.

            ID_Search button Graphical Representation buttonID_Search button.

            ID_Help button Help buttonID_Help button.

3.1     Open button

 

The Open button allows you to open a file containing the sentences to process. The format of this file depends on the Morphology radiobuttons under the  Options button.

3.2     Type button

The Type button allows you to type the sentence to process. The format of the text depends on the Morphology radiobuttons under the  Options button.

3.3     Options button

 

The Options button provides access to the following settings:

            Proportional Representation checkboxID_Languages.

            Show Words Vertically checkboxID_Hide Processing checkbox.

            Morphology radiobuttonsID_Show Words for Topic checkbox.

            Trace Parsed checkboxID_Convert Dictionaries to ANSI checkbox.

            Trace Not Parsed checkboxID_Convert Dictionaries to ANSI checkbox.

            Gap Between Levels setting.

            Gap On One Level setting.

            Check Rules Changes for Each Sentence checkbox.

            Only Number of Variants checkbox.

            Maximal Number of Variants in input setting.

3.4     Process All button

 

The Process All button allows you to process all the sentences in the file automatically, in a batch mode.

3.5     Zoom Picture button

 

The Zoom Picture button allows you to view the picture of the tree or the lists on full screen.

3.6     View Additional Nodes button

 

The View Additional Nodes button allows you to view the nodes of the grammar automatically added in the process of conversion of the grammar into the Chomsky normal form. When it is pressed, the additional nodes are shown.

3.7     Constituent Structure button

 

The Constituent Structure button shows the syntactic tree in the traditional constituent structure form.

3.8     Dependency Structure button

 

The Dependency Structure button shows the syntactic tree in the form of dependency tree.

3.9     Graphical Representation button

 

The Graphical Representation button shows the syntactic tree in the graphical form. The constituent structure is shown. The dependencies (heads) are shown in red color.

3.10     Help button

 

The Help button shows this guide.

4.     Options

4.1     Proportional Representation checkboxID_Languages

The Proportional Representation checkbox affects the way the nodes of the tree are centered in the graphical mode. We recommend to check this checkbox.

4.2     Show Words Vertically checkboxID_Hide Processing checkbox

The Show Words Vertically checkbox affects the way the words are presented in the tree in the graphical mode. We recommend to uncheck this checkbox.

4.3     Morphology radiobuttonsID_Show Words for Topic checkbox

The Morphology radiobuttons determine the expected format of the input text file.

 

·      Morphology in rules

– all the wordforms of the file are expected to be terminal nodes of the grammar.

·      Morphology in input

– the input file has a structured form, with the morphological codes explicitly assigned to each word.

·      Morphological analysis

– the input is a plain text, and the program will analyze it morphologically.

 

We recommend the option Morphological Analysis.

4.4     Trace Parsed checkboxID_Convert Dictionaries to ANSI checkbox

If the Trace Parsed checkbox is checked, the successfully parsed sentences will be traced in the Trace page.

Uncheking this checkbox speeds up the program when viewing the Trace page, and also allows to see in the Trace page only the sentences for which parsing failed.

4.5     Trace Not Parsed checkboxID_Convert Dictionaries to ANSI checkbox

If the Trace Not Parsed checkbox is checked, the sentences for which parsing failed will be traced in the Trace page.

Uncheking this checkbox speeds up the program when viewing the Trace page, and also allows to see in the Trace page only the successfully parsed sentences.

 

4.6     Gap Between Levels setting

The Gap Between Levels setting affects the way the nodes of the tree are laid out in the graphical mode. The recommended value is 40.

4.7     Gap On One Level setting

The Gap Between Levels setting affects the way the nodes of the tree are laid out in the graphical mode. The recommended value is 10.

4.8     Check Rules Changes for Each Sentence checkbox

The Check Rules Changes for Each Sentence checkbox allows you to change the grammar without reloading the program. When this checkbox is set, any changes in the grammar will affect the program immediately. However, this slightly slows down the processing.

We recommend to check this checkbox.

4.9     Only Number of Variants checkbox

The Only Number of Variants checkbox allows you to skip the phase of loading the found variants into the program’s viewer. Instead, when this checkbox is set, the program only detects the number of variants found for each sentence. This greatly speeds up the processing, however, you cannot view the found variants.

We recommend to uncheck this checkbox to view the results of parsing, and to check it to test the coverage of the grammar and the average ambiguity of parsing with the given grammar.

4.10     Maximal Number of Variants in Input setting

The Maximal Number of Variants in Input setting allows you to skip the phase of loading too many of the found variants into the program’s viewer. Instead, the program only loads the number of variants up to the given one for each sentence. This speeds up the processing, however, you cannot view all of the found variants.

5.     Grammar

5.1     The grammar provided with the parser

The parser is provided with the following grammar:

# -------------------------

# General rules and clauses

# -------------------------

 

S

        -> [BEG_S] @:S_SET END_S

        -> [BEG_S] [BEG_S] @:CLAUSE END_S

        -> [BEG_S] [BEG_S] @:ADVP END_S 

        -> @:PP END_S                   

        -> [L_CONJ] @:LIS_NP END_S      

        -> [BEG_S] @:NP(nmb,gnd) END_S  

 

S_SET

        -> [L_CONJ] [SEP_O] [CIR] [SEP_O] [CIR] [SEP_O] @:CLAUSE

        -> @:S_SET LIS_CLAUSE

        -> LIS_CLAUSE @:S_SET

        -> [CIR] [SEP_O] [CIR] [SEP_O] @:CLAUSE [SEP_O] [CIR] [CIR]

 

LIS_CLAUSE

        -> SEP_O @:CLAUSE [LIS_CLAUSE]

        -> [SEP_O] @:L_CONJ LIS_CLAUSE

 

SEP_O

        -> ',' | ':' | ';' | '...' | '(' | '¿' | '¡' | '"' | '-' | ')'

 

END_S

        -> '-' | ')' | '!' | '"' | '?' | '.'

 

BEG_S

        -> '¿' | '-' | '¡'

 

L_CONJ

        -> CONJ

        -> CONJ_SUB

        -> 'que'

        -> '...'

 

CLAUSE

        -> [NP_PERS(nmb,gnd,pers)] [CONJ_SUB] [PPR] [PPR] @:VP(nmb,pers,mean) [ADVP]

        -> [NP_PERS(nmb,gnd,pers)] [SEP_O] [PPR] [PPR] @:VP(nmb,pers,mean) [ADVP]

        -> [NP_PERS(nmb,gnd,pers)] [SEP_O] [PPR] [PPR] @:V(nmb,pers,AUX) [CIR] [ADVP]

        -> ADVP [','] @:CLAUSE

        -> [','] L_CONJ [','] @:CLAUSE

 

CIR

        -> @:ADVP

        -> @:PP [LIS_PP]

        -> [L_CONJ] @:GER [LIS_NP]

 

# ---------------

# Nominal phrases

# ---------------

 

NP_PERS(nmb,gnd,3PRS)

        -> NP(nmb,gnd)

 

NP_PERS(SG,gnd,1PRS)

        -> 'yo'

 

NP_PERS(PL,gnd,1PRS)

        -> 'nosotros'

 

NP_PERS(SG,gnd,2PRS)

        -> 'tu'

 

NP_PERS(SG,gnd,3PRS)

        -> 'él'

 

NP(nmb,gnd)

        -> @:NP(nmb,gnd) ADV

        -> @:NP(nmb,gnd) AP(nmb,gnd)

        -> [DP(nmb,gnd)] @:NOM(nmb,gnd)

        -> @:NP(nmb,gnd) LIS_NP(nmb1,gnd1)

 

LIS_NP(nmb,gnd)

        -> ',' @:NP(nmb,gnd) [LIS_NP(nmb1,gnd1)]

        -> @:CONJ NP(nmb,gnd)

 

NOM(nmb,gnd)

        -> [NUM] @:N(nmb,gnd)

        -> @:NOM(nmb,gnd)   AP(nmb,gnd)

        -> AP(nmb,gnd)    @:NOM(nmb,gnd)

        -> @:NOM(nmb,gnd)   PP

        -> @:AP(nmb,gnd)

        -> INFP

        -> PPR

        -> DATE

        -> NUM

        -> 'quien'

 

# -------------

# Miscellaneous

# -------------

 

DP(nmb,gnd)

        -> DET(nmb,gnd)

        -> ART(nmb,gnd)

 

LIS_PP

        -> ',' @:PP [LIS_PP]

        -> @:CONJ PP

 

PP

        -> @:PR NP(nmb,gnd)

        -> @:PR CLAUSE

        -> @:'que' NP(nmb,gnd)

        -> @:'que' CLAUSE    

 

AP(nmb,gnd)

        -> @:ADJ(nmb,gnd) [AP(nmb,gnd)]

        -> ',' @:ADJ(nmb,gnd) [AP(nmb,gnd)]

        -> @:CONJ ADJ(nmb,gnd)

 

# --------------------

# Personal verb phrase

# --------------------

 

VP(nmb,pers,AUX)

        -> @:V(nmb,pers,AUX)

 

VP(nmb,pers,mean)

        -> @:VP_needs_NP(nmb,pers,mean) [NP(nmb1,gnd1)]

        -> [ADVP] [SEP_O] @:VP(nmb,pers,mean) PP

        -> @:V(nmb,pers,mean) GER

 

VP_needs_NP(nmb,pers,mean)

        -> [ADVP] @:VP_V(nmb,pers,mean)

        -> @:VP_needs_NP(nmb,pers,mean) PP

 

VP_V(nmb,pers,mean)

        -> @:V(nmb,pers,mean)

        -> @:'haber'(nmb,pers) [ADVP] PART(SG,MASC)

        -> @:'ser'(nmb,pers)   [ADVP] PART(nmb,gnd)

        -> @:'ser'(nmb,pers)   [ADVP] AP(nmb,gnd)

        -> @:'estar'(nmb,pers)  [ADVP] PART(nmb,gnd)

 

ADVP

        -> [L_CONJ] @:ADV [NP(nmb,gnd)]

        -> [L_CONJ] @:PP

        -> 'hacer'(nmb,pers) @:NP(nmb,gnd)

        -> @:PR ADV

 

# ------------------------

# Infinitivall verb phrase

# ------------------------

 

INFP

        -> [ADVP] @:VP_INF(nmb,gnd,mean) [ADV]

        -> [ADVP] @:V(INF,aux) [ADV]

 

VP_INF(nmb,gnd,mean)

        -> @:VP_needs_NP_INF(nmb,gnd,mean) [NP(nmb1,gnd1)]

        -> @:V(INF,mean) PP

 

VP_needs_NP_INF(nmb,gnd,mean)

        -> @:VP_V_INF(nmb,gnd,mean)

        -> @:VP_needs_NP_INF(nmb,gnd,mean) PP

 

VP_V_INF(nmb,gnd,mean)

        -> @:V(INF,mean)

        -> @:'haber'(INF) [ADVP] PART(SG,MASC)

        -> @:'ser'(INF)   [ADVP] PART(nmb,gnd)

The @ sign marks the lexical heads, the | sign separates alternatives, the [] signs denote optional elements. The terminal symbols are defines in the file lexesp.mrk where they are mapped to the symbols used by the morphological analyzer.

The symbols in parentheses denote morphological characteristics of the lexical heads of the constituents. They are supposed to be agreed if used more than one time. For example, in the rule

CLAUSE

        -> NP_PERS(nmb,gnd,pers) @:VP(nmb,pers,mean)

the number nmb of the noun phrase and the verb phrase agree, while in the rule

VP(nmb,pers,mean)

        -> @:VP_needs_NP(nmb,pers,mean) NP(nmb1,gnd1)

the number nmb1 and gender gnd1 are not supposed to agree with those of the verb.

The values for the variables like nmb and gnd are specified in the file descvar.txt.

To change the grammar, use the grammar compiler described in the next section.

5.2     The grammar compiler

To change the grammar, make the necessary changes to the file grammar.txt in the directory compile/data, and, if necessary, to the files lexesp.mrk and descvar.txt in the same directory, and run the program process.bat.

If you want to change the grammar without leaving the Parser Demo program, be sure to check the Check Rules Changes for Each Sentence checkbox under the  OptionsID_Options button button.

6.     Development team     

 

This software is (C) Copyright by
the Center for Computing Research of National Polytechnic Institute, Mexico.
It was developed by the Natural Language Laboratory.

 

        

 

The Parser Demo development team:

Design: Dr. Alexander Gelbukh,
Programming: Dr. Grigori Sidorov,
Grammar: Sofía Galicia Haro.