跳转到内容

User talk:SunnyHUANGXiaolin

页面内容不支持其他语言。
维基百科,自由的百科全书

Studies on Those Hybrid Approaches to PP-attachment Disambiguation Before and After the Historical Periods of Statistical Machine Translation

(Part 3.1-3.3) (6pages:page 7 to page 12,and page 18's references) (In myself --- SUNNY HUANG Xiaolin's academic essay: 15pages without calculating cover page, content and references)

 stories Around “Transformation Based Error-driven Learning” 
       (“Rule-based Learning Approaches”) to PP-attachment Disambiguation 

3.1 History of Minimal Attachment and Right Association

3.1.1 History of Minimal Attachment

    According to Frazier (1978), "each lexical item (or other node) is to be attached into 

the phrase marker with the fewest possible number of non-terminal nodes linking it with

the nodes which are already present", which is called minimal attachment. Minimal

Attachment is used for preferring the simplest syntactic structure. For example, in

sentence (10) and sentence (11), it will be said to choose “VP-attachment” instead of

“NP-attachment”.

                                                                                 (It    will   be    said    to 
                                            3.1.2 History of Right Association 
         There is another type of PP-attachment disambiguation principles, which is called 
   “right association”. According to Hirst (1987), “right association” means that choosing 
   the words or phrases to be attached as far to the right in the parse tree as possible”. For 
   example, in sentence (10) and sentence (11), it will be said to choose “NP-attachment” 
         instead of “VP-attachment” in the same example according to “right association”. 
                                                                                     (It   will be    said  to 
                                                                                     choose     the  one   on 
                                                                                     the        left      side 
                                                                                     according           to--- 
                                                                                     “right     association” 
                                                                                     principles. 
    The   scholars  have   discovered   “right   association”  works   correctly   for   at   least   fifty 

percent of attachments” (Hindle, 1993); therefore, “right association” is often used in the

time there are no other solutions or methods for PP-attachment disambiguation can be

used.

3.2 Rule-Based Learning for PP-attachment Disambiguation 
  Brill and Resnik (1994) firstly set up and then they have experienced with 

“transformation based error-driven learning” (“rule-based learning approaches” )to PP

attachment.

    Brill and Resnik (1994) suggest the procedures of “transformation-based error-driven 

learning” should be: “First, un-annotated text is passed through an initial-state annotator”;

“Once text has been passed through the initial state annotator, it is then compared with

truth”, “and transformations are learned that can be applied to the output of the initial

state annotator to make it better resemble to the truth”; “as indicated in a manually

annotated corpus, and transformations are learned that can be applied to the output of the

initial state annotator to make it better resemble the truth”.

   What they actually have done, are using their parser to input a corpus of 4-tuples of 

the form (V N1 P N2): According to right association” principles, their PPs always are

attached to N1 at the beginning. ---According to Brill and Resnik (1994), “the training set

is processed according to the start state annotator, in this case attaching all propositional

phrases low (attached to N1).---Then, their results will be used to compare with the

“truth”, which is the gold standard. ---Following the results of using templates of rules,

the systems will do comparison between existed errors and “truths” in order to form new

sets of rules. ---According to Brill and Resnik (1994), “the best scoring transformation

then becomes the first transformation in the learned list. It is applied to the training

corpus, and the learning continues on the modified corpus.

   “Initial accuracy on the text is 64% when prepositional phrases are always attached to 

the object noun. After applying the transformations, accuracy increases to 80.8%.”

                                                                       (Brill & Resnik, 1994) 
    There are advantages and disadvantages in these       transformation based error-driven 

learning” (“rule-based learning approaches”): the advantages are these approaches can

flexibly be adapted to situations without too many effects from human-beings; there are

no burdens of too many rules to be manually created. The programs themselves can

create rules.

3.3 Pros and Cons in the Historical Approach

Of Transformation Based Error-driven Learning”

(“Rule-based Learning Approaches”) to PP-attachment Disambiguation

    The choice to deal with PP-attachment disambiguation is to incorporate a certain 

degree of statistics into the parsing. Using various sets of example sentences for statistical

approaches, the rules or structures those are more frequently used can be implemented.

Additionally, the statistic approaches may even reflect relationships which have already

been established within linguistics.

   3.3 Pros and Cons in the Historical Approach 

Of Transformation Based Error-driven Learning”

(“Rule-based Learning Approaches”) to PP-attachment Disambiguation

    The choice to deal with PP-attachment disambiguation is to incorporate a certain 

degree of statistics into the parsing. Using various sets of example sentences for statistical

approaches, the rules or structures those are more frequently used can be implemented.

Additionally, the statistic approaches may even reflect relationships which have already

been established within linguistics.

     As  transformation based error-driven learning” (“rule-based learning approaches”) 

is showed above, the parser can be helped from some approaches to recognize and

handing the rules. For example, Hirst(1987) claimed that into, onto and despite are rarely

attached to NPs according to statistical approaches:

          NP   NP                      PP 
                               [Prep { into, onto, despite } ] 
    However, there are disadvantages besides those obvious advantages of statistical 

approaches. Limited appropriated statistics for this approach can be an essential issue. It

is very difficult to claim that the statistical approach essentially substantiate PP-

attachment disambiguation in the sample sentence in case some of the statistics are not

authenticated.

    ------While there is one kind of word based statistical approaches, there is another 

kind of statistical approaches, which is WordNet hypernym tree based statistical

approaches, which would be discussed in this article later. By using a WordNet

hypernym tree, the parser can relate those words with their families or 'concepts' above

them on the tree. Additionally, as there are only limited trees, the statistics are far more

useful and essential in WordNet hypernym tree based statistical approaches than the

statistics in word based statistical approaches

      In summary, some approaches to deal with PP-attachment ambiguities like          right 

association” (“transformation-based error-driven learning”, being considered as one type

of extensive using of “right association” and “minimal attachment” are often used in the

time there are no other solutions or methods for PP-attachment disambiguation can be

used.

    The limitations to using these types of approaches (without semantic approaches 

using WordNet) are obvious: The scholars are not able to create the limited training

corpora in order to deal with those texts and statistics (endlessly) increasing every day.

Therefore, those semantic approaches and preposition’s objects are in need to take into

considerations in order to solve those PP-attachment ambiguities.

References 1, Brill, Eric. & Resnik, Philip. 1994.A Rule-Based Approach To Prepositional Phrase

Attachment Disambiguation. In COLING-1994 Proceedings of the 15th International

Conference on Computational Linguistics. Vol 2. Kyoto, Japan.

2, Frazier, Lyn. 1978. On Comprehending Sentences: Syntactic Parsing Strategies.

PHD thesis, University of Connecticut.

3, Hindle & Rooth, 1993. Structural Ambiguity and Lexical Relations.

Computational Linguistics. 19(1), 103-120.

http://acl.ldc.upenn.edu/J/J93/J93-1005.pdf

4, Hirst, Graeme. 1987. Semantic interpretation and the resolution of ambiguity. http://catdir.loc.gov/catdir/samples/cam034/85018978.pdf

发起与SunnyHUANGXiaolin的讨论

发起讨论