使用者討論:SunnyHUANGXiaolin

Studies on Those Hybrid Approaches to PP-attachment Disambiguation Before and After the Historical Periods of Statistical Machine Translation

(Part 3.1-3.3) (6pages:page 7 to page 12,and page 18's references) (In myself --- SUNNY HUANG Xiaolin's academic essay: 15pages without calculating cover page, content and references)

 stories Around “Transformation Based Error-driven Learning”

       (“Rule-based Learning Approaches”) to PP-attachment Disambiguation

3.1 History of Minimal Attachment and Right Association

3.1.1 History of Minimal Attachment

    According to Frazier (1978), "each lexical item (or other node) is to be attached into

the phrase marker with the fewest possible number of non-terminal nodes linking it with

the nodes which are already present", which is called minimal attachment. Minimal

Attachment is used for preferring the simplest syntactic structure. For example, in

sentence (10) and sentence (11), it will be said to choose 「VP-attachment」 instead of

「NP-attachment」.

                                                                                 (It    will   be    said    to

                                            3.1.2 History of Right Association

         There is another type of PP-attachment disambiguation principles, which is called

   “right association”. According to Hirst (1987), “right association” means that choosing

   the words or phrases to be attached as far to the right in the parse tree as possible”. For

   example, in sentence (10) and sentence (11), it will be said to choose “NP-attachment”

         instead of “VP-attachment” in the same example according to “right association”.

                                                                                     (It   will be    said  to

                                                                                     choose     the  one   on

                                                                                     the        left      side

                                                                                     according           to---

                                                                                     “right     association”

                                                                                     principles.

    The   scholars  have   discovered   “right   association”  works   correctly   for   at   least   fifty

percent of attachments」 (Hindle, 1993); therefore, 「right association」 is often used in the

time there are no other solutions or methods for PP-attachment disambiguation can be

used.

3.2 Rule-Based Learning for PP-attachment Disambiguation

  Brill and Resnik (1994) firstly set up and then they have experienced with

「transformation based error-driven learning」 (「rule-based learning approaches」 )to PP

attachment.

    Brill and Resnik (1994) suggest the procedures of “transformation-based error-driven

learning」 should be: 「First, un-annotated text is passed through an initial-state annotator」;

「Once text has been passed through the initial state annotator, it is then compared with

truth」, 「and transformations are learned that can be applied to the output of the initial

state annotator to make it better resemble to the truth」; 「as indicated in a manually

annotated corpus, and transformations are learned that can be applied to the output of the

initial state annotator to make it better resemble the truth」.

   What they actually have done, are using their parser to input a corpus of 4-tuples of

the form (V N1 P N2): According to right association」 principles, their PPs always are

attached to N1 at the beginning. ---According to Brill and Resnik (1994), 「the training set

is processed according to the start state annotator, in this case attaching all propositional

phrases low (attached to N1).---Then, their results will be used to compare with the

「truth」, which is the gold standard. ---Following the results of using templates of rules,

the systems will do comparison between existed errors and 「truths」 in order to form new

sets of rules. ---According to Brill and Resnik (1994), 「the best scoring transformation

then becomes the first transformation in the learned list. It is applied to the training

corpus, and the learning continues on the modified corpus.

   “Initial accuracy on the text is 64% when prepositional phrases are always attached to

the object noun. After applying the transformations, accuracy increases to 80.8%.」

                                                                       (Brill & Resnik, 1994)

    There are advantages and disadvantages in these       transformation based error-driven

learning」 (「rule-based learning approaches」): the advantages are these approaches can

flexibly be adapted to situations without too many effects from human-beings; there are

no burdens of too many rules to be manually created. The programs themselves can

create rules.

3.3 Pros and Cons in the Historical Approach

Of Transformation Based Error-driven Learning」

(「Rule-based Learning Approaches」) to PP-attachment Disambiguation

    The choice to deal with PP-attachment disambiguation is to incorporate a certain

degree of statistics into the parsing. Using various sets of example sentences for statistical

approaches, the rules or structures those are more frequently used can be implemented.

Additionally, the statistic approaches may even reflect relationships which have already

been established within linguistics.

   3.3 Pros and Cons in the Historical Approach

Of Transformation Based Error-driven Learning」

(「Rule-based Learning Approaches」) to PP-attachment Disambiguation

    The choice to deal with PP-attachment disambiguation is to incorporate a certain

degree of statistics into the parsing. Using various sets of example sentences for statistical

approaches, the rules or structures those are more frequently used can be implemented.

Additionally, the statistic approaches may even reflect relationships which have already

been established within linguistics.

     As  transformation based error-driven learning” (“rule-based learning approaches”)

is showed above, the parser can be helped from some approaches to recognize and

handing the rules. For example, Hirst(1987) claimed that into, onto and despite are rarely

attached to NPs according to statistical approaches:

          NP   NP                      PP 
                               [Prep { into, onto, despite } ]

    However, there are disadvantages besides those obvious advantages of statistical

approaches. Limited appropriated statistics for this approach can be an essential issue. It

is very difficult to claim that the statistical approach essentially substantiate PP-

attachment disambiguation in the sample sentence in case some of the statistics are not

authenticated.

    ------While there is one kind of word based statistical approaches, there is another

kind of statistical approaches, which is WordNet hypernym tree based statistical

approaches, which would be discussed in this article later. By using a WordNet

hypernym tree, the parser can relate those words with their families or 'concepts' above

them on the tree. Additionally, as there are only limited trees, the statistics are far more

useful and essential in WordNet hypernym tree based statistical approaches than the

statistics in word based statistical approaches

      In summary, some approaches to deal with PP-attachment ambiguities like          right

association」 (「transformation-based error-driven learning」, being considered as one type

of extensive using of 「right association」 and 「minimal attachment」 are often used in the

time there are no other solutions or methods for PP-attachment disambiguation can be

used.

    The limitations to using these types of approaches (without semantic approaches

using WordNet) are obvious: The scholars are not able to create the limited training

corpora in order to deal with those texts and statistics (endlessly) increasing every day.

Therefore, those semantic approaches and preposition’s objects are in need to take into

considerations in order to solve those PP-attachment ambiguities.

References 1, Brill, Eric. & Resnik, Philip. 1994.A Rule-Based Approach To Prepositional Phrase

Attachment Disambiguation. In COLING-1994 Proceedings of the 15th International

Conference on Computational Linguistics. Vol 2. Kyoto, Japan.

2, Frazier, Lyn. 1978. On Comprehending Sentences: Syntactic Parsing Strategies.

PHD thesis, University of Connecticut.

3, Hindle & Rooth, 1993. Structural Ambiguity and Lexical Relations.

Computational Linguistics. 19(1), 103-120.

http://acl.ldc.upenn.edu/J/J93/J93-1005.pdf

4, Hirst, Graeme. 1987. Semantic interpretation and the resolution of ambiguity. http://catdir.loc.gov/catdir/samples/cam034/85018978.pdf

與SunnyHUANGXiaolin發起討論

討論頁是用於討論如何將Wikipedia上的內容變得更好的地方。您可以發起新討論來建立聯繫以及與SunnyHUANGXiaolin的協作。您在此處的發言將會公開顯示。

發起討論