Computational Linguistics in the Netherlands (CLIN) 2007 | ||||||
Home Call for abstracts Abstract submission Important dates Location Programme Registration Proceedings Local Organization Sponsors FAQ Pictures |
||||||
CLIN 2007 |
Abstract Keynote Speaker
In Search of the Optimal Combination of Knowledge-based and Data-based Techniques for Robust ParsingTed Briscoe (Computer Laboratory, University of Cambridge)A decade or so ago, the consensus was that full syntactic parsing was too brittle to be viable, and most research was focussed on how to get by with chunking or other partial techniques in various applications. The Penn Treebank (PTB) ushered in a new era of data-driven approaches to full parsing, and now it is fairly common to see a PTB derived parser integrated into applications. Another strand of research, exemplified by the Parc's XLE and Groningen's Alpino systems, has focussed on combining deep feature-based grammars with statistical parse ranking models derived from treebanks, while Clark and Curran's CCG parser augments the PTB approach with a better syntactic (subcategorisation and unbounded dependency) model, and the RASP system deploys a shallower feature-based grammar with a simpler (mostly) unlexicalized parse ranking model. Unfortunately, cross-system parser evaluation is still a mess, but recent results (notably on the Parc DepBank) do allow some comparison and consequent inferences about which parser and more importantly which approaches are working best. The best parsers still rely heavily on fully supervised training to estimate both structural and lexical parameters of their data-driven components and on very computationally expensive run-time discriminative techniques to rank parses. I'll describe recent successes and failures we've had with the RASP system attempting to use semi-supervised techniques to train run-time efficient ranking models, with both structural and lexical parameters, in the quest to develop more generic and portable parsing technology which relies less heavily on in-domain treebanks. |