Home   Call for abstracts   Abstract submission   Important dates   Location   Programme   Registration   Proceedings   Local Organization   Sponsors   FAQ   Pictures  

CLIN 2007

Friday December 7, 2007
University of Nijmegen

CLIN 2007 is organized by the Language and Speech group of the Radboud University Nijmegen.

Abstract Keynote Speaker

In Search of the Optimal Combination of Knowledge-based and Data-based Techniques for Robust Parsing

Ted Briscoe (Computer Laboratory, University of Cambridge)

A decade or so ago, the consensus was that full syntactic parsing was too brittle to be viable, and most research was focussed on how to get by with chunking or other partial techniques in various applications. The Penn Treebank (PTB) ushered in a new era of data-driven approaches to full parsing, and now it is fairly common to see a PTB derived parser integrated into applications. Another strand of research, exemplified by the Parc's XLE and Groningen's Alpino systems, has focussed on combining deep feature-based grammars with statistical parse ranking models derived from treebanks, while Clark and Curran's CCG parser augments the PTB approach with a better syntactic (subcategorisation and unbounded dependency) model, and the RASP system deploys a shallower feature-based grammar with a simpler (mostly) unlexicalized parse ranking model.
Unfortunately, cross-system parser evaluation is still a mess, but recent results (notably on the Parc DepBank) do allow some comparison and consequent inferences about which parser and more importantly which approaches are working best. The best parsers still rely heavily on fully supervised training to estimate both structural and lexical parameters of their data-driven components and on very computationally expensive run-time discriminative techniques to rank parses. I'll describe recent successes and failures we've had with the RASP system attempting to use semi-supervised techniques to train run-time efficient ranking models, with both structural and lexical parameters, in the quest to develop more generic and portable parsing technology which relies less heavily on in-domain treebanks.