Call for participation

About this list Date view Thread view Subject view Author view

Wim Peters (W.Peters@dcs.shef.ac.uk)
Tue, 31 Mar 1998 11:28:45 BST


********************** Call for participation *********************************************** Distributing and Accessing Linguistic Resources *********************************************** May 27th, This workshop is part of First International Conference on Language Resources and Evaluation at the University of Granada, May 26th to 30th 1998 (see http://ceres.ugr.es/~rubio/elra.html for details and how to register). The workshop will discuss ways to increase the efficacy of linguistic resource distribution and programmatic access, and work towards the definition of a new method for these tasks based on distributed processing and object-oriented modelling with deployment on the WWW. Organizers: Yorick Wilks, Wim Peters, Hamish Cunningham, Remi Zajac Provisional Programme --------------------- Panel discussion: Distributing and Accessing Linguistic Resources Khalid Choukri, Eduard Hovy, Judith Klavans, Yorick Wilks, Antonio Zampolli Full papers: Common Formats of MT User Dictionaries and Environments for Exchanging Them as a Part of AAMT Activities S. Kamei, E. Itoh, M. Fujii, T. Hirai, Y. Saitoh, M. Takahashi, T. Hiyama, K. Muraki NEC/Toshiba/Sharp/Fujitsu/Kyushu Matsushita, Japan Distributed Thesaurus Storage and Access in a Cultural Domain Application S. Boutsis, B. Georgantopoulos, S. Piperidis Institute for Language and Speech Processing, Athens Linguistic Research Utilizing the EDR Electronic Dictionary as a Linguistic Resource T. Ogino EDR, Japan Corpus-based Research using the Internet D. Broeder, H. Brugman, A. Russel, P. Wittenburg, R. Piepenbrock Max Planck Institute for Psycholinguistics/CELEX Centre for Lexical Expertise, Nijmegen An Architecture for Distributed NLP Objects R. Zajac New Mexico State University A New Model for Language Resource Access and Distribution W. Peters, H. Cunningham, Y. Wilks, C. McCauley University of Sheffield Posters: TRACTOR: TELRI Research Archive of Computational Tools and Resources R. Krishnamurthy University of Birmingham The CUE Corpus Access Tool O. Mason University of Birmingham Web-Surfing the Lexicon D. Cabrero, M. Vilares, L. Docampo, S. Sotelo Ramon Pineiro Research Centre /Universities of Coruna and Santiago Exploring Distributed MT O. Streiter, A. Schmidt-Wigger, U. Reuther, C. Pease IAI Saarbruecken A Proposal for an On-line Lexical Database P. Cassidy Micra, Inc. Workshop Scope and Aims ----------------------- In general the reuse of of NLP data resources (such as lexicons or corpora) has exceeded that of algorithmic resources (such as lemmatisers or parsers). However, there are still two barriers to data resource reuse: 1) each resource has its own representation syntax and corresponding programmatic access mode (e.g. SQL for CELEX, C or Prolog for Wordnet, SGML for the BNC); 2) resources must generally be installed locally to be usable (and of course precisely how this happens, what operating systems are supported etc. varies from case to case). The consequences of 1) are that although resources share some structure in common (lexicons are organised around words, for example) this commonality is wasted when it comes to using a new resource (the developer has to learn everything afresh each time) and that work which seeks to investigate or exploit commonalities between resources (e.g. to link several lexicons to an ontology) has to first build a layer of access routines on top of each resources. So, for example, if we wish to do task-based evaluation of lexicons by measuring the relative performance of an information extraction system with different instantiations of lexical resource, we might end up writing code to translate several different resources into SQL or SGML. The consequence of 2) is that there is no way to "try before you buy": no way to examine a data resource for its suitability for your needs before licencing it. Correspondingly there is no way for a resource provider to expose limitted access to their products for advertising purposes, or gain revenue through piecemeal supply of sections of a resource. This workshop will discuss ways to overcome these barriers. The proposers will discuss a new method for distributing and accessing language resources involving the development of a common programmatic model of the various resources types, implemented in CORBA IDL and/or Java, along with a distributed server for non-local access. This model is being designed as part of the GATE project (General Architecture for Text Engineering: http://www.dcs.shef.ac.uk/research/groups/nlp/gate/) and goes under the provisional title of an Active CREOLE Server. (CREOLE: Collection of REusable Objects for Language Engineering. Currently CREOLE supports only algorithmic objects, but will be extended to data objects.) A common model of language data resources would be a set of inheritance hierarchies making up a forest or set of graphs. At the top of the hierarchies would be very general abstractions from resources (e.g. lexicons are about words); at the leaves would be data items that were specific to individual resources. Programmatic access would be available at all levels, allowing the developer to select an appropriate level of commonality for each application. Note that although an exciting element of the work could be to provide algorithms to dynamically merge common resources what we're suggesting initially is not to develop anything substantively new, but simply to improve access to existing resources. This is NOT a new standards initiative, but a way to build on previous initiatives. Of course, the production of a common model that fully expressed all the subtleties of all resources would be a large undertaking, but we believe that it can be done incrementally, with useful results at each stage. Early versions will stop decomposing the object structure of resources at a fairly high level, leaving the developer to handle the data structures native to the resources at the leaves of the forest. There should still be a substantial benefit in uniform access to higher level strucures. Program Committee ----------------- Yorick Wilks Hamish Cunningham Wim Peters Remi Zajac Roberta Catizone Paola Velardi Maria Teresa Pazienza Roberto Basili Bran Boguraev Sergei Nirenburg James Pustejowsky Ralph Grishman Christiane Fellbaum


About this list Date view Thread view Subject view Author view

This archive was generated by hypermail 2.0b3 on Fri Dec 18 1998 - 20:38:18 PST