LSA.111 HPSG: Optional Lab
Lab Instructions
Preliminary notes
This material is usually taught in a hands-on lab setting with the
instructor present. Since I'm on the other side of the country, I
will be present "virtually" through the bulletin
board. I strongly encourage you to post questions there
frequently, and read the answers to all the questions posted. Grammar
engineering can be really cool, but to get to the cool part, it's
better not to spend hours chasing down a syntax error. My rule of
thumb is if you've pondered something for 10 minutes, then it's time
to post.
Note that the implemented system is different from the textbook
grammar in many ways. The underlying approach is the same, but do
not be surprised in differences in feature geometry etc. When in
doubt, ask! (Some of these differences come from the exigencies
of implementation, some from the history of the resource we're working
with, and some from simplifications made for this lab. An example of
the latter is the fact that we're not positing any lexical rules in
the lab, but rather just creating apparently unrelated entries in many cases.)
In addition, the LKB/Grammar Engineering FAQ might also
prove useful. I recommend reading the guide to TDL syntax. (TDL stands
for "type description language", and it's what LKB grammars are written in.)
This lab will step you through getting a starter-grammar from the
Matrix web site, and then adding case and/or agreement. If your
language has neither (overt) case nor agreement, please contact me
(ideally on the bulletin
board, or by email: ebender at u dot washington dot edu) and we
can work out something else to add.
Preparation
The relevant software has been loaded onto the PCs in the computer
classroom at MIT. The course
page has instructors on how to install it on your on computer
(Windows, Mac OS X, Linux).
The lab preparation instructions detail
the information you need to collect about the language you choose to
work with in order to complete the lab.
If you don't already know how to use emacs, I strongly encourage
you to spend an hour working with the emacs tutorial. Run emacs,
then select the tutorial from the help menu.
Download your starter package
As discussed in class, the Matrix contains a language-independent
core as well as a collection of modules which allow you to customize
it for certain properties of your language. Visit the Matrix
configuration page to create and download a customized version
of the Matrix.
Start up the LKB, and start parsing
- Start the LKB:
- On a Mac or a Windows machine, this should involve double-clicking
on the LKB icon.
- On a linux machine, start emacs, then type M-x lkb.
(M-x is "meta x", which stands for either esc-x or alt-x).
- Load your grammar:
- From the LKB top menu (the LKB window on Windows/Linux, the
menu called LKB at the top of the screen on a Mac), select "Load > Complete Grammar".
- Navigate to your matrix directory, then to the lkb directory,
and choose the file script.
- Try parsing a sentence:
- From the LKB top menu, select "Parse > Sentence".
- If you filled out the "test sentences" part of the Matrix customization page, one of them should appear by default in the dialogue.
- The menu associated with that field ("prev" on Windows/Linux, just
arrows on a Mac) should show the other one as well.
- Click on "parse".
- Admire the parse tree that appears.
- Explore the menu options on the parse tree/individual nodes
of the parse tree. (On Windows/Linux, one menu option gets you
a window with a larger tree where the individual nodes are clickable.
On a Mac, the initial window has clickable nodes.)
- Try generating from the semantic representation of the sentence:
- Windows/Linux: Select "generate" from the pop-up window on the
small tree.
- Mac: Select "generate from edge" from the pop-up menu on the
S node at the top of the tree.
Add some vocabulary
- Using emacs, open the file lexicon.tdl in your matrix
directory.
- Copying the form of the entries that are there, add some lexical
entries, such as other case forms of the nouns, other agreement forms
of the verbs, etc. NB: sleep and sleeps should
have the same predicate name.
- Add one or two first, and the reload the grammar (LKB Top > Load > Reload Grammar).
- Look at the LKB window and observe any error messages that are printed
out there. Be sure to scroll up to catch any that may have scrolled off.
- Debug as necessary.
- If you used a lot of different words in your test suite file,
hold off on adding them in bulk until you've got case and agreement
in your grammar below (or you'll find yourself redoing a bunch of
work).
Parse again
- Reparse your original sentence, and generate from its semantic
representation.
- If you've added other case or agreement forms for the words
in that sentence, your grammar should now literally overgenerate.
We'll see how to constrain it below.
Try the batch parse facility
Add case to your grammar
In what follows, when you add or modify a type, the default
location is in your language-specific types file, which I will
call esperanto.tdl. If you need to modify another file,
the directions will say so explicitly.
If your language marks case via distinct forms of nouns
- Define a feature case which is
appropriate for the type noun, as follows:
noun :+ [CASE case].
(The notation :+ specifies that you're adding
information to a type already defined, in this case, in
matrix.tdl.)
- Now define the type case, and appropriate
subtypes for it:
case := avm.
nom := case.
...
(The notation := specifies that the type
named on the left is a subtype of the type named on the right.)
- Next create two subtypes of noun-lex, each with
distinct values of SYNSEM.LOCAL.CAT.HEAD.CASE. If your
grammar already has subtypes of noun-lex to account for
different patterns of determiner obligatoriness/optionality, cross-classify
these types with the case types:
obl-spr-nom-noun-lex := obl-spr-noun-lex & nom-noun-lex.
- Edit your noun entries in lexicon.tdl to instantiate
the subtypes of noun-lex that you created.
- Reload the grammar and fix any syntax errors reported in the
LKB window.
If your language marks case via different determiners
- Define the feature CASE for the type noun
as above.
- Create two subtypes of determiner-lex which each bear a
different value for
SYNSEM.LOCAL.CAT.VAL.SPEC.FIRST.LOCAL.CAT.HEAD.CASE. (We're
thus treating the determiners not as bearing case themselves, but as
specifying that they only combine with nouns of the appropriate case.
Note that the SPEC feature allows specifiers and heads to mutually
select each other. To see how this works, examine
basic-head-spec-phrase in matrix.tdl.)
- In lexicon.tdl, modify your lexical entries for
determiners so that they instantiate your new subtypes, rather
than determiner-lex. (A determiner underspecified for
case could still instantiate determiner-lex.)
- Reload the grammar and fix any syntax errors reported in the
LKB window.
- If case is marked only on determiners (and not on nouns, but
possibly on adjectives), nothing further needs to be done for
the nouns.
- If case is also marked on nouns, follow the rest of the
directions above.
If your language marks case via different adpositions
- Add a feature PFORM appropriate for the type
adp as follows:
adp :+ [PFORM pform].
(The notation :+ specifies that you're adding
information to a type already defined, in this case, in
matrix.tdl.)
- Now define the type pform, and appropriate
subtypes for it:
pform := avm.
subj := pform.
...
(The notation := specifies that the type
named on the left is a subtype of the type named on the right.)
- Create two subtypes of case-marker-p-lex, which
each bear a different value for SYNSEM.LOCAL.CAT.HEAD.PFORM.
- Modify lexicon.tdl so that your case marker lexical
entries instantiate the new subtypes you created.
- Reload the grammar and fix any syntax errors reported in the
LKB window.
All languages with case
Test your grammar
- Try parsing a sentence with incorrect case on the nouns.
Does it still parse?
- Try parsing a sentence with correct case on the nouns does
it still parse? Is the parse tree as you would expect?
- Try generating from the semantic representation of this
sentence. Are you still overgenerating (with respect to case)?
- Debug as necessary.
Add agreement to your grammar
Case agreement was described in the previous section. In this
section, we'll focus on person/number/gender agreement. (If your
language has still other kinds of agreement, contact me.) Unlike
in the textbook, we'll be treating agreement as selectional restrictions
(i.e., we won't be using the feature AGR on verbs). Furthermore,
we'll be `housing' the agreement features on the value of INDEX,
inside CONT (the equivalent of SEM).
- Define features PER, NUM, GEND appropriate for
the type png, as needed. (If your language doesn't have
gender agreement between verbs and subjects/objects or nouns
and determiners, you don't need the feature GEND.
png :+ [PER person,
NUM number,
GEND gender].
- Define the types person, number, and
gender (again, as necessary), and appropriate subtypes
for each.
gender := avm.
masc := gender.
- Define subtypes of noun-lex for nouns with different
agreement properties.
3sg-noun-lex := noun-lex &
[ SYNSEM.LOCAL.CONT.HOOK.INDEX.PNG [ PER third,
NUM sg ]].
- If these agreement properties cross-classify with specifier
optionality or case, create appropriate cross-classified subtypes:
obl-spr-3sg-nom-noun-lex := 3sg-noun-lex & obl-spr-noun-lex & nom-noun-lex.
- Note that for number, if only the determiner actually shows
the distinction, you don't need to create multiple noun entries.
Since gender and person are inherent properties of the noun,
you'll want types for these, even if there is no overt marker
for them on the noun itself.
- Modify your entries in lexicon.tdl to instantiate
the agreement-specifying subtypes.
- Reload your grammar and correct and syntax errors that are
noted in the LKB window.
Determiner-noun agreement
Create subtypes of determiner-lex which constrain
the appropriate person/number/gender values inside their SPEC
feature. For example, French la, the feminine singular
determiner, says:
[ SYNSEM.LOCAL.CAT.VAL.SPEC < [ LOCAL.CONT.HOOK.INDEX.PNG [ NUM sg,
GEND fem ]] > ].
If you already created determiner subtypes for case, cross-classify
these with the png subtypes:
fem-sg-nom-det-lex := fem-sg-det-lex & nom-det-lex.
Modify your entries for determiners in lexicon.tdl
to inherit from your new types.
Reload your grammar and correct and syntax errors that are
noted in the LKB window.
Verb-subject or verb-object agreement
Verbs agreeing with their subjects and/or objects constrain
the PNG values of the items on their SUBJ and
COMPS lists.
Create subtypes of verb-lex or transitive-verb-lex
and intransitive-verb-lex which state the appropriate
constraints. For example, an English grammar might have:
3sg-verb-lex := verb-lex &
[ SYNSEM.LOCAL.CAT.VAL.SUBJ < [ LOCAL.CONT.HOOK.INDEX.PNG [ PER third,
NUM sg ]] > ].
And this would be cross-classified with the transitive/intransitive
distinction:
3sg-trans-verb-lex := 3sg-verb-lex & transitive-verb-lex.
(Note: The cross-classification gets a little awkard if you have
both subject and object agreement, suggesting that lexical rules
really are the way to go with this kind of phenomenon.)
Modify your verb entries in lexicon.tdl to instantiate
your new agreement types. (A verb that is underspecified for
agreement, like English slept can still instantiate
a supertype, such as intransitive-verb-lex.)
Reload your grammar and correct and syntax errors that are
noted in the LKB window.
Test your grammar
- Try parsing a sentence with an agreement mismatch.
Does it still parse?
- Try parsing a sentence with correct agreement.
Does it still parse? Is the parse tree as you would expect?
- Try generating from the semantic representation of this
sentence. Are you still overgenerating (with respect to case)?
- Debug as necessary.
Build out your lexicon
You are now in a position to build out your lexicon to
full coverage of your test suite.
- Add entries, periodically reloading the grammar to
check for syntax errors.
- Consider whether you need to add any further types.
- Add types and debug as necessary.
Test your grammar
- Using the batch test facility, test your grammar against
your whole test suite.
- Examine the output file. Do the grammatical sentences all
parse? Do any ungrammatical strings parse?
- In any cases of overgeneration or undergeneration, consider
what might be behind them, and post about it on the bulletin
board.
Write up your results
This lab is not graded, but I would be very interested to see your
results. In addition, you may find that writing things up now is
useful for your own future reference. If you are so inclined, please
write up the following information, and submit it to me (ebender at u
dot washington dot edu) along with your grammar and test suite:
- A description of the relevant grammatical properties of
your language (as collected in the lab preparation).
- A description of the constraints and types you added to
your grammar.
- A description of any ways in which your grammar falls
short (even for this small fragment).
- Any other feedback you might have about the matrix or
this lab.
The Matrix grows by being challenged by new languages.
I'll endeavor to send feedback on any lab write-ups I receive.
Back to course page