Bob Carpenter (bc10+@andrew.cmu.edu)
Fri, 1 Sep 1995 11:38:37 -0400 (EDT)
This message was inspired in part by a note from Bob Levine to me about the nature of phrase structure grammar, and in part, the note from David Dowty to the CG mailing list (those OSU people are awfully provocative these days!). [Note: I just started a CG mailing list and those who are interested should send me email about being added.] When I'm wearing my CG hat, I have a nice answer for you -- there is NO evidence for ANY phrase structure. I still tend to think of HPSG as a kind of first approximation to what's going on in CG with respect to valence (where the rules of use, like the subcat principle, are completed with rules of proof, like Lambek's abstraction scheme). The only kind of evidence I trust in theoretical linguistics is the form-meaning relation that gets generated. I don't see that we have good hard evidence for anything else. This says nothing about phrase structure at all. As we know from formal language theory, various kinds of transformations can be applied to grammars in order to get grammars that generate the same set of form/meaning pairs (usually just stated as generating the same language in a formal language book, but this amounts to the same thing). Let's take a recurring for-instance. In GPSG, slash was originally introduced by an empty category and then percolated by the output of the slash-passing metarule: C/C --> empty-cat if C0 --> C1...Cm...Cn then C0/C --> C1...Cm/C...Cn With just a simple grammar like S --> NP VP VP --> TV NP and the obvious lexical entries, we get: sandy likes ----- ----- ----- NP TV NP/NP ------------- VP/NP --------------------- S/NP Then it was 'discovered' that a metarule could be used to eliminate empty categories: if C0 --> C1...Cm...Cn then C0/Cm --> C1...C(m-1) C(m+1)...Cn Applying to the VP-->TV,NP rule yields VP/NP-->TV Then the derivation above goes through simplifies to just: sandy likes ----- ----- NP TV ------- VP/NP -------------- S/NP The thing to note about this change is that it is just an instance of what is known as 'partial evaluation' in the programming languages literature. That is, the combination of VP/NP --> TV NP/NP NP/NP --> empty gives us the information that VP/NP --> TV as a derived rule (just as we think of derived rules in logic). Very often, programs can often be greatly improved in terms of efficiency if they are partially evaluated. But the key fact is that a partially evaluated program is extensionally equivalent to the program it is derived from -- they compute the same function/determine the same relation. Somewhat surprisingly, this same notion was 'rediscovered' in Chapter 9 of the new HPSG book, when traces are again banished. But if we only consider the relations between expressions and categories (and meanings), nothing changes under partial evaluation. Another instance of partial evaluation would be as follows: S --> NP VP NP --> Det N lead to S --> Det N VP An idea like this led to the notion of 'liberation' in GPSG, but with one catch, the domains of LP constraints get changed. So you can't always guarantee tha partial evaluation of just the ID component of an ID/LP grammar will lead to an equivalent result. Thus if the above are read as ID schemes, we should NOT be able to do the partial evaluation as we could in an ordinary phrase structure grammar. Until we have better evidence for phrase structure itself, say in terms of psycholinguistic reality, I don't see that we will be able to argue in favor of a grammar or its partially evaluated form. In AI and psychology, the notion of partial evaluation often goes under the heading of 'chunking'. If you reason enough times using the inferences A->B and B->C then you will create a new primitive inference pattern A->C which will allow you to derive C from A in one step. Realizing that this is possible makes it rather difficult to try to count inference steps. We do this kind of chunking of reasoning steps all the time. Thus it might be the case that a child learns a grammar with an S->NP VP rule and a NP->Det N rule, but by the time they mature, they've compiled it into an S->Det N VP rule scheme. Or perhaps people have their language chunked up differently, leading to equivalent (or roughly equivalent) expression/meaning relationships, but wildly different underlying 'phrase structure.' We also know from computer science that there are often more than one way to write a program that does the same thing. Some of the work that Mark Johnson and Ed Stabler are doing on partial evaluation of GB grammars leads to phrase structure systems that look surprisingly like HPSG. For instance, in a (greatly simplified) tree like: Sandy likes (trace) ----- ---- ------- NP V NP --------- VP ----------- S the fact that there is an NP trace awaiting government (or whatever it's awaiting) can be represented on the S node, with a result being that the S node looks an awful lot like 'S with an NP waiting to be governed'. This looks strikingly familiar to S/NP to me. Johnson argues (or at least did two weeks ago in Barcelona) that HPSG is just doing all kinds of program/grammar transformations by hand that should be left to the compiler writer. That is, he thinks linguists would be happier talking about long-distance relationships over trees rather than worrying about slash-passing and decorating all of their nodes explicitly. I argued that it's six-of-one vs. half-a-dozen-of-the-other, but Johnson wasn't buying it. Of course, you can then try to pull an Ivan Sag and say that there are 15 different constructions in languages that are sensitive to whether they contain a gap, and Johnson will (actually did after my best Ivan impersonation) that this is why there are stopping-off points in SPEC position for traces. The linguistic issues get a bit subtle for me at this point, but from a formal point of view, I'll stand by my six-of-one/half-dozen-of-the-other argument. The basic moral is that a non-local grammar can be transformed into a local one using fairly standard techniques, and grammars with empty categories can be transformed into ones without also by standard techniques (pick up any book on formal languages and look at the Chomsky-Normal-Form transform that shows how to eliminate empty categories from any CFG to produce an equivalent one that also has only binary branching). - Bob Carpenter carp+@cmu.edu
This archive was generated by hypermail 2.0b3 on Fri Dec 18 1998 - 20:33:03 PST