The 2nd dimension.

Friday, January 6, 2017

Derived structure of the passive.

The former object becomes the new subject. That is clear -- the new subject has all the properties one could reasonably associate with a subject. Number agreement with the verb and subject raising from complement sentences, for instance.

What happens to the former object is a more interesting question. Here are 3 theoretical answers:

I. It may not be represented overtly at all, in which case the original subject was an abstract thing called "UNSPEC" (short for unspecified). This is argued by McCawley in The Syntactic Phenomena of English.

II. It may become part of a manner adverb with "by". This is argued by Chomsky in Aspects of the Theory of Syntax. The reasoning here is that we need to explain the "fact" that only verbs that are subcategorized to co-occur with manner adverbs can be passivized. Unfortunately for Chomsky's proposal, this is not so. Cf. "Ohio is bounded on the north by Lake Erie."

III. In Relational Grammar (by Postal and Perlmutter in various articles), the former subject is a chomeur, meaning it no longer bears any grammatical relation in the clause. I like this theory, because of an interesting generalization discovered by McCawley (reported in the same reference as above).

McCawley undertakes a detailed examination of the structure of passive sentences, and finds that the passive by-phrase can turn up in several different places -- in fact, anywhere it will fit into the derived passive structure.

Now, for those theories that associate the internal structure of clauses with the order in which verb-functions apply to their arguments (like Categorial Grammar and HPSG) and further associate this order with grammatical relation (like HPSG and my theory, 2psg), McCawley's fact is predicted.

Since a by-phrase chomeur bears no particular grammatical relation to the verb, the order in which the verb-function applies to this argument is arbitrary, so you should get a number of different derived structures in the passive.

I believe I am the first to have noticed this convergence between McCawley's theory and Relational Grammar.

Wednesday, January 4, 2017

The vertical dimension of grammatical structure.

Substitution versus Replacement

Hans Reichenbach in his classic Elements of Symbolic Logic distinguishes between substitution for a variable and replacement of an expression. Substitution is uniform, in the sense that the same value must be substituted for each instance of a variable in a formula, while replacement is not uniform. When a symbol that occurs at several places in a formula is replaced, each instance can be replaced by something different.

In the classic theory of PSG (by which I mean context free phrase structure grammar), the abstract symbols are replaced, not substituted. In my own adaptation of PSG, however, substitution rather than replacement is used. In the formal systems we're all most familiar with, for instance high-school algebra, substitution rules, as we proceed step by step to prove results in the system. The rule of substitution is connected with our notion of a variable representing some particular value. We can substitute various values for the variable that appears at one or more places in a formula.

It appears, at first glance, that in phrase structure grammar, it is replacement that we need, rather than substitution. For instance, suppose we begin with a phrase structure rule
  S -> NP gives NP NP
to describe the structure of a sentence "John gives Mary the book". If we used substitution to derived the particular NPs in the example, we could get
  John gives John John
  The book gives the book the book
  Mary gives Mary Mary
which is of course not what we want. Instead, in PSG, we use a rule of replacement, so we can replace the symbol NP with three different values.

But wait! Isn't that hasty? In the general formula for an indirect object sentence, there aren't really three NPs, since the arguments of "gives" differ along the vertical dimension. Really, the general form here is:
  S -> NP1 gives NP3 NP2
using 1, 2, 3, in roughly the way they are used in Relational Grammar. (I should mention that the canonical treatment in RG actually treats the NP3 in my example as a 2 which has been advanced from an original 3.)

So I propose that PSG is an algebraic system of the familiar sort with constants (in place of the terminal symbols of ordinary PSG) and variables (in place of the non-terminal symbols of ordinary PSG). It looks at first like a different sort of system with "rewrite rules" only when we neglect the vertical dimension which distinguishes variables bearing different grammatical relations.

Because natural languages appear not to have any constructions like the hypothetical *"John gives John John" mentioned above, I also adopt as a general principle the Stratal Uniqueness Law (hereafter SUL) of Relational Grammar, which, in my version, does not permit multiple instances of the same variable with the same grammatical relation to occur at the same level of analysis (in the same "stratum", that is).

Weak Generative Capacity

The weak generative capacity of a grammatical theory is the set of strings it characterizes as sentences of the language the theory purports to describe. Not everyone thinks this is a matter of any importance, but nonetheless, that is what I am concerned with, here.
A language which is generated by a PSG (i.e., a context free phrase structure grammar) is, by definition, a context free language. Whether natural languages are context free languages is controversial, but I think they are. The variety of generative grammar with a vertical dimension that I will describe below generates only context free languages.

However, if it were not for the SUL (stratal uniqueness law), this would not be so. A grammar with variables, constants, and assignments of strings to variables, can generate languages that are not context free. Below is a simple example. To distinguish assignments of string values to variables from phrase structure rules, I use an arrow "->" for a phrase structure rule (or "production") and an equals sign "=" for a string assignment rule.

This grammar,
  1. S = AA
  2. A = a
  3. A = aA
  4. A = b
  5. A = bA
generates the "copy language", each of whose sentences is some string of a's and b's followed by a copy of that string: {abab, bbabbbab, aaaabaaaab, ...}. The sentences generated are the strings assigned to the variable S.

For instance, from rule 3, using the equality in rule 4, I deduce that A = ab, then using this in rule 1, I can substitute ab for A to get S = abab. It works like a very primitive algebraic system.

The copy language is known not to be context free -- it cannot be generated by any PSG. (This fact is the basis of Shieber's famous demonstration that the Swiss-German dialect he studied has a non-context free construction.)

Thus, if the theory is to be context free, in the sense that only context free languages can be generated, grammars like the above example must be disallowed. The SUL does this, since it prohibits rule 1 of the above example. A rule of the grammar may not have multiple instances of the same variable, to conform to the SUL, so the two instances of variable A in rule 1 are not legal.

Vertical Levels

There are a lot of pieces to this theory, and it's hard to figure out what should come first, but I think it is time now to give an informal description of the various levels, before trying to be precise about formal aspects of the grammar. I begin by giving a name to this variation of phrase structure grammar, "2psg", which is short for 2 dimensional phrase structure grammar. The first dimension is the ordinary left-to-right, or earlier-to-later, ordering of symbols described by concatenation. The second dimension is the vertical dimension given in tree structures, with least embedded parts of a structure coming near the top of a tree and most embedded parts coming down closer to the leaves at the bottom of a tree diagram.

Every grammatical phrase type -- S, NP, Aux, Adv, PP, ... -- has five different flavors, which are associated with varying heights in a tree diagram. The flavors are 0, 1, 2, 3, Cho, where 0 is highest, 3 is lowest, and Cho (for the chomeur of Relational Grammar) has variable height.

There is a lawful relationship between the height of a variable and the string assigned to that variable: in ordinary constructions not involving raising, a variable cannot have as value a string containing variables with a greater height. For instance, an S2 (an infinitive or POSS-ing nominalization) cannot have as value (or "contain") a NP1 (a subject). I will defer formalizing this requirement, but I mention it here because it may be intuitively helpful in interpreting the following informal remarks about variable heights.
So here are notes about what I've worked out about the variables in this theory, for English grammar only, so far.

NP0 -- a vocative.
NP1 -- a subject.
NP2 -- a direct object.
NP3 -- an indirect object.
NP -- a chomeur (unclear, but perhaps the displaced original NP2 in a double object construction.
S0 -- a root sentence, in the sense of Emonds. Does not like to be embedded.
S1 -- an ordinary finite declarative clause. (Cannot contain NP0, Aux0, ...)
S2 -- an infinitive or gerund nominalized clause. (Cannot contain NP1, ...)
S3 -- a derived nominalization. (Cannot contain NP2, NP1, ...)
S -- an S chomeur, "that"-clause complement
Adv0 -- a performative adverb, referring to the speaker's act. (E.g. the adverb in "Frankly, my dear, I don't give a damn.")
Adv1 -- a "sentential" adverb, such as "probably/necessarily/..." that qualifies the truth of a sentence
Adv2 -- a moral adverb (Vendler's term), which expresses a judgment, approval, or assigns responsibility
Adv2 -- a manner adverb, which describes the way something happens
Adv3 -- a degree adverb, such as "completely", "almost"
Adv -- not known
Aux0 -- inverted auxiliary verb
Aux1 -- uninverted finite auxiliary (includes modal auxiliaries)
Aux2 -- nonfinite progressive "be/been"
Aux3 -- nonfinite passive "be/been"
Aux --nonfinite perfect auxiliary "have" (chomeur displaced from past tense)
PPx -- It's not clear to me how to describe time and place PPs, so I'll skip these, except for the PP passive by-phrase chomeur, which works out especially nicely in this theory.

Derived Phrase Structure Rules

As PSG was classically presented, a demonstration that a sentence (or some other category) is generated consists in giving a sequence of strings the first of which is S (or some other initial symbol), and in which each other string in the sequence is derived from the preceding by using a phrase structure rule to replace some non-terminal on the left of the rule with the string on the right part of the rule. This is not very convenient for stating the part the vertical dimension plays in 2psg (my name for 2 dimensional PSG).
Instead, suppose we allow phrase structure rules derived those already in the grammar to shorten phrase structure derivations. Here is the rule for deriving a new phrase structure rule: from a rule with a non-terminal symbol A in the string on its right side and another rule A -> x, replace A in the first rule with x to get the new rule (where x is a string of terminal and non-terminal symbols).
Here is a simple little example to illustrate. Given the PSG
  S -> NP VP  
  NP -> John, Mary
  VP -> V NP  
  V -> loves  
we can derive new phrase structure rules:
  VP -> loves NP  
  VP -> loves Mary
  S -> NP loves Mary
  S -> John loves Mary  
  ... and so on. 
 
In a more interesting example, an infinity of new phrase structure rules will be derived. If we want to know whether a sentence is generated by this grammar, we need only look among the phrase structure rules to see whether the sentence is on the right side of a phrase structure rule with S on its left side.

So, we no longer need the mechanism of a phrase structure derivation to describe how a PSG generates a sentence, given that we can derive new phrase structure rules. With this preliminary out of the way, I can state the condition which ensures in 2psg that constituents which are lower along the vertical direction are more deeply embedded in a constituent tree.

Derived Assignments
 
Here is the preceding example PSG given to illustrate derived phrase structure rules, but made into a 2psg by using assignments of string values to variables, deriving new assignments by substituting values previously assigned to variables for those variables, and by making some other minor changes:

Basis (lexicon):
  1. S0 = S1  
  2. S1 = NP1 loves NP2  
  3. NP1 = John  
  4. NP2 = Mary  
  5. NP1 = Mary  
  6. NP2 = John 
 
some other assignments derived by substituting values of variables:
  7. S1 = NP1 loves Mary (using the value assigned in 4 to substitute in 2)  
  8. S1 = John loves Mary (using the value assigned in 3 to substitute in 7)  
  9. S0 = John loves Mary (using the value assigned in 8 to substitute in 1) 
 
In principle, the constants in this example are phonemes, but since phonology is not part of this discussion, for the time-being, I have not written out the phonemic forms of the English words in the example, because it doesn't matter. Please take the words and phrases in conventional orthography in this and other examples as standing for the appropriate strings of phonemes.

Earlier, I gave as my goal, the formulation of a revision to PSG which describes the vertical dimension of natural language. So far, I'm on track. Corresponding to the above example of a 2psg, there is a PSG which generates the same language, which illustrates that 2psg is also a context free theory. I can find a PSG which describes the same set of sentences (well, there are only 4) by replacing the "=" sign in the string assignments with arrows, refer to the variables as non-terminal symbols, refer to the constants (the phonemes) as terminal symbols, and treat the derivation in 7.-9. as giving derived phrase structure rules, of the sort that were given earlier.

Other requirements of PSG carry over here, making terminological adjustments. The numbers of basic assignments, of variables, and of constants, are all finite.

I have several revisions to make, yet, before arriving at a characterization of 2psg, but I shall try to preserve this property of the theory. It is essentially a variety of PSG, that is, context free phrase structure grammar.

Categorial Functions

In Categorial Grammar (CG), popular among logicians interested in natural language, the structures of language expressions are given as pronunciations (or spellings) together with their categories, and a tree is built by combining the pronunciations of daughter nodes somehow, perhaps by concatenation, to get the pronunciation of the mother node, and applying the category of one daughter, considered as a function, to the category of the other daughter, considered to be an argument of that function.
I am now just a heartbeat away from a version of this CG theory, which I need to formulate the notion of a constituent structure tree which has a vertical dimension corresponding to the height of variables.

Using the example of the previous section, I begin by writing assignment statements as small trees, annotated as labeled bracketings with the variable to which a value is assigned as the mother node and written immediately after the left bracket. I call these "forms".

Basis (lexicon):
  1. [S0 S1]  
  2. [S1 NP1 loves NP2]  
  3. [NP1 John]  
  4. [NP2 Mary]  
  5. [NP1 Mary]  
  6. [NP2 John] 
 
And now I want to think of the derivation of new assignnments, now called forms, as done by applying a function (the form in which the substitution is made) to an argument form which says what string will be substituted for what variable. Using the usual "function(argument)" notation for the application of a function to an argument, the derivation of the earlier example now looks like this:
  7. [S1 NP1 loves Mary] = [S1 NP1 loves NP2]([NP2 Mary])  
  8. [S1 John loves Mary] = [S1 NP1 loves Mary]([NP1 John])  
  9. [S0 John loves Mary] = [S0 S1]([S1 John loves Mary]) 
 
Putting the derivation in 7.-9. into tree form gives my approximation to a constituent structure tree of the usual sort:

   [S0 John loves Mary]  
      /            \  
9. [S0 S1]       [S1 John loves Mary]  
                    /              \  
8.         [S1 NP1 loves Mary]  [NP1 John]  
                  /         \   
7.      [S1 NP1 loves NP2]  [NP2 Mary] 
 
In CG, the category of a form is written separately from the pronunciation part, and while there is no apparent need to do that here, for the sake of comparing theories, I make the definitions:

The pronunciation part of a form (that is, the constants, which are phonemes) is a constituent, and the remainder of the form (the variable part) is the category.

For the case where the constant part of a form is a continuous string of phonemes (in general, it need not be), I use the notation [var ... __ ...] for a category, where the underline stands for the constituent. For instance, in the above example, for the form [S1 NP1 loves NP2], the category is [S1 NP1 __ NP2], abstracting away the constituent "loves".

Cyclic Conditions on Substitution
  1. A variable of a form is not subject to substitution when the form has some other variable of lesser height (or, that is, with greater obliqueness). This condition is needed to make the connection between the height of a variable (0, 1, 2, 3) and the height of a constituent in the derivation tree. The condition corresponds to the requirement in transformational grammar that processing starts at the bottom of a constituent structure tree.
  2. A form to which any rules such as substitution are applicable may not be an argument of a substitution function. This condition corresponds to the requirement in transformational grammar that cyclic transformations start applying at the bottom of the constituent structure tree. It is required in 2psg to prevent violations of the SUL (stratal uniqueness law) from arising through the substitution of string values for variables.
Coordination

Most ordinary two-part coordinations can be described by this rule:
  • Constituents can be derived by putting "and" between two constituents of the same category, and the category of the new coordinate constituents is the same as each of the two original constituents.
In PSG, this is usually taken to characterize a schema of phrase structure rules, such as V -> V and V, for example. However, it is not possible to carry over such an account into 2psg, since, for one thing, a form [V V and V] would break the SUL, since there is more than one instance of the same variable in the form, and for another thing, there is no grammatical type V (nor is there a VP. V-bar, N, or N-bar), and for yet another thing, that phrase structure rule is wrong, anyway.
The phrase structure rule is wrong, because we cannot coordinate verbs with different valences:
  1. John wiped the window.
  2. John disappeared into the mist.
  3. *John wiped and disappeared the window.
Accordingly, I take the above rule for coordination as a basic rule of 2psg (not a form), so that we can describe the coordination of verbs of the same valence, for instance, making use of the previously given definitions of "constituent" and "category" as respectively the constant and variable parts of a form.
Here is an example:
  1. Given forms [S1 NP1 wiped NP2] and [S1 NP1 broke NP2], since these have the same category [S1 NP1 __ NP2], we can
  2. form a constituent "wiped and broke" of this same category, [S1 NP1 __ NP2],
  3. which is the form [S1 NP1 wiped and broke NP2],
  4. and substituting in this using [NP2 the window] then [NP1 John], gives
  5. "John wiped and broke the window" of category [S1 __].
Eversion

When a finite clause, an S1, becomes oblique, S2 or S3, what happens to arguments it contains of lesser obliqueness? In Raising to subject constructions, we would have a form containing an argument of less obliqueness than the form itself. For instance:
  [S2 NP1 to explode]
where the cyclic conditions on substitution I gave above prohibit substituting for the variable NP1, since the form has a more oblique variable, S2. When this form is made an argument of
  [S1 seemed S2]
the movement of the NP1 up into the higher S1, giving
  [S1 NP1 seemed [S2 to explode]]
can be thought of as a solution to this difficulty, since now the original subject of "explode" has found a home in the higher clause where it can be legally substituted for -- it resides in an S1, a finite clause, which is no more oblique than NP1, a subject.
I refer to such a change when part of a form must move outside it as "eversion" -- the form is turned partially inside out.

Topicalization

Several questions about why topicalization works the way it does can be answered in 2psg. I begin with:
  Why topics are raised
A topic gives what a sentence is about, and this concerns the performance of a speech act, so we expect topics to have a grammatical relation 0, like vocatives and performative adverbs.
 [S1 NP1 ate NP2 on Sundays] becomes by topicalization of the object of "ate":  
 [S1 NP1 ate NP0 on Sundays] and by applying to the argument [NP1 we]:  
 [S1 we ate NP0 on Sundays]  
however this does not give a pronounceable form, because the variable NP0 is less oblique than the variable S1. The form can be an argument of the function [S0 S1], however,
  [S0 we ate NP0 on Sundays] = [S0 S1]([S1 we ate NP0 on Sundays]) by substitution  
and now, since NP0 is no less oblique than S0, it can be substituted for by, say, [NP0 beans]. This gives a constituent structure:

  [S0 we ate beans on Sundays]  
            |                \
[S0 we ate NP0 on Sundays]  [NP0 beans]  
    /           \
[S0 S1]   [S1 we ate NP0 on Sundays]  
                 |                  \
     [S1 NP1 ate NP0 on Sundays]     [NP1 we]  
                 |  by topicalization  
     [S1 NP1 ate NP2 on Sundays]
 
In this example, "on Sundays" is really an argument, but I suppressed some detail to simplify the example. Also, before substituting for NP0, the constituent "we ate ... on Sundays" is a discontinuous constituent, but I am not sure that is actually possible -- it may be that the NP0 has to be moved to the end or to the beginning, to make the remainder a continuous constituent:
  We ate on Sundays, beans.   
  Beans, we ate on Sundays. 
 
At any rate, the natural place for performative constituents in English is at the beginning of a clause, so at least the latter is an option.
So, topics are dependencies that can only be satisfied in root sentences, S0, and when an embedded argument becomes a topic, the topic has to be "everted" -- moved out of its embedded position in the sentence structure.

The lexical category of particles.


 "Particles" have no part of speech.


Earlier descriptions of subcategorization

In that first generation of great young desriptivists from MIT, Robert Lees gave arbitrary and artificial category symbols to express restrictions that tree neighbors place on heads. I'm not sure I've got the following example exactly right, and I suspect Lees wrote it tongue-in-cheek:
  • Vt32 -> "give" (Robert Lees, Grammar of English nominalization)
Realizing that CFG places no restriction on the names of non-terminal symbols (after all he invented CFG), Chomsky proposed a more natural notation to say within the names of the categories of heads what near neighbors could be present:
  • V -> CS (Chomsky, Aspects of the theory of syntax)
means that the non-terminal "V" is replaced with a name made from a set of feature specifications saying what tree sisters could be present for the specific verb that replaced this non-terminal symbol.

However, this proposal of Chomsky's has an odd property that makes it seem to me to be artificial. A subcategorization restriction has to be said twice. You wind up with trees having, for instance, a transitive verb dominated by a symbol whose name says it's transitive and, also, a following NP object. We shouldn't have to do it twice.

A better way to simply not use special categories like "V", "N", "P" for heads. There is no syntactic language evidence I know of for their existence (though there may be morphological evidence). Their only purpose is to let us keep rules concerning words separate from rules that describe phrase structure. I guess the reason for this is that dictionaries are customarily different books from grammars. But this is not evidence from language.

Categorial Grammar (CG) does not require category symbols for heads comparable to "V", "N", "P" and so on. This lets us avoid the issue of how to handle the subcategorization of "V", ...

Illustration of Categorial Grammar

Logicians concerned with the logic of natural language have a fondness for CG, invented by the logician Ajdukiewicz, because the grammatical structure of expressions corresponds in a straightforward way to their semantic structure. Grammatical categories are structures built up with the slash connective, which gives the category of a syntactic function after the slash and the category of the function value before the slash:

"I tidied up the room", S  
    /            \  
"I", NP    "tidied up the room", S/NP  
                    /               \  
         "tidied up", (S/NP)/NP    "the room", NP 
 
Each node of the CG tree for a syntactic structure consists of a constituent and its category. In place of a "V" for the category of "tidied up", we get the category "(S/NP)/NP", which can be read as meaning that this constituent has two NP dependencies which, when satisfied, will yield a constituent of category S. There is no "V" or "VP". In defense of the CG version of grammatical structure, notice that it correctly predicts that "and" cannot be used to conjoin a transitive verb with category (S/NP)/NP with an intransitive verb, which would have a different category: S/NP, because only expressions of the same category can be conjoined.

This illustration I have given does not get right the linear order of a grammatical function expression and its argument expression. More careful formulations of CG can deal with that detail. Emmon Bach proposed a special version of the slash operation, "right wrap", that we could appeal to get the NP "the room" to the left of the the "up" when "tidy up" combines with "the room".

I find the slash operators that are required to make CG work to be artificial, but there is a way to adapt CFG to get the advantages of CG. Noticing some similarity between the information provided at each node of a CG tree and that provided in a rewrite rule of PSG, let us redo the CG tree above with PS rules at each node:

Categorial Grammar partially converted to PSG

S -> "I tidied up the room"  
    /            \  
NP -> "I"    S -> NP "tidied up the room"  
                    /                \  
         S -> NP "tidied up" NP    NP -> "the room"
 
In this form, what would be a derivation in a CFG has been expressed in tree form, with each non-leaf node produced by applying one daughter rewrite rule to the other daughter rewrite rule. Since what were the constituent and category at each node in the CG tree are no longer in two separate parts, if we still wish to refer to constituent and category, we must define those to be the pronunciation part and the non-pronunciation part of a PS rule, respectively.

This revision is unlike a real CG derivation tree in two respects. (1) There is no way to pick out which non-terminal in a PS rule represents the argument. (2) The rewrite operation of CFG is not technically a function, since applying it can give ambiguous results.
As for (1), in an earlier post, How can we describe the vertical dimension ..., I argued that the non-terminals of CFG do have a place along a height scale, and when information about grammatical relations is added, we can pick out the argument of a PS rule as being a lowest or most oblique non-terminal. In the above illustration, we now have

S1 -> NP1 "tidied up" NP2
 
where the NP2 is the lowest point and consequently the non-terminal representing the argument. As for (2), in that post I also described how the rewrite operation of CFG can be reinterpreted as a substitution function. So now we have a full reconstruction of CG.

Representation of discontinuous constituents

As a side benefit of this reworking of CG, it becomes possible to describe discontinuous constituents without making any special assumptions:

S1-> NP1 "tidied" NP2 "up"
 
Note that "up" is given no category here. There is no longer anything corresponding to the "P" of CFG, nor does "up" have any category in the above suggested sense of "category" as meaning the non-pronunciation part of a CFG rule.

Monday, January 2, 2017

What 2psg is.


2psg is a new linguistic theory of mine. It is like context free phrase structure grammar (psg) except it has a 2nd, vertical, dimension.  Hence the "2" of the name "2psg".

The vertical dimension of language grammar corresponds closely to the term grammatical relations of Postal and Perlmutter's theory of Relational Gremmar.  Specifically, the 1, 2, 3 grammatical relations are heights along this vertical dimension.  They are characterized as degrees of obliqueness in  the grammatical theory HPSG, but here they will play a more fundamental role.

Linguists will notice that I have taken a number of ideas from familiar grammatical theories.  The easiest place for me to begin is Categorial Grammar (cg), but I will have to delay discussion of the fundamental idea of a vertical dimension in grammar for a bit.  Please be patient.

2psg is a form of generative grammar which is not modular.  Although you can make definitions to distinguish between grammar, lexicon, and phonology if you like, for convenience of reference, those distinctions are not fundamental to the theory.  There is also no sequential application of grammatical rules in the usual sense.

I begin in the next post with a proposal to adapt Categorial Grammar to context free phrase structure grammar.