Substitution versus Replacement
Hans Reichenbach in his classic
Elements of Symbolic Logic
distinguishes between substitution for a variable and replacement of an
expression. Substitution is uniform, in the sense that the same value
must be substituted for each instance of a variable in a formula, while
replacement is not uniform. When a symbol that occurs at several places
in a formula is replaced, each instance can be replaced by something
different.
In the classic theory of PSG (by which I mean context free phrase
structure grammar), the abstract symbols are replaced, not substituted.
In my own adaptation of PSG, however, substitution rather than
replacement is used. In the formal systems we're all most familiar with,
for instance high-school algebra, substitution rules, as we proceed
step by step to prove results in the system. The rule of substitution
is connected with our notion of a variable representing some particular
value. We can substitute various values for the variable that appears
at one or more places in a formula.
It appears, at first glance, that in phrase structure grammar, it is
replacement that we need, rather than substitution. For instance,
suppose we begin with a phrase structure rule
S -> NP gives NP NP
to describe the structure of a sentence "John gives Mary the book".
If we used substitution to derived the particular NPs in the example, we
could get
John gives John John
The book gives the book the book
Mary gives Mary Mary
which is of course not what we want. Instead, in PSG, we use a rule
of replacement, so we can replace the symbol NP with three different
values.
But wait! Isn't that hasty? In the general formula for an indirect
object sentence, there aren't really three NPs, since the arguments of
"gives" differ along the vertical dimension. Really, the general form
here is:
S -> NP1 gives NP3 NP2
using 1, 2, 3, in roughly the way they are used in Relational
Grammar. (I should mention that the canonical treatment in RG actually
treats the NP3 in my example as a 2 which has been advanced from an
original 3.)
So I propose that PSG is an algebraic system of the familiar sort
with constants (in place of the terminal symbols of ordinary PSG) and
variables (in place of the non-terminal symbols of ordinary PSG). It
looks at first like a different sort of system with "rewrite rules" only
when we neglect the vertical dimension which distinguishes variables
bearing different grammatical relations.
Because natural languages appear not to have any constructions like
the hypothetical *"John gives John John" mentioned above, I also adopt
as a general principle the Stratal Uniqueness Law (hereafter SUL) of
Relational Grammar, which, in my version, does not permit multiple
instances of the same variable with the same grammatical relation to
occur at the same level of analysis (in the same "stratum", that is).
Weak Generative Capacity
The weak generative capacity of a grammatical theory is the set of
strings it characterizes as sentences of the language the theory
purports to describe. Not everyone thinks this is a matter of any
importance, but nonetheless, that is what I am concerned with, here.
A language which is generated by a PSG (i.e., a context free phrase
structure grammar) is, by definition, a context free language. Whether
natural languages are context free languages is controversial, but I
think they are. The variety of generative grammar with a vertical
dimension that I will describe below generates only context free
languages.
However, if it were not for the SUL (stratal uniqueness law), this
would not be so. A grammar with variables, constants, and assignments
of strings to variables, can generate languages that are not context
free. Below is a simple example. To distinguish assignments of string
values to variables from phrase structure rules, I use an arrow "->"
for a phrase structure rule (or "production") and an equals sign "=" for
a string assignment rule.
This grammar,
1. S = AA
2. A = a
3. A = aA
4. A = b
5. A = bA
generates the "copy language", each of whose sentences is some string
of a's and b's followed by a copy of that string: {abab, bbabbbab,
aaaabaaaab, ...}. The sentences generated are the strings assigned to
the variable S.
For instance, from rule 3, using the equality in rule 4, I deduce
that A = ab, then using this in rule 1, I can substitute ab for A to get
S = abab. It works like a very primitive algebraic system.
The copy language is known not to be context free -- it cannot be
generated by any PSG. (This fact is the basis of Shieber's famous
demonstration that the Swiss-German dialect he studied has a non-context
free construction.)
Thus, if the theory is to be context free, in the sense that only
context free languages can be generated, grammars like the above example
must be disallowed. The SUL does this, since it prohibits rule 1 of
the above example. A rule of the grammar may not have multiple
instances of the same variable, to conform to the SUL, so the two
instances of variable A in rule 1 are not legal.
Vertical Levels
There are a lot of pieces to this theory, and it's hard to figure out
what should come first, but I think it is time now to give an informal
description of the various levels, before trying to be precise about
formal aspects of the grammar. I begin by giving a name to this
variation of phrase structure grammar, "2psg", which is short for 2
dimensional phrase structure grammar. The first dimension is the
ordinary left-to-right, or earlier-to-later, ordering of symbols
described by concatenation. The second dimension is the vertical
dimension given in tree structures, with least embedded parts of a
structure coming near the top of a tree and most embedded parts coming
down closer to the leaves at the bottom of a tree diagram.
Every grammatical phrase type -- S, NP, Aux, Adv, PP, ... -- has five
different flavors, which are associated with varying heights in a tree
diagram. The flavors are 0, 1, 2, 3, Cho, where 0 is highest, 3 is
lowest, and Cho (for the chomeur of Relational Grammar) has variable
height.
There is a lawful relationship between the height of a variable and
the string assigned to that variable: in ordinary constructions not
involving raising, a variable cannot have as value a string containing
variables with a greater height. For instance, an S2 (an infinitive or
POSS-ing nominalization) cannot have as value (or "contain") a NP1 (a
subject). I will defer formalizing this requirement, but I mention it
here because it may be intuitively helpful in interpreting the following
informal remarks about variable heights.
So here are notes about what I've worked out about the variables in this theory, for English grammar only, so far.
NP0 -- a vocative.
NP1 -- a subject.
NP2 -- a direct object.
NP3 -- an indirect object.
NP -- a chomeur (unclear, but perhaps the displaced original NP2 in a double object construction.
S0 -- a root sentence, in the sense of Emonds. Does not like to be embedded.
S1 -- an ordinary finite declarative clause. (Cannot contain NP0, Aux0, ...)
S2 -- an infinitive or gerund nominalized clause. (Cannot contain NP1, ...)
S3 -- a derived nominalization. (Cannot contain NP2, NP1, ...)
S -- an S chomeur, "that"-clause complement
Adv0 -- a performative adverb, referring to the speaker's act. (E.g. the adverb in "Frankly, my dear, I don't give a damn.")
Adv1 -- a "sentential" adverb, such as "probably/necessarily/..." that qualifies the truth of a sentence
Adv2 -- a moral adverb (Vendler's term), which expresses a judgment, approval, or assigns responsibility
Adv2 -- a manner adverb, which describes the way something happens
Adv3 -- a degree adverb, such as "completely", "almost"
Adv -- not known
Aux0 -- inverted auxiliary verb
Aux1 -- uninverted finite auxiliary (includes modal auxiliaries)
Aux2 -- nonfinite progressive "be/been"
Aux3 -- nonfinite passive "be/been"
Aux --nonfinite perfect auxiliary "have" (chomeur displaced from past tense)
PPx -- It's not clear to me how to describe time and
place PPs, so I'll skip these, except for the PP passive by-phrase
chomeur, which works out especially nicely in this theory.
Derived Phrase Structure Rules
As PSG was classically presented, a demonstration that a sentence (or
some other category) is generated consists in giving a sequence of
strings the first of which is S (or some other initial symbol), and in
which each other string in the sequence is derived from the preceding by
using a phrase structure rule to replace some non-terminal on the left
of the rule with the string on the right part of the rule. This is not
very convenient for stating the part the vertical dimension plays in
2psg (my name for 2 dimensional PSG).
Instead, suppose we allow phrase structure rules derived those
already in the grammar to shorten phrase structure derivations. Here is
the rule for deriving a new phrase structure rule: from a rule with a
non-terminal symbol A in the string on its right side and another rule A
-> x, replace A in the first rule with x to get the new rule (where x
is a string of terminal and non-terminal symbols).
Here is a simple little example to illustrate. Given the PSG
S -> NP VP
NP -> John, Mary
VP -> V NP
V -> loves
we can derive new phrase structure rules:
VP -> loves NP
VP -> loves Mary
S -> NP loves Mary
S -> John loves Mary
... and so on.
In a more interesting example, an infinity of new phrase structure
rules will be derived. If we want to know whether a sentence is
generated by this grammar, we need only look among the phrase structure
rules to see whether the sentence is on the right side of a phrase
structure rule with S on its left side.
So, we no longer need the mechanism of a phrase structure derivation
to describe how a PSG generates a sentence, given that we can derive new
phrase structure rules. With this preliminary out of the way, I can
state the condition which ensures in 2psg that constituents which are
lower along the vertical direction are more deeply embedded in a
constituent tree.
Derived Assignments
Here is the preceding example PSG given to illustrate derived phrase
structure rules, but made into a 2psg by using assignments of string
values to variables, deriving new assignments by substituting values
previously assigned to variables for those variables, and by making some
other minor changes:
Basis (lexicon):
1. S0 = S1
2. S1 = NP1 loves NP2
3. NP1 = John
4. NP2 = Mary
5. NP1 = Mary
6. NP2 = John
some other assignments derived by substituting values of variables:
7. S1 = NP1 loves Mary (using the value assigned in 4 to substitute in 2)
8. S1 = John loves Mary (using the value assigned in 3 to substitute in 7)
9. S0 = John loves Mary (using the value assigned in 8 to substitute in 1)
In principle, the constants in this example are phonemes, but since
phonology is not part of this discussion, for the time-being, I have not
written out the phonemic forms of the English words in the example,
because it doesn't matter. Please take the words and phrases in
conventional orthography in this and other examples as standing for the
appropriate strings of phonemes.
Earlier, I gave as my goal, the formulation of a revision to PSG
which describes the vertical dimension of natural language. So far, I'm
on track. Corresponding to the above example of a 2psg, there is a PSG
which generates the same language, which illustrates that 2psg is also a
context free theory. I can find a PSG which describes the same set of
sentences (well, there are only 4) by replacing the "=" sign in the
string assignments with arrows, refer to the variables as non-terminal
symbols, refer to the constants (the phonemes) as terminal symbols, and
treat the derivation in 7.-9. as giving derived phrase structure rules,
of the sort that were given earlier.
Other requirements of PSG carry over here, making terminological
adjustments. The numbers of basic assignments, of variables, and of
constants, are all finite.
I have several revisions to make, yet, before arriving at a
characterization of 2psg, but I shall try to preserve this property of
the theory. It is essentially a variety of PSG, that is, context free
phrase structure grammar.
Categorial Functions
In Categorial Grammar (CG), popular among logicians interested in
natural language, the structures of language expressions are given as
pronunciations (or spellings) together with their categories, and a tree
is built by combining the pronunciations of daughter nodes somehow,
perhaps by concatenation, to get the pronunciation of the mother node,
and applying the category of one daughter, considered as a function, to
the category of the other daughter, considered to be an argument of that
function.
I am now just a heartbeat away from a version of this CG theory,
which I need to formulate the notion of a constituent structure tree
which has a vertical dimension corresponding to the height of variables.
Using the example of the previous section, I begin by writing
assignment statements as small trees, annotated as labeled bracketings
with the variable to which a value is assigned as the mother node and
written immediately after the left bracket. I call these "forms".
Basis (lexicon):
1. [S0 S1]
2. [S1 NP1 loves NP2]
3. [NP1 John]
4. [NP2 Mary]
5. [NP1 Mary]
6. [NP2 John]
And now I want to think of the derivation of new assignnments, now
called forms, as done by applying a function (the form in which the
substitution is made) to an argument form which says what string will be
substituted for what variable. Using the usual "function(argument)"
notation for the application of a function to an argument, the
derivation of the earlier example now looks like this:
7. [S1 NP1 loves Mary] = [S1 NP1 loves NP2]([NP2 Mary])
8. [S1 John loves Mary] = [S1 NP1 loves Mary]([NP1 John])
9. [S0 John loves Mary] = [S0 S1]([S1 John loves Mary])
Putting the derivation in 7.-9. into tree form gives my approximation to a constituent structure tree of the usual sort:
[S0 John loves Mary]
/ \
9. [S0 S1] [S1 John loves Mary]
/ \
8. [S1 NP1 loves Mary] [NP1 John]
/ \
7. [S1 NP1 loves NP2] [NP2 Mary]
In CG, the category of a form is written separately from the
pronunciation part, and while there is no apparent need to do that here,
for the sake of comparing theories, I make the definitions:
The pronunciation part of a form (that is, the constants, which are phonemes) is a
constituent, and the remainder of the form (the variable part) is the
category.
For the case where the constant part of a form is a continuous string
of phonemes (in general, it need not be), I use the notation [var ...
__ ...] for a category, where the underline stands for the constituent.
For instance, in the above example, for the form [S1 NP1 loves NP2],
the category is [S1 NP1 __ NP2], abstracting away the constituent
"loves".
Cyclic Conditions on Substitution
- A variable of a form is not subject to substitution when the form
has some other variable of lesser height (or, that is, with greater
obliqueness). This condition is needed to make the connection between
the height of a variable (0, 1, 2, 3) and the height of a constituent in
the derivation tree. The condition corresponds to the requirement in
transformational grammar that processing starts at the bottom of a
constituent structure tree.
- A form to which any rules such as substitution are applicable may
not be an argument of a substitution function. This condition
corresponds to the requirement in transformational grammar that cyclic
transformations start applying at the bottom of the constituent
structure tree. It is required in 2psg to prevent violations of the SUL
(stratal uniqueness law) from arising through the substitution of
string values for variables.
Coordination
Most ordinary two-part coordinations can be described by this rule:
- Constituents can be derived by putting "and" between two
constituents of the same category, and the category of the new
coordinate constituents is the same as each of the two original
constituents.
In PSG, this is usually taken to characterize a schema of phrase
structure rules, such as V -> V and V, for example. However, it is
not possible to carry over such an account into 2psg, since, for one
thing, a form [V V and V] would break the SUL, since there is more than
one instance of the same variable in the form, and for another thing,
there is no grammatical type V (nor is there a VP. V-bar, N, or N-bar),
and for yet another thing, that phrase structure rule is wrong, anyway.
The phrase structure rule is wrong, because we cannot coordinate verbs with different valences:
- John wiped the window.
- John disappeared into the mist.
- *John wiped and disappeared the window.
Accordingly, I take the above rule for coordination as a basic rule
of 2psg (not a form), so that we can describe the coordination of verbs
of the same valence, for instance, making use of the previously given
definitions of "constituent" and "category" as respectively the constant
and variable parts of a form.
Here is an example:
- Given forms [S1 NP1 wiped NP2] and [S1 NP1 broke NP2], since these have the same category [S1 NP1 __ NP2], we can
- form a constituent "wiped and broke" of this same category, [S1 NP1 __ NP2],
- which is the form [S1 NP1 wiped and broke NP2],
- and substituting in this using [NP2 the window] then [NP1 John], gives
- "John wiped and broke the window" of category [S1 __].
Eversion
When a finite clause, an S1, becomes oblique, S2 or S3, what happens
to arguments it contains of lesser obliqueness? In Raising to subject
constructions, we would have a form containing an argument of less
obliqueness than the form itself. For instance:
[S2 NP1 to explode]
where the cyclic conditions on substitution I gave above prohibit
substituting for the variable NP1, since the form has a more oblique
variable, S2. When this form is made an argument of
[S1 seemed S2]
the movement of the NP1 up into the higher S1, giving
[S1 NP1 seemed [S2 to explode]]
can be thought of as a solution to this difficulty, since now the
original subject of "explode" has found a home in the higher clause
where it can be legally substituted for -- it resides in an S1, a finite
clause, which is no more oblique than NP1, a subject.
I refer to such a change when part of a form must move outside it as "eversion" -- the form is turned partially inside out.
Topicalization
Several questions about why topicalization works the way it does can be answered in 2psg. I begin with:
Why topics are raised
A topic gives what a sentence is about, and this concerns the
performance of a speech act, so we expect topics to have a grammatical
relation 0, like vocatives and performative adverbs.
[S1 NP1 ate NP2 on Sundays] becomes by topicalization of the object of "ate":
[S1 NP1 ate NP0 on Sundays] and by applying to the argument [NP1 we]:
[S1 we ate NP0 on Sundays]
however this does not give a pronounceable form, because the variable
NP0 is less oblique than the variable S1. The form can be an argument
of the function [S0 S1], however,
[S0 we ate NP0 on Sundays] = [S0 S1]([S1 we ate NP0 on Sundays]) by substitution
and now, since NP0 is no less oblique than S0, it can be substituted
for by, say, [NP0 beans]. This gives a constituent structure:
[S0 we ate beans on Sundays]
| \
[S0 we ate NP0 on Sundays] [NP0 beans]
/ \
[S0 S1] [S1 we ate NP0 on Sundays]
| \
[S1 NP1 ate NP0 on Sundays] [NP1 we]
| by topicalization
[S1 NP1 ate NP2 on Sundays]
In this example, "on Sundays" is really an argument, but I suppressed
some detail to simplify the example. Also, before substituting for
NP0, the constituent "we ate ... on Sundays" is a discontinuous
constituent, but I am not sure that is actually possible -- it may be
that the NP0 has to be moved to the end or to the beginning, to make the
remainder a continuous constituent:
We ate on Sundays, beans.
Beans, we ate on Sundays.
At any rate, the natural place for performative constituents in
English is at the beginning of a clause, so at least the latter is an
option.
So, topics are dependencies that can only be satisfied in root
sentences, S0, and when an embedded argument becomes a topic, the topic
has to be "everted" -- moved out of its embedded position in the
sentence structure.