To Not Be Or Not To Be, That is the Question
A Corpus-based Analysis of the Split Infinitive
by Scott C. Lucas
While not technically forbidden by grammarians, the split infinitive construction has a notorious place in the English language. Prescribed against by educators, editors, authors, journalists, and a number of manuals dealing with the improvement of expository writing and the teaching of English as a foreign language, the split infinitive seems to have no place in written discourse. When exceptions are stated in grammatical texts, the 'rules' for the usage of the split infinitive are vague at best: according to various grammatical guides, it is acceptable "if another construction would be ambiguous, awkward, or less emphatic," (1) when "the sentence will feel more natural," (2) or "when a split infinitive is less awkward than a preceding one," adding the contradictory warning that "to be on the safe side, however, you should not split such infinitives, especially in formal writing." (3) Despite the evidence the authors of such texts present demonstrating the practicality of the form, split infinitives are clearly stigmatized as something to be avoided in written discourse, if only because, as one author puts it, they "are not errors, but nevertheless annoy many people and therefore should be avoided." (4)
While this avoidance is instantiated in the prescriptive efforts to ban the split infinitive from the English language, the construction has not become obsolete. In fact, its appearance permeates every level of both spoken and written discourse, from the lowest registers to the most formal. The question is, when and how is the split infinitive to be used properly? Under what conditions is it "less awkward," "more natural," and far enough "on the safe side" to be considered acceptable?
Frequency of the Split Infinitive
In order to answer these questions, we will consult the CobuildDirect Interactive Corpus, examining the usage of the split infinitive across Cobuild's twelve corpora. Under the general query to+RB+VERB (a request which will return all instances of the word 'to' followed by an adjective followed by a base-form verb), we will see that the split infinitive, despite the heavy prescription against it, does appear in sizable numbers in the various genres of discourse provided. The following table (1) displays descriptions of the different corpora contained within Cobuild and the frequency of the appearance of split infinitives for each. The chart lists the corpora in descending order based on average appearance, beginning with Npr at an average of 135.5 tokens per one million words, and ending with Times, with only 27.4 per million. The far right column displays the actual amount of tokens returned; due to the differing sizes of the corpora, a corpus with a higher number of split infinitives per million words might have returned less tokens than the corpora below it.
1. Occurrence of the Split Infinitive Across the Corpora (5)
Corpus and Description
|Tokens/Million Wds.||Number of Tokens|
|Npr - National Public Radio (US)||135.5||
|Ukspok - spoken English (UK)||131.8||1,222|
|Usephem - leaflets, adverts, etc. (US)||128.2||157|
|Oznews - newspapers (Australia)||118.2||631|
|Usbooks - fiction and nonfiction books (US)||69.1||389|
|Ukmags - magazines (UK)||66.5||326|
|Today - Today newspaper (UK)||65.2||342|
|Ukephem - leaflets, adverts, etc. (UK)||57.6||180|
|Sunnow - Sun newspaper (UK)||54.3||316|
|Bbc - BBC World Svc. radio broadcasts (UK)||52.1||136|
|Ukbooks - fiction and nonfiction books (UK)||38.3||205|
|Times - London Times newspaper (UK)||27.4||158|
An examination of the above table will bring several preliminary findings to light. First, it is interesting to note that the avoidance of splitting the infinitive seems to be strongest in the United Kingdom corpora, which compose the lower seven tiers of the chart, with only the very low-register unprepared spoken English corpus (Ukspok) entering the upper portion of the table, surrounded by Australian and United States corpora. Further, while one would expect the registers of the corpora to play strongly into the instances each returned, we see that the effect of register, while noticeable in our findings, is less pronounced than we may have guessed. While Times predictably demonstrated a strong avoidance of the split infinitive, it is interesting that Npr and Oznews both scored high percentages of split infinitive usage. From these results, it seems that the tendency to avoid the split infinitive in formal discourse is more closely tied to cultural circumstances than a universal rule of registry. (6)
Now that we can demonstrate that the split infinitive is instantiated across different registers of English, the next step is to determine how it is used; by taking a closer look at our results we can demonstrate several important functions of the split infinitive and outline some of the trends regarding its usage in both spoken and written English.
Qualities of the Split Infinitive
In examining the tokens returned through a search of the Cobuild corpus, the first thing that becomes clear is the dominance of a set of adverbs that are more likely to break the infinitive than others. While infinitives were split by a large assortment of words, the adverbs displayed in the chart below (2) were found to be among the primary interrupters of infinitives in every corpus. It bears mentioning that other words with less overall frequency would appear on this list if the average use of the words were taken into account; however, doing so would also cause the results to be laden with potential artifacts, as certain words with very little usage in the corpora (such as 'intrinsically' or 'generationally') would skew the data with only minimal instances as infinitive-splitters. The adverbs and corpora listed below are in no particular order.
2. Primary Infinitive Splitters
The data above suggests not only that certain adverbs are common infinitive-splitters, but that the register and cultural locations of the corpora influence the proportion of the adverbs listed above compared to others. (7) Note how closely the following table resembles (1) in its ordering; we are here observing, in declining order, the percentage of the tokens for each corpus in which the infinitive is split by one of the above-mentioned set. For example, we will see that Npr again tops the list, with 63.9% of its split infinitives containing one of the adverbs listed in Table (2), with Oznews containing the lowest percentage of common infinitive-splitters at 13.6%.
3. Common Infinitive-Splitter Usage Percentage
As we see here, register and culture operate upon the data in several ways; not only do they play a role in the degree of prescription against the split infinitive, but also in the choice of words with which one is permitted to split the infinitive. The higher-register corpora appear toward the bottom of this list, possibly indicating that the more commonly-used an infinitive-splitting adverb is, the less likely it is to be accepted in formal written discourse.
Another element to take into account in an examination of the composition of the split infinitive construction is the usage of preceding phrasal formulae that will resist being interrupted by an adverb, preferring to instead break the infinitive verb. Common phrasal forms such as "we/they have agreed to...," "the decision to...," "(to be) unable to...," and a wide array possible others are more likely to maintain the connection between the 'to' and the preceding phrase, opting instead to shift adverbial placement and position the adverb after the intact phrase and before the verb. Similar evidence that the 'to' bears a stronger connection to the preceding elements of the sentence are the appearance of the combined words "wanna" and "gonna," the former appearing in the Ukspok corpus 145 times, and the latter 823. The very usage of the words imply a strong connection between 'to' and the preceding verbs 'want' and 'going.' In fact, each of the words also occurred in Ukspok as part of a split infinitive, 'wanna' twice and 'gonna' seven times. The very shape of the words makes any construction containing an adverb nearly impossible without the use of a split infinitive, as the merging of the words denies the 'proper' adverbial placement of formal discourse by eliminating the very space that the adverb is to occupy.
The Split Infinitive in Context
Now that we have seen the primary shapes that the split infinitive takes in the various corpora, we can begin to examine its function. When does the infinitive "feel natural" in a sentence, when is it "less awkward" than its alternatives? In the tokens produced in Cobuild, the majority of the split infinitives (with the possible exception of Ukspok, in which the form was used more freely) performed the necessary function of eliminating confused adverbs. By placing the adverb between 'to' and the base form verb, the reference of the adverb is not called into question; it is modifying the split infinitive verb. When placed before the intact infinitive, the writer risks confusing the adverb between the infinitive and the preceding verb. Consider the following examples from the Cobuild corpus (all italics added):
a) "Mrs. Thatcher forced unions to regularly ballot members on political donations, though this failed to produce any mutiny." (Today)
b) "I'm going to just look about and start to walk away." (Ukspok)
c) "Origins is that simple: just two targeted products that work inwardly to 'retrain' skin over time and outwardly to immediately relieve the problems you see and feel." (Usephem)
Here we find several cases in which the split infinitive is not only acceptable, but vital tp the clarity of meaning of the sentences. In (a), if we consider an inversion of the sentence in order to keep the infinitive intact, such as 'Mrs. Thatcher forced unions regularly to ballot members..,' it is no longer clear verb the adverb is modifying; have the unions repeatedly balloted their members or has Maggie just badgered them to do so, possibly without result? By transferring the adverb to an infinitive-splitting position, we are made aware of what exactly is going on in the sentence.
In the case of (b), the alternative phrasing does not only bring the meaning of the sentence into question, it alters the meaning of the adverb itself. Had the sentence read 'I'm going just to look about and start to walk away,' we lose the sense that 'just' had carried in the first sentence - that the speaker was going 'simply' to look around. In the modified version, 'just' more closely resembles 'only' or 'for the sole reason of,' in meaning, disrupting the original intent of the utterance entirely.
Finally, (c)'s alternative phrasing would describe products that 'work inwardly to 're-train' the skin over time and outwardly immediately to relieve the problems you see and feel;' this not only leaves us referentially confused, it renders the sentence unwieldy and busied, with the status of both the 'outwardly' and 'immediately' muddled. Here the split infinitive works not only to establish which adverb is modifying which verb, but also to space the sentence out and make it manageable as a single statement.
From these examples we can begin to draw an understanding of the shape and operation of the split infinitive in both formal and casual discourse. We have seen that despite the prescription against the construction, its functionality still serves to instantiate it in every discursive category, and how the cultures and registers of the varying categories may impel the split infinitive to take the different shapes that it does across the corpora. We have also begun to see what forces drive the usage of the split infinitive as a clarifying and arranging element within the sentence, and to see some forms in which it operates. Through continuing analysis, we may move closer to a thorough understanding of the operations and appearances of the split infinitive as it occurs in various genres of the English language, and possibly diminish the prescription against its usage as a whole, despite the 'annoyance' such a pursuit might cause.
Hacker, Diana. Rules for Writers. Boston: Bedford Books, 1996.
Hodges, John C. et al. Harbrace College Handbook. New York: Harcourt Brace, 1994.
Johnson, Edward D. The Handbook of Good English. New York: Facts on File Publications, 1982.
Schwab, William. Guide to Modern Grammar and Exposition. New York: Harper and Row, 1967.
1. Schwab 55.
2. Hodges et al. 250.
3. Hacker 113.
4. Johnson 260.
5. All data from the CobuildDirect Interactive Corpus.
6. It is important to note here that while both Npr and Oznews are news forums and thus considered high register, neither has the reputation for conservative formality of Times, which may help explain the crowding of British corpora toward the bottom of the chart. It is also necessary to point out that several of the tokens returned in the news-forum corpora were contained in quotations; these instances were not rooted out of the comparative graph as they were fairly balanced between the different corpora.
7. While it may seem that the adverbs listed in the chart are not very common when compared to the large number of tokens composing the "Others" category, it is important to recognize the large amount of adverbs which make up that category. Of the hundreds that are listed for each corpus, very few made more than a handful of appearances, and those that did were not established over every corpus.