Corpus Linguistics
 
 
Somebody [ME sum body, fr. sum som some + body]: one or some person of no certain or known identity: a person indeterminate.
 
Someone [ME sum oon, fr. sum som some + oon one]: some person: somebody.
--Webster’s Third New International Dictionary
 
Based on the above entries for “somebody” and “someone” it is clear that the conventional definitions for these two words are identical. In fact, “someone” refers to “somebody” in order to clarify its own meaning. The history of these words is nearly identical as well. Each was generated in Middle English, entering the language within two years of each other; “somebody” was first used in 1303, while “someone” was used in 1305.[1] Why are these words, ostensibly utilized in the exact same manner, coexistent in the English language? While Occam’s razor may rule the laws of the cosmos, it would seem not to apply to the English language where such inefficiency occurs. However, this is not the end of the story for the two words. By referencing a corpus and studying the exact patterns of usage for the words, the difference between “somebody” and “someone” can finally be exposed. In this paper I am interested in examining the usage of this word pair in four contexts: collocations, levels of specificity, agent of action and, finally, transitivity.
The bulk of this study is done through the COBUILD corpus of British spoken English. When “somebody” and “someone” are entered as a search query, the results are interesting:
Somebody: 4027 total instances (434.4/million).
Someone: 1600 total instances (172.6/million).
Assuming that these words are, indeed, equal, then it would make sense if each word was used a number of times roughly equivalent to one another, yet this is not the case. “Somebody” is used roughly 1.5 times more than “someone.” This variation might be due to differences in the sounds between the two words. For instance, “somebody” has three syllables, the first two carrying a rather heavy stress pattern whereas “someone” is a two syllable trochaic rhythm. Perhaps “somebody” is used more often because of the desire for speakers to emphasize their point. There are many speculations about sound and word use, but these are all very difficult to prove. Instead, the corpus will give us empirical evidence as to the patterning of the usage of these two words to shed light on their difference.
The first test to determine whether or not these words are different is to simply examine the lists of strong collocations. This was done in the corpus using the “c” command for t-score collocations and the “C” command for Mutual Information. The results for the top five in each list are as follows:
“Somebody” UKSPOK corpus, t-score
who 15 3.534784
else 13 3.506295
if 13 2.734573
to 27 2.108609
somebody 5 2.080691
 
 
who 15 3.534784
else 9 2.880709
to 32 2.820764
if 13 2.734573
s 27 2.593357
 
“Somebody” UKSPOK corpus, mutual information
 
else 13 5.183439
somebody 5 3.847505
talking 4 3.570671
who 15 3.517850
saying 3 2.679442
 
someone 4 4.857311
else 9 4.652871
who 15 3.517850
give 5 3.517755
twenty 3 3.091247
 
Clearly, the only information of real significance that the collocations test offer is that “somebody” and “someone” keep the same company. In fact, the similarities between the lists are striking.
The next test to determine any differences between these words is to examine their levels of specificity. Perhaps the distinction between “body” and “one” is important to the usage of these words. “Body” is more tangible and specific while “one” is more existential, referring simply to a generic human existence. The test used to discover the level of specificity of each word is simple and involves ranking each instance of “somebody/one” based on the specificity of the entity that word is referring to. This was done on a five-point scale:
1: Somebody/one used as a replacement for a name or referring to a specific, definite person.
(1): He’s [Shelley] is someone who is trying to bring together various strands of radical philosophic thought.
2: A ranking of 2 was given when the referent is either part of a group of people that are present or referring to a specific person as of yet unknown.
(2): Somebody else say something
(3): See him waiting outside the newsagents for someone going to get him some fags.
3: This ranking is for a person in general, but qualified in some manner (i.e. “someone who/with…).
(4): If a G P has somebody with learning disabilities do they go to a community learning disability team?
4: A score of 4 is for a referent that is “not you” but somebody else.
(5): So I mean if she goes and you’ve got to do English with someone else…
5: The most general referent, where somebody/one=anybody/one.
(6): You know somebody takes my car or anything like that you know.
The results are as follows (Sample=100):
Somebody Someone
1: 11 instances 1: 14 instances
2: 12 2: 11
3: 21 3: 17
4: 6 4: 4
5: 38 5: 44
X (Unknown): 12 X: 10
 
Again, the similarities are the most important piece of information gained from this test. The levels of specificity break down almost evenly with each other, even to the amount of instances that were impossible to determine.
The third test run on these words was done to explore whether each word related to the action of a sentence in a differing way. Here, each clause containing “somebody/one” was examined and assigned one of two ranks:
S was given to clauses where somebody/one was doing all of the action.
(7): This really is someone having a laugh at our expense isn’t it.
N was given to clauses where something other than “somebody/one” did the action.
(8): Can’t you cut somebody then if they don’t put the phone down?
The results, again, show a profound similarity between these words (Sample=100):
Somebody Someone
S: 66 instances S: 63 instances
N: 27 N: 31
X (unknown): 7 X: 6
 
The final test used to examine “somebody” and “someone” revealed a surprising difference between these words. Drawing on the work of Hopper and Thompson 1980, each of these words was screened for transitivity. Hopper and Thompson define transitivity as traditionally understood as “a global property of an entire clause, such that an activity is ‘carried over’ or ‘transferred’ from an agent to a patient.”[2] They then went on to develop a system of criteria for transitivity. This system shows clauses to be at times “more or less” transitive than other clauses.[3] Here are the criteria for determining transitivity:
High transitivity Low transitivity
A: Participants two or more (A and O) 1 participant
B: Kinesis action nonaction
C: Aspect telic atelic
D: Punctuality punctual nonpunctual
E: Volitionality volitional nonvolitional
F: Affirmation affirmative negative
G: Mode realis irrealis
H: Agency A high in potency A low in potency
I: Affectedness of O O totally affected O not affected
J: Individuation of O O highly individuated O nonindividuated[4]
 
It is clear that that each of these criteria are not “black and white” issues. For instance, for criterion B, “thrashed” presents more action than “said” which presents more action than “enjoy.” For the purposes of analyzing “somebody” and “someone” though, the best approach is to assign each criterion a value of “1” and assign each criterion to the clauses that contain “somebody/one” as best as possible. Thus, this system of determining transitivity closely resembles many utilitarian systems for calculating greatest utility for the greatest amount of people. In order to avoid the problems of scale, I interpreted these criteria based on a division between the physical and mental world. For instance, a clause would receive a point for criterion B if the subject kicked (physical) but not if he/she loved (mental). Similarly, for criterion I, if an object was shattered, then the clause would receive a point, but would not receive a point if the object was looked at. Similar types of interpretations were made to delineate each criterion. Following are examples of clauses ranked by transitivity.
High transitivity:
(9): And someone’s put cigarettes here.
This clause receives a point for all the criteria, thus scoring a 10.
Low transitivity:
(10): I’m just going to go and find somebody else.
This clause only receives points for criteria A, B, E, F, and H, thus only receiving a 5. These are the results for the transitivity test (Sample=100):
Somebody
Criteria:
B: 64
C: 26
D: 45
E: 65
F: 79
G: 35
H: 79
I: 17
J: 36
X (unknown): 17
 
Average level of transitivity: 6.1
 
Someone (Sample=100)
Criteria:
A: 69 instances of criterion A
B: 44
C: 21
D: 29
E: 60
F: 75
G: 29
H: 80
I: 13
J: 20
X (unknown): 17
 
Average level of transitivity: 5.3
 
As a result of this experiment, the patterning of transitivity indicates that “somebody” is used more transitively than “someone.” The difference of average level appears slight, only differing by .8 of a point; however, the difference becomes clearer when looking at the distribution of each of the criteria. Most often, the criteria are on very similar levels, only differing by a few points. This is not the case for criteria B, D and J. “Somebody” was awarded both criteria D and J 16 more times than “someone” and 20 more times for criterion B. Criteria B and D are especially significant because they relate to one another. B represents the level of physical action of the verb. D represents whether the verb was goal oriented and further still if that goal was completed. Perhaps the “body” / “one” division does play a significant role, in that “somebody” is more apt to be found in highly transitive clauses involving physical processes.
In order to strengthen the claim that there is a difference of transitivity between “someone” and “somebody” it is necessary to check these results in a corpus other than the British spoken English corpus. Here I chose to use the British books corpus for two primary reasons. Firstly, it was necessary to keep within the realm of British English so that no differences between American culture and British culture would skew the results. Secondly, the books corpus represents “pure” writing. That is, it avoids the danger of finding instances of the query in quoted speech, as often happens in newspapers. The only potential drawback of the British books corpus is that many of the books are fiction, which brings in the author’s conscious artistry in ways that might deviate from “natural” language. However, this drawback does not present any significant problems in discovering patterning of “somebody” and “someone”.[5]
The specificity test, the action test and the test for transitivity were run on the British books corpus. The results for specificity are as follows (Sample=50):
1: 5 1: 7
2: 10 2: 6
3: 4 3: 11
4: 11 4: 7
5: 19 5: 19
X: 1 X: 0
 
Again, the distribution of specificity is even between the two words with a slight difference in category 3. The test to determine who is doing the action produced results similar to those of the British spoken English corpus (Sample=50):
S: 32 S: 25
N: 17 N: 23
X: 1 X: 2
 
Here, there is a difference between the two words in that “someone” is more likely to be acted upon in a sentence than is “somebody”. Is this a reflection of the prior findings about transitivity? Finally, the transitivity test was run (Sample=50):
Somebody
Criteria:
A: 37
B: 30
C: 19
D: 21
E: 40
F: 45
G: 24
H: 44
I: 15
J: 24
X: 3
 
Average transitivity: 6.4
 
Someone
Criteria:
A: 33
B: 21
C: 8
D: 12
E: 25
F: 42
G: 19
H: 40
I: 6
J: 12
X: 8
 
Average transitivity: 5.4
 
Again, “somebody” averages a point higher than does “someone” in terms of transitivity. Looking at the distribution of criteria between the two, the major differences reside in B, C, D, E, and I. In this smaller sample, the differences are now staggering. Each of these criteria deal directly with the aggressiveness of verbs and the level these verbs effect the object. Criterion E, volitionality, did not appear as a very distinct difference in British spoken English, but here “somebody” emerges as representing entities with a high capacity to will their actions.
The differences of other word/phrase pairs are revealed when transitivity is examined. To illustrate this point, I ran the transitivity test on the phrase pair “each other” / “one another” to determine if there was a difference between the two phrases. The results are as follows (Sample=50):
Criteria:
A: 48
B: 33
C: 13
D: 11
E: 36
F: 43
G: 30
H: 44
I: 12
J: 47
X: 2
 
Average transitivity: 6.8
 
One Another
Criteria:
A: 47
B: 19
C: 3
D: 2
E: 23
F: 41
G: 29
H: 49
I: 6
J: 39
X: 2
 
Average transitivity: 5.1
 
Here is an astounding difference in transitivity. The difference enters through criteria B, C and D, again, the criteria that cover the aggressiveness of the verbs.
In conclusion, the dictionary entries for both “somebody” and “someone” are misleading in that they present a picture of uniform similarity between the two words. This idea of uniform similarity holds true through several examinations of the two words, including collocations, levels of specificity and agents of action. On closer inspection, however, it is clear that there is a significant difference between the two words relating to transitivity. Using Hopper and Thompson’s system for determining the transitivity of a clause, it is evident that “somebody” is more often found in highly transitive clauses involving physical action than is “someone.” This is possibly due to the “body” / “one” division, with “body” lending itself to physical actions more easily than the more existential “one”. The only major problem with this study is present because so much of the data collection relies on consistent interpretation of word usage. Not only does the fatigue of the interpreter come into play, but also do the limitations of the corpus. When a query is retrieved, there is only a small paragraph available to examine. This small paragraph does not allow the viewer to understand the entire context within which each word is spoken, thus leading to the possibility of misinterpreting the data. Despite this drawback, the present study affords insight into the differences of word/phrase pairs as a result of transitivity.
 
Appendix A
"Someone"  From the British Spoken Corpus, Sample=100
you could take all day to transcribe a hundred words
<ZGY> <F01> And I
think it should be the equivalent of about five pounds an
hour providing <F03>
 Yeah. Well we
should out <F01> there's someone working at a reason
you know at
 a steady rate.
<F0X> Yeah. We should try transcribing some various people try
it. <F0X> Yeah. <F0X> Mm. <F0X> See how
long it takes them and then pay them
the
 
<tc text=laughs> I thought oh that describes people
so well. Erm <ZF1> it
<ZF0> it seemed to be the sort of word you needed
that up until then you hadn't
had the really good way of describing someone
who just lounges out <ZF1> and
<ZF0> and watches the video. And here you had got
it.  <ZF1> I <ZF0> I think
that you know when <ZF1> you <ZF0> you get a
word that's doing a terrific job
like
 
<F0X> That's right. That's what I feel <ZGY>
yeah <ZG0> <F0X> <ZF1> This this
<ZF0> this <M0X> <ZGY> <F0X> poem
strikes me as something that is opposed to
the first poem. It I almost strikes me as someone
read the poem last week and
though <ZGY> oh <ZGY> <M0X> <tc
text=laughs> <F0X> I'll now go and write the
other side. It really strikes me like that. <F0X>
Mm. Mm. <M0X> Yeah. <F0X> Mm
.
<F0X> Mm.
 
you the details of <ZGY> <F0X> I would like
your phone number actually 'cos I
would like to know of a masseur or masseur I suppose you'd
call it <ZGY>
because my husband really would like to go to someone
<F01> Right. <F0X> on a
regular basis but <ZGY> <F01> Okay.  <ZF1> I <ZF0> I can do that as
well.  <tc
 
text=pause> <F0X> Thank you. <ZZ1> possibly
a break in the recording at 571.
For the rest
 then we'll have
<ZGY> time to do a few <ZG0> other things. Erm our area of
social interaction. We need to have people that we can
talk to.  <ZF1> It's
<ZF0> it's really important that er there is
someone in our lives and <ZF1> if
 you if <ZF0>
if you find it difficult to make friends then there are certain
groups and organizations <ZF1> or <ZF0> or
counsellors that you can express
your feelings
 
to it but I don't think that they should assume the worst
about us. <M01>
<000> No you're absolutely right. <F01> When I
see someone dressed <ZF1> in
<ZF0> in any way whatsoever I mean if I see someone
dressed conventionally I
don't think when they pick up a pen that they're going to
steal it. <M01> Of
course not. In fact probably more people who dress
conventionally steal pens
than anything
 
I <tc text=pause> offered her my love and she
accepted it. She had her own
mother. <tc text=pause> <ZF1> I <ZF0> I
wasn't envious of her own mother or
anything. All I could think of was there's
someone here to love my son when I'
m
not here. <M01> Mm. <F04> And I think <tc
text=pause> you know <ZF1> this this
 
is the <ZF0> this is the <tc text=pause>
essence er isn't it you know. 
<ZF1>
Le <ZF0>
 
the test side and because that leaves my selection
problems to some degree. <t
c
text=laughing> <M03> Erm if Graham's not picked
in the test side then we've
certainly got a wealth of batting talent someone's
going to be very
disappointed all year but over the course of the season
and certainly in the
bowling department I am sure all the bowlers'll get plenty
of cricket because
that's the way
 
from Malawi where I've been on behalf of Comic Relief
looking to see how Comic
 
Relief money is spent. I can guarantee that every penny
that you give is used
to help save somebody's life or to give someone's
life a bit more dignity.
Please cough up this year for Comic Relief. <M04>
Each week day afternoon
Leicester P M has news views information and
entertainment. <ZZ1> Scripted
announcement <ZZ0>
 
Ratners bought somebody but whether it was <ZF1> he
he <ZF0> he bought some
chain out but I don't think it was Samuels but somebody
will come on and tell
us. <F04> Ah so I was going to say I think someone
will because I think <ZF1>
it <ZF0> it's one of the jewellers that you can go
into and you can buy exactl
y
identical items and <ZF1> I <ZF0> I thought it
was Samuels. <M01> I don't know
.
I think
 
Appendix B
"Somebody" From
the British Spoken Corpus, Sample=100
my registration number which is my registration number.
<F0X> If you put the
phone down straightaway it actually doesn't cut them off
if they've rung you
does it. <F0X> No. <M0X> Can't you cut somebody
then if they don't put the
phone down. <F0X> It's only if they put the phone
down. <F0X> Yeah. Really.
<M0X> Yeah. <F0X> I think so.  <ZGY> <F0X> <ZGY> just put
the phone down if
you'll put it
 
bust is completely unrelated to another person going bust.
It's not going
to be because an outside event comes and wipes everybody
out it's just you kno
w
that some amongst ten people there's somebody
who is going to foul the thing u
p
you know that's going to make a mistake. We don't know who
it is but <ZF1>
there's <ZF0> there's just a we know from experience
that not everybody is a
ace
 
it as a practitioner for client who has come for treatment
I am going to just
do it without asking any case history or anything. Usually
I do ask the case
history first before I do it. Also when somebody
come for treatment I always
ask them if they are on any medication. I would like to
know because sometimes
 
some symptoms could be due to their side effect of
medication as well. <F11>
<ZGY>
 
That was why <ZGY> <F0X> No this was this came
up recently in something else.
 <ZF1> It was
<ZF0> it was on Mastermind thing and it was the one he got wrong
 
as well. <F0X> It was something like somebody
Gold Brown or <ZGY> Dolly
something. <F0X> <ZF1> Dolly <ZF0> Dolly
Threadgold. <M0X> Dolly Threadgold.
<F0X> <tc text=laughs> <F0X> <tc
text=laughs> I don't know <ZGY> <F0X> The nam
e
rings a
 
sound would we get? <M01> Well I d I the interesting
thing is when you hear
people sing they sing in a completely different accent to
the way they talk
very often. <M09> Very true. <M01> You take somebody
like Chris Rea <M09> <ZGY
>
<M01> <ZF1> who <ZF0> who sings with a
sort of erm very powerful almost
Americany type sound to sing <M09> That's right.
<M01> and yet there he is wit
h
a Middlesbrough
 
Gilbert was saying and er the problem I had <ZF1>
the <ZF0> the record company
 
was actually moving premises and <ZF1> they'd er
<ZF0> they'd got all the
studio in bits. So I had to go over to er
somebody else's studio and do some o
f
the recording but the record company erm said that they
must have er final say
 
<ZF1> on <ZF0> on how things happen so er we
ended up putting a makeshift
studio
 
<M01> <ZF1> This is <ZF0> this is erm
this that was a case of America doing
in Panama what it says it can't do <ZF1> in I
<ZF0> in Iraq. <F06> <tc
text=laughs> <M01> Which is go in and remove somebody.
<F06> Yes that's right.
 
I mean <ZF1> they <ZF0> they talked about war criminals
<tc text=pause> er
after this war in the Gulf. But I remember the U S A
<tc text=pause> er
dropping napalm on
 
into lectures and they've given us the real life
situations that you are
actually going to. <M01> <ZF1> You <ZF0>
you prefer being taught by somebody
who has actually done the job rather than somebody
who's studied the job if yo
u
like. <M08> <ZF1> I <ZF0> I again a
personal opinion but I think a theory
aspect is very different than a practical aspect.
<M01> Yeah. <M08> Erm from m
y
point of
 
no more in but I had already paid nine hundred and sixty
odd pound. <M14> Mm.
<M16> <ZF1> And er <ZF0> and <ZF1>
when <ZF0> when they matured after twenty
years I naturally just a minute there's somebody
at the door. <M14> Oh. Oh wel
l
<ZF1> we <ZF0> we have lost Reg for well there
we are well a <ZF1> beginning
<ZF0> beginnings of an interesting story of twenty
years of unit trusts. <M16>
 
Oh
 
outcry and er of course coming towards an election all the
politicians
wanted to do their best I think you have that situation
over here as well.
<M01> Let's hope we don't have to wait until somebody
dies because of it thoug
h
Peter that's the thing. <M05> Erm. Well I mean
<ZF1> I'd <ZF0> I'd say Ireland
 
is only a country of three and a half million and if somebody
does die like
that it does
[1] Oxford English Dictionary, 2nd Edition
[2] Hopper, P.J., and S.A. Thompson (1980) “Transitivity in Grammar and Discourse,” Language 56, 251.
[3] Hopper, P.J., and S.A. Thompson (1980) “Transitivity,” 253.
[4] Ibid., 252.
[5] When the two words were entered as queries into the British books corpus, an interesting phenomenon occurred. For UK books, the total number of instances of “somebody” was only 342 times, (63.9/million). On the other hand, “someone” occurred 1517 times, (283.3/million). This is opposite of the trend for British spoken English. I began to wonder if there was a register difference between the two words. Here are the numbers for all corpora:
Someone
Corpus Total Number of Average Number per
Occurrences Million Words
ukbooks 1517 283.3/million
ukmags 1175 239.7/million
usbooks 1282 227.9/million
today 1117 212.8/million
sunnow 1206 207.1/million
ukspok 1600 172.6/million
npr 478 152.8/million
ukephem 434 138.9/million
oznews 730 136.8/million
usephem 157 128.2/million
times 723 125.4/million
bbc 124 47.5/million
Somebody
Corpus Total Number of Average Number per
Occurrences Million Words
ukspok 4027 434.3/million
npr 344 109.9/million
usbooks 425 75.5/million
ukbooks 342 63.9/million
ukmags 200 40.8/million
today 182 34.7/million
times 191 33.1/million
sunnow 186 31.9/million
oznews 108 20.2/million
bbc 51 19.5/million
ukephem 48 15.4/million
usephem 12 9.8/million
 
However, no register difference immediately presents itself.