Tom Sweterlitsch

Corpus Linguistics

 

Someone vs. Somebody

Word/Phrase Pairs Exposed

 

Somebody [ME sum body, fr. sum som some + body]: one or some person of no certain or known identity: a person indeterminate.

 

Someone [ME sum oon, fr. sum som some + oon one]: some person: somebody.

                        --Webster’s Third New International Dictionary

 

           

Based on the above entries for “somebody” and “someone” it is clear that the conventional definitions for these two words are identical.  In fact, “someone” refers to “somebody” in order to clarify its own meaning.  The history of these words is nearly identical as well.  Each was generated in Middle English, entering the language within two years of each other; “somebody” was first used in 1303, while “someone” was used in 1305.[1]  Why are these words, ostensibly utilized in the exact same manner, coexistent in the English language?  While Occam’s razor may rule the laws of the cosmos, it would seem not to apply to the English language where such inefficiency occurs.  However, this is not the end of the story for the two words.  By referencing a corpus and studying the exact patterns of usage for the words, the difference between “somebody” and “someone” can finally be exposed.  In this paper I am interested in examining the usage of this word pair in four contexts: collocations, levels of specificity, agent of action and, finally, transitivity.

            The bulk of this study is done through the COBUILD corpus of British spoken English.  When “somebody” and “someone” are entered as a search query, the results are interesting:

                        Somebody: 4027 total instances (434.4/million).

                        Someone: 1600 total instances (172.6/million).

Assuming that these words are, indeed, equal, then it would make sense if each word was used a number of times roughly equivalent to one another, yet this is not the case.  “Somebody” is used roughly 1.5 times more than “someone.”  This variation might be due to differences in the sounds between the two words.  For instance, “somebody” has three syllables, the first two carrying a rather heavy stress pattern whereas “someone” is a two syllable trochaic rhythm.  Perhaps “somebody” is used more often because of the desire for speakers to emphasize their point.  There are many speculations about sound and word use, but these are all very difficult to prove.  Instead, the corpus will give us empirical evidence as to the patterning of the usage of these two words to shed light on their difference.

            The first test to determine whether or not these words are different is to simply examine the lists of strong collocations.  This was done in the corpus using the “c” command for t-score collocations and the “C” command for Mutual Information.  The results for the top five in each list are as follows:

Somebody”  UKSPOK corpus, t-score

who                       15   3.534784                                   

                                    else                        13   3.506295

                                    if                            13   2.734573

                                    to                            27   2.108609

                                    somebody                 5   2.080691

 

“Someone” UKSPOK corpus, t-score

 

who                       15   3.534784

                                    else                          9   2.880709

                                    to                           32   2.820764

                                    if                            13   2.734573

                                    s                             27   2.593357

 

“Somebody” UKSPOK corpus, mutual information           

 

else                         13   5.183439

                                    somebody                 5   3.847505

                                    talking                      4   3.570671

                                    who                         15   3.517850

                                    saying                       3   2.679442

 

“Someone” UKSPOK corpus, mutual information

 

someone                   4   4.857311

                                    else                           9   4.652871

                                    who                         15   3.517850

                                    give                          5   3.517755

                                    twenty                      3    3.091247

 

Clearly, the only information of real significance that the collocations test offer is that “somebody” and “someone” keep the same company.  In fact, the similarities between the lists are striking. 

            The next test to determine any differences between these words is to examine their levels of specificity.  Perhaps the distinction between “body” and “one” is important to the usage of these words.  “Body” is more tangible and specific while “one” is more existential, referring simply to a generic human existence.  The test used to discover the level of specificity of each word is simple and involves ranking each instance of “somebody/one” based on the specificity of the entity that word is referring to.  This was done on a five-point scale:

1: Somebody/one used as a replacement for a name or referring to a specific, definite person. 

(1): He’s [Shelley] is someone who is trying to bring together various strands of radical philosophic thought.

2: A ranking of 2 was given when the referent is either part of a group of people that are present or referring to a specific person as of yet unknown.

(2): Somebody else say something

(3): See him waiting outside the newsagents for someone going to get him some fags.

3: This ranking is for a person in general, but qualified in some manner (i.e. “someone who/with…).

(4): If a G P has somebody with learning disabilities do they go to a community learning disability team?

4: A score of 4 is for a referent that is “not you” but somebody else.

(5): So I mean if she goes and you’ve got to do English with someone else…

5: The most general referent, where somebody/one=anybody/one.

(6): You know somebody takes my car or anything like that you know.

            The results are as follows (Sample=100):

Somebody                                           Someone

1: 11 instances                                     1: 14 instances

2: 12                                                    2: 11

3: 21                                                    3: 17

4: 6                                                      4: 4

5: 38                                                    5: 44

X (Unknown): 12                                X: 10

 

Again, the similarities are the most important piece of information gained from this test.  The levels of specificity break down almost evenly with each other, even to the amount of instances that were impossible to determine. 

            The third test run on these words was done to explore whether each word related to the action of a sentence in a differing way.  Here, each clause containing “somebody/one” was examined and assigned one of two ranks:

S was given to clauses where somebody/one was doing all of the action.

(7): This really is someone having a laugh at our expense isn’t it.

N was given to clauses where something other than “somebody/one” did the action.

(8): Can’t you cut somebody then if they don’t put the phone down?

The results, again, show a profound similarity between these words (Sample=100):

Somebody                               Someone

S: 66 instances                                    S: 63 instances

N: 27                                       N: 31

X (unknown): 7                      X: 6

 

            The final test used to examine “somebody” and “someone” revealed a surprising difference between these words.  Drawing on the work of Hopper and Thompson 1980, each of these words was screened for transitivity.  Hopper and Thompson define transitivity as traditionally understood as “a global property of an entire clause, such that an activity is ‘carried over’ or ‘transferred’ from an agent to a patient.”[2]  They then went on to develop a system of criteria for transitivity.  This system shows clauses to be at times “more or less” transitive than other clauses.[3]  Here are the criteria for determining transitivity:

                                                            High transitivity                   Low transitivity

A: Participants                                    two or more (A and O)                    1 participant

B: Kinesis                               action                                                  nonaction

C: Aspect                                telic                                                     atelic

D: Punctuality                          punctual                                               nonpunctual

E: Volitionality                volitional                                           nonvolitional

F: Affirmation                         affirmative                                          negative

G: Mode                                 realis                                                   irrealis

H: Agency                               A high in potency                                A low in potency

I: Affectedness of O                 O totally affected                         O not affected

J: Individuation of O                 O highly individuated                            O nonindividuated[4]

 

It is clear that that each of these criteria are not “black and white” issues.  For instance, for criterion B, “thrashed” presents more action than “said” which presents more action than “enjoy.”  For the purposes of analyzing “somebody” and “someone” though, the best approach is to assign each criterion a value of “1” and assign each criterion to the clauses that contain “somebody/one” as best as possible.  Thus, this system of determining transitivity closely resembles many utilitarian systems for calculating greatest utility for the greatest amount of people.  In order to avoid the problems of scale, I interpreted these criteria based on a division between the physical and mental world.  For instance, a clause would receive a point for criterion B if the subject kicked (physical) but not if he/she loved (mental).  Similarly, for criterion I, if an object was shattered, then the clause would receive a point, but would not receive a point if the object was looked at.  Similar types of interpretations were made to delineate each criterion.  Following are examples of clauses ranked by transitivity.

High transitivity:

(9):   And someone’s put cigarettes here.

This clause receives a point for all the criteria, thus scoring a 10. 

Low transitivity:

(10): I’m just going to go and find somebody else.

This clause only receives points for criteria A, B, E, F, and H, thus only receiving a 5.  These are the results for the transitivity test (Sample=100): 

Somebody

Criteria:

A: 68 instances of criterion A

B: 64

C: 26

D: 45

E: 65

F: 79

G: 35

H: 79

I: 17

J: 36

X (unknown): 17

 

Average level of transitivity: 6.1

 

Someone (Sample=100)

Criteria:

A: 69 instances of criterion A

B: 44

C: 21

D: 29

E: 60

F: 75

G: 29

H: 80

I: 13

J: 20

X (unknown): 17

 

Average level of transitivity: 5.3

 

            As a result of this experiment, the patterning of transitivity indicates that “somebody” is used more transitively than “someone.”  The difference of average level appears slight, only differing by .8 of a point; however, the difference becomes clearer when looking at the distribution of each of the criteria.  Most often, the criteria are on very similar levels, only differing by a few points.  This is not the case for criteria B, D and J.  “Somebody” was awarded both criteria D and J 16 more times than “someone” and 20 more times for criterion B.  Criteria B and D are especially significant because they relate to one another.  B represents the level of physical action of the verb.  D represents whether the verb was goal oriented and further still if that goal was completed.  Perhaps the “body” / “one” division does play a significant role, in that “somebody” is more apt to be found in highly transitive clauses involving physical processes.

                In order to strengthen the claim that there is a difference of transitivity between “someone” and “somebody” it is necessary to check these results in a corpus other than the British spoken English corpus.  Here I chose to use the British books corpus for two primary reasons.  Firstly, it was necessary to keep within the realm of British English so that no differences between American culture and British culture would skew the results.  Secondly, the books corpus represents “pure” writing.  That is, it avoids the danger of finding instances of the query in quoted speech, as often happens in newspapers.  The only potential drawback of the British books corpus is that many of the books are fiction, which brings in the author’s conscious artistry in ways that might deviate from “natural” language.  However, this drawback does not present any significant problems in discovering patterning of “somebody” and “someone”.[5] 

            The specificity test, the action test and the test for transitivity were run on the British books corpus.  The results for specificity are as follows (Sample=50):

Somebody                               Someone

1: 5                                          1: 7

2: 10                                        2: 6

3: 4                                          3: 11

4: 11                                        4: 7

5: 19                                        5: 19

X: 1                                         X: 0

 

Again, the distribution of specificity is even between the two words with a slight difference in category 3.  The test to determine who is doing the action produced results similar to those of the British spoken English corpus (Sample=50):

Somebody                               Someone

S: 32                                        S: 25

N: 17                                       N: 23

X: 1                                         X: 2

 

Here, there is a difference between the two words in that “someone” is more likely to be acted upon in a sentence than is “somebody”.  Is this a reflection of the prior findings about transitivity?  Finally, the transitivity test was run (Sample=50):

Somebody

Criteria:

A: 37

B: 30

C: 19

D: 21

E: 40

F: 45

G: 24

H: 44

I: 15

J: 24

X: 3

 

Average transitivity: 6.4

 

Someone

Criteria:

A: 33

B: 21

C: 8

D: 12

E: 25

F: 42

G: 19

H: 40

I: 6

J: 12

X: 8

 

Average transitivity: 5.4

 

Again, “somebody” averages a point higher than does “someone” in terms of transitivity.  Looking at the distribution of criteria between the two, the major differences reside in B, C, D, E, and I.  In this smaller sample, the differences are now staggering.  Each of these criteria deal directly with the aggressiveness of verbs and the level these verbs effect the object.  Criterion E, volitionality, did not appear as a very distinct difference in British spoken English, but here “somebody” emerges as representing entities with a high capacity to will their actions. 

            The differences of other word/phrase pairs are revealed when transitivity is examined.  To illustrate this point, I ran the transitivity test on the phrase pair “each other” / “one another” to determine if there was a difference between the two phrases.  The results are as follows (Sample=50):

Each Other

Criteria:

A: 48

B: 33

C: 13

D: 11

E: 36

F: 43

G: 30

H: 44

I: 12

J: 47

X: 2

 

Average transitivity: 6.8

 

One Another

Criteria:

A: 47

B: 19

C: 3

D: 2

E: 23

F: 41

G: 29

H: 49

I: 6

J: 39

X: 2

 

Average transitivity: 5.1

 

Here is an astounding difference in transitivity.  The difference enters through criteria B, C and D, again, the criteria that cover the aggressiveness of the verbs. 

            In conclusion, the dictionary entries for both “somebody” and “someone” are misleading in that they present a picture of uniform similarity between the two words.  This idea of uniform similarity holds true through several examinations of the two words, including collocations, levels of specificity and agents of action.  On closer inspection, however, it is clear that there is a significant difference between the two words relating to transitivity.  Using Hopper and Thompson’s system for determining the transitivity of a clause, it is evident that “somebody” is more often found in highly transitive clauses involving physical action than is “someone.”  This is possibly due to the “body” / “one” division, with “body” lending itself to physical actions more easily than the more existential “one”.  The only major problem with this study is present because so much of the data collection relies on consistent interpretation of word usage.  Not only does the fatigue of the interpreter come into play, but also do the limitations of the corpus.  When a query is retrieved, there is only a small paragraph available to examine.  This small paragraph does not allow the viewer to understand the entire context within which each word is spoken, thus leading to the possibility of misinterpreting the data.  Despite this drawback, the present study affords insight into the differences of word/phrase pairs as a result of transitivity.

 


Appendix A

"Someone"  From the British Spoken Corpus, Sample=100

you could take all day to transcribe a hundred words <ZGY> <F01> And I

think it should be the equivalent of about five pounds an hour providing <F03>

 Yeah. Well we should out <F01> there's someone working at a reason you know at

 a steady rate. <F0X> Yeah. We should try transcribing some various people try

it. <F0X> Yeah. <F0X> Mm. <F0X> See how long it takes them and then pay them

the

 

<tc text=laughs> I thought oh that describes people so well. Erm <ZF1> it

<ZF0> it seemed to be the sort of word you needed that up until then you hadn't

had the really good way of describing someone who just lounges out <ZF1> and

<ZF0> and watches the video. And here you had got it.  <ZF1> I <ZF0> I think

that you know when <ZF1> you <ZF0> you get a word that's doing a terrific job

like

 

<F0X> That's right. That's what I feel <ZGY> yeah <ZG0> <F0X> <ZF1> This this

<ZF0> this <M0X> <ZGY> <F0X> poem strikes me as something that is opposed to

the first poem. It I almost strikes me as someone read the poem last week and

though <ZGY> oh <ZGY> <M0X> <tc text=laughs> <F0X> I'll now go and write the

other side. It really strikes me like that. <F0X> Mm. Mm. <M0X> Yeah. <F0X> Mm

.

<F0X> Mm.

 

you the details of <ZGY> <F0X> I would like your phone number actually 'cos I

would like to know of a masseur or masseur I suppose you'd call it <ZGY>

because my husband really would like to go to someone <F01> Right. <F0X> on a

regular basis but <ZGY> <F01> Okay.  <ZF1> I <ZF0> I can do that as well.  <tc

 

text=pause> <F0X> Thank you. <ZZ1> possibly a break in the recording at 571.

For the rest

 then we'll have <ZGY> time to do a few <ZG0> other things. Erm our area of

social interaction. We need to have people that we can talk to.  <ZF1> It's

<ZF0> it's really important that er there is someone in our lives and <ZF1> if

 you if <ZF0> if you find it difficult to make friends then there are certain

groups and organizations <ZF1> or <ZF0> or counsellors that you can express

your feelings

 

to it but I don't think that they should assume the worst about us. <M01>

<000> No you're absolutely right. <F01> When I see someone dressed <ZF1> in

<ZF0> in any way whatsoever I mean if I see someone dressed conventionally I

don't think when they pick up a pen that they're going to steal it. <M01> Of

course not. In fact probably more people who dress conventionally steal pens

than anything

 

I <tc text=pause> offered her my love and she accepted it. She had her own

mother. <tc text=pause> <ZF1> I <ZF0> I wasn't envious of her own mother or

anything. All I could think of was there's someone here to love my son when I'

m

not here. <M01> Mm. <F04> And I think <tc text=pause> you know <ZF1> this this

 

is the <ZF0> this is the <tc text=pause> essence er isn't it you know.  <ZF1>

Le <ZF0>

 

the test side and because that leaves my selection problems to some degree. <t

c

text=laughing> <M03> Erm if Graham's not picked in the test side then we've

certainly got a wealth of batting talent someone's going to be very

disappointed all year but over the course of the season and certainly in the

bowling department I am sure all the bowlers'll get plenty of cricket because

that's the way

 

from Malawi where I've been on behalf of Comic Relief looking to see how Comic

 

Relief money is spent. I can guarantee that every penny that you give is used

to help save somebody's life or to give someone's life a bit more dignity.

Please cough up this year for Comic Relief. <M04> Each week day afternoon

Leicester P M has news views information and entertainment. <ZZ1> Scripted

announcement <ZZ0>

 

Ratners bought somebody but whether it was <ZF1> he he <ZF0> he bought some

chain out but I don't think it was Samuels but somebody will come on and tell

us. <F04> Ah so I was going to say I think someone will because I think <ZF1>

it <ZF0> it's one of the jewellers that you can go into and you can buy exactl

y

identical items and <ZF1> I <ZF0> I thought it was Samuels. <M01> I don't know

.

I think

 

Appendix B

"Somebody" From the British Spoken Corpus, Sample=100

my registration number which is my registration number. <F0X> If you put the

phone down straightaway it actually doesn't cut them off if they've rung you

does it. <F0X> No. <M0X> Can't you cut somebody then if they don't put the

phone down. <F0X> It's only if they put the phone down. <F0X> Yeah. Really.

<M0X> Yeah. <F0X> I think so.  <ZGY> <F0X> <ZGY> just put the phone down if

you'll put it

 

bust is completely unrelated to another person going bust. It's not going

to be because an outside event comes and wipes everybody out it's just you kno

w

that some amongst ten people there's somebody who is going to foul the thing u

p

you know that's going to make a mistake. We don't know who it is but <ZF1>

there's <ZF0> there's just a we know from experience that not everybody is a

ace

 

it as a practitioner for client who has come for treatment I am going to just

do it without asking any case history or anything. Usually I do ask the case

history first before I do it. Also when somebody come for treatment I always

ask them if they are on any medication. I would like to know because sometimes

 

some symptoms could be due to their side effect of medication as well. <F11>

<ZGY>

 

That was why <ZGY> <F0X> No this was this came up recently in something else.

 <ZF1> It was <ZF0> it was on Mastermind thing and it was the one he got wrong

 

as well. <F0X> It was something like somebody Gold Brown or <ZGY> Dolly

something. <F0X> <ZF1> Dolly <ZF0> Dolly Threadgold. <M0X> Dolly Threadgold.

<F0X> <tc text=laughs> <F0X> <tc text=laughs> I don't know <ZGY> <F0X> The nam

e

rings a

 

sound would we get? <M01> Well I d I the interesting thing is when you hear

people sing they sing in a completely different accent to the way they talk

very often. <M09> Very true. <M01> You take somebody like Chris Rea <M09> <ZGY

>

<M01> <ZF1> who <ZF0> who sings with a sort of erm very powerful almost

Americany type sound to sing <M09> That's right. <M01> and yet there he is wit

h

a Middlesbrough

 

Gilbert was saying and er the problem I had <ZF1> the <ZF0> the record company

 

was actually moving premises and <ZF1> they'd er <ZF0> they'd got all the

studio in bits. So I had to go over to er somebody else's studio and do some o

f

the recording but the record company erm said that they must have er final say

 

<ZF1> on <ZF0> on how things happen so er we ended up putting a makeshift

studio

 

<M01> <ZF1> This is <ZF0> this is erm this that was a case of America doing

in Panama what it says it can't do <ZF1> in I <ZF0> in Iraq. <F06> <tc

text=laughs> <M01> Which is go in and remove somebody. <F06> Yes that's right.

 

I mean <ZF1> they <ZF0> they talked about war criminals <tc text=pause> er

after this war in the Gulf. But I remember the U S A <tc text=pause> er

dropping napalm on

 

into lectures and they've given us the real life situations that you are

actually going to. <M01> <ZF1> You <ZF0> you prefer being taught by somebody

who has actually done the job rather than somebody who's studied the job if yo

u

like. <M08> <ZF1> I <ZF0> I again a personal opinion but I think a theory

aspect is very different than a practical aspect. <M01> Yeah. <M08> Erm from m

y

point of

 

no more in but I had already paid nine hundred and sixty odd pound. <M14> Mm.

<M16> <ZF1> And er <ZF0> and <ZF1> when <ZF0> when they matured after twenty

years I naturally just a minute there's somebody at the door. <M14> Oh. Oh wel

l

<ZF1> we <ZF0> we have lost Reg for well there we are well a <ZF1> beginning

<ZF0> beginnings of an interesting story of twenty years of unit trusts. <M16>

 

Oh

 

outcry and er of course coming towards an election all the politicians

wanted to do their best I think you have that situation over here as well.

<M01> Let's hope we don't have to wait until somebody dies because of it thoug

h

Peter that's the thing. <M05> Erm. Well I mean <ZF1> I'd <ZF0> I'd say Ireland

 

is only a country of three and a half million and if somebody does die like

that it does



[1] Oxford English Dictionary, 2nd Edition

[2] Hopper, P.J., and S.A. Thompson (1980) “Transitivity in Grammar and Discourse,” Language 56, 251.

[3] Hopper, P.J., and S.A. Thompson (1980) “Transitivity,” 253.

[4] Ibid., 252.

[5] When the two words were entered as queries into the British books corpus, an interesting phenomenon occurred.  For UK books, the total number of instances of “somebody” was only 342 times, (63.9/million).  On the other hand, “someone” occurred 1517 times, (283.3/million).  This is opposite of the trend for British spoken English.  I began to wonder if there was a register difference between the two words.  Here are the numbers for all corpora:

              

Someone

Corpus         Total Number of       Average Number per

               Occurrences           Million Words

 

ukbooks            1517                283.3/million

ukmags             1175                239.7/million

usbooks            1282                227.9/million

today              1117                212.8/million

sunnow             1206                207.1/million

ukspok             1600                172.6/million

npr                 478                152.8/million

ukephem             434                138.9/million

oznews              730                136.8/million

usephem             157                128.2/million

times               723                125.4/million

bbc                 124                 47.5/million

 

 

Somebody

 

Corpus         Total Number of       Average Number per

               Occurrences           Million Words

 

ukspok             4027                434.3/million

npr                 344                109.9/million

usbooks             425                 75.5/million

ukbooks             342                 63.9/million

ukmags              200                 40.8/million

today               182                 34.7/million

times               191                 33.1/million

sunnow              186                 31.9/million

oznews              108                 20.2/million

bbc                  51                 19.5/million

ukephem              48                 15.4/million

usephem              12                  9.8/million

 

               However, no register difference immediately presents itself.