How Midterm I for ERMII was Graded

How Midterm I for ERMII was Graded

By: Uri Simonsohn

General Comments:

Each question was given full credit for, unless otherwise noted. You get a –x for whatever amount of points you got subtracted. If nothing is written you had full credit.

I wrote many comments, if none are written down, and you had a low score, check the key below; you probably made a common error.

If you see NA somewhere it just means you didn’t answer the question (No Answer).

Below you will find the proposed answers to most questions, and a brief summary of how they were graded.

Question 1.

a) Almost everybody had full credit. Any good argument was accepted.

b) A lot of people missed 2 points here, arguing that heterogeneous groups are good for having diverse data. You don’t need that, you can have several groups, all of them homogeneous, and still achieve diverse data if the groups are homogenous but different among them.

Also you may miss a few points if your argument in favor of having heterogeneous group was incoherent (the reasoning didn’t lead to the proposed argument).

c) I think the first part of this question everyone got right. The second, a few forgot to answer, others mentioned procedures that wouldn’t help, or were unreasonable.

d) Many people got –5 here. The level of fantasizing was so high that I gave a 1 point premium to those honest enough to leave it blank. Most just mentioned typical reservations that exist regarding focus groups (FG), but not from the specific reading.

The reading emphasizes that subjects aren’t experts, that FG are good for generating ideas but not so much for evaluating, and that it must not be forgotten that marketers shape preferences, and shouldn’t just record them.

Question 2.

a) The hard part of coming up with asking about cigarette consumption is that it is a multidimensional question. Usually the drawback of the proposed solutions was that some dimension was disregarded. In some cases minimal rules of survey making such as providing exhaustive possible answers were violated. Others used vague terms such as “do you smoke often?” or gave scales that were unrealistic for the question.

In general, 1 point was discounted for an answer that just missed an important dimension (frequency or intensity). For example if just asked how often they smoke but not how much each time. Another point was taken off if some other minor flaw was present.

Some good solutions were: (note that they combine to some extent frequency and intensity).

1) Do you smoke? ___ yes ___ no

2) How often do you smoke ___ Couple of times a year,

___ Couple of times a month

___ Couple of times a day

1)Do you smoke ___ No

___ Yes, couple of times a year,

___ Yes, couple of times a month

___ Yes, a couple of times a week

___ Yes couple of times a day

2) If you do smoke:

When you smoke, how many cigarettes would you say it is typical for you to smoke?___

b) Important limitation to all answers is that the respondent may not want to disclose smoking habits. If you failed to mention this limitation, or whichever were specific to your solution on (a), 1 or 2 points were discounted. Even if limitations were mentioned, if a very salient one from (a) wasn’t, points were taken off.

c) Polling 200 students is better. Selection bias is more important than sample size

d) External Validity (internal is not an issue in surveys)

e) Since the error is random, and centered in zero, it would be a reliability problem (in plainer English, the errors of some will cancel those of others, so validity shouldn’t be a problem).

f) The proposed question had to be general and easy to answer.

Question 3.

a1) Many people missed 2 points for not giving a reason or process behind their argument. If you missed 4 or the full 5 points you probably made an argument that was not coherent (or that was not understandable).

One reasonable answer is that you avoid “scaring” them with a tough personal question, by inquiring about ecstasy consumption right away. Many other responses were accepted

a2) Similar grading to (a1). A reasonable answer is that if you start off asking about ecstasy, then the survey-taker may perceive the other drugs as less “evil”, and be more open to answer truthfully.

b) Proposed Answer: “Towards the end, so that if people get scared they have already completed the survey. It also is a rather specific question, it therefore should go towards the end”.

If you just answer with no justification you miss 2 points.

c1) Proposed Answer: If you have the general question first, you avoid having the survey taker thinking ONLY of the specific questions they just answered when referring to the global response.

Most people either got it right or wrong, if the argumentation is weak or unclear you may miss 1 or 2 points.

c2) Proposed Answer: The specific questions may initiate thinking making the response to this question more truthful or representing of reality.

In both c1 and c2 there were several other arguments that would be considered correct answers.

A lot of people missed points because their arguments were incoherent. By this I mean that the reason they proposed for one ordering or the other, was (sometimes) a true statement, but not really related with ordering of survey questions.

For example, some argued that by having different questions on the same topic we can check for validity. Although this may be true, there is no clear interaction between the quality of the validity test and the ordering of the questions, and hence if you didn’t propose how such interaction may occur, you basically didn’t answer the question.

Others said that specific questions are more uncomfortable to answer, but again, if no interaction is proposed between ordering of questions and the comfort of answering to those questions, then the answer is not valid.

It may be true that responding about cocaine consumption is uncomfortable, but if you fail to propose a relationship between how tough it is to answer it, and the order of the questions, then you simply missed the point that was being asked.

d) Proposed Answer: It is a scale because it is very likely that if people do cocaine they do both marijuana and cigarettes. When such interactions occur then it is a scale. If it no relationship between responses was expected, it would be an index.

Partial credit was given to those that answer Index and gave evidence of knowing what that was.

Most people missed this question, at least partially.

e) Proposed Answer: “You give it to the respondent and ask him/her to toss it. If it comes up tails they must answer yes, if heads then answer truthfully”.

This may work because the respondent should feel more comfortable stating the truth since there is no obvious way to be sure if they did consume drugs or not.

Note that you were asked HOW, therefore the process had to be explained in order to get full credit.

f) This was discussed in class. The authors propose that the social norms that rule everyday conversation, are present in survey reading, and hence should be considered. Because of this, the way the survey is written (order of questions, options available, etc.) influences the outcome.

Most people didn’t give this more general response and just stated the conclusions (importance of order, etc.) Depending on the quality of the answers, these received 6-9 points.

Question 4.

a) It is pretty straightforward; ecstasy has low test-retest reliability, the rest ok and cigarettes good.

b) One way to measure the reliability of the overall scale is to split the data in two. Given the low correlations across drug consumption this is likely to show low reliability, even if ecstasy was excluded (most students were rather far from this answer). Other answers, if coherent, were also acceptable.

c) Similar to (a)

d) Multidimensional means that there are several attributes related to the variable being measured, so that just one quantitative or qualitative question isn’t capable of measuring all of them. (Look at the proposed answer for question 2a).

In the case of drug consumption we have type of drug (hard or soft, legal or illegal, recreational or sensation oriented, etc.), frequency of use (once a day, weekly, before church, etc.), intensity of use (5 joints at a time, push needles into myself till the orange juice I had for breakfast starts dripping of my arm, etc).

Quite a few people confused a multidimensional construct with having several causes for drug consumption, and hence argued that different social background, economic situation, peer pressure, etc. may contribute to drug consumption. This is not what is being asked, and doesn’t really relate to the course material.

Depending on how off the question these sorts of answers were, between 4 and 6 points were taken off.

Question 5.

a) Divergent validity refers to the correlation between variables that in order for the tests to be valid, shouldn’t have a particularly high correlation. In this case we wouldn’t like to have the type of indicator of drug consumption affecting the results. We find that the correlation between smoking self-report and marijuana self-report is high 0.75, and between both saliva measures is 0.65

b) Convergent validity refers to the correlation between variables that in order for the tests to be valid, should have a high correlation. In this particular case we would like for self-report smoking and saliva tests of smoking to have a high correlation, yet it is just .35, and similarly for marijuana it is just .23

A lot of people were pretty confused about the terms. If you answered convergent when you were asked for divergent you got –2, and vice-versa. If you just said “it’s low” or “it’s high” you were given no credit (-5). Regardless of being “correct” or not, we don’t want to have incentives for wild guessing.

c) In here the range of valid answers was pretty wide, any well written and supported answer was given full credit.

Question 6.

I marked on each of your tests the right answer. You can probably just check your notes to look for justifications for them.

Question 7.

a) Many different responses were accepted. Among potential problems for this question we have the fact that slang is being used, that data may cluster on [_more], that the scale may be hinting the respondent what an expected range is, that it is specific to just one week, etc.

Fixes are the logical solutions to the above complaints. A few people came up with changes to the question that didn’t really fix the problem they had pointed out, and of course didn’t get full credit for their answer.

b) The most important problem with this question is that it is a 2-barrel question, basically two different questions. The obvious way to fix it is to split it in two. Other answers were accepted but if this fact wasn’t pointed out (since it was so important and salient) 2 points were taken off the score. Again, if your fix didn’t address the problem you pointed out you didn’t get full credit.

c) Many problems: it is hard for kids to come up with %, it is being asked a fraction yet the answer is in %, what is time? Does it include sleeping?, again uses slang which may be hard to understand or may bias the answer, etc.

d) Fairly straightforward.

Free knowledge!

Find this image interesting? Near Prague there is a town called Kutne Hora. Some guy put some dirt from Jerusalem in a church's cemetery and for centuries royalty and high class people wanted to be buried. At the beginning of the 19th century they had as many as 20 people on top of each other on the graves so they decided to get rid of them. They took the bones, over 20.000 people!, and decorated the church. If you click on the image you will go to a website with more info and plenty of images.

traveling the world is cheaper than you think!, try it out

Do you have any questions? Email me!