|
Introduction: Eclecticism in Methods —David A. Harrison Controlling Method Effects in Self Report Instruments —Mary E. McLaughlin Missing Data: Instrument-Level Heffalumps and Item-Level Woozles —Philip L. Roth and Fred S. Switzer III Paradigms and Research Methods —Robert Gephart Improving the Power of Moderated Multiple Regression to Estimate Interaction Effects —Herman Aguinis and Charles A. Pierce Lost Time: Reflections and Recommendations on the Treatment of Temporal Issues in Organizational Research —Donald D. Bergh |
Controlling Method Effects in Self-Report Instruments
MARY E. MCLAUGHLIN
Department of Management
University of Texas at Arlington
marym@uta.edu
Method effects are a continuing source of debate and frustration in organizational research. They seem to be especially troubling in the most commonly used form of data collection -- self-report (survey or questionnaire) instruments. In this paper, I describe what may be a chief culprit in method variance of self-reports: the effects of a self-report question or item's context, both within the instrument and in the broader circumstances of measurement. After linking context effects with recent research on the cognitive processes underlying question-answering, I recommend a variety of ways to minimize their impact.
What are Item Context Effects, and Why are They Important?
Self-reports, and written self-reports in particular, are the most commonly used measurement mode in organizational research. Survey researchers from various disciplines have long recognized that seemingly trivial features of self-report instruments can have an impact on results, generating method, rather than substantive variance. The context of an item, including format, question wording, response options, and presentation order of questions and response options, can have unintended (or intended) effects on respondents' answers. These item context effects are important, potentially leading researchers to different conclusions. Context effects on univariate distributions can have an impact on tracking data, and results of evaluation studies because answers to the same question in different instruments may not be comparable. Context can even have an impact on observed relations between variables (e.g., Harrison & McLaughlin, 1996), in some instances possibly creating "self-generated validity" of theoretical propositions (Feldman & Lynch, 1988).
Although historically, empirical demonstrations of context effects were inconsistent and oftentimes not replicable (Sudman, Bradburn, & Schwarz, 1996, pp. 4-11), research based on cognitive and communicative theory during the last two decades has yielded repeatable and useful results. Several effects can be predicted, created with experimental manipulations, and replicated. From the published research to date, I make a few recommendations to control measurement artifacts stemming from question context. I limit my recommendations to written or computer-administered self-reports, although many of them apply to interviews as well. But first, I review some conceptual background.
How Does Context Affect Answers?
A common assumption in item context research is that people often construct judgments (e.g., of attitude or behavior) on the spot, when needed (whether in daily life or as participants in research), using information that is accessible from memory at the time (e.g., Schwarz & Bless, 1992; Strack & Martin, 1987; Tourangeau, 1992). Information that is accessible every time an issue is brought to mind is referred to as chronically accessible. It is this chronically accessible information that makes judgments stable over time, and (as a source of variance in responses) most closely resembles the hypothetical "true-score" or "latent trait" of interest to researchers. Information that is not accessible every time an issue is considered is temporarily accessible. Temporarily accessible information may be provided to respondents inadvertently by the researcher via features of a self-report instrument, such as, the presentation order of questions, or the range of numerical response scales. This temporarily accessible information can influence the respondent during any of the stages of the response process, including question interpretation, retrieval of information from memory, judgment, and generation of a response. The effects of temporarily accessible information contribute to random, and systematic measurement error (a form of method variance). Thus, seemingly inconsequential features of an instrument can have substantial effects, in ways often unforeseen by the researcher.
The amount of chronically accessible information about an attitude target may correspond with the respondent's attitude strength (defined in terms of attitude persistence, stability, and impact by Krosnick, 1999). Weaker context effects might be expected among respondents with larger amounts of chronically accessible information about the attitude target or other issues addressed in a questionnaire. However, most studies of attitude strength and context effects have not provided any support for its moderating influence. The only exceptions are in a set of three studies conducted by Lavine, Huff, Wagner, and Sweeny (1998) in which six dimensions of attitude strength (importance, certainty, intensity, frequency of thought, extremity, and ambiguity) were operationalized, as distinguished from the single item measures used in prior research. Attitude strength moderated context effects in five out of eight issues across all three studies. These results suggest the possible utility of including measures of attitude (or, belief, opinion, preference) strength in self-report instruments. Chronic accessibility doesn't guarantee retrieval, however, especially when contextual features of the self-report instrument are salient and readily accessible. A few methods for controlling the effects of contextual features on response processes and research results are described below.
How Can Item Context Effects be Controlled?
There are no easy formulas for preventing undesirable context effects in self-reports. However, a few research-based recommendations can be made for reducing the likelihood of those effects. Many of the recommendations I give are elaborated more thoroughly in Schwarz (1999), and Sudman, et. al. (1996). Some are standard, "textbook" guidelines for designing self-report instruments; others stem uniquely from research on context effects. Although they are useful in general, they carry no specific guarantees. The unique mix of stimuli and accompanying response processes in any given measurement context may make particular features operate differently than described below.
Item order. Prior questions can influence a respondent's interpretation of an item, retrieval of information from memory, judgment, and selection of a response to an item (Tourangeau & Rasinski, 1988). Variables affecting the likelihood and direction of item order effects on these stages of the response process are summarized well by Tourangeau (1999). The variables he summarizes, along with models of context effects may be used to identify potential ordering effects in specific questionnaires.
One model of context effects, called the inclusion-exclusion model (Schwarz & Bless, 1992), predicts two effects that may be generated by the presentation order of items: assimilation and contrast effects. Assimilation occurs when information retrieved for answering a preceding question comes to mind and is used to form a temporary representation of the target of the current question. Contrast effects occur when the information retrieved for an earlier question is excluded from the respondent's temporary representation of the target. Sudman, et al. (1996, pp. 100-129) describe the inclusion-exclusion model and how it may be used to predict assimilation and contrast effects on judgments given specific measurement contexts. If, based on the model, undesirable assimilation or contrast effects are likely, change the order of questions accordingly (if possible), and re-assess the questionnaire.
In general, the question or set of questions most vital to the research should be presented first, to avoid unwanted influence from preceding questions, due to their effects on information accessibility and current question interpretation. If possible, avoid using questions or targets within questions that produce extremely positive or negative judgments because they are likely to strongly influence answers to subsequent questions (Schwarz, 1999). If extreme targets or questions are included, present them last.
Rating scales. When using numerical scale anchors on rating scales, make sure that the meaning conveyed by the numbers reflects the underlying construct. If anchors range from negative to positive (e.g., -3 to +3), respondents are likely to interpret the dimension as bipolar, in which the two poles refer to the presence of opposite attributes; if the anchors are only positive numbers (e.g., 1 to 7), respondents will likely interpret the dimension as unipolar, referring to different degrees of the same attribute (see Schwarz, 1996). For example, in answering the question, "How successful would you say you have been in life?" only 13% of respondents who were presented with a scale ranging from -5 to +5 chose values between -5 and 0, yet 34% of respondents presented with a scale from 0 to 10 chose responses in the equivalent range of 0 to 5 (Schwarz, Knäuper, et al., 1991). Means and standard deviations also indicated that responses were displaced to the high end of the scale with the -5 to +5 response range. Respondents in the -5 to +5 condition may have interpreted the negative anchors (-5 to -1) to refer to degrees of failure, and the positive anchors (+1 to +5) to refer to degrees of success. Respondents presented with anchors from 0 to 10 most likely interpreted them as meaning degrees of success (Schwarz, Knäuper, et al., 1991).
Frequency questions. Use open-ended response formats for measures of frequency of behavior (or other events) and standardize after the data are collected. Avoid vague, verbal frequency anchors such as "always," "frequently," and "sometimes" (Schwarz, 1999). Different respondents use the same term for different objective frequencies of the same behavior; and, the same term may indicate different frequencies in different domains (e.g., "frequently" in reference to drinking coffee may have a different meaning than in reference to drinking tequila).
Several studies have shown the effects that the range of frequencies in response options for these types of questions can have on the response process, and on answers to questions that follow (e.g., Menon, Raghubir, & Schwarz, 1995; Schwarz & Bienias, 1990; Schwarz, Hippler, Deutsch, & Strack, 1985). For example, in an experimental investigation of response option ranges in a somatic complaints scale, respondents who were presented with a high frequency range of responses (from "4 or less" to "9 or more") were much more likely to report feeling "low or emotionally depressed" on five or more occasions during the past month than respondents presented with the low range of response options (from "0" to "5 or more"; Harrison & McLaughlin, 1996). The response ranges in this example may have influenced respondents' interpretation of the intensity of emotional experience in this somewhat ambiguous item. Similar effects were observed in responses to the nine other items of the scale.
In frequency-based items, it is important to instruct respondents clearly on the time reference period they should consider (e.g., "today," "the past two weeks," "the past year") and specific units of the behavior or event (e.g., "minutes per day talking on the telephone" versus "hours per day talking on the telephone"). Consider the implications of the time reference period that respondents are asked to consider, as it may affect interpretation of a question. In a recent study in which the reference period was experimentally manipulated, respondents interpreted a question about how frequently they felt angry as referring to more intense experiences when the question referred to a longer (one year) rather than a shorter (one week) reference period. Correspondingly, respondents reported feeling angry at a more frequent rate over the period of a week than over a year (Winkielman, Knäuper & Schwarz, 1998). Also, make the time period as short as possible to make it easier for respondents to recall.
Item reversals. Avoid placing reverse-worded items (i.e. polar opposites or negative polar opposites) within a series of questions. Reverse-wording, or reversing the direction of evaluative connotation of a subset of items interspersed within a scale, has often been used to deter respondents' acquiescent response styles. However, any gains in controlling acquiescence may be more than offset by losses in psychometric quality and distortion of factor structure that accompany the use of item reversals (e.g., Harrison & McLaughlin, 1993; Schriesheim, Eisenbach, & Hill, 1991).
Scrutinize the instrument for potential context effects. Recognize that any and every irrelevant feature of the survey may be considered relevant and used by respondents to interpret the intended meaning of questions. Lessler and Forsyth (1996) describe a coding system for detecting question features that will likely affect response accuracy and measurement error, which also may be useful for identifying potential sources of unwanted context effects. Have an expert in cognition evaluate the instrument for potential context effects, if one is available.
Pretest. Pre-test the instrument to identify problems. There are several procedures to choose from (see Sudman, et. al., 1996, pp. 258-260). One method is to use "think-aloud interviews" to determine how respondents interpret questions, retrieve information from memory, and form judgments. Think-aloud interviewing techniques based on Ericsson and Simon's (1993) verbal protocol methods are described in Schwarz and Sudman (1996; see also Harrison, McLaughlin, & Coalter, 1996). A cheaper alternative to think-aloud interviewing is to use focus groups to understand how focal concepts and issues are comprehended and retrieved (Sudman, et. al., 1996, pp. 45-46; see Krueger, 1994 for general information about focus groups).
Design and statistical strategies. Contend with context effects using statistical, or design strategies. Include context manipulations in the design of your investigation. Randomize the presentation order of questions and response options (if not necessitated by an underlying dimension) across respondents, if possible. The widespread availability of computer technology makes this a more viable option than in the past. Context effects are not reduced, but become random or at least estimable error rather than systematic confounds. Some context effects may be quantified and partitioned using structural equations modeling (Williams & Anderson, 1994).
Finally, a few generally good recommendations. Some common suggestions for making good self-report instruments in general may also help to control item context effects. For example, do what you can to improve the clarity and specificity of instructions, questions, and response options, using the everyday language of those who will provide answers. When it comes to item context effects, respondents are less likely to use other, extraneous features of the questionnaire for interpretation if questions are unambiguous. Give respondents the resources they need to complete the instrument, removing environmental distractions and instructing them to take their time and provide thoughtful answers. This may lead them to retrieve more chronically accessible information from memory, and rely less on temporarily accessible information provided by the context. Assuring anonymity and confidentiality can help reduce reliance on contextual cues for information about what may be considered socially desirable or expected of a "normal" person, and editing of judgments for self-presentation during response generation.
Context effects are not always bad, and some may be desirable. With respect to question ordering, questions can be strategically placed earlier to encourage recall of information that the researcher would like the respondent to consider in subsequent questions (Sudman et. al. 1996, p. 263). A researcher may group items (that measure the same construct) together, perhaps boxing them or using some other graphical method to separate them from other items, to facilitate comprehension of the questions and retrieval of information. This may result in greater discriminant validity of the construct measured with the grouped items (Harrison & McLaughlin, 1996).
Conclusion
The set of recommendations described here includes most of what can be taken from the extant empirical work on item context effects. Survey developers and users may wonder about possible effects of other contextual features that are not discussed here. With several, cognitive theories of the question-answering process developed recently, we are likely to soon see empirical work in which the effects of other such features of surveys are evaluated. Interest in research on context effects will continue to grow because of its broad and expanding impact, with many disciplines relying on self-report methods of data collection.
References
Ericsson, K.A., & Simon, H.A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.
Feldman, J. M. & Lynch, J. G., Jr. (1988). Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Applied Psychology, 73, 421-435.
Harrison, D.A., & McLaughlin, M.E. (1993). Cognitive processes in self-report responses: Tests of item context effects in work attitude measures. Journal of Applied Psychology, 78: 129-140.
Harrison, D.A., & McLaughlin, M.E. (1996). Structural properties and psychometric qualities of organizational self-reports: Field tests of connections predicted by cognitive theory. Journal of Management, 22: 313-338.
Harrison, D.A., McLaughlin, M.E., & Coalter, T.M. (1996). Context, cognition, and common method variance: Psychometric and verbal protocol evidence. Organizational Behavior and Human Decision Processes, 68: 246-261.
Krosnick, J.A. (1999). Maximizing questionnaire quality. In J.P. Robinson & P.R. Shaver (Eds.), Measures of political attitudes, Ch. 2. San Diego, CA: Academic Press.
Krueger, R.A. (1994). Focus groups. Newbury Park, CA: Sage.
Lavine, H., Huff, J.W., Wagner, S.H., & Sweeny, D. (1998). The moderating influence of attitude strength on the susceptability to context effects in attitude surveys. Journal of Personality and Social Psychology, 75: 359-373.
Lesser, J.T., & Forsyth, B.H. (1995). A coding system for appraising questionnaires. In N. Schwarz & S. Sudman (Eds.) Answering Questions: Methodology for Determining Cognitive and Communicative Processes in Survey Research, Ch. 11. San Francisco: Jossey-Bass.
Menon, G., Raghubir, P., & Schwarz, N. (1995). Behavioral frequency judgments: An accessibility-diagnosticity framework. Journal of Consumer Research, 22, 212-228.
Schriesheim, C.A., Eisenbach, R.J., & Hill, K.D. (1991). The effect of negation and polar opposite item reversals on questionnaire reliability and validity: An experimental investigation. Educational and Psychological Measurement, 51: 67-78.
Schwarz, N. (1996). Cognition and communication: Judgmental biases, research methods, and the logic of conversation. Mahwah, New Jersey: Lawrence-Erlbaum, pp. 43-46.
Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54: 93-105.
Schwarz, N., & Bienias, J. (1990). What mediates the impact of response alternatives on frequency reports of mundane behaviors? Applied Cognitive Psychology, 4: 61-72.
Schwarz, N., & Bless, H. (1992). Constructing reality and its alternatives: Assimilation and contrast effects in social judgment. In L.L. Martin & A. Tesser (Eds.), The Construction of Social Judgments. Hillsdale, NJ: Erlbaum.
Schwarz, N., Hippler, J., Deutsch, B., & Strack, F. (1985). Response scales: Effects of category range on reported behavior and comparative judgments. Public Opinion Quarterly, 49: 388-385.
Schwarz, N., Knäuper, B., Hippler, H.J., Noelle-Neumann, E., & Clark, F. (1991). Rating scales: Numeric values may change the meaning of rating scales. Public Opinion Quarterly, 55, 570-582.
Schwarz, N., & Sudman, S. (Eds.; 1995). Answering questions: Methodology for determining cognitive and communicative processes in survey research. San Francisco: Jossey-Bass.
Strack, F., & Martin, L.L. (1987). Thinking, judging, and communicating:
A process account of context effects in attitude surveys. In H.J. Hippler, N.
Schwarz, & S. Sudman (Eds.), Social information processing and survey
methodology (pp. 123-148). New York: Springer-Verlag.
Sudman, S., Bradburn, N.M., & Schwarz, N. (1996). Thinking about answers: The application of cognitive processes to survey methodology. San Francisco: Jossey Bass.
Tourangeau, R. (1992). Attitudes as memory structures: Belief sampling and context effects. In N. Schwarz & S. Sudman (Eds.), Context Effects in Social and Psychological Research (pp. 35-47). New York: Springer-Verlag.
Tourangeau, R. (1999). Context effects on answers to attitude questions. In M.G. Sirken, D.J. Hermann, S. Schechter, N. Schwarz, J.M. Tanur, & R. Tourangeau (Eds.), Cognition and Survey Research. New York: Wiley.
Tourangeau, R. & Rasinski, K.A. (1988). Cognitive processes underlying context effects in attitude measurement. Psychological Bulletin, 103: 299-314.
Williams, L.J., & Anderson, S.E. (1994). An alternative approach to method effects using latent-variable models: Applications in organizational behavior research. Journal of Applied Psychology, 79: 323-331.
Winkielman, P., Knäuper, B., & Schwarz, N. (1998). Looking back at anger: Reference periods change the interpretation of emotion frequency questions. Journal of Personality and Social Psychology, 75: 719-728.