MacNell, Driscoll, & Hunt (2014). What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching.

MacNell, L., Driscoll, A., & Hunt, A. N. (2014). What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching. Innovative Higher Education, 40(4), 291–303.

“Students rated the male identity significantly higher than the female identity, regardless of the instructor’s actual gender, demonstrating gender bias.” (p 291)

“Though far from perfect, student ratings of teaching provide valuable feedback about an instructor’s teaching effectiveness (Svinicki & McKeachie, 2010). They may be reliably interpreted as both a direct measure of student satisfaction with instruction and as an indirect [page break] measure of student learning (Marsh, 2007; Murray, 2007).” (p 292-293)

[“Gender Bias in Academia” (p 293) …]

“Gender then contributes to a hierarchal system of power relations that is embedded within the interactional and institutional levels of society and shapes gendered expectations and experiences in the workplace (Risman, 2004).” (p 293)

[“Gender Role Expectations” (p 294) …]

“Students often expect their male and female professors to behave in different ways or to respectively exhibit certain ‘masculine’ and ‘feminine’ traits. Commonly held masculine, or ‘effectiveness,’ traits include professionalism and objectivity; feminine, or ‘interpersonal,’ traits include warmth and accessibility. Students hold their instructors accountable to these gendered behaviors and are critical of instructors who violate these expectations (Bachen, McLoughlin, & Garcia, 1999; Chamberlin & Hickey, 2001; Dalmia, Giedeman, Klein, & Levenburg, 2005; Sprague & Massoni, 2005). Consequently, instructors who adhere to gendered expectations are viewed more favorably by their students (Andersen & Miller, 1997; Bennet, 1982).” (p 294)

[“Methodological Concerns with Previous Studies of Gender Bias” (p 294)]

[“Instrument” (p 297) …]

“… exploratory factor analysis …. Principal component factor analysis …” (p 297)

[“Analysis” (p 297) …]

“To test for the existence of gender bias in student ratings of teaching, we made two types of comparisons. First we compared across the actual gender of the assistant instructor, combining the two groups that had the female assistant instructor (one of which thought they had a male) into one category and doing the same with the two groups that had the male assistant instructor. Second, we compared across the perceived gender of the assistant instructor, combining the two groups that thought they had a female assistant instructor (one of which was actually a male) into one category and doing the same with the two groups that thought they had a male assistant instructor. … [page break] … A MANOVA allows a researcher to test a set of correlated dependent variables and conduct a single, overall comparison between the groups formed by categorical independent variables (Garson, 2012). This F-test of all means addresses the potential for false positive findings as the result of multiple comparisons.[3]” (p 297-298)

[Footnotes …]

“[3] We acknowledge that the application of parametric analytical techniques (ANOVA, MANOVA, and _t_-tests) to ordinal data (the Likert scale responses) remains controversial among social scientists and statisticians. (See Knapp (1990) for a relatively balanced review of the debate.) We side with the arguments of Gaito (1980) and Armstrong (1981) and argue that it is appropriate to do so in our case as the concept being measured is interval, even if the data labels are not. This practice is common within higher education research. (e.g. Centra & Gaubatz [2000] Young, Rush, & Shaw [2009]; Basow [1995]; and Knol et al. [2013])

“[4] While we acknowledge that a significance level of .05 is conventional in social science and higher education research, we side with Skipper, Guenther, and Nass (1967), Labovitz (1968), and Lai (1973) in pointing out the arbitrary nature of conventional significance levels. Considering our study design, we have used a significance level of .10 for some tests where: 1) the results support the hypothesis and we are consequently more willing to reject the null hypothesis of no difference; 2) our hypothesis is strongly supported theoretically and by empirical results in other studies that use lower significance levels; 3) our small _n_ may be obscuring large differences; and 4) the gravity of an increased risk of Type I error is diminished in light of the benefit of decreasing the risk of a Type II error (Labovitz, 1968; Lai, 1973).” (p 294)

[“Results” (p 298) …]

“Our MANOVAs indicate that there is a significant difference in how students rated the perceived male and female instructors (p <0.05), but not the actual male and female instructors.” (p 298)

[“Discussion” (p 299) …]

“… students rated the instructors they perceived to be female lower than those they perceived to be male, regardless of teaching quality or actual gender of the instructor. The perceived female instructor received significantly lower ratings on six of the 12 metrics on the survey, as well as on the student ratings index.” (p 300)

[“Conclusions” (p 301) …]

“Our findings show that the bias we saw here is not a result of gendered behavior on the part of the instructors, but of actual bias on the part of the students. Regardless of actual gender or performance, students rated the perceived female instructor significantly more harshly than the perceived male instructor, which suggests that a female instructor would have to work harder than a male to receive comparable ratings.” (p 301)

“This study demonstrates that gender bias is an important deficiency of student ratings of teaching.” (p 301)

Selected References

  • Andersen, K., & Miller, E. D. (1997). Gender and student evaluations of teaching. Ps-Political Science and Politics, 30, 216–219.
  • Armstrong, G. D. (1981). Parametric statistics and ordinal data: A pervasive misconception. Nursing Research, 30, 60–62.
  • Bachen, C. M., McLoughlin, M. M., & Garcia, S. S. (1999). Assessing the role of gender in college students’ evaluations of faculty. Communication Education, 48, 193–210.
  • Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology, 87, 656–665.
  • Bennett, S. K. (1982). Student perceptions of and expectations for male and female instructors: Evidence relating to the question of gender bias in teaching evaluation. Journal of Educational Psychology, 74, 170–179.
  • Centra, J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? Journal of Higher Education, 71, 17–33.
  • Chamberlin, M. S., & Hickey, J. S. (2001). Student evaluations of faculty performance: The role of gender expectations in differential evaluations. Educational Research Quarterly, 25, 3–14.
  • Dalmia, S., Giedeman, D. C., Klein, H. A., & Levenburg, N. M. (2005). Women in academia: An analysis of their expectations, performance and pay. Forum on Public Policy, 1, 160–177.
  • Gaito, J. (1980). Measurement scales and statistics: Resurgence of an old misconception. Psychological Bulletin, 87, 564–567.
  • Garson, G. D. (2012). General linear models: Multivariate GLM & MANOVA/MANCOVA. Asheboro, NC: Statistical Associates.
  • Knapp, T. R. (1990). Treating ordinal scales as interval scales: An attempt to resolve the controversy. Nursing Research, 39, 121–123.
  • Knol, M. H., Veld, R., Vorst, H. C. M., van Driel, J. H., & Mellenbergh, G. J. (2013). Experimental effects of student evaluations coupled with collaborative consultation on college professors’ instructional skills. Research in Higher Education, 54, 825–850.
  • Labovitz, S. (1968). Criteria for selecting a significance level: A note on the sacredness of .05. The American Sociologist, 3, 220–222.
  • Lai, M.K. (1973). The case against tests of statistical significance. Report from the Teacher Education Division Publication Series. Retrieved from
  • Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319–383). Dordrecht, The Netherlands: Springer.
  • Murray, H. G. (2007). Low-inference teaching behaviors and college teaching effectiveness: Recent developments and controversies. In R. P. Perry & J. C. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 145–183). Dordrecht, The Netherlands: Springer.
  • Risman, B. J. (2004). Gender as a social structure: Theory wrestling with activism. Gender & Society, 18, 429–450.
  • Skipper, J. K., Guenther, A. C., & Nass, G. (1967). The sacredness of.05: A note concerning the uses of statistical levels of significance in social science. The American Sociologist, 1, 16–18.
  • Sprague, J., & Massoni, K. (2005). Student evaluations and gendered expectations: What we can’t count can hurt us. Sex Roles, 53, 779–793.
  • Svinicki, M.,&McKeachie, W. J. (2010). McKeachie’s teaching tips: Strategies, research, and theory for college and university teachers (13th ed.). Belmont, CA: Wadsworth.
  • Young, S., Rush, L., & Shaw, D. (2009). Evaluating gender bias in ratings of university instructors’ teaching effectiveness. International Journal of Scholarship of Teaching and Learning, 3, 1–14.
See this page at