An Investigation of Rater Effects on L2 Translation Performance Scores

Nilufer Aybirdi, Turgay Han


This study examines the impact of L2 translation quality and rating experience on raters’ scoring behaviors and on the reliability and variability of scores. It also investigates frequently used decision-making strategies applied by raters assigning scores to different qualities of translation papers produced by Turkish EFL students. In total, 80 translation papers (40 low-quality and 40 high-quality) obtained from the participating students were given to 10 raters to score using a translation scoring rubric. Results revealed that less experienced raters were more positive while scoring the students’ translation performances and assigned higher and more consistent scores to the papers.


translation assessment, decision-making strategies, rating experience, translation quality, think-aloud protocols.

Full Text:



American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Angelelli, C. V. (2009). Testing and assessment in translation and interpreting studies. John Benjamins.

Baba, K. (2009). Aspects of lexical proficiency in writing summaries in a foreign language. Journal of Second Language Writing, 18, 191–208.

Baker, A. B. (2010). Playing with the stakes: A consideration of an aspect of the social context of a gatekeeping writing assessment. Assessing Writing, 15(3), 133–153.

Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12(2), 86–107.

Barkaoui, K. (2010). Do ESL essays raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44(1), 31–57.

Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51–75.

Berggren, I. (1972). Does the use of translation exercises have negative effects on the learning of a second language? Gothenburg University.

Brennan, R. L. (2001). Generalizability theory. Springer.

Brennan, R. L. (2011). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1–21.

Brown, J. D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25(4), 587–603.

Brown, G. T. (2004). Teachers’ conceptions of assessment: Implications for policy and professional development. Assessment in Education: Principles, Policy & Practice, 11(3), 301–318.

Calis, E., & Dikilitas, K. (2012). The use of translation in EFL classes as L2 learning practice. Procedia-Social and Behavioral Sciences, 46, 5079–5084.

Chang, Y. (2002). EFL teachers’ responses to L2 writing. Reports Research (143). Retrieved from

Colina, S. (2008). Translation quality evaluation: Empirical evidence for a functionalist approach. The Translator, 14(1), 97–134.

Cook, G. (2010). Translation in language teaching: An argument for reassessment. Oxford University Press.

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. Wiley.

Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86(1), 67–96.

Dickins, J., Hervey, S., & Higgins, I. (2016). Thinking Arabic translation: A course in translation method: Arabic to English. Taylor & Francis.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185.

El-Banna, A. I. (1993). The development and validation of a multiple-choice translation test for ESL college freshmen.

Eyckmans, J., Anckaert, P., & Segers, W. (2009). The perks of norm-referenced translation evaluation, In C. V Angelelli, & H. E Jacobson, (Eds.), Testing and assessment in translation and interpreting (pp. 73–93). John Benjamins.

Ferris, D. R. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28(2), 414–420.

Freedman, S. W. (1981). Influences on evaluators of expository essays: Beyond the text. Research in the Teaching of English, 15(3), 245–255.

Ghaiyoomian, H., & Zarei, G. R. (2015). The effect of using translation on learning grammatical structures: A case study of Iranian junior high school students. Research in English Language Pedagogy, 3(1), 32–39.

Ghonsooly, B. (1993): Development and validation of a translation test. Edinburgh Working Papers in Applied Linguistics, 4, 54–62.

Güler, N., Uyanık, G. K., & Teker, G. T. (2012). Genellenebilirlik kuramı. Pegem Akademi Yayınları.

Han, T. (2017). Scores assigned by inexpert EFL raters to different quality of EFL compositions, and the raters’ decision-making behaviors. International Journal of Progressive Education, 13(1), 136–152.

Han, C., & Shang, X. (2023). An item-based, Rasch-calibrated approach to assessing translation quality. Target, 35(1), 63–96.

Hatim, B., & Mason, I. (1997). The translator as communicator. Routledge.

Heaton, J. B. (2003). Writing English language tests. Longman.

Homburg, T. J. (1984). Holistic evaluation of ESL compositions: Can it be validated objectively? TESOL Quarterly, 18(1), 87–107.

Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-scale assessments? A generalizability theory approach. Assessing Writing, 13(3), 201–218.

Huang, J., Han, T., Tavano, H., & Hairston, L. (2014). Using generalizability theory to examine the impact of essay quality on rating variability and reliability of ESOL writing. In J. Huang & T. Han (Eds.), Empirical quantitative research in social sciences: Examining significant differences and relationships (pp. 127–149). Untested Ideas Research Center.

Ito, A. (2004). Two types of translation tests: Their reliability and validity. System, 32(3), 395–405.

Källkvist, M. (1998). How different are the results of translation tasks? A study of lexical errors. In K. Malmkjær (Ed.), Translation and language teaching: Language teaching and translation (pp. 77–87). St Jerome.

Källkvist, M. (2008). L1–L2 translation versus no translation. In L. Ortega, & H. Byrnes (Eds.), The longitudinal study of advanced L2 capacities (pp.182–202). Routledge.

Lado, R. (1964). Language teaching, a scientific approach. McGraw-Hill.

Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabulary learning: A case for contrastive analysis and translation. Applied Linguistics, 29(4), 694–716.

Laukkanen, J. (1996). Affective and attitudinal factors in translation processes. International Journal of Translation Studies, 8(2), 257–274.

Lee, T. Y. (2013). Incorporating translation into the language classroom and its potential impacts upon L2 learners. In D. Tsagari, & G. Floros (Eds.), Translation in language teaching and assessment (pp. 3–22). Cambridge Scholars Publishing.

Marais, K. (2013). Constructive alignment in translator education: reconsidering assessment for both industry and academy. Certification, 5(1), 13–31.

Melis, N. M., & Albir, A. H. (2001). Assessment in translation studies: Research needs. Meta, 46(2), 272–287.

Mundt, K., & Groves, M. (2016). A double-edged sword: the merits and the policy implications of Google Translate in higher education. European Journal of Higher Education, 6(4), 387–401.

Neves, R. R. (2002). Translation quality assessment for research purposes: An empirical approach. Cadernos de Tradução, 2(10), 113–131.

Orozco, M. (2000). Building a measuring instrument for the acquisition of translation competence in trainee translators. Benjamins Translation Library, 38, 199–214.

Pöchhacker, F. (1994). Simultan dolmetschen als komplexes handeln. Narr.

Prince, P. (1996). Second language vocabulary learning: The role of context versus translation as a function of proficiency. The Modern Language Journal, 80(4), 478–493.

Pym, A. (2010). Exploring translation theories. Routledge.

Rinnert, C., & Kobayashi, H. (2001). Differing perceptions of EFL writing among readers in Japan. The Modern Language Journal, 85(2), 189–209.

Shavelson, R. J., Webb, N. M., & Rowley, G. L. (1989). Generalizability theory. American Psychologist, 44(6), 922–932.

Shirazi, M. A. (2019). For a greater good: Bias analysis in writing assessment. SAGE Open, 9(1), 1–14.

Soleimani, H., & Heidarikia, H. (2017). The effect of translation as a noticing strategy on learning complex grammatical structures by EFL learners. Applied Linguistics Research Journal, 1(1), 1–13.

Song, B., & Caruso, I. (1996). Do English and ESL faculty differ in evaluating the essays of native English-speaking, and ESL students? Journal of Second Language Writing, 5(2), 163–182.

Stapleton, P., & Kin, B. L. K. (2019). Assessing the accuracy and teachers' impressions of Google Translate: A study of primary L2 writers in Hong Kong. English for Specific Purposes, 56, 18–34.

Sweedler-Brown, C. O. (1985). The influence of training and experience on holistic essay evaluation. English Journal, 74(5), 49–55.

Şahan, Ö. (2018). The impact of rating experience and essay quality on rater behavior and scoring [Unpublished doctoral dissertation]. Çanakkale Onsekiz Mart University.

Tavakoli, M., Shafiei, S., & Hatam, A. H. (2012). The relationship between translation tests and reading comprehension: A case of Iranian University students. Iranian Journal of Applied Language Studies, 4(1), 193–211.

Uzawa, K. (1996). Second language learners’ processes of L1 writing, L2 writing, and translation from L1 into L2. Journal of Second Language Writing, 5(3), 271–294.

Vaezi, S., & Mirzaei, M. (2007). The effect of using translation from L1 to L2 as a teaching technique on the improvement of EFL learners’ linguistic accuracy–focus on form. Humanising Language Teaching, 9(5), 79–121.

Webb, N. M., & Shavelson, R. J. (2005). Generalizability theory. In B. S. Everitt, & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science (pp. 717–719). Wiley.

Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145–178.

Yıldız, M. (2020). A critical perspective on the translation quality assessments of five translators` organizations: ATA, CTTIC, ITI, NAATI, and SATI. RumeliDE Dil ve Edebiyat Araştırmaları Dergisi, (18), 568–589.

Date of publication: 2024-10-07 11:52:38
Date of submission: 2023-12-23 19:42:41


Total abstract view - 64
Downloads (from 2020-06-17) - PDF - 30



  • There are currently no refbacks.

Copyright (c) 2024 Nilufer Aybirdi, Turgay Han

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.