
心理与行为研究 ›› 2026, Vol. 24 ›› Issue (2): 151-160.DOI: 10.12139/j.1672-0628.2026.02.002
杨官豆豆1, 谭静文1,2, 刘苗苗1,3, 李虹1
收稿日期:2025-03-18
出版日期:2026-03-20
发布日期:2026-03-20
通讯作者:
李虹
基金资助:YANG Guandoudou1, TAN Jingwen1,2, LIU Miaomiao1,3, LI Hong1
Received:2025-03-18
Online:2026-03-20
Published:2026-03-20
摘要: 两两比较作为一种相对整体并能够高效获得可靠结果的文本难度评估手段,在汉语文本评估中的效果还有待探索。本研究通过80名评估者对80篇文本的两两比较,探讨了比较次数对两两比较信效度的影响。结果表明,两两比较获得的结果具有较高的信度,且与文本册数和可读性分数显著相关,随着比较次数的增加,信效度逐渐增加且趋于稳定,同时未发现评估者特征对比较结果的影响,这意味着两两比较在汉语文本难度评估中具备一定可靠性。
中图分类号:
杨官豆豆, 谭静文, 刘苗苗, 李虹. 两两比较在汉语文本难度评估中的应用[J]. 心理与行为研究, 2026, 24(2): 151-160.
YANG Guandoudou, TAN Jingwen, LIU Miaomiao, LI Hong. The Application of the Comparative Judgment in Chinese Text Difficulty Assessment[J]. Studies of Psychology and Behavior, 2026, 24(2): 151-160.
| 陈茹玲, 蔡鑫廷, 宋曜廷, 李宜宪. (2015). 文本适读性分级架构之建立研究. 教育科学研究期刊, 60(1), 1–32 刘苗苗, 李燕, 王欣萌, 甘琳琳, 李虹. (2021). 分级阅读初探: 基于小学教材的汉语可读性公式研究. 语言文字应用, (2), 116–126 杨慊, 贺文洁, 王海龙. (2021). 单参数单维度Rasch模型的优势与意义. 心理科学, 44(6), 1491–1498 中国新闻出版研究院. (2022). 第十九次全国国民阅读调查结果. 2022-11-30取自https://society.huanqiu.com/article/47ix20UIt5x Bartholomew, S. R., Ruesch, E. Y., Hartell, E., & Strimel, G. J. (2020). Identifying design values across countries through adaptive comparative judgment. International Journal of Technology and Design Education, 30(2), 321–347 Bloxham, S. (2009). Marking and moderation in the UK: False assumptions and wasted resources. Assessment & Evaluation in Higher Education, 34(2), 209–220 Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3–4), 324–345 Bramley, T. (2007). Paired comparison methods. In P. Newton, J. A. Baird, H. Goldstein, H. Patrick, & P. Tymms (Eds.), Techniques for monitoring the comparability of examination standards (pp. 246–300). London: Qualifications and Curriculum Authority. Bramley, T. (2015). Investigating the reliability of adaptive comparative judgment. Cambridge: Cambridge University Press & Assessment. Bramley, T., & Vitello, S. (2019). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 26(1), 43–58. Chall, J. S., & Conard, S. S. (1991). Should textbooks challenge students? :The case for easier or harder books. New York: Teachers College Press. Chen, S. Y., & Fang, S. P. (2015). Developing a Chinese version of an author recognition test for college students in Taiwan. Journal of Research in Reading, 38(4), 344–360 Coertjens, L., Lesterhuis, M., Verhavert, S., van Gasse, R., & De Maeyer, S. (2017). Judging texts with rubrics and comparative judgement: Taking into account reliability and time investment. Pedagogische Studien, 94(4), 283–303 Crompvoets, E. A. V., Béguin, A. A., & Sijtsma, K. (2020). Adaptive pairwise comparison for educational measurement. Journal of Educational and Behavioral Statistics, 45(3), 316–338 Crossley, S., Heintz, A., Choi, J. S., Batchelor, J., Karimi, M., & Malatinszky, A. (2023). A large-scaled corpus for assessing text readability. Behavior Research Methods, 55(2), 491–507 Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading, 42(3–4), 541–561 Dale, E., & Chall, J. S. (1949). The concept of readability. Elementary English, 26(1), 19–26 Fountas, I. C., & Pinnell, G. S. (2012). Guided reading: The romance and the reality. The Reading Teacher, 66(4), 268–284 Fry, E. (2002). Readability versus leveling. The Reading Teacher, 56(3), 286–291 Jones, I., & Inglis, M. (2015). The problem of assessing problem solving: Can comparative judgement help? Educational Studies in Mathematics, 89(3), 337–355 Jones, I., Swan, M., & Pollitt, A. (2015). Assessing mathematical problem solving using comparative judgement. International Journal of Science and Mathematics Education, 13(1), 151–177 Kuhn, M. R., Schwanenflugel, P. J., & Meisinger, E. B. (2010). Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Reading Research Quarterly, 45(2), 230–251 Landrieu, Y., De Smedt, F., van Keer, H., & De Wever, B. (2022). Assessing the quality of argumentative texts: Examining the general agreement between different rating procedures and exploring inferences of (dis)agreement cases. Frontiers in Education, 7, 784261 Lesterhuis, M., Bouwer, R., van Daal, T., Donche, V., & De Maeyer, S. (2022). Validity of comparative judgment scores: How assessors evaluate aspects of text quality when comparing argumentative texts. Frontiers in Education, 7, 823895 Lesterhuis, M., van Daal, T., van Gasse, R., Coertjens, L., Donche, V., & De Maeyer, S. (2018). When teachers compare argumentative texts: Decisions informed by multiple complex aspects of text quality. L1-Educational Studies in Language and Literature, 18(1), 1–22 Liu, M. M., Li, Y. X., Su, Y. Q., & Li, H. (2024). Text complexity of Chinese elementary school textbooks: Analysis of text linguistic features using machine learning algorithms. Scientific Studies of Reading, 28(3), 235–255 Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: John Wiley & Sons, Inc. Meng, X. L., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172–175 Paquot, M., Rubin, R., & Vandeweerd, N. (2022). Crowdsourced adaptive comparative judgment: A community-based solution for proficiency rating. Language Learning, 72(3), 853–885 Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19(3), 281–300. Pollitt, A., & Murray, N. L. (1996). What raters really pay attention to. In M. Milanovic & N. Saville (Eds.), Studies in language testing 3: Performance testing, cognition and assessment (pp. 74–91). Cambridge: Cambridge University Press. Renaissance. (2022). What kids are reading report 2022. Retrieved November 30, 2022, from https://www.renaissance.com/2022/03/01/news-renaissance-shares-findings-of-worlds-largest-annual-k12-reading-survey/ Sheehan, K. M., Kostin, I., Napolitano, D., & Flor, M. (2014). The TextEvaluator tool: Helping teachers and test developers select texts for use in instruction and assessment. The Elementary School Journal, 115(2), 184–209 Smith, D. R., Stenner, A. J., Horabin, I., & Smith, M. (1989). The lexile scale in theory and practice: Final report for NIH Grant HD-19448. Bethesda, MD: National Institutes of Health. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34(4), 273–286 Thwaites, P., Kollias, C., & Paquot, M. (2024). Is CJ a valid, reliable form of L2 writing assessment when texts are long, homogeneous in proficiency, and feature heterogeneous prompts? Assessing Writing, 60, 100843 Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice, 26(5), 541–562. Verhavert, S., De Maeyer, S., Donche, V., & Coertjens, L. (2018). Scale separation reliability: What does it mean in the context of comparative judgment? Applied Psychological Measurement, 42(6), 428–445 Wheadon, C., Barmby, P., Christodoulou, D., & Henderson, B. (2020). A comparative judgement approach to the large-scale assessment of primary writing in England. Assessment in Education: Principles, Policy & Practice, 27(1), 46–64. |
| [1] | 王娅珂, 张雨轩, 冯琳琳, 卡明芳, 梁菲菲. 小学三到五年级儿童阅读眼动模式的发展及其与阅读理解的关系[J]. 心理与行为研究, 2026, 24(2): 161-169. |
| [2] | 卢林鑫, 刘在花, 刘红萍, 蔺秀云, 周瀚翔, 班永飞, 孙霁, 李小青, 张依晴, 黄海珍. 特校跨文化敏感性氛围与视障学生亲社会行为:双文化认同整合的多层贝叶斯中介模型[J]. 心理与行为研究, 2026, 24(2): 170-177. |
| [3] | 王丹云, 王玉龙, 唐卓. 青少年运动习惯与自尊、心理韧性的动态关系:基于交叉滞后与随机截距交叉滞后模型的分析[J]. 心理与行为研究, 2026, 24(2): 178-186. |
| [4] | 马燕, 王振宏. 道德自我知觉对欺骗行为的影响:解释水平的调节作用[J]. 心理与行为研究, 2024, 22(1): 39-45. |
| [5] | 刘思远, 朱麟, 王瑞冰, 徐楚言, 王芸萍, 刘聪慧. 道德决策中是否存在方言效应?[J]. 心理与行为研究, 2024, 22(1): 31-38. |
| [6] | 姚远青, 郭易安, 李春梅, 吴亚楠, 石雷, 赵广平. 几何图形社会角色隐喻的映射机制:行为和ERPs证据[J]. 心理与行为研究, 2024, 22(1): 23-30. |
| [7] | 付春野, 吕勇. 预期与时间注意对视觉感知的影响[J]. 心理与行为研究, 2024, 22(1): 15-22. |
| [8] | 钱程, 赵越, 牛溪溪, 顾佳灿, 王爱君. 三维空间深度位置上情绪面孔对返回抑制的影响[J]. 心理与行为研究, 2024, 22(1): 8-14. |
| [9] | 郭梅华, 兰泽波, 巫金根, 李赛男, 吴俊杰, 闫国利. 汉语词切分和字号对阅读知觉广度的影响:眼动的证据[J]. 心理与行为研究, 2024, 22(1): 1-7. |
| [10] | 陈汝淇, 包亚倩, 黄林洁琼, 李兴珊. 中文阅读中词语加工与眼动控制整合模型简介[J]. 心理与行为研究, 2023, 21(6): 725-735. |
| [11] | 梁菲菲, 冯琳琳, 刘瑛, 王昶浩, 王洁. 词素位置概率信息在中文双字词识别中的作用:词汇语境多样性的调节[J]. 心理与行为研究, 2023, 21(6): 736-743. |
| [12] | 于秒, 王文娣, 陈晓霄. 汉语“N的V”结构加工的韵律制约[J]. 心理与行为研究, 2023, 21(6): 744-750. |
| [13] | 陈婉婷, 张逸飞, 何清华. 准确性提示降低错误信息的分享意愿[J]. 心理与行为研究, 2023, 21(6): 751-759. |
| [14] | 马大付, 秦春影, 喻晓锋, 何催. 项目区分度指标在属性多水平和混合计分项目下的组卷研究[J]. 心理与行为研究, 2023, 21(6): 760-769. |
| [15] | 刘蕾, 李亚楠, 牛若愚, 于文婷, 陈玉雪, 刘莹. 踏步任务下动作同步的神经基础[J]. 心理与行为研究, 2023, 21(5): 600-607. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||