Setting Benchmarked Performance Standards: A Content Focused, Judgmental Approach, Procedures, and Some Empirical Results

Setting Benchmarked Performance Standards: A Content Focused, Judgmental Approach, Procedures, and Some Empirical Results


  • Cognia
  • Creative Measurement Solutions LLC
  • Center for Assessment


Benchmarked Standards, External Benchmarks, Standard Setting


In this paper we describe content focused, practical procedures to establish benchmarked performance standards for educational and other tests. To implement our procedures, we identify one or more external benchmark (e.g., NAEP or PISA performance levels, college and career readiness benchmarks); link to the external benchmarks to locate those standards on a test’s score scale; write aligned achievement level descriptors; and train standard setting panelists to review, then retain or adjust, the benchmarked cut scores and provide content based rationales for resulting recommendations. We provide rationales for benchmarking cut scores; define and characterize benchmarked cut scores; describe the history and evolution of benchmarked cut scores; review differences between our content focused approach, which employs empirical analyses, and purely empirical benchmarking; and describe workshop procedures and selected results from establishing benchmarked cut scores and performance standards for two assessment programs.


Download data is not yet available.


Metrics Loading ...




How to Cite

Ferrara, S., Lewis, D., & D’Brot, J. (2021). Setting Benchmarked Performance Standards: A Content Focused, Judgmental Approach, Procedures, and Some Empirical Results. Journal of Applied Testing Technology, 22(1), 52–73. Retrieved from





Beimers, J. N., Way, W. D., McClarty, K. L., & Miles, J. A. (2012, January). Evidence based standard setting: Establishing cut scores by integrating research evidence with expert content judgments. Austin, TX: Pearson.

Bejar, I. I., Braun, H. I., & Tannenbaum, R. J. (2007). A prospective, progressive, and predictive approach to standard setting. In R. W. Lissitz (Ed.), Assessing and modeling cognitive development in school (pp. 1–30). Maple Grove, MN: JAM Press.

Budescu, D. V., & Chen, E. (2015). Identifying expertise to extract the wisdom of crowds. Management Science, 61(2), 267-280. https://

Camara, W. (2013). Defining and measuring college and career readiness: A validation framework. Educational Measurement: Issues and Practice, 32(4), 16-27.

Cizek, G. J. (Ed). (2001). Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.

Cizek, G. J. (Ed.) (2005). Vertically moderated standard setting [Special Issue]. Applied Measurement in Education, 18 (1).

Cizek, G. J. (Ed). (2012). Setting performance standards: Foundations, methods, and innovations (2nd ed.). New York: Routledge.

Cizek, G. J., & Agger, C. A. (2012). Vertically moderated standard setting. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 467–484). New York: Routledge.

Cizek, G. J., & Agger, C. A. (2012). Vertically moderated standard setting. In G. J. Cizek (Ed.), In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 467-484). New York: Routledge.

Davis, L. L., & Moyer, E. L. (2015 December). PARCC achievement setting technical report. Available by request from and from the Educational Resources Information Center (ERIC), identified as ED599257.

Egan, K. L., Schneider, M. C., & Ferrara, S. (2012). Performance level descriptors: History, practice, and a proposed framework. In G. J. Cizek (Ed.). Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 79-106). New York: Routledge.

Every Student Succeeds Act. (2015), Pub. L. No. 115-64, Stat. 1177. pdf/BILLS-114s1177enr.pdf.

Ferrara, S. (2017 April 28). Aligning item response demands with knowledge and skill requirements in achievement level descriptors: An approach to achieving full alignment and engineering cut scores. In D. Lewis (Chair), Engineered cut scores: Aligning standard setting methodology with contemporary assessment design principles. Coordinated session conducted at the annual meeting of the National Council on Measurement in Education, San Antonio, TX.

Ferrara, S., Johnson, E., & Chen, W-H. (2005). Vertically articulated performance standards: Logic, procedures, and likely classification accuracy. Applied Measurement in Education, 18 (1), 35-59.

Ferrara, S., & Lewis, D. M. (2012). The Item-Descriptor (ID) Matching method. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 255-282). New York: Routledge.

Ferrara, S, Svetina, D., Skucha, S., & Davidson, A. H. (2011). Test design with performance standards and achievement growth in mind. Educational Measurement: Issues and Practice, 30(4), 3-15.

Galton, F. (1907). Vox populi. Nature, 75(1949), 450–451.

Glass, G. V. (1978). Standards and criteria. Journal of Educational Measurement, 15(4), 237–261.

Green, D. R., Trimble, C. S., & Lewis, D. M. (2003). Interpreting the results of three different standard setting procedures. Educational Measurement: Issues and Practice, 22(1), 22–32.

Haertel, E. H. (2002). Standard setting as a participatory process: Implications for validation of standards-based accountability programs. Educational Measurement: Issues and Practice, 21(1), 16–22.

Haertel, E. H., Beimers, J., & Miles, J. (2012). The Briefing Book method. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 283300). New York: Routledge.

Hambleton, R. K., Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433-470). Westport, CT: American Council on Education/Praeger.

Hambleton, R. K., Pitoniak, M. J., & Coppella, J. (2012). Essential steps in setting performance standards on educational tests for assessing the reliability and validity of results. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 47-76). New York: Routledge.

Hills, J. R. (1971) Use of measurement in selection and placement. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 680-732). Washington, DC: American Council on Education.

Jaeger, R. M., & Mills, C. N. (2001). An integrated judgment procedure for setting standards on complex, large-scale assessments. In G. J. Cizek (Ed.), Standard setting: Concepts, methods, and perspectives (pp. 313-338). Mahwah, NJ: Erlbaum.

Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus and Giroux.

Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press. CBO9780511809477

Kendall, G. (2016). How to unleash the wisdom of crowds. The Conversation.

Kingston, N. M., & Tiemann, G. C. (2012). Setting performance standards on complex assessments: The Body of Work Method. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 201224). New York: Routledge.

Landl, E. (2019). Re-envisioning performance standards validation.

Lewis, D. M. (2017 June). The historical and methodological context for engineered cut scores. In D. M. Lewis (Chair), Engineered cut scores: Aligning standard setting methodology with contemporary assessment design principles. Coordinated session conducted at the annual meeting of the Council of Chief State School Officers, Philadelphia.

Lewis, D., & Cook, R. (2020). Embedded standard setting: Aligning standard-setting methodology with contemporary assessment design principles. Educational Measurement: Issues and Practice, 39(1), 8–21.

Lewis, D. M., & Haug, C. A. (2005). Aligning policy and methodology to achieve consistent across-grade performance standards. Applied Measurement in Education, 18(1), 11–34.

Lewis, D. M., Mitzel, H. C., Mercado, R. L., & Schulz, E. M. (2012). The Bookmark standard setting procedure. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 225-254). New York: Routledge.

Linn, R. L. (1997). Evaluating the validity of assessments: The consequences of use. Educational

Measurement: Issues and Practice, 16, 14–16.

Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.

Mattar, J., Hambleton, R., Copella, J. M., & Finger, M. S. (2012). Reviewing or revalidating performance standards on credentialing examinations. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 399-412). New York: Routledge.

McClarty, K. L., Way, W. D., Porter, A. C. Beimers, J. N., & Miles, J. A. (2013). Evidence-based standard setting: Establishing a validity framework for cut scores. Educational Researcher, 42(2), 78–88.

Measured Progress & WestEd. (2012). National Assessment of Educational Progress judgmental standard setting (JSS): Technical report. Dover, NH: Authors.

Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001).The Bookmark procedure: Psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 249-282). Mahwah, NJ: Lawrence Erlbaum Associates.

O’Malley, K., Keng, L, & Miles, J. (2012). From Z to A: Using validity evidence to set performance standards. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 301-322). New York: Routledge.

Pashley, P., & Phillips. G. W. (1993). Toward world class standards: A research study linking national and international assessments. Princeton, NJ: Educational Testing Service.

Phillips, G. W. (2010, October). International benchmarking: State education performance standards. Washington, DC: American Institutes for Research. resource/international-benchmarking-state-educationperformancestandards.The

Phillips, G. W. (2012). The Benchmark Method of standard setting. In G. J. Cizek (Ed.), Setting performance standards (2nd ed., pp. 323-346). New York: Routledge.

Phillips, G. W., Mullis, I. V. S., Bourque, M. L., Williams, P. L., Hambleton, R. K., Owen, E. H., & Barton, P. E. (1993). Interpreting NAEP scales. Washington, DC: U.S. Department of Education.

Popham, W. J. (1978). As always, provocative. Journal of Educational Measurement, 15(4), 297–300.

Reckase, M.D., & Chen, J. (2012). The role, format, and impact of feedback to standard setting panelists. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 149–164). New York: Routledge.

Scriven, M. (1978). How to anchor standards. Journal of Educational Measurement, 15(4), 273–275.

Skorupski, W. P. (2012). Understanding the cognitive processes of standard setting panelists. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 135-147). New York, NY: Routledge.

Smarter Balanced Assessment Consortium. (2015 January 7). Achievement level setting final report. by requesting an account from Smarter Balanced.

Surowiecki, J. (2005). The wisdom of crowds. New York: Anchor Books. U.S. Department of Education. (2015 September 25). Peer review of state assessment systems non-regulatory guidance for states.

West Virginia Department of Education. (n.d.). 21st century content standards and objectives development. Available from author.
