24 Assessors and assessment validity

High stakes assessments necessitate high standards for assessment and assessors. Due to the difficulty in meeting these standards, portfolios may be preferable to single high stakes assessments.

Checklists and Global Rating Scales

Ilgen et al showed that global rating scales carried higher reliability, were more flexible and were better able to evaluate nuances. However, Labbe et al showed that there is high variability in performance, especially early in the learning process. Assessments need to be repeated over time to be valid.

Raters

Most studies suggest the assessors must not only follow good assessment practices and avoid common biases but must have face validity for those being assessed (eg be experts or be seen as having expertise). Due to the large number of potential causes of rater errors, raters should be trained, be calibrated with other raters, review past rating differences (errors), and discuss rating errors with others. Inter-rater reliability estimates should be performed at a minimum.

Standard setting (cut-offs)

For many clinical programs, standard setting is appropriate and can be used effectively.  For central venous catheter insertion, Barsuk et al found the Mastery Angoff and Patient-Safety standard setting method to be more useful than traditional methods. For the Mastery Angoff method,  experts evaluate each item in a checklist and set standards based on what percent of well prepared students would be able to perform that step (vs  pass the whole test). The Patient Safety addition further asks experts to determine whether the step is related to patient safety, comfort or clinical outcomes. The two are combined to set standards for pass rates.

Progress testing

If an assessment is used over time with the same cohort, improved scores should correlate with time in program and additional training.

Kane’s Validity framework

Scoring

  • Good tool characteristics with acceptable inter-item correlations
  • Strong item -total correlations
  • Low error variance for items and raters
  • Strong inter-relater reliability

Generalizability

  • assessment captures majority of competencies, skills  and national directives
  • good overall and inter-station reliability

Extrapolation

  • negative or low correlation between assessment scores and errors in clinical practice

Implications

  • low error rates in practice for candidates with passing scores

 

Resources

M Labbe et al. How Consistent Is Competent? Examining Variance in Psychomotor Skills Assessment. Acad Med. 2020;95:771–776

JH Barsuk et al. A Comparison of Approaches for Mastery Learning Standard Setting. Acad Med. Vol. 93, No. 7 / July 2018

RH DeMuth et al. Progress on a New Kind of Progress Test: Assessing Medical Students’ Clinical Skills. Academic Medicine, 2018, Vol.93(5), pp.724-728

KD Royal. Forty‑Five Common Rater Errors in Medical and Health Professions Education. Educ Health Prof 2018;1:33-5.

W Tavares et al. Applying Kane’s validity framework to a simulation based assessment of clinical competence. Adv in Health Sci Educ (2018) 23:323–338

JS Ilgen et al. A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment. Medical Education 2015: 49: 161–173

SS Sebok et al. Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments. Adv in Health Sci Educ (2015) 20:581–594

DANETTE W. MCKINLEY & JOHN J. NORCINI. How to set standards on performance-based examinations: AMEE Guide No. 85. Med Teach 2014; 36: 97–110

SM Downing. Reliability: on the reproducibility of assessment data. Medical Education 2004; 38: 1006–1012

C Violato. Assessing competence in medicine and other health professions. Boca Raton, CRC Press 2019

License

Share This Book