Beyond reliability: assessing rater competence when using a behavioural marker system

Abstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of i...

Full description

Saved in:
Bibliographic Details
Main Authors: Samantha Eve Smith, Scott McColgan-Smith, Fiona Stewart, Julie Mardon, Victoria Ruth Tallentire
Format: Article
Language:English
Published: BMC 2024-12-01
Series:Advances in Simulation
Online Access:https://doi.org/10.1186/s41077-024-00329-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559782172917760
author Samantha Eve Smith
Scott McColgan-Smith
Fiona Stewart
Julie Mardon
Victoria Ruth Tallentire
author_facet Samantha Eve Smith
Scott McColgan-Smith
Fiona Stewart
Julie Mardon
Victoria Ruth Tallentire
author_sort Samantha Eve Smith
collection DOAJ
description Abstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. Methods Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test. Results The ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). Conclusions Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.
format Article
id doaj-art-035d5bfafd4b48b88a5e826e25f1a53c
institution Kabale University
issn 2059-0628
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series Advances in Simulation
spelling doaj-art-035d5bfafd4b48b88a5e826e25f1a53c2025-01-05T12:11:56ZengBMCAdvances in Simulation2059-06282024-12-019111110.1186/s41077-024-00329-9Beyond reliability: assessing rater competence when using a behavioural marker systemSamantha Eve Smith0Scott McColgan-Smith1Fiona Stewart2Julie Mardon3Victoria Ruth Tallentire4Centre for Medical Education, University of DundeeNHS Education for ScotlandNHS Education for ScotlandScottish Centre for Simulation and Clinical Human Factors, NHS Forth ValleyMedical Education Directorate, NHS LothianAbstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. Methods Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test. Results The ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). Conclusions Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.https://doi.org/10.1186/s41077-024-00329-9
spellingShingle Samantha Eve Smith
Scott McColgan-Smith
Fiona Stewart
Julie Mardon
Victoria Ruth Tallentire
Beyond reliability: assessing rater competence when using a behavioural marker system
Advances in Simulation
title Beyond reliability: assessing rater competence when using a behavioural marker system
title_full Beyond reliability: assessing rater competence when using a behavioural marker system
title_fullStr Beyond reliability: assessing rater competence when using a behavioural marker system
title_full_unstemmed Beyond reliability: assessing rater competence when using a behavioural marker system
title_short Beyond reliability: assessing rater competence when using a behavioural marker system
title_sort beyond reliability assessing rater competence when using a behavioural marker system
url https://doi.org/10.1186/s41077-024-00329-9
work_keys_str_mv AT samanthaevesmith beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem
AT scottmccolgansmith beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem
AT fionastewart beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem
AT juliemardon beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem
AT victoriaruthtallentire beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem