Beyond reliability: assessing rater competence when using a behavioural marker system

Abstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Samantha Eve Smith, Scott McColgan-Smith, Fiona Stewart, Julie Mardon, Victoria Ruth Tallentire
Format:	Article
Language:	English
Published:	BMC 2024-12-01
Series:	Advances in Simulation
Online Access:	https://doi.org/10.1186/s41077-024-00329-9
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841559782172917760
author	Samantha Eve Smith Scott McColgan-Smith Fiona Stewart Julie Mardon Victoria Ruth Tallentire
author_facet	Samantha Eve Smith Scott McColgan-Smith Fiona Stewart Julie Mardon Victoria Ruth Tallentire
author_sort	Samantha Eve Smith
collection	DOAJ
description	Abstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. Methods Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test. Results The ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). Conclusions Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.
format	Article
id	doaj-art-035d5bfafd4b48b88a5e826e25f1a53c
institution	Kabale University
issn	2059-0628
language	English
publishDate	2024-12-01
publisher	BMC
record_format	Article
series	Advances in Simulation
spelling	doaj-art-035d5bfafd4b48b88a5e826e25f1a53c2025-01-05T12:11:56ZengBMCAdvances in Simulation2059-06282024-12-019111110.1186/s41077-024-00329-9Beyond reliability: assessing rater competence when using a behavioural marker systemSamantha Eve Smith0Scott McColgan-Smith1Fiona Stewart2Julie Mardon3Victoria Ruth Tallentire4Centre for Medical Education, University of DundeeNHS Education for ScotlandNHS Education for ScotlandScottish Centre for Simulation and Clinical Human Factors, NHS Forth ValleyMedical Education Directorate, NHS LothianAbstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. Methods Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test. Results The ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). Conclusions Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.https://doi.org/10.1186/s41077-024-00329-9
spellingShingle	Samantha Eve Smith Scott McColgan-Smith Fiona Stewart Julie Mardon Victoria Ruth Tallentire Beyond reliability: assessing rater competence when using a behavioural marker system Advances in Simulation
title	Beyond reliability: assessing rater competence when using a behavioural marker system
title_full	Beyond reliability: assessing rater competence when using a behavioural marker system
title_fullStr	Beyond reliability: assessing rater competence when using a behavioural marker system
title_full_unstemmed	Beyond reliability: assessing rater competence when using a behavioural marker system
title_short	Beyond reliability: assessing rater competence when using a behavioural marker system
title_sort	beyond reliability assessing rater competence when using a behavioural marker system
url	https://doi.org/10.1186/s41077-024-00329-9
work_keys_str_mv	AT samanthaevesmith beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT scottmccolgansmith beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT fionastewart beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT juliemardon beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT victoriaruthtallentire beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem

Beyond reliability: assessing rater competence when using a behavioural marker system

Similar Items