Beyond reliability: assessing rater competence when using a behavioural marker system
Abstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of i...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2024-12-01
|
Series: | Advances in Simulation |
Online Access: | https://doi.org/10.1186/s41077-024-00329-9 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559782172917760 |
---|---|
author | Samantha Eve Smith Scott McColgan-Smith Fiona Stewart Julie Mardon Victoria Ruth Tallentire |
author_facet | Samantha Eve Smith Scott McColgan-Smith Fiona Stewart Julie Mardon Victoria Ruth Tallentire |
author_sort | Samantha Eve Smith |
collection | DOAJ |
description | Abstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. Methods Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test. Results The ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). Conclusions Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback. |
format | Article |
id | doaj-art-035d5bfafd4b48b88a5e826e25f1a53c |
institution | Kabale University |
issn | 2059-0628 |
language | English |
publishDate | 2024-12-01 |
publisher | BMC |
record_format | Article |
series | Advances in Simulation |
spelling | doaj-art-035d5bfafd4b48b88a5e826e25f1a53c2025-01-05T12:11:56ZengBMCAdvances in Simulation2059-06282024-12-019111110.1186/s41077-024-00329-9Beyond reliability: assessing rater competence when using a behavioural marker systemSamantha Eve Smith0Scott McColgan-Smith1Fiona Stewart2Julie Mardon3Victoria Ruth Tallentire4Centre for Medical Education, University of DundeeNHS Education for ScotlandNHS Education for ScotlandScottish Centre for Simulation and Clinical Human Factors, NHS Forth ValleyMedical Education Directorate, NHS LothianAbstract Background Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. Methods Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test. Results The ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). Conclusions Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.https://doi.org/10.1186/s41077-024-00329-9 |
spellingShingle | Samantha Eve Smith Scott McColgan-Smith Fiona Stewart Julie Mardon Victoria Ruth Tallentire Beyond reliability: assessing rater competence when using a behavioural marker system Advances in Simulation |
title | Beyond reliability: assessing rater competence when using a behavioural marker system |
title_full | Beyond reliability: assessing rater competence when using a behavioural marker system |
title_fullStr | Beyond reliability: assessing rater competence when using a behavioural marker system |
title_full_unstemmed | Beyond reliability: assessing rater competence when using a behavioural marker system |
title_short | Beyond reliability: assessing rater competence when using a behavioural marker system |
title_sort | beyond reliability assessing rater competence when using a behavioural marker system |
url | https://doi.org/10.1186/s41077-024-00329-9 |
work_keys_str_mv | AT samanthaevesmith beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT scottmccolgansmith beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT fionastewart beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT juliemardon beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem AT victoriaruthtallentire beyondreliabilityassessingratercompetencewhenusingabehaviouralmarkersystem |