Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed

Published on Jan 1, 2011in Journal of Clinical Epidemiology4.65
· DOI :10.1016/j.jclinepi.2010.03.002
Jan Kottner27
Estimated H-index: 27
Laurent Audigé5
Estimated H-index: 5
+ 6 AuthorsDavid L. Streiner85
Estimated H-index: 85
(U of T: University of Toronto)
Abstract Objective Results of reliability and agreement studies are intended to provide information about the amount of error inherent in any diagnosis, score, or measurement. The level of reliability and agreement among users of scales, instruments, or classifications is widely unknown. Therefore, there is a need for rigorously conducted interrater and intrarater reliability and agreement studies. Information about sample selection, study design, and statistical analysis is often incomplete. Because of inadequate reporting, interpretation and synthesis of study results are often difficult. Widely accepted criteria, standards, or guidelines for reporting reliability and agreement in the health care and medical field are lacking. The objective was to develop guidelines for reporting reliability and agreement studies. Study Design and Setting Eight experts in reliability and agreement investigation developed guidelines for reporting. Results Fifteen issues that should be addressed when reliability and agreement are reported are proposed. The issues correspond to the headings usually used in publications. Conclusion The proposed guidelines intend to improve the quality of reporting.
Figures & Tables
  • References (115)
  • Citations (683)
📖 Papers frequently viewed together
2,202 Citations
15.2k Citations
888 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
Part I: FOUNDATIONS OF NURSING RESEARCH AND EVIDENCE-BASED PRACTICE 1: Introduction to Nursing Research in an Evidence-Based Practice Environment 2: Translating Research Evidence into Nursing Practice: Evidence-Based Nursing 3: Generating Evidence: Key Concepts and Steps in Qualitative and Quantitative Research Part II: CONCEPTUALIZING A STUDY TO GENERATE EVIDENCE FOR NURSING 4: Conceptualizing Research Problems, Research Questions, and Hypotheses 5: Finding and Critiquing Evidence: Research Lit...
4,328 Citations
#1Marieke ZegersH-Index: 15
#2Martine C. de Bruijne (VUmc: VU University Medical Center)H-Index: 17
Last. Henrica C.W. de Vet (VUmc: VU University Medical Center)H-Index: 93
view all 6 authors...
Abstract Objective To evaluate the inter-rater agreement of the record review process of the Dutch Adverse Event study, which we aimed to improve by the involvement of two independent physician reviewers per record instead of one including a consensus procedure in case of disagreement. Methods The inter-rater agreement within pairs of physicians (independent review between physician A+B) and between pairs of physicians (independent review between physician A+B and C+D) was measured to evaluate t...
61 CitationsSource
#1Jan Kottner (Charité)H-Index: 27
#2Theo Dassen (Charité)H-Index: 27
Last. Antje Tannen (Charité)H-Index: 12
view all 3 authors...
Abstract Background The Waterlow scale is one of the pressure ulcer risk assessment scales which are frequently criticised for their low reliability. It is widely used in the United Kingdom, Europe and all over the world. Objectives The study objectives were to systematically review and evaluate inter- and intrarater reliability and/or agreement of the whole Waterlow scale and its single items. The overall aim was to find out if the Waterlow scale is applicable to daily clinical practice. Design...
30 CitationsSource
#1Jan KottnerH-Index: 27
#2Kathrin RaederH-Index: 3
Last. Theo DassenH-Index: 37
view all 4 authors...
Aims. To review systematically the interrater reliability of pressure ulcer classification systems to find out which classification should be used in daily practice. Background. Pressure ulcer classification systems are important tools in research and practice. They aim at providing accurate and precise communication, documentation and treatment decisions. Pressure ulcer classifications are criticised for their low degree of interrater reliability. Design. Systematic review. Methods. The data ba...
49 CitationsSource
Abstract Background Adequate risk assessment is essential in pressure ulcer prevention. Assessment scales were designed to support practitioners in identifying persons at pressure ulcer risk. The Braden scale is one of the most extensively studied risk assessment instruments, although the majority of studies focused on validity rather than reliability. Objectives The first aim was to measure the interrater reliability of the Braden scale and its individual items. The second aim was to study diff...
46 CitationsSource
There are many studies investigating psychometric properties of the Braden scale, a scale that predicts the risk for pressure ulcers. The main focus of these studies is validity as opposed to reliability. In order to estimate the degree of interrater reliability a literature review revealed that numerous statistical approaches and coefficients were used (Pearson's product-moment correlation, Cohen's kappa, overall percentage of agreement, intraclass correlation). These coefficients were calculat...
35 CitationsSource
#1R Nanda (James Cook University Hospital)H-Index: 3
#2Sanjay Gupta (James Cook University Hospital)H-Index: 1
Last. Amar Rangan (James Cook University Hospital)H-Index: 14
view all 5 authors...
Over 30 separate clinical signs for the shoulder have been described, most with little evidence to support their accuracy and reliability. The aim of our study was to evaluate the accuracy and reliability of some of the commonly used tests for rotator cuff disease. Two clinicians, a consultant with an established shoulder practice and a registrar with an interest in shoulder surgery, examined 63 patients with history suggestive of rotator cuff disease. A set of pre-determined clinical tests for ...
17 CitationsSource
#1Naser ElkumH-Index: 19
#2M. M. Shoukri (UWO: University of Western Ontario)H-Index: 2
This paper proposes the use of signal-to-noise ratio (SNR) as another index of a measurement’s reproducibility. We derive its maximum likelihood estimation and discuss confidence interval construction within the framework of the one-way random effect model. We investigate the validity of the approximate normal confidence interval by Monte-Carlo simulations. The paper also derives the optimal allocation for the number of subject and the number of repeated measurements needed to minimize the varia...
3 CitationsSource
Pi ( p )a nd kappa ( k )s tatistics ar ew idely used in the areas of psychiatr ya nd psychological testing to compute the extent of agreement between raters on nominally scaled data. It is af act that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explore st he origin of these limitations, and introduces an alternativ ea nd more stable agreemen tc oefficient referred to as the AC 1 coefficient. Also proposed ar en ew variance es...
458 CitationsSource
#1Toby Hall (Curtin University)H-Index: 27
#2Kim Robinson (Curtin University)H-Index: 11
Last. Elizabeth A. PyneH-Index: 1
view all 5 authors...
Abstract Objective This article evaluates reliability and diagnostic validity of the cervical flexion-rotation test (FRT) to discriminate subjects with headache because of C1/2 dysfunction. In addition, this study evaluates agreement between experienced and inexperienced examiners. Methods These were 2 single blind comparative measurement study designs. In study 1, 2 experienced blinded examiners evaluated the FRT in 10 asymptomatic controls, 20 subjects with cervicogenic headache (CeH) where C1...
61 CitationsSource
Cited By683
#1Rachael Rietdijk (University of Sydney Faculty of Health Sciences)
#2Emma Power (University of Sydney Faculty of Health Sciences)H-Index: 16
Last. Leanne Togher (University of Sydney Faculty of Health Sciences)H-Index: 1
view all 4 authors...
ABSTRACTThere is growing interest in using telehealth to work with people with traumatic brain injury (TBI). This study investigated whether established rating scales for evaluating conversations o...
2 CitationsSource
#1Raquel Costa (University of Porto)H-Index: 14
#2Samantha Johnson (University of Leicester)H-Index: 36
Last. Jennifer Zeitlin (Paris V: Paris Descartes University)H-Index: 43
view all 7 authors...
Background: The Movement Assessment Battery for Children-Second Edition (Movement ABC-2) is widely used to assess children’s motor function, yet there is a lack of normative data for many countries.Aims: To assess the extent to which the application of different population reference norms for the Movement ABC-2 affects the classification and prevalence of motor impairment.Design: Data were obtained from two Portuguese regions participating in the Screening to Improve Health in Very Preterm Infan...
#1Ivana H. Levy (UIUC: University of Illinois at Urbana–Champaign)
#2Matthew C. Allender (UIUC: University of Illinois at Urbana–Champaign)H-Index: 14
Last. Krista A. Keller (UIUC: University of Illinois at Urbana–Champaign)
view all 3 authors...
Abstract Background Core body temperature is a crucial health parameter. Temperature aberrations can indicate certain infectious or inflammatory disorders, influence clinical management decisions, and serve as a prognostic indicator for patient recovery. Historically, rectal temperature measurements have been utilized in small companion zoological animals. However, there is a growing interest in less invasive methods including auricular and axillary measurements to determine core body temperatur...
#1Troels Krarup Hansen (Aarhus University Hospital)H-Index: 28
#2Else Marie Damsgaard (AU: Aarhus University)H-Index: 16
Last. Merete Gregersen (Aarhus University Hospital)H-Index: 7
view all 5 authors...
To examine the reproducibility and diagnostic accuracy of a comprehensive frailty assessment method based solely on the older medical inpatient’s electronic medical record. We found good reliability, high agreement, and considerable diagnostic accuracy when comparing the record-based method to the bedside method. The record-based MPI is highly desirable. It seems feasible, reproducible, accurate, and worth exploring in larger datasets and other settings. The comprehensive geriatric assessment (C...
#1Marcus Brookshaw (UNB: University of New Brunswick)H-Index: 1
#2Andrew Sexton (UNB: University of New Brunswick)H-Index: 3
Last. Chris A. McGibbon (UNB: University of New Brunswick)H-Index: 16
view all 3 authors...
Muscle strength is an important clinical outcome in rehabilitation and sport medicine, but options are limited to expensive but accurate isokinetic dynamometry (IKD) or inexpensive but less accurate hand-held dynamometers (HHD). A wearable, self-stabilizing, limb strength measurement device (LSMD) was developed to fill the current gap in portable strength measurement devices. The purpose of this study was to evaluate the reliability and validity of the LSMD in healthy adults. Twenty healthy adul...
#1Nirmeen M. Hassan (La Trobe University)
#2Andrew K. Buldt (La Trobe University)H-Index: 6
Last. Shannon E. Munteanu (La Trobe University)H-Index: 23
view all 6 authors...
Children and adolescents with Down syndrome have a distinctive foot shape (such as wide and flat feet) that often leads to difficulty with footwear fitting. 3-dimensional (3D) scanning can accurately measure the foot dimensions of individuals with Down syndrome, which may assist shoe fit. However, the reproducibility of measuring foot dimensions using 3D scans in children and adolescents with Down syndrome is unknown. The aim of this study was to determine the intra- and inter-rater reproducibil...
#1Gorm Henrik Fogh Rasmussen (AAU: Aalborg University)
#2Mathias Vedsø Kristiansen (AAU: Aalborg University)H-Index: 3
Last. Pascal Madeleine (AAU: Aalborg University)H-Index: 38
view all 5 authors...
Objective Breast cancer survivors (BCS) are often characterized by decreased pressure pain thresholds (PPT), range of motion (ROM) and strength in and around the shoulder affected by the treatment. This intra-rater reliability study was to establish the relative and absolute reliability of PPT’s, active ROM and maximal isokinetic muscle strength (MIMS) of the affected shoulder in BCS with persistent pain after treatment. Methods Twenty-one BCS participated in the study. The PPTs of 17 locations ...
#1Camila Carvalho de Araújo (State University of Campinas)H-Index: 1
#2Cássia Raquel Teatin Juliato (State University of Campinas)H-Index: 8
Last. Luiz Gustavo Oliveira Brito (State University of Campinas)H-Index: 1
view all 5 authors...
INTRODUCTION AND HYPOTHESIS Short questionnaires are important for validating the clinical diagnosis of urinary incontinence (UI). We sought to validate and culturally translate the Questionnaire for Urinary Incontinence Diagnosis (QUID) for the Brazilian Portuguese language. METHODS A cross-sectional study with 457 women (330 with urinary incontinence and 127 controls) was performed in a Southeastern Brazilian outpatient clinic. Patients answered a pilot-tested, notarized, six-item questionnair...
#1L. Cattani (Katholieke Universiteit Leuven)H-Index: 1
#2Dominique Van Schoubroeck (Katholieke Universiteit Leuven)H-Index: 35
Last. Jan Deprest (UCL: University College London)H-Index: 66
view all 7 authors...
Introduction and hypothesis Three-dimensional exoanal ultrasound imaging of the anal sphincter may be obtained transperineally with a convex probe, or at the introitus with a transvaginal probe. We hypothesised that introital acquisition would yield better quality and more reproducible evaluation.
1 CitationsSource
#1Sarah J.J. Adcock (UC Davis: University of California, Davis)H-Index: 2
#2Cassandra B. Tucker (UC Davis: University of California, Davis)H-Index: 29
Wild ungulates can recognize certain predators without previous experience, but this innate ability may be relaxed under domestication. Using naive dairy calves, Bos taurus, we examined the effects of exposure to a predator odour (coyote, Canis latrans, urine) and two control odours (deer urine and water) on (1) latency to approach a milk food reward, (2) exploration, vigilance and locomotor play, (3) magnitude of the startle response to a sudden noise delivered upon arrival at the feeder and (4...