--- title: "Inter-rater reliability: gesture coding" author: "Marlou Rasenberg" date: '(this version: `r format(Sys.Date())`)' output: github_document: toc: true html_document: toc: true toc_float: true editor_options: chunk_output_type: console --- ```{r global_options, include=FALSE} knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.path='3_out/') ``` ```{r packages, results='hide',include=F} # Packages list.of.packages <- c("dplyr", "irr") new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if(length(new.packages)>0) install.packages(new.packages) lapply(list.of.packages, require, character.only=T) rm(list.of.packages,new.packages) ``` This script supports the paper 'The primacy of multimodal alignment in converging on shared symbols for novel referents' by Marlou Rasenberg, Asli Özyürek, Sara Bögels and Mark Dingemanse (for the CABB team). To increase readability, not all code chunks present in the .Rmd source are shown in the output. ## Aim To establish inter-rater reliability for gesture coding, we focused on the first two rounds of the interaction (where presumably the most (diverse) gestures would occur), where a second coder independently coded 15% of the trials (96 trials, N= 296 gestures). We inspect the inter-rater reliability for: * **identification** of co-speech gestures (see Staccato files in data folder for agreement on length/organization of those annotations) * coding of **gesture type** * coding of **gesture referent** * coding of **handedness** ## Data The input for this script is the file "ELANoutput_reliability_gesturecoding.txt". In ELAN, the gesture annotations of two coders (M (first author) and E (student assistant)) have been compared manually. The column 'reliability_identification' indicates whether coders agreed on the presence of co-speech gesture (1) or if only one coder identified a gesture while the other did not (0). We considered coders to agree when their annotations overlapped, where we disregarded differences in handedness, the length of the annotations and/or the number of segments (e.g., one stroke annotation from one coder spanning two stroke annotations of the other coder). In the columns 'reliability_type' and 'reliability_referent' the coding of the two coders are listed. ```{r read data} df <- read.delim2("1_data/referential_task/inter_rater_reliability/gesture_coding/ELANoutput_reliability_gesturecoding.txt") ``` ```{r initial cleaning} names(df)[7] <- "handedness" names(df)[8] <- "agree_presence" names(df)[5] <- "type" names(df)[6] <- "ref" names(df)[9] <- "pairnr" df[df==""]<-NA df$File.Path<-NULL df <- df[,c(9,1:4,8,5:7)] df$type_M <- gsub(" /.*", "", df$type) df$type_E <- gsub(".*/ ", "", df$type) df$ref_M <- gsub(" /.*", "", df$ref) df$ref_E <- gsub(".*/ ", "", df$ref) df$hands_M <- gsub(" /.*", "", df$handedness) df$hands_E <- gsub(".*/ ", "", df$handedness) df$type <- NULL df$ref <- NULL df$handedness <- NULL df <- df %>% mutate_if(is.character, as.factor) ``` ## Analysis The set of comparisons we work with here is **N=`r nrow(df)`** ### Gesture identification ```{r agreement identification} agreement_identification <- round(sum(df$agree_presence)/nrow(df)*100, digits=2) #89.2 ``` Inter-rater agreement on gesture identification was `r agreement_identification`%. ### Gesture type ```{r gesture type} table(df$type_M, df$type_E) agreement_type <- agree(df[,7:8]) #95.1 kappa_type <- kappa2(df[,7:8]) #0.644 ``` Inter-rater agreement for gesture type was substantial: agreement = `r round(agreement_type$value, digits=2)`% Cohen’s kappa = `r round(kappa_type$value, digits=2)` ### Gesture referent ```{r gesture referent} #table(df$ref_M, df$ref_E) #results in massive table agreement_referent <- agree(df[,9:10]) #92.8 kappa_referent <- kappa2(df[,9:10]) #0.926 #how many categories? list <- c(levels(df$ref_E), levels(df$ref_M)) length(unique(list)) #65 ``` Inter-rater agreement for gesture referent was high: agreement = `r round(agreement_referent$value, digits=2)`% Cohen’s kappa = `r round(kappa_referent$value, digits=2)` ### Handedness ```{r gesture handedness} table(df$hands_M, df$hands_E) agreement_hands <- agree(df[,11:12]) #94.7 kappa_hands <- kappa2(df[,11:12]) #0.913 ``` Inter-rater agreement for gesture handedness was high: agreement = `r round(agreement_hands$value, digits=2)`% Cohen’s kappa = `r round(kappa_hands$value, digits=2)` ```{r session info, include=F} sessionInfo() # R version 4.0.2 (2020-06-22) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows 10 x64 (build 19043) # # Matrix products: default # # locale: # [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C # [5] LC_TIME=Dutch_Netherlands.1252 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] irr_0.84.1 lpSolve_5.6.15 dplyr_1.0.0 # # loaded via a namespace (and not attached): # [1] Rcpp_1.0.7 crayon_1.3.4 digest_0.6.25 R6_2.4.1 lifecycle_0.2.0 magrittr_1.5 evaluate_0.14 pillar_1.4.4 rlang_0.4.11 # [10] rstudioapi_0.11 ellipsis_0.3.1 vctrs_0.3.0 generics_0.0.2 rmarkdown_2.2 tools_4.0.2 glue_1.4.1 purrr_0.3.4 xfun_0.14 # [19] yaml_2.2.1 compiler_4.0.2 pkgconfig_2.0.3 htmltools_0.4.0 tidyselect_1.1.0 knitr_1.28 tibble_3.0.1 ```