--- title: "Supplementary analysis: gesture form similarity" author: "Marlou Rasenberg" date: '(this version: `r format(Sys.Date())`)' output: github_document: toc: true html_document: toc: true toc_float: true editor_options: chunk_output_type: console --- ```{r global_options, include=FALSE} knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.path='3_out/') ``` ```{r packages, results='hide',include=F} # Packages list.of.packages <- c("dplyr", "tidyr","ggplot2","ggpol", "cowplot") new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if(length(new.packages)>0) install.packages(new.packages) lapply(list.of.packages, require, character.only=T) rm(list.of.packages,new.packages) ``` This script supports the paper 'The primacy of multimodal alignment in converging on shared symbols for novel referents' by Marlou Rasenberg, Asli Özyürek, Sara Bögels and Mark Dingemanse (for the CABB team). To increase readability, not all code chunks present in the .Rmd source are shown in the output. ## Aim With this script, we analyse the degree of form similarity for referentially aligned gestures. That is, for iconic co-speech gestures which are produced by two different speakers, which refer to the same Fribble subpart, we assess the degree of form overlap in terms of five form features: handedness, handshape, movement, orientation and position. ## Data The input file for the script is [filename.csv], which includes the coding of form similarity for N=419 gesture comparisons (= all referentially aligned gestures for 8 dyads in round 1 and 2 of the interaction). There is one row per gesture comparison. The columns provide information about the two gestures of each gesture pair: when they are produced (round and exact time points), by whom (participant A or B) and depicting which referent (Fribble number and specific subparts). The key column for the current analysis is 'gesture_similarity_coding': this specifies whether the gestures of the gesture pair were similar (1) or different (0) with respect to each form feature. The code '9' is used for missing values, i.e., when it was not possible to code similarity. E.g., "H1P1S0M0R1" means there was overlap in handedness (H), position (P) and orientation (R), but not in handshape (S) and movement (M). ```{r read in data} #read in data df_original <- read.csv2(file="1_data/referential_task/gesture_form_similarity_coding.csv", header=T, sep=";", colClasses=c(A_trial="factor",B_trial="factor", gesture_similarity_coding="factor")) ``` ```{r initial check, results='hide',include=F} df <- df_original #check if order of features is persistent in the coding (should be HPSMR) list_scores <- levels(df$gesture_similarity_coding) list_scores <- gsub(0,'',list_scores) list_scores <- gsub(1,'',list_scores) list_scores <- gsub(9,'',list_scores) unique_list_scores <- unique(list_scores) unique_list_scores=="HPSMR" ``` ```{r initial processing, results='hide',include=F} #add new columns df$handedness <- "" df$position <- "" df$handshape <- "" df$movement <- "" df$orientation <- "" for (row in 1:nrow(df)) { original_text <- as.character(df[row,]$gesture_similarity_coding) splitted <- unlist(strsplit(original_text, split="[A-Z]+")) splitted <- splitted[splitted!=""] df[row,"handedness"] <- splitted[1] df[row,"position"] <- splitted[2] df[row,"handshape"] <- splitted[3] df[row,"movement"] <- splitted[4] df[row,"orientation"] <- splitted[5] } df[28:32] <- lapply(df[28:32], as.numeric) #reorder columns df <- df[,c(26,28:32,1:6,8:25,27)] ``` ### Subsetting data There are three steps: 1. Include only gesture comparisons where gestures have the same Fribble *subpart* as referent (our unit of analysis). That is, exclude gesture comparisons where one or both gestures have 'main' (=base shape of Fribbles) or '-WF' (=Wrong Fribble; when participant produced a gesture about a Fribble other than the target Fribble) 2. Exclude gesture comparisons with one or more missing form similarity scores (i.e, with '9' as code for one of the features) ==> Step 1 and 2 result in: df_complete 3. Remove feature 'position' (because of insufficient inter-rater reliability) ==> Step 3 results in df_4feat ```{r subsetting data} #subsetting data # exclude non-relevant comparisons (have already been marked as 0 in column 'include') df_complete <- subset(df, include==1) #exclude gesture comparisons with missing form similarity scores list_missing <- c() for (x in 1:nrow(df_complete)) { if (df_complete[x,]$handedness==9|df_complete[x,]$handshape==9|df_complete[x,]$movement==9|df_complete[x,]$orientation==9|df_complete[x,]$position==9) {list_missing <- c(list_missing,x)} } df_complete[list_missing,c(1:6)] #5 gesture comparisons with missing values for handshape, movement or orientation #exclude those df_complete <- df_complete[-list_missing,] #remove column position df_4feat <- df_complete[,-3] ``` ## Analysis The subset that we will be working with for the analyses includes **N=`r nrow(df_4feat)` gesture comparisons** ### Which features overlap? ```{r set up dfs: which features overlap} #df wide to long df_long <- gather(df_4feat, form_feature, score, handedness:orientation, factor_key=T) #summarize: how many times was their form similarity (score 1) per feature, per dyad? df_sums <- df_long %>% group_by(form_feature, pairnr) %>% summarise(n=sum(score)) #add relative frequencies #nr of rows per pair df_nrows <- df_4feat %>% count(pairnr) #compute proportional scores per pair df_sums$n_prop <- df_sums$n/df_nrows$n ``` ```{r half boxplot which features overlap - Fig B1a} #half boxplot with points for each dyad p_Fig_B1a_perfeature <- ggplot(data=df_sums, aes(x=form_feature, y=n_prop)) + #geom_violin()+ #geom_point(aes(colour=factor(pairnr)))+ geom_boxjitter(jitter.size=0.15, jitter.params=list(width=.05), lwd=0.2, fill="lightgrey", jitter.color="black")+ #stat_summary(fun.y=mean, geom="point", size=2, color="black")+ scale_y_continuous("proportion of gesture pairs with form overlap", labels = scales::percent, limits=c(0,1))+ scale_x_discrete("\nform features")+ theme_light()+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), axis.line.x = element_line(colour="grey", size=0.2), axis.line.y=element_line(colour="grey", size=0.2))+ ggtitle("Which features overlap?")+ theme(legend.position="none")+ theme(axis.text.y=element_text(size=6, colour="black"),axis.title.y=element_text(size=6),axis.text.x=element_text(size=6, colour="black"),axis.title.x=element_text(size=6), title=element_text(size=7)) p_Fig_B1a_perfeature ``` Summary per feature (collapse dyads) ```{r summary per feature} #means df_sums %>% group_by(form_feature) %>% summarize(mean=mean(n_prop), sd=sd(n_prop), min=min(n_prop), max=max(n_prop)) ``` ### How many features overlap? ```{r set up df how many features overlap} #first count per gesture comparison df_nroffeat <- df_4feat #count df_nroffeat$features_sum <- (df_nroffeat$handedness+df_nroffeat$handshape+df_nroffeat$movement+df_nroffeat$orientation) #summarize df_nroffeat_sums <- df_nroffeat %>% group_by(features_sum, .drop=F)%>% count(pairnr) #compute proportional scores per pair names(df_nrows)[2] <- "n_rows" df_nroffeat_sums <- merge(df_nroffeat_sums, df_nrows) df_nroffeat_sums$n_prop <- df_nroffeat_sums$n/df_nroffeat_sums$n_rows #check df_nroffeat_sums %>% group_by(pairnr)%>% summarize(sum=sum(n_prop)) #adds up to 1 for every dyad, OK ``` ```{r half boxplot how many features overlap - Fig B1b} p_Fig_B1b_sumfeature <- ggplot(data=subset(df_nroffeat_sums, !is.na(features_sum)), aes(x=as.factor(features_sum), y=n_prop)) + geom_boxjitter(outlier.color=NA, jitter.size=0.15, jitter.position = ggplot2::PositionJitter, jitter.params = list(width=0.05, height=0), lwd=0.2, fill="lightgrey", jitter.color="black")+ scale_y_continuous("proportion of gesture pairs", labels = scales::percent, limits=c(0,1))+ scale_x_discrete("\nnumber of overlapping form features")+ theme_light()+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), axis.line.x = element_line(colour="grey", size=0.2), axis.line.y=element_line(colour="grey", size=0.2))+ ggtitle("How many features overlap?")+ theme(legend.position="none")+ theme(axis.text.y=element_text(size=6, colour="black"),axis.title.y=element_text(size=6),axis.text.x=element_text(size=6, colour="black"),axis.title.x=element_text(size=6), title=element_text(size=7)) p_Fig_B1b_sumfeature ``` ```{r descriptives how many features overlap} #descriptives count_fulloverlap <- df_4feat %>% subset(handedness==1&handshape==1&movement==1&orientation==1) %>% nrow() count_partialoverlap <- df_4feat %>% subset(handedness==1|handshape==1|movement==1|orientation==1) %>% nrow() proportion_fulloverlap <- round(count_fulloverlap/nrow(df_4feat)*100, digits=2) proportion_partialoverlap <- round(count_partialoverlap/nrow(df_4feat)*100, digits=2) ``` Out of all gesture comparisons, **`r proportion_fulloverlap`%** have 'complete" form overlap (similar in all four form features: handedness, handshape, movement and orientation), while **`r proportion_partialoverlap`%** have partial form overlap (similar in one or more form features). ### Combination: heatmap ```{r heatmap - Fig B2} df_heatmap <- df_4feat df_heatmap$sum_features <- NA for (x in 1:nrow(df_heatmap)) { df_heatmap[x,]$sum_features <- sum(df_heatmap[x,]$handedness,df_heatmap[x,]$handshape,df_heatmap[x,]$movement,df_heatmap[x,]$orientation) } #wide to long (get features as rows) df_heatmap_long <- gather(df_heatmap, feature, score, handedness:orientation, factor_key=TRUE) df_heatmap_agg <- df_heatmap_long %>% subset(sum_features>0&&sum_features<4) %>% #exclude cases with 0 and full overlap, because does not make sense to display which features overlapped group_by(feature, sum_features) %>% summarise(count=sum(score)) p_Fig_B2_heatmap <- ggplot(subset(df_heatmap_agg, sum_features!=0&sum_features!=4), aes(x=sum_features, y=feature, fill=count))+ geom_tile()+ scale_fill_gradient(low="white",high="red")+ xlab("total number of overlapping features")+ ylab("overlap in feature")+ theme_light()+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), axis.line.x = element_line(colour="grey", size=0.2), axis.line.y=element_line(colour="grey", size=0.2))+ theme(axis.line.y=element_blank(), axis.line.x=element_blank())+ theme(axis.text.y=element_text(size=6, colour="black"),axis.title.y=element_text(size=6),axis.text.x=element_text(size=6, colour="black"),axis.title.x=element_text(size=6))+ theme(legend.text=element_text(size=6), legend.title=element_text(size=6))+ theme(legend.key.size = unit(0.5,"line")) p_Fig_B2_heatmap ``` ### Extra: including 'position' ```{r including position} #how much overlap in position #df wide to long df_long_5feat <- gather(df_complete, form_feature, score, handedness:orientation, factor_key=T) #summarize: how many times was their form similarity (score 1) per feature, per dyad? df_sums_5feat <- df_long_5feat %>% group_by(form_feature, pairnr) %>% summarise(n=sum(score)) #add relative frequencies #nr of rows per pair df_nrows <- df_complete %>% count(pairnr) #compute proportional scores per pair df_sums_5feat$n_prop <- df_sums_5feat$n/df_nrows$n mean_position <- df_sums_5feat %>% subset(form_feature=="position") %>% summarize(mean=mean(n_prop)) #how many features overlap when including position count_fulloverlap_5feat <- df_complete %>% subset(handedness==1&handshape==1&movement==1&orientation==1&position==1) %>% nrow() count_partialoverlap_5feat <- df_complete %>% subset(handedness==1|handshape==1|movement==1|orientation==1|position==1) %>% nrow() proportion_fulloverlap_5feat <- round(count_fulloverlap_5feat/nrow(df_complete)*100, digits=2) proportion_partialoverlap_5feat <- round(count_partialoverlap_5feat/nrow(df_complete)*100, digits=2) ``` When taking into account the feature 'position' as well, we get the following picture: On average across dyads, the proportion of gesture pairs with form overlap is **`r round(mean_position[[2]]*100, 2)`%**. Out of all gesture comparisons, **`r proportion_fulloverlap_5feat`%** have 'complete" form overlap (similar in all four form features: handedness, handshape, movement, orientation and position), while **`r proportion_partialoverlap_5feat`%** have partial form overlap (similar in one or more form features). ```{r export figures, results='hide',include=T} #combine Fig1A and Fig1B p_Fig_B1 <- plot_grid(p_Fig_B1a_perfeature, p_Fig_B1b_sumfeature, nrow=1, labels=c("A","B"), label_size=7) #Heatmap: add some white space to Fig2 (in order to save with width=13) p_Fig_B2 <- plot_grid(NULL, p_Fig_B2_heatmap, NULL, nrow=1, rel_widths=c(0.8,1.8,0.8)) ggsave(p_Fig_B1, file="2_figures/FigB1.png", bg="white", width=143,height=60,units="mm", dpi=600) ggsave(p_Fig_B1, file="2_figures/FigB1.pdf", bg="white", width=143,height=60,units="mm") ggsave(p_Fig_B2, file="2_figures/FigB2.png", bg="white", width=143,height=55,units="mm", dpi=600) ggsave(p_Fig_B2, file="2_figures/FigB2.pdf", bg="white", width=143,height=55,units="mm") ``` ```{r session info, include=F} sessionInfo() # R version 4.0.2 (2020-06-22) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows 10 x64 (build 19043) # # Matrix products: default # # locale: # [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C # [5] LC_TIME=Dutch_Netherlands.1252 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] cowplot_1.0.0 ggpol_0.0.7 ggplot2_3.3.1 tidyr_1.1.0 dplyr_1.0.0 # # loaded via a namespace (and not attached): # [1] Rcpp_1.0.7 pillar_1.4.4 compiler_4.0.2 plyr_1.8.6 tools_4.0.2 digest_0.6.25 evaluate_0.14 lifecycle_0.2.0 tibble_3.0.1 # [10] gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.11 cli_2.0.2 rstudioapi_0.11 yaml_2.2.1 xfun_0.14 withr_2.2.0 knitr_1.28 # [19] generics_0.0.2 vctrs_0.3.0 grid_4.0.2 tidyselect_1.1.0 glue_1.4.1 R6_2.4.1 fansi_0.4.1 rmarkdown_2.2 purrr_0.3.4 # [28] farver_2.0.3 magrittr_1.5 scales_1.1.1 ellipsis_0.3.1 htmltools_0.4.0 assertthat_0.2.1 colorspace_1.4-1 labeling_0.3 utf8_1.1.4 # [37] munsell_0.5.0 crayon_1.3.4 ```