This script supports the paper ‘The primacy of multimodal alignment in converging on shared symbols for novel referents’ by Marlou Rasenberg, Asli Özyürek, Sara Bögels and Mark Dingemanse (for the CABB team). To increase readability, not all code chunks present in the .Rmd source are shown in the output.

Aim

This script is used to analyse and plot the emergence of lexical and gestural alignment in the referential task; besides frequencies of occurence, we inspect when alignment emerges in the respective modalities.

Data

The dataframe used in this script contains information about if and when lexical and/or gestural alignment is achieved. This is done for 10 (out of 20) dyads, and 8 (out of 16) Fribbles. The unit of analysis is “Fribble subparts”, i.e., the separate shapes that are attached to the base figure. In total there are 34 subparts for the 8 Fribbles together. Hence 34 Fribble subparts * 10 dyads yields 340 rows, with information about if/when/in which modality alignment is achieved for a particular subpart by a particular dyad.

Coding has been based on the audio/videos themselves and the speech and gesture annotations in ELAN (for these annotations, see folder data/referential_task/transcripts).

The codings for each subpart are then categorized as follows:
No alignment categories
* lex_nor_gest_both: both participants (lexically and/or gesturally) referred to a particular subpart, but without establishing alignment anywhere in the interaction
* lex_nor_gest_1pp: only one participant referred to a particular subpart
* no_ref_atall: neither of the participants referred to a particular subpart (i.e., subpart “ignored” in the whole interaction)
Alignment categories
* lex_only: only lexical alignment
* gest_only: only gestural alignment
* lex_first: there is alignment in both modalities, where lexical alignment precedes gestural alignment
* gest_first: there is alignment in both modalities, where gestural alignment precedes lexical alignment
* sim_withinturn: there is alignment in both modalities, which is achieved simultaneously, with the lexical and gestural components being produced within the same speech turn by both participants (they go “hand in hand” so to say)
* sim_multiturn: there is alignment in both modalities, which is achieved simultaneously, but with the lexical and gestural components separated across multiple turns for at least one participant

Initial processing

df <- read.csv2("1_data/referential_task/alignment_coding.csv")

Let’s get a first impression of the frequencies of the coding categories:

table(df$category)
## 
##        gest_first         gest_only         lex_first  lex_nor_gest_1pp lex_nor_gest_both          lex_only      no_ref_atall     sim_multiturn    sim_withinturn 
##                42                16                31                33                21                91                31                 8                67

We need to modify the dataframe for later analyses.

First, we want to make sure that the dataframe includes all 34 subparts for every dyad (with each individual subpart on a separate row), because for the analyses we use subparts as the unit of analysis. Sometimes dyads talk/gesture about multiple subparts jointly (resulting in codes such as ‘14A+14B’);here these ‘double subparts’ are split (‘14A+14B’ becomes ‘14A’ and ‘14B’).

#rename original subpart col to subpart_old
df <- rename(df, subpart_old = subpart)

#create a new column with simple name 'subpart' (to be used in mixed effects analyses later on)
df$subpart <- df$subpart_old

#cases with 'double subparts' (with 2 subparts, so e.g. '14A+14B') are already on two separate, identical rows, where the second row is marked with duplication==1
#we will generate  new subpart labels for those cases: the first row will get the first subpart (e.g. '14A'), the second, duplicated row will get the second part as subpart (e.g. '14B')
for (x in 1:nrow(df)){
  #find the duplicated 'second' rows, so duplication==1 (or not NA)
  if (!is.na(df[x,]$duplication)) {
    #store subpart label
    double_sub <- df[x,]$subpart_old
    #keep last part for this second row (marked with duplication==1)
    df[x,]$subpart <- gsub(".*\\+", "", double_sub)
    #and keep first part for the row above, which has the same partnr
    for (y in 1:nrow(df)){
      if (df[y,]$subpart_old == double_sub & is.na(df[y,]$duplication)) {
        df[y,]$subpart <- gsub("\\+.*", "", double_sub)
      }
    }
  }
}

#for 2 dyads it happened that they referred to 4 subparts simultaneously ("10A+10B+10E+10F"), we'll fix those manually: 
#find rownrs:
df[df$subpart_old == "10A+10B+10E+10F", ]
#generate new subpart labels
#pair13
df[176,]$subpart <- "10A"
df[177,]$subpart <- "10B"
df[178,]$subpart <- "10E"
df[179,]$subpart <- "10F"
#pair14
df[209,]$subpart <- "10A"
df[210,]$subpart <- "10B"
df[211,]$subpart <- "10E"
df[212,]$subpart <- "10F"

#set subpart to factor
df$subpart <- as.factor(df$subpart)
#check if it's all correct like this
length(levels(df$subpart))==34
unique(table(df$subpart, df$pairnr))==1 #every subpart should occur once for every dyad

Second, we add a column with info on when alignment is ‘achieved’ (i.e., when second pair part (the “repetition” of the “original”) was produced)

df$lex_align_round_achieved <- df$lex_align_when
df$gest_align_round_achieved <- df$gest_align_when

df$lex_align_round_achieved <- gsub(".*-", "", df$lex_align_round_achieved)
df$gest_align_round_achieved <- gsub(".*-", "", df$gest_align_round_achieved)

To inspect the frequency with which alignment emerges lexically, gesturally or in both modalities, we look at how often alignment occurs for subparts proportionally to the total nr of subparts that both participants referred to (i.e., for which it would have been possible to align on). Thus, we exclude subparts that were not referred to, or where only 1 participant referred to, and create a df with the 4 main categories of interest: lexical alignment, gestural alignment, multimodal alignment and no alignment

df_4groups <- 
  df %>%
  filter(category!="no_ref_atall"&category!="lex_nor_gest_1pp")

df_4groups$category_4 <- NA

for (x in 1:nrow(df_4groups)) {if (df_4groups[x,]$category=='sim_multiturn'|df_4groups[x,]$category=='sim_withinturn') {df_4groups[x,]$category_4 <- 'both'}}
for (x in 1:nrow(df_4groups)) {if (df_4groups[x,]$category=='lex_first'|df_4groups[x,]$category=='gest_first') {df_4groups[x,]$category_4 <- 'both'}}
for (x in 1:nrow(df_4groups)) {if (df_4groups[x,]$category=='lex_only') {df_4groups[x,]$category_4 <- 'lex_only'}}
for (x in 1:nrow(df_4groups)) {if (df_4groups[x,]$category=='gest_only') {df_4groups[x,]$category_4 <- 'gest_only'}}
for (x in 1:nrow(df_4groups)) {if (df_4groups[x,]$category=='lex_nor_gest_both') {df_4groups[x,]$category_4 <- 'no_alignment'}}

df_4groups$category_4 <- as.factor(df_4groups$category_4)
table(df_4groups$category_4)
## 
##         both    gest_only     lex_only no_alignment 
##          148           16           91           21
nrow(df_4groups) #276 remaining (from 340 in total)
## [1] 276

Now let’s aggregate the df, such that we get frequencies and proportions for 4 categories, per dyad

df_4groups_freq <-
  df_4groups %>%
  group_by(pairnr) %>%
  count(category_4, .drop=F)

df_4groups_freq <- df_4groups_freq %>% group_by(pairnr) %>% mutate(prop=n/sum(n))

names(df_4groups_freq)[3] <- 'freq'
names(df_4groups_freq)[2] <- 'classification'

df_4groups_freq$classification <- factor(df_4groups_freq$classification, levels=c("both", "lex_only", "gest_only", "no_alignment"))


#check: total n should be different across dyads (because some dyads have not referred to all subparts, see filtering above)
tapply(df_4groups_freq$freq, df_4groups_freq$pairnr, sum)
#but props should always equal to 1 for each dyad
unique(tapply(df_4groups_freq$prop, df_4groups_freq$pairnr, sum)[2])==1
#total should be 276
sum(df_4groups_freq$freq)==276

For the analyses and plots we are interested in the 3 categories of alignment (thus excluding ‘no alignment’); we’ll need an aggregated df for that too.

df_3groups <- 
  df %>%
  #also exclude no_alignment (when both pp referred to it, yet there was no alignment: 'lex_nor_gest_both')
  filter(category!="no_ref_atall"&category!="lex_nor_gest_1pp"&category!="lex_nor_gest_both")

df_3groups$category_3 <- NA

for (x in 1:nrow(df_3groups)) {if (df_3groups[x,]$category=='sim_multiturn'|df_3groups[x,]$category=='sim_withinturn') {df_3groups[x,]$category_3 <- 'both'}}
for (x in 1:nrow(df_3groups)) {if (df_3groups[x,]$category=='lex_first'|df_3groups[x,]$category=='gest_first') {df_3groups[x,]$category_3 <- 'both'}}
for (x in 1:nrow(df_3groups)) {if (df_3groups[x,]$category=='lex_only') {df_3groups[x,]$category_3 <- 'lex_only'}}
for (x in 1:nrow(df_3groups)) {if (df_3groups[x,]$category=='gest_only') {df_3groups[x,]$category_3 <- 'gest_only'}}
for (x in 1:nrow(df_3groups)) {if (df_3groups[x,]$category=='lex_nor_gest_both') {df_3groups[x,]$category_3 <- 'no_alignment'}}

df_3groups$category_3 <- as.factor(df_3groups$category_3)
table(df_3groups$category_3)

nrow(df_3groups)==nrow(subset(df_4groups, category_4!="no_alignment"))
nrow(df_3groups) #255 remaining

#aggregate for plotting
df_3groups_freq <-
  df_3groups %>%
  group_by(pairnr) %>%
  count(category_3, .drop=F)

df_3groups_freq <- df_3groups_freq %>% group_by(pairnr) %>% mutate(prop=n/sum(n))

names(df_3groups_freq)[3] <- 'freq'
names(df_3groups_freq)[2] <- 'classification'

df_3groups_freq$classification <- factor(df_3groups_freq$classification, levels=c("both", "lex_only", "gest_only"))


#check: total n should be different across dyads (because some dyads have not referred to all subparts, see filtering above)
tapply(df_3groups_freq$freq, df_3groups_freq$pairnr, sum)
#but props should always equal to 1 for each dyad
unique(tapply(df_3groups_freq$prop, df_3groups_freq$pairnr, sum)[1])==1
#total should be 255
sum(df_3groups_freq$freq)==255

Now, we modify the main df, such that it contains all necessary information for the plots later on. Importantly, we want to show when (i.e., in which round of the interaction) alignment emerged. But if lexical and gestural alignment emerged in the same round, we want to be able to show whether this was simultaneously or successively in dumbbell plots. For the latter cases, we use round=.9 and round=1.1 etc. instead of 1, so we can plot the modalities next to each other later on.

df$lex_align_round_achieved <- as.numeric(gsub('R', '', df$lex_align_round_achieved)) #convert R1 etc. into integers
df$gest_align_round_achieved <- as.numeric(gsub('R', '', df$gest_align_round_achieved))

#add col to indicate 'datapoint' (used to plot one datapoint per row in dumbbell plots later on)
df$datapoint <- ""
for (x in 1:nrow(df)) {df[x,]$datapoint <- x}
df$datapoint <- as.factor(df$datapoint)
#more informative labels instead of 'datapoint'
df$datapoint_label <- paste(df$pairnr,df$subpart,df$subpart_duplicated, sep="_", collapse=NULL)
df$datapoint_label <- gsub("_NA", "", df$datapoint_label)

#compute column 'with order (w_order), for cases of successive emergence (lex_first / gest_first):
df$gest_align_round_achieved_w_order <- ""
df$lex_align_round_achieved_w_order <- ""

for (x in 1:nrow(df)) {
  if (df[x,]$category=="lex_first"&df[x,]$gest_align_round_achieved==df[x,]$lex_align_round_achieved) {
    df[x,]$lex_align_round_achieved_w_order <- (df[x,]$lex_align_round_achieved - 0.1)
    df[x,]$gest_align_round_achieved_w_order <- (df[x,]$gest_align_round_achieved + 0.1)
  }
  else if (df[x,]$category=="gest_first"&df[x,]$gest_align_round_achieved==df[x,]$lex_align_round_achieved) {
    df[x,]$gest_align_round_achieved_w_order <- (df[x,]$gest_align_round_achieved - 0.1)
    df[x,]$lex_align_round_achieved_w_order <- (df[x,]$lex_align_round_achieved + 0.1)
  }
  else {df[x,]$lex_align_round_achieved_w_order <- df[x,]$lex_align_round_achieved
        df[x,]$gest_align_round_achieved_w_order <- df[x,]$gest_align_round_achieved}
}

df$gest_align_round_achieved_w_order <- as.numeric(df$gest_align_round_achieved_w_order)
df$lex_align_round_achieved_w_order <- as.numeric(df$lex_align_round_achieved_w_order)

#sort by label
df <- df[order(df[,'pairnr']), ]

We have 5 patterns of interest, for which we need (wide) dfs and variables for plots later on

### Unimodal vs. multimodal:

#lex only
df_lex_only <- df %>%
  filter(category=="lex_only")
df_lex_only$datapoint_count <- c(1:nrow(df_lex_only))
lexonly_N = nrow(df_lex_only)

#gest only
df_gest_only <- df %>%
  filter(category=="gest_only")
df_gest_only$datapoint_count <- c(1:nrow(df_gest_only))
gestonly_N = nrow(df_gest_only)

#all multi
df_all_multi <- df %>%
  filter(category=="sim_multiturn"|category=="sim_withinturn"|category=="lex_first"|category=="gest_first")
df_all_multi$datapoint_count <- c(1:nrow(df_all_multi))
allmulti_N = nrow(df_all_multi)

df_all_multi$first_align_round_achieved <- NA

#which round alignment first achieved?
for (x in 1:nrow(df_all_multi)) {
  if (df_all_multi[x,]$lex_align_round_achieved == df_all_multi[x,]$gest_align_round_achieved) 
    {df_all_multi[x,]$first_align_round_achieved <- df_all_multi[x,]$lex_align_round_achieved}
  else if (df_all_multi[x,]$lex_align_round_achieved < df_all_multi[x,]$gest_align_round_achieved) 
  {df_all_multi[x,]$first_align_round_achieved <- df_all_multi[x,]$lex_align_round_achieved}
  else if (df_all_multi[x,]$gest_align_round_achieved < df_all_multi[x,]$lex_align_round_achieved) 
  {df_all_multi[x,]$first_align_round_achieved <- df_all_multi[x,]$gest_align_round_achieved}
}


### Simultaneous vs. successive emergence
#sim
df_sim <- df %>%
  filter(category=="sim_multiturn"|category=="sim_withinturn")
df_sim$datapoint_count <- c(1:nrow(df_sim))
sim_N = nrow(df_sim)

#lex first
df_lex_first <- df %>%
  filter(category=="lex_first")
df_lex_first$datapoint_count <- c(1:nrow(df_lex_first))
lexfirst_N = nrow(df_lex_first)

#gest first
df_gest_first <- df %>%
  filter(category=="gest_first")
df_gest_first$datapoint_count <- c(1:nrow(df_gest_first))
gestfirst_N = nrow(df_gest_first)

To further inspect cases of multimodal alignment, we need a df with (relative) frequencies for simultatenous, lex_first and gest_first emergence:

df_lexandgest <- 
  df %>%
  filter(lex_align==1&gest_align==1)

df_lexandgest$category_3 <- df_lexandgest$category
df_lexandgest$category_3 <- gsub('sim_multiturn', 'sim', df_lexandgest$category_3)
df_lexandgest$category_3 <- gsub('sim_withinturn', 'sim', df_lexandgest$category_3)

#count
df_lexandgest_freq <-
  df_lexandgest %>%
  count(category_3) %>%
  mutate(prop=n/sum(n))

names(df_lexandgest_freq)[2] <- 'freq'
names(df_lexandgest_freq)[1] <- 'classification'

df_lexandgest_freq$classification <- factor(df_lexandgest_freq$classification, levels=c("lex_first", "gest_first", "sim"))
df_lexandgest_freq$proportion <- round(df_lexandgest_freq$freq/sum(df_lexandgest_freq$freq)*100, 1)

Prevalence of alignment in the interactive task (§3.2 in ms)

means <- df_4groups_freq %>%
  group_by(classification) %>%
  summarize(mean = mean(prop))

means
## # A tibble: 4 x 2
##   classification   mean
##   <fct>           <dbl>
## 1 both           0.519 
## 2 lex_only       0.343 
## 3 gest_only      0.0568
## 4 no_alignment   0.0817

We find that on average, the case of “no alignment” is rare: it happened for only 8.17% of Fribble subparts, that were referred to by both members of a dyad. Thus, on average across dyads, alignment emerged in at least one modality at some point in the interaction for 91.83% of Fribble subparts that had been referred to by both members of a dyad.

Here we’ll zoom in on those subparts for which alignment did emerge and inspect how frequent lexical, gestural and multimodal alignment are. We plot the variation in relative frequencies across dyads (boxplot):

#note that for this boxplot, we plot 3 categories (multi, lex_only, gest_only and no_alignment), and the frequencies plotted are relative to the total nr of fribble subparts for which alignment emerged. Hence, the values per dyad (e.g., yellow dot) add up to 100%

p_frequencies_boxplot <- ggplot(data=subset(df_3groups_freq), aes(x=classification, y=prop)) +
  geom_boxplot(lwd=0.2, fill="lightgrey", alpha=.2)+
  geom_jitter(aes(colour=as.factor(pairnr)), size=1, width=.05, height=0)+
  scale_color_manual(values=mycolors)+
  theme(legend.position='none')+
  scale_y_continuous(labels=scales::percent)+
  expand_limits(y=c(0,1))+
  ylab("Proportion of aligned Fribble subparts")+
  scale_x_discrete(labels = c("multimodal\nalignment","only lexical\nalignment","only gestural\nalignment","no alignment"))+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2))+
  theme(legend.position="none")+
  theme(plot.margin = unit(c(0,0,0,0), "lines"))+
  theme(axis.text.y=element_text(size=6, colour="black"), axis.title.y=element_text(size=6), axis.text.x=element_text(size=6, colour="black"))+
  theme(axis.title.x=element_blank())

p_frequencies_boxplot

p_Fig6 <- p_frequencies_boxplot

To test the hypotheses about frequencies of alignment in various modalities, we use mixed effects models.

We use the Fribble subparts as the unit of analysis; for each subpart (for which alignment emerged), we should have an alignment score for each modality level. That is, we have three alignment types: multimodal (multi), lexical alignment only (lex_only) and gestural alignment only (gest_only).

Our hypothesis is: (multi, lex only) > gest only. This is prediction 2 in the ms. Thus we predict both multimodal alignment and lexical alignment only to be more frequent than gestural alignment only, but we have no predictions about the difference between multimodal alignment and lexical alignment.

Note that we have only one variabele (alignment type), and no independent variables. We wish to compare gest_only against the two other alignment types (lex_only, multi) We therefore use binomial intercept-only mixed effects models, where alignment type is the DV. We set up two models for the two comparisons: - multi (1) vs. gest_only (0) - lex_only (1) vs. gest_only (0)

#first take subset of data where alignment emerged in at least one modality
df_aligned <- subset(df, category!="lex_nor_gest_1pp"&category!="lex_nor_gest_both"&category!="no_ref_atall")
nrow(df_aligned)==255

df_aligned$category <- droplevels(df_aligned$category)
levels(df_aligned$category)

#nemame categories, to yield 'lex_only', 'gest_only' and 'multi'
df_aligned$category <- gsub("sim_multiturn", "multi", df_aligned$category)
df_aligned$category <- gsub("sim_withinturn", "multi", df_aligned$category)
df_aligned$category <- gsub("lex_first", "multi", df_aligned$category)
df_aligned$category <- gsub("gest_first", "multi", df_aligned$category)

#create two seperate dfs, one for each contrast

#multi vs gestonly
df_multi_vs_gestonly <- subset(df_aligned, category=="multi"|category=="gest_only") %>% dplyr::select(pairnr, subpart, category)
#lexonly vs gestonly
df_lexonly_vs_gestonly <- subset(df_aligned, category=="lex_only"|category=="gest_only") %>% dplyr::select(pairnr, subpart, category)

#create binary 'alignment_type' DV

#multi=1 vs. gestonly=0
df_multi_vs_gestonly$alignment_type <- NA
for (x in 1:nrow(df_multi_vs_gestonly)){
  if (df_multi_vs_gestonly[x,]$category=="multi") {df_multi_vs_gestonly[x,]$alignment_type <- 1}
  else if (df_multi_vs_gestonly[x,]$category=="gest_only") {df_multi_vs_gestonly[x,]$alignment_type <- 0}
}

#lexonly=1 vs. gestonly=0
df_lexonly_vs_gestonly$alignment_type <- NA
for (x in 1:nrow(df_lexonly_vs_gestonly)){
  if (df_lexonly_vs_gestonly[x,]$category=="lex_only") {df_lexonly_vs_gestonly[x,]$alignment_type <- 1}
  else if (df_lexonly_vs_gestonly[x,]$category=="gest_only") {df_lexonly_vs_gestonly[x,]$alignment_type <- 0}
}
## multi vs gest only ##

#maximal model --> results in singular fit
multi_vs_gestonly_1 <- glmer(alignment_type ~ 1 + (1|pairnr) + (1|subpart), data = df_multi_vs_gestonly, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

#we remove the derivative calculations
#see e.g. Brown (2021): https://doi.org/10.1177/2515245920960351 
#and: http://decision-lab.org/resources/ > Standard Operating Procedures For Using Mixed-Effects Models
#this allows us to inspect the random effects of pairnr and subpart
multi_vs_gestonly_2 <- glmer(alignment_type ~ 1 + (1|pairnr) + (1|subpart), data = df_multi_vs_gestonly, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5), calc.derivs = FALSE))
summary(multi_vs_gestonly_2)

#the estimated random effect of pairnr is 0, and thus including this random effect may bias parameter estimates and standard errors
#we therefore use a simplified model with only a random effect for subpart, and see if this changes the parameters

multi_vs_gestonly_3 <- glmer(alignment_type ~ 1 + (1|subpart), data = df_multi_vs_gestonly, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

summary(multi_vs_gestonly_3)



## lex only vs gest only ##

#maximal model
lexonly_vs_gestonly_1 <- glmer(alignment_type ~ 1 + (1|pairnr) + (1|subpart), data = df_lexonly_vs_gestonly, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

summary(lexonly_vs_gestonly_1)
#final models reported in paper
summary(multi_vs_gestonly_3)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: alignment_type ~ 1 + (1 | subpart)
##    Data: df_multi_vs_gestonly
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    107.5    113.7    -51.7    103.5      162 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0397  0.2356  0.2403  0.3290  0.5384 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subpart (Intercept) 0.9762   0.988   
## Number of obs: 164, groups:  subpart, 33
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.5324     0.4584   5.525  3.3e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lexonly_vs_gestonly_1)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: alignment_type ~ 1 + (1 | pairnr) + (1 | subpart)
##    Data: df_lexonly_vs_gestonly
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##     81.4     89.4    -37.7     75.4      104 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8999  0.1172  0.1688  0.3112  0.8593 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subpart (Intercept) 1.270    1.127   
##  pairnr  (Intercept) 5.356    2.314   
## Number of obs: 107, groups:  subpart, 32; pairnr, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    2.076      1.052   1.974   0.0484 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Though we did not have a prediction about whether multimodal alignment and lexical alignment only would occur equally frequently or not, we have tested this comparison exploratively, and added it as a footnote in the ms.

#set up df

#multi vs lexonly
df_multi_vs_lexonly <- subset(df_aligned, category=="multi"|category=="lex_only") %>% dplyr::select(pairnr, subpart, category)

#create binary 'alignment_type' DV
#multi=1 vs. lexonly=0
df_multi_vs_lexonly$alignment_type <- NA
for (x in 1:nrow(df_multi_vs_lexonly)){
  if (df_multi_vs_lexonly[x,]$category=="multi") {df_multi_vs_lexonly[x,]$alignment_type <- 1}
  else if (df_multi_vs_lexonly[x,]$category=="lex_only") {df_multi_vs_lexonly[x,]$alignment_type <- 0}
}

#run mixed effects model
multi_vs_lexonly_1 <- glmer(alignment_type ~ 1 + (1|pairnr) + (1|subpart), data = df_multi_vs_lexonly, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
summary(multi_vs_lexonly_1)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: alignment_type ~ 1 + (1 | pairnr) + (1 | subpart)
##    Data: df_multi_vs_lexonly
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    234.1    244.5   -114.0    228.1      236 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9430 -0.5405  0.3034  0.3980  1.8451 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subpart (Intercept) 0.08434  0.2904  
##  pairnr  (Intercept) 5.01803  2.2401  
## Number of obs: 239, groups:  subpart, 34; pairnr, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.6980     0.7443   0.938    0.348

Temporal distribution of unimodal and multimodal alignment (§3.3 in ms)

#lex only
p_lexonly_dotplot <- ggplot(data=df_lex_only, aes(x=lex_align_round_achieved))+
  geom_dotplot(binwidth=1, dotsize=.04, method="histodot", fill=purple, colour=purple, stackratio=1.5)+
  #geom_dotplot(method="histodot", fill=purple, colour=purple)+
  labs(x="round of first emergence", y="count", 
             title=paste0("Only lexical alignment  \n(*n* = ",lexonly_N,")"))+
  xlim("1","2","3","4","5","6")+
  ylim(0,1)+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2))+
  theme(axis.text.y= element_blank())+
  #theme(panel.background=element_rect(fill="transparent", colour=NA), plot.background=element_rect(fill="transparent", colour=NA))+
  theme(plot.margin = unit(c(1,1,0,1), "lines"))+
  theme(plot.title = element_text(hjust = 0.5, size=7))+
  theme(axis.title.x=element_text(size=6), axis.text.x=element_text(size=6, colour="black"), axis.title.y=element_text(size=6), title=element_text(size=6))+
  theme(plot.title = ggtext::element_markdown()) #necessary to get 'n' in italics in title

#gest only
p_gestonly_dotplot <- ggplot(data=df_gest_only, aes(x=gest_align_round_achieved))+
  geom_dotplot(binwidth=1, dotsize=.04, method="histodot", fill=orange, colour=orange, stackratio=1.5)+
  labs(x="round of first emergence", y="count", 
             title=paste0("Only gestural alignment  \n(*n* = ",gestonly_N,")"))+
  xlim("1","2","3","4","5","6")+
  ylim(0,1)+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2))+
  theme(axis.text.y= element_blank())+
  #theme(panel.background=element_rect(fill="transparent", colour=NA), plot.background=element_rect(fill="transparent", colour=NA))+
  theme(plot.margin = unit(c(1,1,0,1), "lines"))+
  theme(plot.title = element_text(hjust = 0.5, size=7))+
  theme(axis.title.x=element_text(size=6), axis.text.x=element_text(size=6, colour="black"), axis.title.y=element_text(size=6), title=element_text(size=6))+
  theme(plot.title = ggtext::element_markdown())

#allmulti
p_allmulti_dotplot <- ggplot(data=df_all_multi, aes(x=first_align_round_achieved))+
  geom_dotplot(binwidth=1, dotsize=.04, method="histodot", fill=lightgrey, colour=lightgrey, stackratio=1.5)+
  labs(x="round of first emergence", y="count", 
             title=paste0("Multimodal alignment  \n(*n* = ",allmulti_N,")"))+
  xlim("1","2","3","4","5","6")+
  ylim(0,1)+
  expand_limits(x=c(1,6))+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2))+
  theme(axis.text.y= element_blank())+
  #theme(panel.background=element_rect(fill="transparent", colour=NA), plot.background=element_rect(fill="transparent", colour=NA))+
  theme(plot.margin = unit(c(1,1,0,1), "lines"))+
  theme(plot.title = element_text(hjust = 0.5, size=7))+
  theme(axis.title.x=element_text(size=6), axis.text.x=element_text(size=6, colour="black"), axis.title.y=element_text(size=6), title=element_text(size=6))+
  theme(plot.title = ggtext::element_markdown())


p_Fig7 <- plot_grid(p_allmulti_dotplot, p_lexonly_dotplot, p_gestonly_dotplot, nrow=1, labels=c('A','B','C'), label_size=7)
p_Fig7

To investigate whether the distributions of the round in which alignment emerged varies across multimodal vs. lexical only vs gestural only alignment, we use 2-sided Kolmogorov-Smirnov tests:

#multimodal vs unimodal (lexonly+gestonly)
multi <- df_all_multi$first_align_round_achieved
lex_only <- df[df$category=="lex_only",]$lex_align_round_achieved
gest_only <- df[df$category=="gest_only",]$gest_align_round_achieved

ks.test(multi, lex_only)
## Warning in ks.test(multi, lex_only): p-value will be approximate in the presence of ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  multi and lex_only
## D = 0.20463, p-value = 0.01784
## alternative hypothesis: two-sided
ks.test(multi, gest_only)
## Warning in ks.test(multi, gest_only): cannot compute exact p-value with ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  multi and gest_only
## D = 0.41892, p-value = 0.01259
## alternative hypothesis: two-sided
ks.test(lex_only, gest_only)
## Warning in ks.test(lex_only, gest_only): cannot compute exact p-value with ties
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  lex_only and gest_only
## D = 0.21429, p-value = 0.5597
## alternative hypothesis: two-sided
#yields warning message due to ties (because we have discrete values: 1, 2, 3 etc. -present in both samples)
#we apply ks.boot to be sure if we can rely on the 'approximate p-value' of the ks. test
#"This function [ks.boot] executes a bootstrap version of the univariate Kolmogorov-Smirnov test which provides correct coverage even when the distributions being compared are not entirely continuous. Ties are allowed with this test unlike the traditional Kolmogorov-Smirnov test."


ks.boot(multi, lex_only)
## $ks.boot.pvalue
## [1] 0
## 
## $ks
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  Tr and Co
## D = 0.20463, p-value = 0.01784
## alternative hypothesis: two-sided
## 
## 
## $nboots
## [1] 1000
## 
## attr(,"class")
## [1] "ks.boot"
ks.boot(multi, gest_only)
## $ks.boot.pvalue
## [1] 0
## 
## $ks
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  Tr and Co
## D = 0.41892, p-value = 0.01259
## alternative hypothesis: two-sided
## 
## 
## $nboots
## [1] 1000
## 
## attr(,"class")
## [1] "ks.boot"
ks.boot(lex_only, gest_only)
## $ks.boot.pvalue
## [1] 0.123
## 
## $ks
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  Tr and Co
## D = 0.21429, p-value = 0.5597
## alternative hypothesis: two-sided
## 
## 
## $nboots
## [1] 1000
## 
## attr(,"class")
## [1] "ks.boot"
#yields the same result

This shows that the distributions are significantly different.

Note that the gest_only category is rather small (n=16); however when running the ks-test for multimodal versus unimodal (thus collapsing lex_only and gest_only), we also find a significant difference in the temporal distribution of emergence.

uni <- c(df[df$category=="lex_only",]$lex_align_round_achieved, df[df$category=="gest_only",]$gest_align_round_achieved)

ks.test(multi, uni)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  multi and uni
## D = 0.23668, p-value = 0.001903
## alternative hypothesis: two-sided
ks.boot(multi, uni)
## $ks.boot.pvalue
## [1] 0
## 
## $ks
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  Tr and Co
## D = 0.23668, p-value = 0.001903
## alternative hypothesis: two-sided
## 
## 
## $nboots
## [1] 1000
## 
## attr(,"class")
## [1] "ks.boot"
prop_multi_r1r2 <- round((nrow(subset(df_all_multi, first_align_round_achieved==1|first_align_round_achieved==2))/nrow(df_all_multi))*100, digits=2)
prop_lexonly_r1r2 <- round((nrow(subset(df_lex_only, lex_align_round_achieved==1|lex_align_round_achieved==2))/nrow(df_lex_only))*100, digits=2)
prop_gestonly_r1r2 <- round((nrow(subset(df_gest_only, gest_align_round_achieved==1|gest_align_round_achieved==2))/nrow(df_gest_only))*100, digits=2)

Early emergence especially holds in case of multimodal alignment: the first instance of alignment emerged in the first or second round in 91.89% of the multimodally aligned subparts.
This is less frequent in case of unimodal alignment, i.e. when participants aligned only lexically (71.43%) or gesturally (50%) on a subpart.

Order of emergence in multimodal alignment (§3.4 in ms)

Let’s first inspect how frequent simultaneous versus successive (lex_first / gest_first) emergence is. We create a stacked barplot (will be combined with other plots for Fig. 8 in ms) and a boxplot to inspect variation across dyads

#text for legend
list_legend <- c("lexical alignment first","gestural alignment first","multimodal emergence")


stacked_lexandgestorder <- ggplot(data=df_lexandgest_freq, aes(x='', y=proportion, fill=classification))+
  geom_col(width=.6)+
  scale_fill_manual(values=c(purple, orange, lightgrey), name="Order of emergence", labels=c(list_legend[1:3]))+
  ylab("proportion of multimodally aligned Fribble subparts")+
  xlab("modality in which\nalignment emerged first")+
  ggtitle("")+
  scale_y_continuous(labels = function(x) paste0(x, "%"))+
  geom_text(data=df_lexandgest_freq, aes(x='', y=proportion, label = paste0(proportion, "%")),size=2,
            position = position_stack(vjust = 0.5), colour="black") +
  theme_light()+
  theme(panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      axis.line.x = element_line(colour="grey", size=0.2), 
      axis.line.y=element_line(colour="grey", size=0.2),
      axis.ticks=element_line(colour="grey", size=0.2))+
  theme(legend.position="none")+
  theme(axis.title.y=element_text(size=5.5), axis.text.y=element_text(size=6, colour="black"), axis.title.x=element_text(size=6))+
  theme(axis.ticks.x=element_blank(), axis.text.x=element_blank())+
  theme(axis.title.x = element_text(vjust=3.3))+ #move x axis title a bit higher (such that it matches dumbbell plots later on)
  theme(plot.margin = unit(c(1,1,-0.5,1), "lines"))
#first take subset of data where alignment emerged in both modalities
df_order <- subset(df, lex_align==1&gest_align==1)
nrow(df_order)==148
## [1] TRUE
df_order$category <- droplevels(df_order$category)
levels(df_order$category)
## [1] "gest_first"     "lex_first"      "sim_multiturn"  "sim_withinturn"
#nemame categories, to yield 'lex_first', 'gest_first' and 'sim'
df_order$category <- gsub("sim_multiturn", "sim", df_order$category)
df_order$category <- gsub("sim_withinturn", "sim", df_order$category)

df_order$category <- as.factor(df_order$category)

#aggregate for plotting
df_orders_freq <-
  df_order %>%
  group_by(pairnr, category, .drop=F) %>%
  count()

df_orders_freq <- df_orders_freq %>% group_by(pairnr) %>% mutate(prop=n/sum(n))

names(df_orders_freq)[3] <- 'freq'
names(df_orders_freq)[2] <- 'classification'

df_orders_freq$classification <- factor(df_orders_freq$classification, levels=c("sim", "gest_first", "lex_first"))

#remove empty rows for dyad 18 (no multimodally aligned subparts)
df_orders_freq <- subset(df_orders_freq, pairnr!="18")


#check: total n should be different across dyads (because some dyads have not referred to all subparts, see filtering above)
tapply(df_orders_freq$freq, df_orders_freq$pairnr, sum)
##  4  5  7  9 11 13 14 16 18 20 
## 27  5 16  9  8 25 22 14 NA 22
#but props should always equal to 1 for each dyad
tapply(df_orders_freq$prop, df_orders_freq$pairnr, sum)
##  4  5  7  9 11 13 14 16 18 20 
##  1  1  1  1  1  1  1  1 NA  1
#total should be 148
sum(df_orders_freq$freq)
## [1] 148
p_frequencies_orders_boxplot <- ggplot(data=subset(df_orders_freq), aes(x=classification, y=prop)) +
    geom_boxplot(outlier.shape=NA)+
  geom_jitter(aes(colour=as.factor(pairnr)), size=3, width=.03, height=0)+
  scale_color_manual(values=mycolors)+
  theme(legend.position='none')+
  scale_y_continuous(labels=scales::percent)+
  expand_limits(y=c(0,1))+
  ylab("proportion of multimodally aligned Fribble subparts")+
  scale_x_discrete(labels = c("multimodal\nemergence","gestural\nalignment first","lexical\nalignment first"))+
  theme_classic()+
  theme(legend.position="none")+
  theme(axis.text.y=element_text(size=6), axis.title.y=element_text(size=6), axis.text.x=element_text(size=6))+
  theme(axis.title.x=element_blank())

p_frequencies_orders_boxplot

We expect that gestural alignment will either emerge simultaneously with lexical alignment or precede it, and we expect that least frequently of all, gestural alignment follows lexical alignment (prediction 3): (gest_first, sim) > lex_first We use the same kind of analysis as was used for prediction 2, that is, for every Fribble subpart (for which alignment emerged in both modalities), we assess whether there are differences in how frequently a particular modality emerges first (gest_first, lex_first or sim).

Note again that we have only one variable (order), and no independent variables. We wish to compare lex_first against the two other orders (gest_first, sim) We therefore use binomial intercept-only mixed effects models, where order is the DV. We set up two models for the two comparisons: - sim (1) vs. lex_first (0) - gest_first (1) vs. lex_first (0)

#create two seperate dfs, one for each contrast

#sim vs lexfirst
df_sim_vs_lexfirst <- subset(df_order, category=="sim"|category=="lex_first") %>% dplyr::select(pairnr, subpart, category)
#gestfirst vs lexfirst
df_gestfirst_vs_lexfirst <- subset(df_order, category=="gest_first"|category=="lex_first") %>% dplyr::select(pairnr, subpart, category)

#create binary 'order' DV

#sim=1 vs. lexfirst=0
df_sim_vs_lexfirst$order <- NA
for (x in 1:nrow(df_sim_vs_lexfirst)){
  if (df_sim_vs_lexfirst[x,]$category=="sim") {df_sim_vs_lexfirst[x,]$order <- 1}
  else if (df_sim_vs_lexfirst[x,]$category=="lex_first") {df_sim_vs_lexfirst[x,]$order<- 0}
}

#gestfirst=1 vs. lexfirst=0
df_gestfirst_vs_lexfirst$order <- NA
for (x in 1:nrow(df_gestfirst_vs_lexfirst)){
  if (df_gestfirst_vs_lexfirst[x,]$category=="gest_first") {df_gestfirst_vs_lexfirst[x,]$order <- 1}
  else if (df_gestfirst_vs_lexfirst[x,]$category=="lex_first") {df_gestfirst_vs_lexfirst[x,]$order <- 0}
}
## sim vs lexfirst ##

#maximal model --> singular fit
sim_vs_lexfirst_1 <- glmer(order ~ 1 + (1|pairnr) + (1|subpart), data = df_sim_vs_lexfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
#drop computation of derivatives (see chunk 19)
sim_vs_lexfirst_2 <- glmer(order ~ 1 + (1|pairnr) + (1|subpart), data = df_sim_vs_lexfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5), calc.derivs = FALSE))

summary(sim_vs_lexfirst_2) 
# random effect for subpart estimated at 0
# simplify model by using only random effect for pairnr

sim_vs_lexfirst_3 <- glmer(order ~ 1 + (1|pairnr), data = df_sim_vs_lexfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

summary(sim_vs_lexfirst_3) 

## gestfirst vs lexfirst ##

#maximal model --> singular fit
gestfirst_vs_lexfirst_1 <- glmer(order ~ 1 + (1|pairnr) + (1|subpart), data = df_gestfirst_vs_lexfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
#drop computation of derivatives
gestfirst_vs_lexfirst_2 <- glmer(order ~ 1 + (1|pairnr) + (1|subpart), data = df_gestfirst_vs_lexfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5), calc.derivs = FALSE))
summary(gestfirst_vs_lexfirst_2)
# random effect for subpart estimated at 0
# simplify model by using only random effect for pairnr

gestfirst_vs_lexfirst_3 <- glmer(order ~ 1 + (1|pairnr), data = df_gestfirst_vs_lexfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
summary(gestfirst_vs_lexfirst_3)
#final models
summary(sim_vs_lexfirst_3) 
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: order ~ 1 + (1 | pairnr)
##    Data: df_sim_vs_lexfirst
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    129.9    135.3    -63.0    125.9      104 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4772 -1.0593  0.5112  0.6777  0.9440 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  pairnr (Intercept) 0.4849   0.6964  
## Number of obs: 106, groups:  pairnr, 9
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.9402     0.3437   2.736  0.00622 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(gestfirst_vs_lexfirst_3)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: order ~ 1 + (1 | pairnr)
##    Data: df_gestfirst_vs_lexfirst
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    103.1    107.6    -49.5     99.1       71 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4228 -1.0376  0.7028  0.8242  1.0020 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  pairnr (Intercept) 0.226    0.4754  
## Number of obs: 73, groups:  pairnr, 9
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.3334     0.3046   1.095    0.274

Thus, we find that simultaneous emergence is more frequent than lexical alignment first, but we find no difference between lexical alignment first and gestural alignment first.

Though we did not have a prediction about whether simultaneous emergence and gestural alignment first would occur equally frequently or not, we have tested this comparison exploratively, and added it as a footnote in the ms.

#set up df

#sim vs gestfirst
df_sim_vs_gestfirst <- subset(df_order, category=="sim"|category=="gest_first") %>% dplyr::select(pairnr, subpart, category)

#create binary 'order' DV

#sim=1 vs. gestfirst=0
df_sim_vs_gestfirst$order <- NA
for (x in 1:nrow(df_sim_vs_gestfirst)){
  if (df_sim_vs_gestfirst[x,]$category=="sim") {df_sim_vs_gestfirst[x,]$order <- 1}
  else if (df_sim_vs_gestfirst[x,]$category=="gest_first") {df_sim_vs_gestfirst[x,]$order<- 0}
}

#mixed effects model
sim_vs_gestfirst <- glmer(order ~ 1 + (1|pairnr) + (1|subpart), data = df_sim_vs_gestfirst, family=binomial,
                                       control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

summary(sim_vs_gestfirst)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: order ~ 1 + (1 | pairnr) + (1 | subpart)
##    Data: df_sim_vs_gestfirst
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##    158.1    166.4    -76.1    152.1      114 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4869 -1.2265  0.6740  0.7096  0.9164 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  subpart (Intercept) 0.06679  0.2584  
##  pairnr  (Intercept) 0.13229  0.3637  
## Number of obs: 117, groups:  subpart, 33; pairnr, 9
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   0.5895     0.2533   2.327     0.02 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note also that the category of simultaneous emergence consist of two subcategories, of which “multiturn” (i.e., gesture-speech ensemble repeated as a whole within turn) is most frequent:

#additional: count sim_multiturn and sim_withinturn separately
df_lexandgest %>%
  subset(category=="sim_multiturn"|category=="sim_withinturn")%>%
  count(category)
##         category  n
## 1  sim_multiturn  8
## 2 sim_withinturn 67

Let’s plot the rounds in which lexical and gestural alignment emerge for the simultaneous and successive cases:

#SIMULTANEOUS
df_sim <- df_sim[order(df_sim$lex_align_round_achieved),]
df_sim$datapoint_count <- c(1:nrow(df_sim))

p_sim <- ggplot(data=df_sim, aes(x=gest_align_round_achieved, xend=lex_align_round_achieved, y=datapoint_count), na.rm=T) +
        #create a thick line between x and xend instead of using default provided by geom_dubbell
        geom_segment(aes(x=gest_align_round_achieved, 
                         xend=lex_align_round_achieved, 
                         y=datapoint_count, 
                         yend=datapoint_count), 
                     color="#c9c9c9", size=.02, alpha=.4)+
        geom_dumbbell(color="#c9c9c9", 
                      size=0.15,
                      size_x=0.15, 
                      size_xend = 0.15,
                      #Note: there is no US:'color' for UK:'colour' 
                      # in geom_dumbbel unlike standard geoms in ggplot()
                      colour_x=lightgrey, 
                      colour_xend = lightgrey)+
        labs(x="round of first emergence", y="datapoint", 
              title=paste0("Multimodal emergence  \n(*n* = ",sim_N,")"))+
        scale_x_continuous(breaks=seq(1,6,1), expand=c(0,0.6))+
        scale_y_continuous(limits=c(1,75),breaks=seq(1,75,1))+
        expand_limits(x=c(1,6))+
        theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2), 
              axis.ticks=element_line(size=0.2),axis.ticks.length=unit(0.5, "mm"))+
        theme(axis.text.y= element_blank(), axis.title.y = element_blank())+
        theme(plot.margin = unit(c(1,0.5,-0.5,0), "lines"))+
        theme(plot.title = element_text(hjust = 0.5, size=7))+
        theme(axis.title.x=element_text(size=6), axis.text.x=element_text(size=6, colour="black"), title=element_text(size=6))+
  theme(plot.title = ggtext::element_markdown()) #necessary to get 'n' in italics in title

#GEST FIRST
df_gest_first <- df_gest_first[order(df_gest_first$gest_align_round_achieved, df_gest_first$lex_align_round_achieved),]
df_gest_first$datapoint_count <- c(1:nrow(df_gest_first))

p_gestfirst <- ggplot(data=df_gest_first, aes(x=gest_align_round_achieved_w_order, xend=lex_align_round_achieved_w_order, y=datapoint_count), na.rm=T) +
        #create a thick line between x and xend instead of using default provided by geom_dubbell
        geom_segment(aes(x=gest_align_round_achieved_w_order, 
                         xend=lex_align_round_achieved_w_order, 
                         y=datapoint_count, 
                         yend=datapoint_count), 
                     color="#c9c9c9", size=.02, alpha=.4)+
        geom_dumbbell(color="#c9c9c9", 
                      size=0.15,
                      size_x=0.15, 
                      size_xend = 0.15,
                      colour_x=orange, 
                      colour_xend = purple)+
        labs(x="round of first emergence", y="datapoint", 
              title=paste0("Gestural alignment first  \n(*n* = ",gestfirst_N,")"))+
        scale_x_continuous(breaks=seq(1,6,1), expand=c(0,0.6))+
        scale_y_continuous(limits=c(1,75),breaks=seq(1,42,1))+
        expand_limits(x=c(1,6))+
        theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2), 
              axis.ticks=element_line(size=0.2),axis.ticks.length=unit(0.5, "mm"))+
        theme(axis.text.y= element_blank(), axis.title.y = element_blank())+
        theme(plot.margin = unit(c(1,0.5,-0.5,0), "lines"))+
        theme(plot.title = element_text(hjust = 0.5, size=7))+
        theme(axis.title.x=element_text(size=6), axis.text.x=element_text(size=6, colour="black"), title=element_text(size=6))+
        theme(plot.title = ggtext::element_markdown())

# LEX FIRST
df_lex_first <- df_lex_first[order(df_lex_first$lex_align_round_achieved, df_lex_first$gest_align_round_achieved),]
df_lex_first$datapoint_count <- c(1:nrow(df_lex_first))

p_lexfirst <- ggplot(df_lex_first, aes(x=gest_align_round_achieved_w_order, xend=lex_align_round_achieved_w_order, y=datapoint_count), na.rm=T) +
        #create a thick line between x and xend instead of using default provided by geom_dubbell
        geom_segment(aes(x=gest_align_round_achieved_w_order, 
                         xend=lex_align_round_achieved_w_order, 
                         y=datapoint_count, 
                         yend=datapoint_count), 
                     color="#c9c9c9", size=.02, alpha=.4)+
        geom_dumbbell(color="#c9c9c9", 
                      size=0.15,
                      size_x=0.15, 
                      size_xend =0.15,
                      colour_x=orange, 
                      colour_xend = purple)+
                labs(x="round of first emergence", y="datapoint", 
             title=paste0("Lexical alignment first  \n(*n* = ",lexfirst_N,")"))+
        scale_x_continuous(breaks=seq(1,6,1), expand=c(0,0.6))+
        scale_y_continuous(limits=c(1,75),breaks=seq(1,31,1))+
        expand_limits(x=c(1,6))+
        theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.x = element_line(colour="grey", size=0.2), 
              axis.line.y=element_line(colour="grey", size=0.2), 
              axis.ticks=element_line(size=0.2),axis.ticks.length=unit(0.5, "mm"))+
        theme(axis.text.y= element_blank(), axis.title.y = element_blank())+
        theme(plot.margin = unit(c(1,1,-0.5,0), "lines"))+
        theme(plot.title = element_text(hjust = 0.5, size=7))+
        theme(axis.title.x=element_text(size=6), axis.text.x=element_text(size=6, colour="black"), title=element_text(size=6))+
        theme(plot.title = ggtext::element_markdown()) #necessary to get 'n' in italics in title

To make the distributions of when alignment emerged first more comprehensible, we create density plots:

p_density_sim <- ggplot(data=df_sim, aes(x=lex_align_round_achieved, ..scaled..))+
  geom_density(color=lightgrey, fill=lightgrey, alpha=.8, size=.2) +
  scale_x_discrete(name ="round", limits=c("1","2","3","4","5","6"))+
  scale_y_reverse(limits=c(1,0))+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.y=element_line(colour="grey", size=0.2), axis.ticks=element_line(colour="grey", size=0.2),axis.ticks.length=unit(0.5, "mm"))+
  theme(axis.text.y= element_blank(), axis.title.y = element_blank())+
  theme(axis.text.x= element_blank(), axis.title.x = element_blank(), axis.line.x=element_blank(), axis.ticks.x=element_blank())+
  theme(plot.margin = unit(c(0,0.5,0,0), "lines"))

p_density_gestfirst <- ggplot(data=df_gest_first, aes(x=gest_align_round_achieved, ..scaled..))+
  geom_density(color=orange, fill=orange, alpha=.8, size=.2) +
  scale_x_discrete(name ="round", limits=c("1","2","3","4","5","6"))+
  scale_y_reverse(limits=c(1,0))+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.y=element_line(colour="grey", size=0.2), axis.ticks=element_line(colour="grey", size=0.2),axis.ticks.length=unit(0.5, "mm"))+
  theme(axis.text.y= element_blank(), axis.title.y = element_blank())+
  theme(axis.text.x= element_blank(), axis.title.x = element_blank(), axis.line.x=element_blank(), axis.ticks.x=element_blank())+
  theme(plot.margin = unit(c(0,0.5,0,0), "lines"))

p_density_lexfirst <- ggplot(data=df_lex_first, aes(x=lex_align_round_achieved, ..scaled..))+
  geom_density(color=purple, fill=purple, alpha=.8, size=.2) +
  scale_x_discrete(name ="round", limits=c("1","2","3","4","5","6"))+
  scale_y_reverse(limits=c(1,0))+
  theme_light()+
        theme(panel.grid.major = element_blank(),
              panel.grid.minor = element_blank(),
              panel.border = element_blank(),
              axis.line.y=element_line(colour="grey", size=0.2), axis.ticks=element_line(colour="grey", size=0.2),axis.ticks.length=unit(0.5, "mm"))+
  theme(axis.text.y= element_blank(), axis.title.y = element_blank())+
  theme(axis.text.x= element_blank(), axis.title.x = element_blank(), axis.line.x=element_blank(), axis.ticks.x=element_blank())+
  theme(plot.margin = unit(c(0,1,0,0), "lines"))

#plot_grid(p_density_sim, p_density_gestfirst, p_density_lexfirst, nrow=1)

Let’s put that all together into one plot (Figure 8 in ms)

#Fig8
p_legend <- get_legend(
  ggplot(data=df_lexandgest_freq, aes(x=1, y=freq, fill=classification)) +
  geom_bar(stat="identity")+
  theme(axis.line.y=element_blank(), axis.ticks.y=element_blank(), axis.text.y=element_blank())+
  scale_fill_manual(values=c(lightgrey, orange, purple), labels=c("multimodal alignment", "gestural alignment", "lexical alignment"))+
  theme(legend.key.size = unit(0.5,"line"))+
  theme(legend.key=element_blank(),legend.background=element_blank())+
  theme(legend.direction="vertical", legend.text=element_text(size=6), legend.title=element_blank())+
  theme(plot.margin = unit(c(0, 0, 0, 0), "lines"))
)

p_order_row1 <- plot_grid(stacked_lexandgestorder, p_sim, p_gestfirst, p_lexfirst, ncol=4, align="h", labels=c('A', 'B', 'C', 'D'), label_size=7)
p_order_row2 <- plot_grid(p_legend, p_density_sim, p_density_gestfirst, p_density_lexfirst, ncol=4, align="v", axis="b")
p_Fig8 <- plot_grid(p_order_row1, p_order_row2, nrow=2, align="hv", rel_heights=c(5.5,1))
p_Fig8

To investigate whether the distributions of the round in which alignment emerged varies across simultaneous vs. lex_first vs. gest_first emergence, we use 2-sided Kolmogorov-Smirnov tests:

sim <- df_sim$lex_align_round_achieved
lexfirst <- df_lex_first$lex_align_round_achieved
gestfirst <- df_gest_first$gest_align_round_achieved

ks.test(lexfirst, gestfirst)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  lexfirst and gestfirst
## D = 0.096774, p-value = 0.9962
## alternative hypothesis: two-sided
ks.test(sim, lexfirst)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  sim and lexfirst
## D = 0.21376, p-value = 0.2688
## alternative hypothesis: two-sided
ks.test(sim, gestfirst)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  sim and gestfirst
## D = 0.25524, p-value = 0.05992
## alternative hypothesis: two-sided
ks.boot(lexfirst, gestfirst)
## $ks.boot.pvalue
## [1] 0.424
## 
## $ks
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  Tr and Co
## D = 0.096774, p-value = 0.9962
## alternative hypothesis: two-sided
## 
## 
## $nboots
## [1] 1000
## 
## attr(,"class")
## [1] "ks.boot"
ks.test(sim, lexfirst)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  sim and lexfirst
## D = 0.21376, p-value = 0.2688
## alternative hypothesis: two-sided
ks.test(sim, gestfirst)
## 
##  Two-sample Kolmogorov-Smirnov test
## 
## data:  sim and gestfirst
## D = 0.25524, p-value = 0.05992
## alternative hypothesis: two-sided

This shows that the distributions are not significantly different.

#add some white space to left and right of Fig6 (boxplot), to still save with width=143 mm
p_Fig6_centered <- plot_grid(NULL, p_Fig6, NULL, nrow=1, rel_widths=c(0.8,1.7,0.8))


ggsave(p_Fig6_centered, file="2_figures/Fig6.png", bg="white", width=143, height=55, units="mm", dpi=600)
ggsave(p_Fig6_centered, file="2_figures/Fig6.pdf", bg="white", width=143, height=55, units="mm")

ggsave(p_Fig7, file="2_figures/Fig7.png", bg="white", width=143, height=55, units="mm", dpi=600)
ggsave(p_Fig7, file="2_figures/Fig7.pdf", bg="white", width=143, height=55, units="mm")

ggsave(p_Fig8, file="2_figures/Fig8.png", bg="white", width=143, height=75,units="mm", dpi=600)
ggsave(p_Fig8, file="2_figures/Fig8.pdf", bg="white", width=143, height=75,units="mm")