After having linked the daily DASA scores with our outcome data (restraints, incidents), we also add the diagnosis data, now that it is ready. We link based on encounter, as it appears that there is a diagnosis recorded for each encounter (with multiple diagnoses recorded for single clients). Fortunately, most encounters from the DASA data have associated diagnoses:

Diagnosis available for DASA encounter?N
Yes19173
No333
R code
##LINKING DASA + OUTCOMES TO DIAGNOSIS#####
list.files("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat")
diag <- read.csv("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/diag_marta_09.15.2022.csv")
dasa <- read.csv("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/dasa_inc_res.csv")

length(unique(diag$research_subject_id)) #33638 unique clients
length(unique(diag$encounter_number)) #for 114944 encounters
#so while we have multiple diagnoses for a single client, we have only a single diagnosis for each encounter, which we can use to link to DASA

diag_encs <- diag$encounter_number #so 114944 encounters for diagnosis
dasa_encs <- unique(dasa$EncounterNumber) #we have 19506 unique encounters for dasa

count(dasa_encs %in% diag_encs)
#x  freq
#1 FALSE   333
#2  TRUE 19173
#this is great, looks like we have diagnoses for most encounters (just 333 are missing)

#we don't need all vars from diag so we retain encounter_number (which we use to link), Diagnosis_Final (the coded diagnosis), and substance_induced
diag <- diag[,c(3,5,6)]

dasa_diag <- merge(dasa, diag, by.x="EncounterNumber", by.y="encounter_number", all.x=T, all.y=F)

colnames(dasa_diag)
#clean up vars
dasa_diag <- dasa_diag[,c(1,6:9,15:54)]

write.csv(dasa_diag, "dasa_diag.csv")
write.csv(dasa_diag, "/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/dasa_diag.csv")


The resulting data (dasa_diag.csv) can be found on the Y Drive (Preliminary Modelling folder).  The diagnosis information is as follows (repeating for each encounter as there are multiple DASAs for each encounter):

Diagnosis_FinalN
                     Anxiety disorder  4692
                Bipolar mood disorder 28742
                  Depressive disorder 21411
             Neurocognitive disorders  2987
         Neurodevelopmental disorders  3311
                                Other  5011
                 Personality disorder  7963
           Primary psychotic disorder106935
                            Remission     1
          Substance-related disorder 17052
Trauma and stressor related disorder  3587
                                <NA>   908
Substance induced?
True6135
False195557
NA908


As a check, we can generate plots to examine whether trajectories of DASA scores stratified by diagnosis.

R CODE (visualization)

require(ggplot2)
library(EnvStats)

p <- ggplot(data = dasa_diag, aes(x = Appt, y = DASA.Total.Score, group = EncounterNumber))

p + labs(title="DASA score by diagnosis") + geom_line() + stat_smooth(method="loess", aes(group = 1)) + stat_summary(aes(group = 1), geom = "point", fun.y = mean, shape = 21, size = 3, color = "blue", fill = "grey") + facet_grid(. ~ Diagnosis_Final) + stat_n_text(size=2.75) +
  xlab("Day") + ylab("DASA score")

Note, click on the image to enlarge. Preliminarily, it looks like DASA scores are lowest for anxiety/depressive disorders (which makes sense, as these illnesses may be considered less severe).

  • No labels