Linking DASA to diagnoses (v2 cohort)

After having linked the daily DASA scores with our outcome data (restraints, incidents), we also add the diagnosis data, now that it is ready. We link based on encounter, as it appears that there is a diagnosis recorded for each encounter (with multiple diagnoses recorded for single clients). Fortunately, most encounters from the DASA data have associated diagnoses:

Diagnosis available for DASA encounter?	N
Yes	19173
No	333

R code

##LINKING DASA + OUTCOMES TO DIAGNOSIS#####
list.files("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat")
diag <- read.csv("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/diag_marta_09.15.2022.csv")
dasa <- read.csv("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/dasa_inc_res.csv")

length(unique(diag$research_subject_id)) #33638 unique clients
length(unique(diag$encounter_number)) #for 114944 encounters
#so while we have multiple diagnoses for a single client, we have only a single diagnosis for each encounter, which we can use to link to DASA

diag_encs <- diag$encounter_number #so 114944 encounters for diagnosis
dasa_encs <- unique(dasa$EncounterNumber) #we have 19506 unique encounters for dasa

count(dasa_encs %in% diag_encs)
#x  freq
#1 FALSE   333
#2  TRUE 19173
#this is great, looks like we have diagnoses for most encounters (just 333 are missing)

#we don't need all vars from diag so we retain encounter_number (which we use to link), Diagnosis_Final (the coded diagnosis), and substance_induced
diag <- diag[,c(3,5,6)]

dasa_diag <- merge(dasa, diag, by.x="EncounterNumber", by.y="encounter_number", all.x=T, all.y=F)

colnames(dasa_diag)
#clean up vars
dasa_diag <- dasa_diag[,c(1,6:9,15:54)]

write.csv(dasa_diag, "dasa_diag.csv")
write.csv(dasa_diag, "/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/dasa_diag.csv")

The resulting data (dasa_diag.csv) can be found on the Y Drive (Preliminary Modelling folder). The diagnosis information is as follows (repeating for each encounter as there are multiple DASAs for each encounter):

Diagnosis_Final	N
Anxiety disorder	4692
Bipolar mood disorder	28742
Depressive disorder	21411
Neurocognitive disorders	2987
Neurodevelopmental disorders	3311
Other	5011
Personality disorder	7963
Primary psychotic disorder	106935
Remission	1
Substance-related disorder	17052
Trauma and stressor related disorder	3587
<NA>	908

Substance induced?
True	6135
False	195557
NA	908

As a check, we can generate plots to examine whether trajectories of DASA scores stratified by diagnosis.

R CODE (visualization)

require(ggplot2)
library(EnvStats)

p <- ggplot(data = dasa_diag, aes(x = Appt, y = DASA.Total.Score, group = EncounterNumber))

p + labs(title="DASA score by diagnosis") + geom_line() + stat_smooth(method="loess", aes(group = 1)) + stat_summary(aes(group = 1), geom = "point", fun.y = mean, shape = 21, size = 3, color = "blue", fill = "grey") + facet_grid(. ~ Diagnosis_Final) + stat_n_text(size=2.75) +
xlab("Day") + ylab("DASA score")

Note, click on the image to enlarge. Preliminarily, it looks like DASA scores are lowest for anxiety/depressive disorders (which makes sense, as these illnesses may be considered less severe).

Page tree

Linking DASA to diagnoses (v2 cohort)