After having linked the daily DASA scores with our outcome data (restraints, incidents), we also add the diagnosis data, now that it is ready. We link based on encounter, as it appears that there is a diagnosis recorded for each encounter (with multiple diagnoses recorded for single clients). Fortunately, most encounters from the DASA data have associated diagnoses:
Diagnosis available for DASA encounter? | N |
---|---|
Yes | 19173 |
No | 333 |
##LINKING DASA + OUTCOMES TO DIAGNOSIS##### list.files("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat") diag <- read.csv("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/diag_marta_09.15.2022.csv") dasa <- read.csv("/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/dasa_inc_res.csv") length(unique(diag$research_subject_id)) #33638 unique clients length(unique(diag$encounter_number)) #for 114944 encounters #so while we have multiple diagnoses for a single client, we have only a single diagnosis for each encounter, which we can use to link to DASA diag_encs <- diag$encounter_number #so 114944 encounters for diagnosis dasa_encs <- unique(dasa$EncounterNumber) #we have 19506 unique encounters for dasa count(dasa_encs %in% diag_encs) #x freq #1 FALSE 333 #2 TRUE 19173 #this is great, looks like we have diagnoses for most encounters (just 333 are missing) #we don't need all vars from diag so we retain encounter_number (which we use to link), Diagnosis_Final (the coded diagnosis), and substance_induced diag <- diag[,c(3,5,6)] dasa_diag <- merge(dasa, diag, by.x="EncounterNumber", by.y="encounter_number", all.x=T, all.y=F) colnames(dasa_diag) #clean up vars dasa_diag <- dasa_diag[,c(1,6:9,15:54)] write.csv(dasa_diag, "dasa_diag.csv") write.csv(dasa_diag, "/genome/scratch/Neuroinformatics/mmaslej2/Risk_Assessment_Dat/dasa_diag.csv")
The resulting data (dasa_diag.csv) can be found on the Y Drive (Preliminary Modelling folder). The diagnosis information is as follows (repeating for each encounter as there are multiple DASAs for each encounter):
Diagnosis_Final | N |
---|---|
Anxiety disorder | 4692 |
Bipolar mood disorder | 28742 |
Depressive disorder | 21411 |
Neurocognitive disorders | 2987 |
Neurodevelopmental disorders | 3311 |
Other | 5011 |
Personality disorder | 7963 |
Primary psychotic disorder | 106935 |
Remission | 1 |
Substance-related disorder | 17052 |
Trauma and stressor related disorder | 3587 |
<NA> | 908 |
Substance induced? | |
---|---|
True | 6135 |
False | 195557 |
NA | 908 |
As a check, we can generate plots to examine whether trajectories of DASA scores stratified by diagnosis.
R CODE (visualization) |
---|
require(ggplot2) p <- ggplot(data = dasa_diag, aes(x = Appt, y = DASA.Total.Score, group = EncounterNumber)) p + labs(title="DASA score by diagnosis") + geom_line() + stat_smooth(method="loess", aes(group = 1)) + stat_summary(aes(group = 1), geom = "point", fun.y = mean, shape = 21, size = 3, color = "blue", fill = "grey") + facet_grid(. ~ Diagnosis_Final) + stat_n_text(size=2.75) + |
Note, click on the image to enlarge. Preliminarily, it looks like DASA scores are lowest for anxiety/depressive disorders (which makes sense, as these illnesses may be considered less severe).