PsyLaP Pipeline
Description:
An independent multi-dimensional feature extraction pipeline that can be used for research projects that rely on unstructured textual data.
Background/History:
“Personality is the characteristic sets of behaviors, cognitions, and emotional patterns that evolve from biological and environmental factors. “ (Wikipedia)
There are several famous personality tests such as MBTI and OCEAN in which you get some score on each aspect of your personality and the aim of my projects was finding a pattern between the individuals’ personality and their writings (which I used a specific type named Stream of consciousness/Expressive writing). In this type of writing, you write whatever comes to your head without censoring or filtering anything. For more information: https://scholar.google.com/citations?user=XskUR5oAAAAJ&hl=en
Purpose
in statistical analysis and ML, it is hard to interpret text (or in general, unstructured data) because of its nature. These approaches require some sort of numbers or at least structured textual categories (such as dividing tweets into different categories based on their topics).
I call the process of extracting structured information from unstructured data feature extraction.
Benefits
l goal is to make predictions about the mental state of the users/patients/individuals by using textual features.
How
The features encompass different aspects of the text, from sentiment analysis, finding the topic of the document, counting the number of words related to several psycholinguistics categories, etc...
The next step would be to find the relation of these features with the labels of the datasets, which might be some value indicating the mental state of an individual.
Further reading
Additional information
I am currently using my pipeline for two datasets, one is for the people dealing with suicide ideation thoughts which I have their google search, and the second one consists of Morning and Evening journals for the sleep health project, which tries to find some patterns based on textual features to predict the sleep quality and other labels provided in the dataset.
Resource Stack
Resource | Location |
---|---|
Point of Contact | |
GDrive | https://drive.google.com/drive/folders/1QdrZPVTFj9lbHpg2Ga4whwiZBO73YSty?usp=sharing |
Project Tracking | https://github.com/aid4mh/sleepHealth_dataAnalysis/projects/1 |
Code Repo |