PsyLaP Pipeline


Description:

An independent multi-dimensional feature extraction pipeline that can be used for research projects that rely on unstructured textual data.

Purpose

 In statistical analysis and ML, it is hard to interpret text (or in general, unstructured data) because of its nature. These approaches require some sort of numbers or at least structured textual categories (such as dividing tweets into different categories based on their topics). 

I call the process of extracting structured information from unstructured data feature extraction.

Benefits

l goal is to make predictions about the mental state of the users/patients/individuals by using textual features.

How

The features encompass different aspects of the text, from sentiment analysis, finding the topic of the document, counting the number of words related to several psycholinguistics categories, etc... 

The next step would be to find the relation of these features with the labels of the datasets, which might be some value indicating the mental state of an individual.

Further reading

Additional information

I am currently using my pipeline for two datasets, one is for the people dealing with suicide ideation thoughts which I have their google search, and the second one consists of Morning and Evening journals for the sleep health project, which tries to find some patterns based on textual features to predict the sleep quality and other labels provided in the dataset.



Resource Stack

Resource

Location 

Point of Contact
GDrivehttps://drive.google.com/drive/folders/1QdrZPVTFj9lbHpg2Ga4whwiZBO73YSty?usp=sharing
Project Trackinghttps://github.com/aid4mh/sleepHealth_dataAnalysis/projects/1
Code Repo

https://github.com/aid4mh/sleepHealth_dataAnalysis


Reference Papers: