PsyLaP Pipeline

Description:

An independent multi-dimensional feature extraction pipeline that can be used for research projects that rely on unstructured textual data.

Purpose

In statistical analysis and ML, it is hard to interpret text (or in general, unstructured data) because of its nature. These approaches require some sort of numbers or at least structured textual categories (such as dividing tweets into different categories based on their topics).

I call the process of extracting structured information from unstructured data feature extraction.

Benefits

l goal is to make predictions about the mental state of the users/patients/individuals by using textual features.

How

The features encompass different aspects of the text, from sentiment analysis, finding the topic of the document, counting the number of words related to several psycholinguistics categories, etc...

The next step would be to find the relation of these features with the labels of the datasets, which might be some value indicating the mental state of an individual.

Further reading

Additional information

I am currently using my pipeline for two datasets, one is for the people dealing with suicide ideation thoughts which I have their google search, and the second one consists of Morning and Evening journals for the sleep health project, which tries to find some patterns based on textual features to predict the sleep quality and other labels provided in the dataset.

Resource Stack

Resource	Location
Point of Contact	Unknown User (amir_kazemeinizadeh)
GDrive	https://drive.google.com/drive/folders/1QdrZPVTFj9lbHpg2Ga4whwiZBO73YSty?usp=sharing
Project Tracking	https://github.com/aid4mh/sleepHealth_dataAnalysis/projects/1
Code Repo	https://github.com/aid4mh/sleepHealth_dataAnalysis

PsyLaP Pipeline

Resource Stack

Reference Papers: