What is a Data Mart?
A Data Mart is the access layer of the Neuroinformatics Platform that is used to provide users with data. It is a relational database that stores data in columns and rows, making it easy to organize and access.
The key objective of the Data Mart is to give users direct access to specific data relevant to their project or group.
Requesting a new Data Mart
A project request form must be completed for any new study. In this form, the project team can select the desired tools for data collection and management (REDCap, XNAT, LabKey, etc.)
Additionally, in the same form, the project team can request a Data Mart, which will contain integrated data from all data collection tools used in this project. The data will be stored in a relational database (PostgreSQL). Access is granted to the project lead indicated in the project request form. Any additional access is requested by submitting a user access request form.
Project requests are processed by the KCNI OPS team. Your project will be created in each of the selected systems, and access will be granted accordingly.
Requesting access to an existing Data Mart
A user access request form must be completed for any new user wishing to obtain access to a project. Users can request access to the Data Mart, as well as to any of the data collection tools used in the project. The request form will be automatically routed to the project lead for approval.
Once the request is approved, the KCNI OPS team will grant the user project-level access in the selected systems. Data Mart connection parameters will be sent via email, and the user will be able to authenticate using CAMH credentials.
Data Mart structure
Data Marts are created in a relational database (PostgreSQL). Each Data Mart is a separate PostgreSQL schema, which allows it to have a distinct set of tables and permissions. The name of the schema corresponds to the Project ID as per the Neuroinformatics Platform convention.
The table names correspond to each data collection tool used in the project (e.g. redcap_data, xnat_data).
Connecting to the Data Mart
Users can connect to the Data Mart with a number of tools that support database connections, e.g. DBeaver, Tableau, Jupyter Notebook, and many more.
Connection information is sent to the user when their request gets approved and completed.
The user can then enter the connection information and their CAMH credentials into the tool of their choice and connect to explore the data, build dashboards, and perform analysis.
Connection using Python
import pandas as pd from sqlalchemy import create_engine from sqlalchemy.types import String, Date # read postgres engine postgres_engine_credentials = 'postgresql+psycopg2://[username]:[password]@[server name].camhres.ca/[db name]' # replace '[...]' with appropriate values # establish a postgres connection engine = create_engine(postgres_engine_credentials) # create a test query and submit against postgres test_query = "select * from [schema name].[table name]" # replace '[...]' with schema name and table names test_query_results_df = pd.read_sql(test_query, engine) # see results test_query_results_df
Connection using R
## Required packages are DBI and odbc, install using the following (required once): install.packages("DBI") install.packages("odbc") ##Load the packages library(DBI) library(odbc) ##Create the connection with the database con <- dbConnect(RPostgres::Postgres(), dbname = '[database name]', host = '[host address]', port = [port number], user = '[windows user name]', password = '[windows pw]') #Setup the query: Query = "SELECT * FROM [schema].[table]" #Send the query to the database res <- dbSendQuery(con, Query) #Fetch data coming out from that query in R Dataframe format df = dbFetch(res)
Connection using DBeaver
- Launch DBeaver and go to add connections and click on Postgres
2. Enter the Host, Database and Port information provided by the NIP team along with your Username and Password
3. Click on Test Connection to make sure a connection has been established and then press OK to close the settings.
Connection using Tableau
- Launch Tableau and go to Connect To a Server and enter the information provided by the NIP team below along with the Username and Password:
2. Once the connection is established, the available data tables will appear under the Database pane on the Data Source sheet:
3. Drag the table of interest from the data tables to access the data:
Backups and Replicas
The PostgreSQL relational database is backed up daily to the research storage.
Contact Information
For any questions, please contact researchit@camh.ca