What is a Data Mart?

A Data Mart is the access layer of the Neuroinformatics Platform that is used to provide users with data. It is a relational database that stores data in columns and rows, making it easy to organize and access.

The key objective of the Data Mart is to give users direct access to specific data relevant to their project or group. 

Data Mart Data Flow


  • Data from source systems are merged into the data warehouse on a nightly basis
  • Data from the warehouse is then filtered and placed into individual data marts for study teams to access
  • the data from the data marts can be accessed using a variety of tools depending on the teams comfort; the user will likely need to be connected to CAMH VPN to access the tools/data
  • Finally, in some cases, study teams are allowed to write data back into their data marts
    • the data will not be merged with the data warehouse
    • it should be in tables that are not auto-created by the warehouse→ data mart process otherwise the data will be overwritten

Requesting a new Data Mart

A project request form must be completed for any new study. In this form, the project team can select the desired tools for data collection and management (REDCap, XNAT, LabKey, etc.)

Additionally, in the same form, the project team can request a Data Mart, which will contain integrated data from all data collection tools used in this project. The data will be stored in a relational database (PostgreSQL). Access is granted to the project lead indicated in the project request form. Any additional access is requested by submitting a user access request form. 

Project requests are processed by the KCNI OPS team. Your project will be created in each of the selected systems, and access will be granted accordingly.

Requesting access to an existing Data Mart 

A user access request form must be completed for any new user wishing to obtain access to a project. Users can request access to the Data Mart, as well as to any of the data collection tools used in the project. The request form will be automatically routed to the project lead for approval.

Once the request is approved, the KCNI OPS team will grant the user project-level access in the selected systems. Data Mart connection parameters will be sent via email, and the user will be able to authenticate using CAMH credentials.

Data Mart structure

Data Marts are created in a relational database (PostgreSQL). Each Data Mart is a separate PostgreSQL schema, which allows it to have a distinct set of tables and permissions. The name of the schema corresponds to the Project ID as per the Neuroinformatics Platform convention.

The table names correspond to each data collection tool used in the project (e.g. redcap_data, xnat_data).

Connecting to the Data Mart

Users can connect to the Data Mart with a number of tools that support database connections, e.g. DBeaver, Tableau, Jupyter Notebook, and many more.

Connection information is sent to the user when their request gets approved and completed. 

The user can then enter the connection information and their CAMH credentials into the tool of their choice and connect to explore the data, build dashboards, and perform analysis.

Connection using Python

import pandas as pd
  
from sqlalchemy import create_engine
from sqlalchemy.types import String, Date
  
# read postgres engine
postgres_engine_credentials = 'postgresql+psycopg2://[username]:[password]@[server name].camhres.ca/[db name]'  # replace '[...]' with appropriate values
  
# establish a postgres connection
engine = create_engine(postgres_engine_credentials)
  
# create a test query and submit against postgres
test_query = "select * from [schema name].[table name]" # replace '[...]' with schema name and table names
test_query_results_df = pd.read_sql(test_query, engine)
  
# see results
test_query_results_df

Connection using R

## Required packages are DBI and odbc, install using the following (required once):
 
install.packages("DBI")
 
install.packages("odbc")
 
 
 
##Load the packages
 
library(DBI)
 
library(odbc)
 
 
 
##Create the connection with the database
 
con <- dbConnect(RPostgres::Postgres(),
 
                 dbname = '[database name]',
 
                 host = '[host address]',
 
                 port = [port number],
 
                 user = '[windows user name]',
 
                 password = '[windows pw]')
 
 
 
#Setup the query:
 
Query = "SELECT * FROM [schema].[table]"
 
 
 
#Send the query to the database
 
res <- dbSendQuery(con, Query)
 
 
 
##Fetch data coming out from that query in R Dataframe format
 
df = dbFetch(res)

Connection using DBeaver


DBeaver Usage Note

To use DBeaver to access KCNI's postgres instances, you need to be using a CAMH laptop or be on the CAMH network.

  1. If you don't have DBeaver installed, the community edition can be installed from here: https://dbeaver.io/download/
  2. Launch DBeaver and go to add connections  and click on Postgres

       2. Enter the Host, Database and Port information provided by the NIP team along with your Username and Password

                                       

      3. Click on Test Connection to make sure a connection has been established and then press OK to close the settings.


Connection using Tableau


  1. Launch Tableau and go to Connect To a Server and enter the information provided by the NIP team below along with the Username and Password:


                                     


     2. Once the connection is established, the available data tables will appear under the Database pane on the Data Source sheet:

                                   

                                    

     3. Drag the table of interest from the data tables to access the data:

                                     

                                     

Backups and Replicas

The PostgreSQL relational database is backed up daily to the research storage. 

Contact Information

For any questions, please contact researchit@camh.ca

  • No labels