NC TraCS: Technical Resources

This page contains a number of education resources (videos, slides,and Jupyter notebooks) on various parts of the data science lifecycle/process, but geared towards medical research using OHDSI's OMOP Common Data Model. If there's a specific topic, you would like to covered, please submit a request.

Data Science Workflow

Video  Slides

Exploratory Data Analysis

Video   Notebook   R Source Code   Data

SQL Tutorials

Introduction to SQL   More Advanced Queuries   Sample SQL  

Statistical Concepts

In healthcare and biomedical research, statistical methods are essential for drawing reliable conclusions from data, assessing the effectiveness of treatments, and making informed clinical decisions. From hypothesis testing in clinical trials to evaluating predictive models in diagnostics, a solid understanding of key statistical principles enhances the ability to interpret findings and apply evidence-based practices. This tutorial series introduces fundamental statistical concepts essential for understanding and conducting research in healthcare.

Hypothesis Testing

Hypothesis testing is a statistical method used to evaluate assumptions about a population based on sample data. It plays a crucial role in clinical trials and medical research. For example, hypothesis testing can be used to assess whether a new treatment provides greater benefits compared to the current standard of care. A hypothesis is a statement about a measurable population parameter, such as the effect of smoking on specific health outcomes. Hypothesis testing can then be used to assess whether the data provides sufficient evidence to reject or not to reject the statement in favor of an alternative idea.

In general, there are two hypotheses: a null hypothesis (H_0) and an alternative hypothesis (H_1 or H_a). The null hypothesis typically represents the assumption that there is no effect or difference. For example, a null hypothesis could be: “There is no difference in health outcomes between smokers and non-smokers.” The alternative hypothesis is typically the opposite– that there is an effect or difference. For example: “Smokers have worse health outcomes compared to non-smokers.”

Our video tutorials present some key concepts in hypothesis testing:

Rejection Region

Explore key concepts and important information about the Rejection Region in Hypothesis Testing. This tutorial uses real-world examples to guide you through step-by-step calculations and statistical reasoning.
Video   Slides   Python source code   R source code

Type I and Type II Errors

Learn about Type I and Type II errors in hypothesis testing, including their meanings, implications, and how they affect statistical decision-making.
Video  Slides   Python source code   R source code

Power Function

Understand the power function in hypothesis testing and its relationship with sample size, significance level, effect size, and Type II errors.
Video  Slides   Python source code   R source code

Significance Level

Learn about the significance level (alpha) in hypothesis testing, its role in defining the rejection region, and its impact on Type I and Type II errors.
Video  Slides

P-values

Learn about the P-value in hypothesis testing, how it helps determine statistical significance, and its relationship with the rejection region and significance level.
Video  Slides   Python source code   R source code

Confidence and Prediction Intervals

In medical (or any) research, it is important to quantify the uncertainty around a point estimate. Confidence and prediction intervals are statistical tools for this purpose.
  Python source code   R source code

Bias and Variance

Explore bias and variance in statistical inference and their trade-offs in predictive accuracy.
Video   Slides   R source code   Test Data   Train Data

ROC Curves

In healthcare analytics, predictive models are used for tasks such as diagnosing diseases or predicting patient outcomes. It is critical to evaluate how well these models perform using appropriate metrics. Some commonly used evaluation metrics include ROC Curves (Receiver Operating Characteristic) and the AUC (Area Under the Curve), which measure the model’s ability to distinguish between positive and negative cases across various classification thresholds. Additionally, Precision, Recall, and the F1 Score are used to assess classification performance, highlighting the model’s ability to balance identifying true positives while minimizing false results.
Video   Slides   R source code

R-Squared and Adjusted R-Squared

When building predictive models for healthcare data, it is crucial to understand how well the model represents the data without overfitting or underfitting. We introduce the concepts of R-Squared and Adjusted R-Squared, and discuss bias-variance tradeoff here.
Video  Slides   Python source code   R source code
Support Home

Note: All provided data is synthetic - no real patient data is available on this page.