Learn various methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from high content molecular and phenotype profiling of human cells.
The Library of Integrative Network-based Cellular Signatures
(LINCS) is an NIH Common Fund project
that was recently expanded to its second phase. The idea is to perturb different types of human cells with many different types of perturbations such as: drugs and other small molecules; genetic manipulations such as knockdown or overexpression of genes; manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more. These perturbations are applied to various types of human cells including induced pluripotent stem cells from patients, differentiated into various lineages such as neurons or cardiomyocytes. Then, to better understand the molecular networks that are affected by these perturbations, changes in level of many different variables are measured including: mRNAs, proteins, and metabolites, as well as cellular phenotypic changes such as changes in cell morphology. In most cases, this life sciences data is collected at the genome-wide scale across different regulatory layers.
The BD2K-LINCS Data Coordination and Integration Center
(DCIC) is commissioned to organize, analyze, visualize and integrate LINCS data
with other publicly available relevant resources. In this course we will introduce the various Centers that collect data for LINCS, describing the experimental data procedures and the various data types. We will then cover the design and collection of metadata and how metadata is linked to ontologies. We will then cover basic data processing and data normalization methods to clean and harmonize LINCS data. This will follow a discussion about how the data is served as RESTful APIs and JSON, and for this we will cover concepts from client-server computing. Most importantly, the course will focus on various bioinformatics methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from molecular biomedicine. The course will be taught by members of the Ma'ayan Lab
at the Icahn School of Medicine Mount Sinai, Medvedovic Lab
at the University of Cincinnati, Schurer Lab
at the University of Miami, and other members of the BD2K-LINCS DCIC Team
as well as members of other BD2K
NIH funded centers.
- Overview of the NIH Common Fund LINCS Program
- Overview of the BD2K-LINCS Data Coordination and Integration Center
- Overview of the Data and Signature Generation Centers (experiments and data)
- Metadata and Ontologies
- Data Normalization
- Unsupervised Learning Methods: Data Clustering
- Supervised Learning Methods
- Enrichment Analyses
- Network Analysis and Network Visualization
- Serving data through RESTful APIs and JSON
- Interactive Data Visualization of LINCS Data
- Crowdsourcing Projects
Basic courses in statistics and molecular biology are useful but not required. Ability to write short scripts in languages such as Python would be useful but not necessary.
Review articles and selected original research articles will be discussed in the lectures and can enhance understanding, but these are not required to complete the course. All materials will be from open access journals or will be provided as links to e-reprints, so there will be no cost to the student.
The class will consist of video lectures, which are between 8 and 12 minutes in length. The course will be divided into segments where each segment will have a quiz and a homework assignment. For evaluation, students will be graded through their participation in the assignments and quiz completion.
Will I get a Statement of Accomplishment after completing this class?
Yes. Students who successfully complete the course will receive a Statement of Accomplishment signed by the Course Director.What are the pre-requisites for the class?
The course is designed to accommodate students from diverse backgrounds. Specifically, background in molecular biology, statistics, and computer science is most helpful, but such background is not assumed or required.How difficult is the class?
The class can be easy if the student is only concerned with playing a relatively passive role. However, students are encouraged to engage in the course and take initiative and exercise their creativity. This may require more time and effort but would be more fun and rewarding.