BMMB 852: Applied Bioinformatics (Fall, 2016)
Lectures will appear below as they are presented. Homework are specified in each handout.
Course information, homework and project information, introduction to computing, setting up you computer,
basic unix command line usage, organizing your projects. Homework 1.
Data formats, data analysis concepts, basic Unix commands:
input and output streams, piping commands, processing a tabular file with UNIX tools. Homework 2.
What do words mean? Sequence Ontology and Gene Ontology, Gene Set Enrichment, homework 3.
Accessing data from scientific publications, the GenBank format,
automated download of data from NCBI, installing and using Entrez Direct. Homework 4.
FASTA and FASTQ formats, Phred quality scores, encodings. Homework 5.
Handling compressed file archives, sequencing concepts,
FASTQ quality control, the fastqc tool. Homework 6.
Sequencing concepts, sequencing depth and coverages,
more details on the fastq quality control plots. Homework 7.
The short read archive, downloading data for the ebola project. Homework 8.
Quality control of sequencing data. Trimming reads,
removing adapters. Homework 9.
Automating tasks, writing scripts, bash concepts, homework 10
The basics of alignments, global, local and semiglobal alignments,
scoring matrices, pairwise alignments, EMBOSS. Homework 11.
Lecture 12: Sequence patterns
Advanced pattern matching. K-mers. Catching up with materials presented in
previous lectures: More on alignments, quality control and sequence adapters.
Lecture 13: Basic Local Alignment Search Tool, BLAST
Installing and Using BLAST, search strategies,
BLAST settings and configuration. homework 13
Lecture 14: BLAST Databases
Using Blast. Interacting with blast databases, homework 14
Lecture 15: Short Read Aligners
Short read alignments: bwa, bowtie
Lecture 16: SAM Format
Sequence Alignment Maps: The SAM format
Lecture 17: Working with SAM/BAM files.
Working with SAM/BAM files, samtools
Lecture 18: Analyzing SAM files.
Analyzing with SAM/BAM files, samtools
Lecture 19: Some programming required
Programming skills, writing simple scripts with AWK.
Lecture 20: Data Visualization
Genomic data visualization, IGV, IGB.
Lecture 21: Visualizing Genomic Variation
Visualize large scale genomic reorganization.
Get your pen and paper ready. There will be drawing involved...
Lecture 22: Genomic Variation
Genomic variation, pileups, definition of SNPs, SNVs and other
Lecture 23: The Variant Call Format
What is the variant call format, understand the fields and their
Lecture 24: Variant Calling in Practice
What makes variant calling difficult.
Lecture 25: Multi sample variant calling
Multi sample variant calling. Variant effect prediction.
Lecture 26: Introduction to RNA-Seq
Introduction to RNA-Seq analysis.
Lecture 27: Differential Expression with RNA-Seq
Differential expression with RNA-Seq data.
Lecture 27: Differential Expression with RNA-Seq
How to perform RNA-Seq data analysis.
Instructor: Istvan Albert
Course records: PSU ELion
Course registration: BMMB 852 - Applied Bioinformatics
The purpose of this course is to introduce students to the
various applications of high-throughput sequencing including: chip-Seq,
RNA-Seq, SNP calling, metagenomics, de-novo assembly and others.
The course material will concentrate on presenting complete data analysis scenarios
for each of these domains of applications and will introduce students to a wide
variety of existing tools and techniques. We expect that by the end of the
course work students will:
- understand common bioinformatics data formats and standards
- become familiar with the practice of analyzing short-read sequencing data from various instruments:
- Illumina HiSeq/MiSeq sequencers, PacBio* sequencer, MinION platforms
- develop a computationally oriented thinking that is necessary to take on large-scale data analysis projects
- understand data analysis principles of methodologies such as:
- short read and long read alignments
- Chip-Seq analysis and peak calling
- interval query and manipulation
- SNP calling and genomic variation detection
- genome assembly with open source tools
- metagenomics analysis
- filter, extract and combine data with scripting languages
- automate tasks with shell scripts to create reusable data pipelines
- plot and visualize results with R and other packages
Access to a Mac or Linux computer is necessary to perform the homework.
Only Mac OSX (Tiger/Leopard) and Linux operating systems are supported.
Note: Computers using the Windows operating systems must install Linux
(unfortunately due to the wide variety of Windows hardware we are unable
to assist with this task).
Grading and Homework
This course will have a total of 30 homeworks that are given out at the end of each lecture
and are due by the first lecture (Tuesday) each week.
The final grade will be an average of the grades obtained on the homework.
For more details please refer to the information presented during the first lecture.
We want to emphasize that the primary goal of this course work is to improve
students ability to handle and interpret data sets. Therefore the evaluation process
is relative to the initial aptitudes. We aim to focus on developing
permanent skills and talents that are not just immediately useful but
also provide the foundation for further more in depth understanding of
informatics in general.
All Penn State Policies regarding ethics and honorable
behavior apply to this course.