John Ramey Statistics and Machine Learning

Installing Python Data Science Stack on Yosemite

I was attempting to install the Python data-science stack within a fresh virtual environment on my Mac with OS X 10.10.1 (Yosemite) but encountered various frustrating errors. I logged my steps below that eventually yielded a successful installation.

MLB Rankings Using the Bradley-Terry Model

Today, I take my first shots at ranking Major League Baseball (MLB) teams. I see my efforts at prediction and ranking an ongoing process so that my models improve, the data I incorporate are more meaningful, and ultimately my predictions are largely accurate. For the first attempt, let’s rank MLB teams using the Bradley-Terry (BT) model.

A Brief Look at Mixture Discriminant Analysis

Lately, I have been working with finite mixture models for my postdoctoral work on data-driven automated gating. Given that I had barely scratched the surface with mixture models in the classroom, I am becoming increasingly comfortable with them. With this in mind, I wanted to explore their application to classification because there are times when a single class is clearly made up of multiple subclasses that are not necessarily adjacent.

High-Dimensional Microarray Data Sets in R for Machine Learning

Much of my research in machine learning is aimed at small-sample, high-dimensional bioinformatics data sets. For instance, here is a paper of mine on the topic.

How to Download Kaggle Data with Python and

Recently I started playing with Kaggle. I quickly became frustrated that in order to download their data I had to use their website. I prefer instead the option to download the data programmatically. After some Googling, the best recommendation I found was to use lynx. My friend Anthony recommended that alternatively I should write a Python script.