01 Feb 2015
Apache Spark is a great way for performing
large-scale data processing. Lately, I have begun working with
way of interfacing with Spark through Python. After a discussion with a
coworker, we were curious whether PySpark could run from within an IPython
Notebook. It turns out that this is fairly
straightforward by setting up an IPython profile.
24 Jan 2015
I was attempting to install the Python data-science stack within a fresh virtual
environment on my Mac with OS X 10.10.1 (Yosemite) but encountered various
frustrating errors. I logged my steps below that eventually yielded a successful
31 Aug 2013
Today, I take my first shots at ranking Major League Baseball (MLB) teams. I see
my efforts at prediction and ranking an ongoing process so that my models
improve, the data I incorporate are more meaningful, and ultimately my
predictions are largely accurate. For the first attempt, let’s rank MLB teams
using the Bradley-Terry (BT) model.
02 Jul 2013
Lately, I have been working with finite mixture models for my postdoctoral work
on data-driven automated gating.
Given that I had barely scratched the surface with mixture models in the
classroom, I am becoming increasingly comfortable with them. With this in mind,
I wanted to explore their application to classification because there are times
when a single class is clearly made up of multiple subclasses that are not
29 Dec 2012
Much of my research in machine learning is aimed at small-sample, high-dimensional bioinformatics data sets. For instance, here is a paper of mine on the topic.