John Ramey Statistics and Machine Learning


  • itertools2 - This R package is a port of Python’s excellent itertools module to R for efficient looping and is a replacement for the existing itertools R package.

  • noncensus - This R package provides a collection of various regional information determined by the U.S. Census Bureau along with demographic data.

  • clusteval - An R package that provides a suite of tools to evaluate clustering algorithms, clusterings, and individual clusters.

  • sparsediscrim - An R package that is a collection of sparse and regularized classification models intended for small-sample, high-dimensional data sets. The models include the well-known Regularized Discriminant Analysis classifier and generalizations of the Naive Bayes classifier.

  • pocketknife - A collection of useful utility functions in R.

  • activelearning - An R package that implements active learning, a machine learning paradigm for optimally choosing unlabeled observations in a training data set to query for their true labels.

  • datamicroarray - An R package that provides a collection of scripts to download, process, and load small-sample, high-dimensional microarray data sets to assess machine learning algorithms and models. For each data set, we include a small set of scripts that automatically download, clean, and save the data set. Additionally, we include thorough descriptions and additional information about each microarray data set in the package’s wiki. The majority of the microarary data sets included in the package are cancer-related.

  • sortinghat - An R package that provides a variety of error-rate estimation methods for supervised classification. To assess classification performance, I have provided several widely known estimators, including random split / Monte-Carlo cross-validation, cross-validation, bootstrap, .632, .632+, apparent, and bolstering/smoothed error rates.