Machine learning Archives - Ramhise Blog on statistics and machine learning Mon, 15 Apr 2024 08:14:46 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 https://ramhiser.com/wp-content/uploads/2024/04/cropped-graph-7128343_640-32x32.png Machine learning Archives - Ramhise 32 32 High-Dimensional Microarray Data Sets in R for Machine Learning https://ramhiser.com/blog/2012/12/29/high-dimensional-microarray-data-sets-in-r-for-machine-learning/ Sat, 09 Dec 2023 08:06:00 +0000 https://ramhiser.com/?p=53 In my pursuit of machine learning research, I often delve into small-sample, high-dimensional bioinformatics datasets.

The post High-Dimensional Microarray Data Sets in R for Machine Learning appeared first on Ramhise.

]]>
In my pursuit of machine learning research, I often delve into small-sample, high-dimensional bioinformatics datasets. A significant portion of my work focuses on exploring new methodologies tailored to these datasets. For example, I’ve published a paper discussing this very topic.

Many studies in the field of machine learning rely heavily on two prominent datasets: the Alon colon cancer dataset and the Golub leukemia dataset. Despite their popularity, both datasets were introduced in papers published back in 1999. This indicates a potential mismatch between existing methodologies and the advancements in data collection technology. Moreover, the Golub dataset, while widely used, isn’t ideal as a benchmark due to its well-separated nature, leading to nearly perfect classification by most methods.

To address this gap, I embarked on a mission to discover alternative datasets that could serve as valuable resources for researchers like myself. What initially started as a small-scale project quickly evolved into something more substantial. As a result, I’ve curated a collection of datasets and packaged them conveniently for easy access and analysis. This effort culminated in the creation of the datamicroarray package, which is now available on my GitHub account.

Each dataset included in the package comes with a script for downloading, cleaning, and storing the data as a named list. For detailed instructions on data storage and usage, refer to the README file provided with the package. Currently, the datamicroarray package comprises 20 datasets specifically tailored for assessing machine learning algorithms and models in the context of small-sample, high-dimensional data.

Additionally, I’ve supplemented the package with a comprehensive wiki hosted on the GitHub repository. This wiki serves as a valuable resource, offering detailed descriptions of each dataset along with additional information, including links to the original papers for reference.

One challenge I’ve encountered is the large file size of the R package, primarily due to storing an RData file for each dataset. To mitigate this issue, I’m actively exploring alternative approaches for dynamically downloading data. I welcome any suggestions or contributions from the community in this regard. Additionally, I must acknowledge that some data descriptions within the package are incomplete, and I would greatly appreciate assistance in enhancing them.

Researchers are encouraged to leverage any of the datasets provided in the datamicroarray package for their work. However, it’s essential to ensure proper data processing before conducting analysis and incorporating the results into research endeavors.

The post High-Dimensional Microarray Data Sets in R for Machine Learning appeared first on Ramhise.

]]>
Installing TensorFlow on an AWS EC2 Instance with GPU Support https://ramhiser.com/2016/01/05/installing-tensorflow-on-an-aws-ec2-instance-with-gpu-support/ Sun, 05 Nov 2023 07:58:00 +0000 https://ramhiser.com/?p=47 Here's a guide on installing TensorFlow 0.6 on an Amazon EC2 Instance with GPU Support. Additionally

The post Installing TensorFlow on an AWS EC2 Instance with GPU Support appeared first on Ramhise.

]]>
Here’s a guide on installing TensorFlow 0.6 on an Amazon EC2 Instance with GPU Support. Additionally, a Public AMI (ami-e191b38b) is provided with the configured setup for convenience.

Note: Updated on Jan 28, 2016, to reflect the requirement of Bazel 0.1.4 and to export environment variables in ~/.bashrc.

The installation includes:

  • Essentials
  • Cuda Toolkit 7.0
  • cuDNN Toolkit 6.5
  • Bazel 0.1.4 (requires Java 8)
  • TensorFlow 0.6

To begin, it’s recommended to request a spot instance to save costs. Launch a g2.2xlarge instance using the Ubuntu Server 14.04 LTS AMI.

After instance launch, install essentials:

bash
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev
sudo pip install -U pip

Next, install CUDA Toolkit 7.0:

bash
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/x86_64/cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1410_7.0-28_amd64.deb
rm cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

Download and install cuDNN Toolkit 6.5:

bash
tar -zxf cudnn-6.5-linux-x64-v2.tgz && rm cudnn-6.5-linux-x64-v2.tgz
sudo cp -R cudnn-6.5-linux-x64-v2/lib* /usr/local/cuda/lib64/
sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda/include/

Reboot the instance:

bash
sudo reboot

Set environment variables:

bash
echo "export CUDA_HOME=/usr/local/cuda" >> ~/.bashrc
echo "export CUDA_ROOT=/usr/local/cuda" >> ~/.bashrc
echo "export PATH=$PATH:$CUDA_ROOT/bin" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64" >> ~/.bashrc
source ~/.bashrc

Install Java 8 and Bazel 0.1.4:

bash
sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
sudo apt-get install -y oracle-java8-installer
sudo apt-get install pkg-config zip g++ zlib1g-dev
wget https://github.com/bazelbuild/bazel/releases/download/0.1.4/bazel-0.1.4-installer-linux-x86_64.sh
chmod +x bazel-0.1.4-installer-linux-x86_64.sh
./bazel-0.1.4-installer-linux-x86_64.sh --user
rm bazel-0.1.4-installer-linux-x86_64.sh

Clone TensorFlow repo:

bash
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow

Build TensorFlow with GPU support:

bash
TF_UNOFFICIAL_SETTING=1 ./configure

During configuration, choose CUDA version 3.0. Then, build TensorFlow:

bash
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-0.6.0-cp27-none-linux_x86_64.whl

Congratulations! TensorFlow is now installed with GPU support. Test the installation by running Python code that utilizes TensorFlow. You should see GPU-related messages indicating successful setup.

This guide is a compilation of instructions from various sources, with credits and thanks to the original contributors. For more information and options, refer to TensorFlow’s official installation instructions.

The post Installing TensorFlow on an AWS EC2 Instance with GPU Support appeared first on Ramhise.

]]>
Serverless API around Google Cloud Vision with the Serverless Framework https://ramhiser.com/2016/09/05/serverless-api-around-google-cloud-vision-with-the-serverless-framework/ Thu, 02 Nov 2023 08:01:00 +0000 https://ramhiser.com/?p=50 Recently, the Serverless Framework reached version 1.0 (beta) after about a year of development.

The post Serverless API around Google Cloud Vision with the Serverless Framework appeared first on Ramhise.

]]>
Recently, the Serverless Framework reached version 1.0 (beta) after about a year of development. This framework has rapidly matured, offering developers a way to build scalable applications without the hassle of managing servers. It simplifies deployment through:

  • An easy-to-use CLI tool
  • Customizable deployment via config files
  • Automation of tedious tasks
  • Extensibility through plugins

Although the Serverless Framework doesn’t yet support Google Cloud Functions, it’s designed to work with various event-driven compute services, including AWS Lambda and, eventually, Google Cloud Functions. If you’re unfamiliar with serverless computing, I recommend starting with Martin Fowler’s overview.

So, why use a framework instead of cobbling together a bunch of bash scripts? The answer is simple: the Serverless Framework handles AWS IAM Roles, streamlining the deployment process. Additionally, as we’ll see below, it simplifies the inclusion of Python dependencies along with Lambda functions.

I’ve been keen on building a serverless app and, combining that goal with the desire to make Google Cloud Vision more convenient to use, I developed a serverless API wrapper around Google Cloud Vision using AWS API Gateway and AWS Lambda. Despite concerns about integrating services from both Amazon and Google, the Serverless Framework ensured a seamless experience. While I focused on AWS Lambda for this project, I may explore Google’s offering once it matures.

What Does the App Do?

In a nutshell, I created a microservice via API Gateway that accepts an image URL and triggers a Lambda function. This function ingests the image from the URL and sends it to Google Cloud Vision for standard image recognition tasks (e.g., facial detection, OCR, etc.). It returns a JSON response, allowing me to generate a new image with bounding boxes around the detected faces (in this case, my son and me).

Beyond facial detection, Google Cloud Vision supports various image recognition tasks, including LABEL_DETECTION, TEXT_DETECTION, SAFE_SEARCH_DETECTION, FACE_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, and IMAGE_PROPERTIES.

How to Get Started?

Now, let’s walk through setting up and deploying the project in your own cloud environment.

Google Cloud Vision API

First, let’s set up the Google Cloud Vision API. To access the Cloud Vision API, you’ll need a Google Cloud Platform account. Luckily, Google offers a free 60-day trial with $300 credit.

After creating a Service Account Key, download the JSON file with your application credentials and rename it as cloudvision/google-application-credentials.json.

AWS

We’re mixing cloud providers here, which might not sit well with everyone. However, while AWS lacks a polished API for image recognition, its cloud offerings are robust.

First, create an AWS account. Quick disclaimer: AWS isn’t free, but for our purposes, AWS Lambda is quite cost-effective.

Next, create a default AWS profile on your local machine. Install the aws-cli and run:

bash
aws configure

For more details, Serverless provides an AWS overview along with a video walkthrough on YouTube.

Serverless Framework to Deploy the App on AWS

Once your AWS account is set up, ensure you have Node.js 4.0+ installed and install the Serverless Framework:

bash
npm install serverless -g

Clone the app repository:

bash
git clone git@github.com:ramhiser/serverless-cloud-vision.git
cd serverless-cloud-vision

One of the highlights of the Serverless Framework is its ability to install Python dependencies locally, ensuring they’re deployed along with the app. To achieve this, install the Python dependencies specified in requirements.txt to the cloudvision/vendored folder:

bash
pip install -t cloudvision/vendored/ -r requirements.txt

After installing the Python requirements, deploy the app to AWS:

bash
serverless deploy

This command creates IAM roles for Lambda and API Gateway (only done once), zips Python code and uploads it to S3, creates the Lambda function, and sets up the API Gateway endpoint that triggers the Lambda function.

Upon successful deployment, Serverless provides useful information, including the API endpoint you’ll need to use your microservice. For example:

plaintext
Service Information
service: cloudvision
stage: dev
region: us-east-1
endpoints:
  POST - https://some-api-gateway.execute-api.us-east-1.amazonaws.com/dev/detect_image
functions:
  lambda-cloudvision: arn:aws:lambda:us-east-1:1234567890:function:lambda-cloudvision

The endpoint provided by API Gateway is automatically generated by AWS and will differ in your implementation.

Now you have a simple API for basic image recognition tasks. For instance, you can send an image URL of my son and me to the API:

bash
curl -

The post Serverless API around Google Cloud Vision with the Serverless Framework appeared first on Ramhise.

]]>