Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Transcript

What?

System Environment

System Libraries

Bioinformatics Applications

What is OpenStack and private cloud computing?

xvfb

App Runtime Environment

virtualenv

Poorly Defined Boundaries

Cron Jobs

Galaxy App

Init Scripts

Proxy Webserver

API

Tool Wrappers

Workflow Engine

Job Scheduler

User Interface

Log Management

Display Application

Bio Data

A Different Paradigm

Common Scenario

Two people or teams need

to be intimately familiar with

Galaxy and must frequently

communicate.

Application VMs (Web and Compute)

(Infrastructure Engineer?)

(Application Engineer?)

Admin

Developer

Replace e-mail with API calls,

documentation with scripts.

`

DB Server

File Server

Repository

Cloud Infrastructure

(OpenStack)

`

App Server

Repository

DB Server

File Server

Galaxy is not the code in the repository, it is the whole stack on the application server.

(at least)

Private Cloud Computing is Coming

&

Reuse

Redistribute

Opportunity to reduce workload by building Galaxy using common community template.

Businesses large and small are flocking to Amazon et. al. because they are cheap.

There is no GalaxyAdmin walk through - each

large Galaxy installation represents numerous innovations.

Big And Small

CloudBioLinux

System Environment

https://github.com/chapmanb/cloudbiolinux

"A fully automated infrastructure installs software and data, with packages specified in simple configuration files."

Do not wasting effort manually installing software, automate it.

App Runtime Environment

Why?

Building more of the stack on community templates, provides a way to share more innovations back with community.

CloudMan

"CloudMan is a cloud manager that orchestrates all of the steps required to provision a complete compute cluster environment on a cloud infrastructure; subsequently, it allows one to manage the cluster, all through a web browser. "

Reasons research institutions might not immediately switch to Amazon

However...

Galaxy App

The rest of this presentation describes contributions I have made along these lines.

  • Storage Costs
  • High Utilization
  • Data Access Polices

Saving money however is not the only reason to employ cloud computing, as I will argue for the specific case of Galaxy - cloud computing can also help manage complexity.

Enter OpenStack

Deploy open source cloud infrastructure on your own hardware.

Galaxy

OpenStack

Python

Open Source

Vibrant Community

Why deploy Galaxy in a (private) cloud?

Production Instances

http://bit.ly/prodcloudman

Creating Cloud Images

  • Launch vanilla Ubuntu instance
  • Install CloudMan via CloudBioLinux
  • Create image from configured instance
  • Launch new instance from this image
  • CloudMan runs, configures Galaxy

User-Data

http://wiki.galaxyproject.org/CloudMan/UserData

A block of YAML text used to

configure VM at launch time.

CloudMan uses to configure virtual machine - Galaxy, nginx, NFS, arbitrary other files.

Configure with CloudMan

Full details @ ReadTheDocs

Configure CloudBioLinux

How?

External Authentication

Splitting Galaxy into Multiple Processes

...i.e. enable Load Balancing

galaxy_conf_dir: /mnt/galaxyTools/galaxy-central/conf.d

galaxy_universe_use_remote_user: True

galaxy_universe_remote_user_maildomain:<domain_name>

galaxy_universe_remote_user_logout_href: \

https://logout@<galaxy_url>/

galaxy_universe_require_login: True

configure_multiple_galaxy_processes: True

web_thread_count: 2

handler_thread_count: 2

galaxy_conf_dir: /mnt/galaxyTools/galaxy-central/conf.d

User-Data

user-data

CloudBioLinux must compile nginx with LDAP

nginx.conf

fabricrc

nginx_enable_module_ldap = true

galaxy_conf_dir

https://bitbucket.org/galaxy/galaxy-central/pull-request/44/

Very useful in non-cloud contexts as well. Allows universe_wsgi.ini to be split into a directory of

files (ala /etc/sudoers.d or /etc/apache/conf.d).

location / {

...

location /admin/jobs {

proxy_pass http://localhost:8079;

}

}

Benefits

Setup nginx.conf

Allow some properties set in repository others in runtime environment.

Easier for configuration management tools such as Puppet or Chef to work with.

Separate development/production properties and/or developer/admin properties.

  • Specify LDAP connection
  • Modify root location to require authorization
  • Create /api location where auth is optional.

Hack to fix admin panel.

http://bit.ly/prodcloudman-auth

SSL

Galaxy Reports Application

conf_files:

- path: /usr/nginx/conf/key

content: <base64 encoding of key>

- path: /usr/nginx/conf/cert

content: <base64 encoding of cert>

Powerful tool provides a wealth of valuable data on every job that Galaxy has run as well as disk usage accounting, etc....

user-data

Implemented CloudMan "service" for this...

Configure arbitrary config files on VM

nginx.conf

server {

listen 80;

server_name galaxyp.msi.umn.edu;

rewrite ^ https://$server_name$request_uri? permanent;

}

server {

listen 443 default_server ssl;

ssl_certificate /usr/nginx/conf/cert;

ssl_certificate_key /usr/nginx/conf/key;

....

}

services:

- name: Galaxy

- name: GalaxyReports

- name: Postgres

user-data

  • Load balancing
  • External Authentication
  • SSL
  • Advanced Reporting
  • Utilize to External Resources
  • Databases
  • File Servers
  • Compute

Mounting External File Systems

External Database Server

no postgres

master_prestart_commands:

- "mkdir -p /mnt/galaxyData"

- "mount -t nfs4 -o sec=sys spider.msi.umn.edu:/export/galaxyp /mnt/galaxyData/"

- "mkdir -p /project/db"

- "mount -t nfs4 -o ro buzzard.msi.umn.edu:/zprod2/misc/db /project/db/"

worker_prestart_commands:

- "mkdir -p /mnt/galaxyData"

- "mount -t nfs4 -o sec=sys spider.msi.umn.edu:/export/galaxyp /mnt/galaxyData/"

- "mkdir -p /project/db"

- "mount -t nfs4 -o ro buzzard.msi.umn.edu:/zprod2/misc/db /project/db/"

services:

- name: Galaxy

- name: GalaxyReports

galaxy_universe_database_connection: \

postgres://user:password@host:port/schema

user-data

Run arbitrary commands on master and worker nodes at startup...

Accessing External Compute Resources

... via the LWR

https://lwr.readthedocs.org/

How to build production Galaxy instances for the cloud.

https://bitbucket.org/jmchilton/lwr/

Run normal Galaxy jobs from normal tools on a remote server without requiring shared file systems.

Run jobs on *nix or Windows.

galaxy_tool_runner_proteinpilot: "lwr://https://<secretkey>@remotehost:8913/"

user-data

Any tool id

Any job runner URL, not just LWR urls

Ephemeral cloud VM submitting jobs to a dedicated compute resources... a little backwards but it works.

Deploying Production Galaxy

Instances with CloudBioLinux

and CloudMan

Thanks...

My Ongoing Work

Finer grain grid engine configuration

Related Work

Other Large CloudMan Deployments

Future Work

CloudBioLinux + Galaxy without CloudMan (formerly galaxy-vm-launcher)

CloudBioLinux Deployer

NBIC

http://galaxy.nbic.nl/

Pull Request Acceptors

http://www.nbic.nl/about-nbic/news-press/bioinformatics-news/detail/article/galaxy-on-cloud/

https://github.com/jmchilton/cloudbiolinux/tree/deploy

Enis Afgan, Ph.D.

https://bitbucket.org/galaxy/cloudman

Monitoring

Brad Chapman, Ph.D.

https://github.com/chapmanb/cloudbiolinux

How does one do nagios in the Cloud?

Nate Coraor and Dannon Baker

CloudBioLinux + Chef

https://bitbucket.org/galaxy/galaxy-central

"production"-y innovations by others

loggly.com Integration

https://github.com/jmchilton/cloudbiolinux/tree/chef

libcloud based deployment scripts

Minnesota Supercomputing Institute Galaxy and Cloud Teams

https://bitbucket.org/galaxy/cloudman/pull-request/23/added-optional-loggly-based-cloud-logging/diff

Proteomics, Proteomics, Proteomics...

Jim Johnson; Pratik Jagtap, Ph.D.; Daniel Debertin; Kevin Silverstein; Ph.D.; Anne-Françoise Lamblin, Ph.D.; Benjamin Lynch, Ph.D.

novnc for web accessible console access

https://bitbucket.org/galaxyp/galaxyp-central

CloudMan+CloudBioLinux Alternatives

https://github.com/jmchilton/cloudbiolinux/tree/proteomics

https://github.com/jmchilton/proteomics-wine-env

Principle Investigator

Configured with chef recipes, Globus online integration.

Globus Provision

Timothy Griffin, Ph.D.

Deploy Galaxy from git

http://www.globus.org/provision/

John Chilton

Minnesota Supercomputing Institute

http://www.cse.buffalo.edu/faculty/tkosar/datacloud2012/papers/datacloud2012_paper_6.pdf

Funding

Ability to package tool shed installs

CloudBioLinux + Galaxy without CloudMan (formerly galaxy-vm-launcher)

CloudBioLinux Deployer

https://github.com/jmchilton/cloudbiolinux/tree/deploy

Good for cloud & traditional staging/production environments.

This work was funded by Minnesota Partnership for Biotechnology and Medical Genomics and the National Science Foundation.

Learn more about creating dynamic, engaging presentations with Prezi