BioSLAX
Encyclopedia
BioSLAX is a Live CD
Live CD
A live CD, live DVD, or live disc is a CD or DVD containing a bootable computer operating system. Live CDs are unique in that they have the ability to run a complete, modern operating system on a computer lacking mutable secondary storage, such as a hard disk drive...

/Live DVD/Live USB
Live USB
A live USB is a USB flash drive or a USB external hard disk drive containing a full operating system that can be booted. Live USBs are closely related to live CDs, but sometimes have the ability to persistently save settings and permanently install software packages back onto the USB device...

 comprising a suite of more than 300 bioinformatics tools and application suites. It has been released by the Bioinformatics Resource Unit of the Life Sciences Institute (LSI), National University of Singapore
National University of Singapore
The National University of Singapore is Singapore's oldest university. It is the largest university in the country in terms of student enrollment and curriculum offered....

 (NUS) and is bootable from any PC that allows a CD/DVD or USB boot option and runs the compressed Slackware
Slackware
Slackware is a free and open source Linux-based operating system. It was one of the earliest operating systems to be built on top of the Linux kernel and is the oldest currently being maintained. Slackware was created by Patrick Volkerding of Slackware Linux, Inc. in 1993...

 flavour of the Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 Operating System (OS), also known as Slax
SLAX
Slax is a LiveCD Linux distribution based on Slackware and is currently being developed by Tomáš Matějíček. Packages can be selected in a website where users can build a custom Slax iso image. Slax slogan refers to the software as a "Pocket Operating System"...

. Slax was created by Tomáš Matějíček in the Czech republic using the Linux Live Scripts which he also developed. The BioSLAX derivative was created by Mark De Silva, Lim Kuan Siong and Tan Tin Wee.

BioSLAX was first released to the NUS Life Science Curriculum in April 2006.

History

In January 2003, APBioNet received a research grant from the Pan Asia Networking (PAN) Programme of IDRC (Canada) to build an APBioBox of commonly used bioinformatics applications and packages with grid-computing software as part of its effort to build an APBioGrid. The platform chosen was the then ubiquitous Redhat Linux. In March of that same year, APBioNet launched an industry partnership scheme (AIPS) and partnered with Sun Microsystems to build BioBox for the Solaris platform. Six months later, beta versions of APBioBox and Sun's biobox, now called Bio-Cluster Grid were released for beta testing among selected parties. The packages included Globus Grid Toolkit Version 2.0 and Sun Grid Engine respectively.

On 4 December 2003, the biobox software packages now named APBioBox (Redhat Linux) and BioCluster Grid (Sun Solaris) were field-tested at a Bioinformatics Workshop was conducted at the Advanced Science and Technology Institute (ASTI), Department of Science and Technology (DOST), Philippines on the occasion of the 70th Anniversary of the National Research Council of the Philippines (NRCP). Ten pentium machines and a couple of Sun servers were successfully inducted into the APBioGrid. This Workshop and the software tested were sponsored by Sun Microsystems and partially funded by IDRC.

In July 2004, Dr Derek Kiong introduced Knoppix
Knoppix
Knoppix, or KNOPPIX , is an operating system based on Debian designed to be run directly from a CD / DVD or a USB key , one of the first of its kind for any operating system. Knoppix was developed by Linux consultant Klaus Knopper. When starting a program, it is loaded from the removable medium...

 as a stable, powerful and small-footprint Unix (Debian-based) platform to A/Prof Tan Tin Wee in a workshop organised by the Institute of Systems Science (ISS), NUS. By September 2004, through Mr Ong Guan Sin, we were able to create a Knoppix remaster template by building software in APBioBox plus useful applications into a prototype, APBioKnoppix, as a project for the practical course of LSM2104 module of the Dept of Biochemistry, NUS. It was subsequently upgraded based on Knoppix 4.02 and released as APBioKnoppix2. While APBioKnoppix was widely used, it was found that it was not easily expandable. All applications had to be inplace prior to remastering and this made the distribution highly inflexible.

In June 2005, Mr. Mark De Silva of the Bioinformatics Resource Unit of the Life Sciences Institute (LSI), suggested using Slax as a base for a new bio based live CD due to its modular system, which effectively allowed for the same base system to be used and various tools or changes to be included on top of the base easily by adding single modules with all the application files or changes. This eliminated the need to remaster the entire system every time new software or changes emerged, which was the case for Knoppix.

By April 2006, the first version of BioSLAX was released with several editions:
  • Standard User Edition (530 MBytes)
  • Developer Edition (700 MBytes)
  • Sever Edition (470 MBytes)


BioSLAX was subsequently used in the bioinformatics teaching module within NUS under the Life Science Curriculum as well as in several events that were organized under the umbrella of the Asia Pacific Bioinformatics Network (APBioNet). APBioNet is a regional affiliate of the International Society for Computational Biology (ISCB). Customized versions were built to cater for both NUS and APBioNet.

In August 2007, in collaboration with the APBioNet, a customized BioSLAX was used to set up the Bioinformatics Resource Node of Vietnam at Bio-IBT, the Bioinformatics Resource Server of the Institute of Biotechnology, Vietnam Academy of Science and Technology, Hanoi, Viet Nam. The Bio-IBT node offered :
  • BioMirrors repository of biological databases
  • NCBI BLAST
    BLAST
    In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

     mirrored resource
  • Web access to EBI EMBOSS applications
  • Web access to CLUSTALW multiple sequence alignment
  • Web access to the T-Coffee multiple sequence alignment
  • Web access to the PHYLIP Phylogenetic Inference Package
  • Web access to the Sequence Manipulation Suite, SMS2


Users with SSH access to the server also had access to many more command line based bio/life science applications.

The entire project was done in collaboration with the 1st UNESCO-IUBMB-FAOBMB-APBioNet Bioinformatics Workshop in Vietnam, held 20th to 31st of August 2007, a satellite event of the 6th International Conference on Bioinformatics (InCoB) 2007 at HongKong, Hanoi and Nansha.

Some versions of BioSLAX deployed in international instituitions under APBioNet were fitted with a small tool which allowed them to map their IPs to a dynamically created apbionet.org domain name, hence giving each machine a fully qualified domain name (FQDN) and presence on the Internet.

Modularity

Because Slax worked by overlaying "application modules" on top of the base Linux OS, it made the entire distribution modular. The additional functionality of deploying these modules even while the system was already running, made using Slax even more appealing. The inclusion of the GUI based "BioSLAX Module Manager", made this process of dynamically adding and removing modules even easier.

Users were able to test updates to software or new versions and "rollback" to previous versions if they want. This was especially effective if SLAX/BioSLAX was installed to a writable medium such as a USB drive.

Versions

To date, there have been two versions of BioSLAX - BioSLAX 5.x based on Slax 5 and BioSLAX 7.x based on Slax 6. While BioSLAX 5.x followed the version numbers of Slax 5, BioSLAX 7 adopted a new version numbering which is one higher than the Slax version it is based on. Latest versions can be downloaded from the BioSLAX website.

BioSLAX 5.x

BioSLAX 5.x was largely based on the 5.1.8 version of Slax, running earlier versions of the 2.6 Linux kernel and KDE 3.4, with unionfs.
Standard User Edition

This edition runs the KDE
KDE
KDE is an international free software community producing an integrated set of cross-platform applications designed to run on Linux, FreeBSD, Microsoft Windows, Solaris and Mac OS X systems...

 X Window GUI and comes with all the tools and application suites, but does not include any compiler tools nor the Linux kernel source code and headers. This is mainly suited for users who only need to use the tools and applications suites. It has a very small size, making it easy to download and particularly convenient for regions where internet bandwidth is an issue.
Developer Edition

This edition runs the KDE X Window GUI and comes with all the tools and application suites and also includes a full set of development and compiler tools and also including the Linux kernel source code and headers. This is edition is more for the power user, who, in addition to using the various tools and applications, might want to also compile new applications or create new application modules for BioSLAX.
Sever Edition

This edition does not include any X Window GUI, compilation tools, Linux kernel source or kernel headers. It is primarily meant to be used as a remote server, where users have to either SSH
Secure Shell
Secure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...

 in to use the command line applications or connect to the server via the web to access the available web-based portals to popular bio applications.
NUS LSM Edition

This edition is the Developer Edition, customized for use by the NUS Life Science Curriculum for the teaching of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

.
Taverna Edition

This edition is the Developer Edition which includes TaveRNA. The TaveRNA Project aims to provide a language and software tools to facilitate easy use of workflow and distributed compute technology.

BioSLAX 7.x

BioSLAX 7.x is based on Slax 6 and features the later releases of the 2.6 Linux kernel, KDE 3.5 and using aufs and lzma compression. The biggest change is the use of this version as either client or server. The distribution was also moved from CD to DVD, allowing for more applications to be introduced, which were previously left out of version 5.x due to space considerations. The ability to boot from a FAT or EXT formatted USB drive was also introduced in Slax 6, hence BioSLAX 7.x versions also had this feature, effectively enabling persistent file handling which are unavailable on the CD/DVD as they are not (re-)writable.

BioSLAX 8

Versions of BioSLAX after 7.x have been delayed due to the base distribution's (Slax) developer, Tomáš Matějíček, refusing to move forward with a new version because of family commitments. However his primary reason for not moving forward was that he was waiting for Squash FS and LZMA
LZMA
The Lempel–Ziv–Markov chain algorithm is an algorithm used to perform data compression. It has been under development since 1998 and was first used in the 7z format of the 7-Zip archiver...

 to be integrated into the Linux kernel by default, instead of users needing to apply separate patches. As of kernel 2.6.38, the integration was finally done and this has prompted Tomáš Matějíček to look at a new version of Slax, which will therefor result in a new version of BioSLAX in the coming months. One can follow his thoughts on the new version of Slax on his blog.

Standard Tools

BioSLAX features the Linux Slackware 12.1 operating system with updated drivers for various network adapters including support for a large variety of wireless cards. It also has many useful basic tools and applications such as:
  • PERL
    Perl
    Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

     (including BioPerl
    BioPerl
    BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project....

     modules)
  • PHP
    PHP
    PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

  • Apache 2
  • MySQL
    MySQL
    MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...

  • OpenOffice.org
    OpenOffice.org
    OpenOffice.org, commonly known as OOo or OpenOffice, is an open-source application suite whose main components are for word processing, spreadsheets, presentations, graphics, and databases. OpenOffice is available for a number of different computer operating systems, is distributed as free software...

  • KPDF
    KPDF
    KPDF is a free PDF reader based on Xpdf. It is integrated with the KDE platform, so it embeds very well in Konqueror as KPart. Nevertheless, KPDF has been replaced in KDE SC 4 by Okular.- Features :Feature highlights:...

     Reader
  • Mozilla Firefox
    Mozilla Firefox
    Mozilla Firefox is a free and open source web browser descended from the Mozilla Application Suite and managed by Mozilla Corporation. , Firefox is the second most widely used browser, with approximately 25% of worldwide usage share of web browsers...

  • Mozilla Thunderbird
    Mozilla Thunderbird
    Mozilla Thunderbird is a free, open source, cross-platform e-mail and news client developed by the Mozilla Foundation. The project strategy is modeled after Mozilla Firefox, a project aimed at creating a web browser...

  • gFTP
    GFTP
    gFTP is a free/open source multithreaded FTP client. It is most used on Unix-like systems, such as Linux, Mac OS X and Sony PlayStation. It includes both a GUI and a command-line interface...

  • ProFTPd
  • Open SSH
  • Kopete
    Kopete
    Kopete is a multi-protocol, free software instant messaging client. Although it can run in numerous environments, it was designed for and integrates with the KDE desktop environment...

     Instant Messenger
  • VNC Viewer
  • Remote Desktop Services

BioInformatics Tools

The bioinformatics tools and applications are subdivided into three main categories.

Console Apps

  • BLAST
    BLAST
    In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

  • BlastCL3
  • BioGrep
  • ClustalW
  • EMBOSS
    EMBOSS
    EMBOSS is an acronym for European Molecular Biology Open Software Suite. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology and bioinformatics user community...

  • Genesplicer
  • GlimmerHMM
  • HMMER
    HMMER
    HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences...

  • Modeller
  • PamL
  • Phylip
  • Primer3
  • R programming language & Bioconductor
    Bioconductor
    Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology....

  • T-Coffee
    T-Coffee
    T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment...


Desktop Apps

  • ACT
  • Artemis
  • ClustalX (GUI Based ClustalW)
  • JAligner
    JAligner
    JAligner is an open source Java implementation of the Smith-Waterman algorithm with Gotoh's improvement for biological local pairwise sequence alignment using the affine gap penalty model. It was written by Ahmed Moustafa....

  • Jalview
    Jalview
    Jalview is a multiple sequence alignment editor and viewer written in the Java programming language. The programme was originally written by Michele Clamp whilst working in Geoff Barton's group at the EBI....

  • jEMBOSS (Java EMBOSS
    EMBOSS
    EMBOSS is an acronym for European Molecular Biology Open Software Suite. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology and bioinformatics user community...

     Suite)
  • Jmol
    Jmol
    Jmol is an open-source Java viewer for chemical structures in 3D,that does not require 3D acceleration plugins.Jmol returns a 3D representation of a molecule that may be used as a teaching tool, or for research e.g...

  • NJPlot
  • Pymol
    PyMOL
    PyMOL is an open-source, user-sponsored, molecular visualization system created by Warren Lyford DeLano and commercialized by DeLano Scientific LLC, which is a private software company dedicated to creating useful tools that become universally accessible to scientific and educational communities...

  • ReadSEQ
  • TreeView
  • Weka (machine learning)
    Weka (machine learning)
    Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...


Web Apps

  • Web BLAST
    BLAST
    In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

  • Web ClustalW
  • Web Phylip
  • Web T-Coffee
    T-Coffee
    T-Coffee is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment...

  • wEMBOSS (Web based EMBOSS
    EMBOSS
    EMBOSS is an acronym for European Molecular Biology Open Software Suite. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology and bioinformatics user community...

     suite)
  • Sequence Manipulation Suite (SMS)

Installing to Hard Disk

One of the more intriguing features of Slax based distributions, is how easy it is to convert the live OS into a full fledged Linux system installed on the hard drive of any PC, which will take up roughly 3.5GBytes of space.

A tool, written with the KDE Kommander toolkit called the "BioSLAX Installer" is provided for users to easily convert their live OS to a full Linux installation. By using modules to customize the distribution and then using the installer, users can do rapid deployment of fully installed customized clients.

BioSLAX Updates

BioSLAX will be updated as newer Slackware (or Slax) versions are released. The tools and applications suites will also be monitored for significant changes and upgraded as necessary. Some tools may be removed to make way for other tools which can do the same thing but with added functionality and better efficiency. More web based portals are being looked at, for example, portals to ReadSeq, Primer3 and Genesplicer are in the pipeline.

Grid Deployment

The developers were also looking at integrating various Grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

 platforms with BioSLAX. Because BioSLAX can be booted up immediately from any CD/DVD/USB, it can be used as a rapidly deployable Grid-enabled Operating System. One such Grid platfor was the Univa Grid platform. Using the Univa Grid MP
Grid MP
Grid MP is a commercial distributed computing software package developed and sold by Univa , a privately held company based primarily in Austin, Texas...

 agent, it was shown during GridAsia 2009 in a talk given by Tan Tin Wee, that the agent, once modularized on BioSLAX, can be used to Grid enable machines from any location as slave-nodes to a master-node located elsewhere, effectively creating an "global-wide grid".

BioSLAX on the CLOUD

In a proof-of-concept endeavour, the developers successfully deployed BioSLAX as instances on a pool of resources using both VMWare's
VMware
VMware, Inc. is a company providing virtualization software founded in 1998 and based in Palo Alto, California, USA. The company was acquired by EMC Corporation in 2004, and operates as a separate software subsidiary ....

 ESXi and Citrix Xen's Hypervisors. Their aim was to effectively create a "BioSLAX CLOUD" where students and staff may instantiate any number BioSLAX servers dynamically for research and education (conduct bioinformatics practical labs by having students connect to the servers via suitable X Window clients such as X-Win32
X-Win32
In computing, X-Win32 is a proprietary implementation of the X Window System for Microsoft Windows, produced by StarNet Communications. It is based on X11R7.4...

, VNC, Exceed and NoMachine NX) or deployed in such a manner which when used in conjunction with the UD Grid mpagent may be used to form a cluster for processing large jobs.

The proof-of-concept was highly successful in being deployed for research and education for the Life Science Curriculum at NUS and in 2011, a number of the BioSLAX cloud instances, both on VMWare's vSphere and Citrix Xen servers, were used in the APBioNet project, BioDB100. The backend controls and automation were created and implemented using the various APIs for vSphere and Xen by Mr. Mark De Silva.

Developers were also in talks with Amazon from 2009 to 2010 to deploy similar BioSLAX cloud images on Amazon's EC2, hoping to push some of their research and education machines over to Amazon, cutting costs on hardware. Discussions, however, fell through when it was clear that Amazon was not going to support full hardware virtualization which was required in order to run BioSLAX images on the cloud. Supporting only para-virtualizaion, in fact, is the stand of most commercial cloud providers using Citrix Xen hypervisors. Until the mind-set of these entities change, only private clouds running Citrix Xen hypervisors configured for full hardware virtualization or VMWare vSphere clouds will be the only clouds capable of running BioSLAX.

Screenshots




External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK