1Professor,
High-performance computing (HPC) has become more essential in bioinformatics data processing as a result of new computational difficulties. Work is usually distributed over a cluster of computers that connect to a shared file system housed on a storage area network. The Message Passing Interface (MPI) and, more recently, Hadoop's MapReduce API have been used to achieve work parallelization. Cloud computing is another computer architecture/service model that is currently being investigated. In a nutshell, cloud computing is HPC with a web interface plus the flexibility to scale up and down quickly for on-demand usage. Remote clients upload potentially large data sets for analysis in the Hadoop framework or other parallelized environments running in the data center, with the server side deployed in data centers working on clusters. The present use of Hadoop, a toplevel Apache Software Foundation project, and related open source software projects in the bioinformatics field is discussed. The principles underlying Hadoop and the HBase project are explained, as well as the existing bioinformatics software that uses Hadoop. The emphasis is on next-generation sequencing, which is now the most popular application area.
API, Hadoop, HBase, Map Reduce, Pig