GENETAL: A New Technique for Feature Extraction from Large Set of Biological Sequences and Its Use in Classification

Ulavappa. B. Angadi; M. Venkatesulu

Year: 2010
Volume: 3
Issue: 1

GENETAL: A New Technique for Feature Extraction from Large Set of Biological Sequences and Its Use in Classification

Author:
Ulavappa. B. Angadi, M. Venkatesulu
Total Page Count: 13
Page Number: 49 to 61

Department of Computer Applications, Kalasalingam University, Krishnankoil, Srivilliputtur (via), Tamil Nadu, India, 626 190. E-mail: venkatesulu_m2000@yahoo.com

*Corresponding author E-mail: angadiub@gmail.com

Abstract

In bioinformatics, enormous biological data is being accumulated due to genome sequencing projects all over the globe. Compelling need to transform biological data into useful information and knowledge is become an important and challenging task to both computer scientists and biologists. One of the problems arising in the analysis of biological sequences is the discovery of similar motifs/features from set of sequences. Such motifs usually corresponds to residues conserved during evolution due to an important structural or functional rule. In this paper, we develop a new algorithm GENETAL based on genetic theory for discovery of motifs/features in biological sequences and text documents. Our algorithm is able to produce all motifs appearing at least a minimum number of sequences (user defined). It is very efficient compared to other existing algorithms for large set of data, with respect to space and time complexity. Also, we demonstrate clustering of DNA/Protein sequences and text document data using GENETAL as feature extraction algorithm with simple incremental clustering technique and Jaccard coefficient dissimilarity measures.

Keywords

Motifs discovery, Clustering, DNA/Protein sequences, Pattern recognition

GENETAL: A New Technique for Feature Extraction from Large Set of Biological Sequences and Its Use in Classification

Abstract

Keywords

Products

Company

Account

Support