International Journal of Bioinformatics and Biological Science
  • Year: 2019
  • Volume: 7
  • Issue: 1and2

The Closed Sequence Patterns for DNA Data without Candidate Generation

1Assistant Professor, Department of Computer Science and Applications, Sri Krishna Arts and Science College, Coimbatore-8, Tamil Nadu, India

2Assistant Professor, PG & Research Department of Computer Science, Government Arts College, Coimbatore-18, Tamil Nadu, India

*Corresponding author: shivamjawahar@gmail.com

Online published on 11 July, 2020.

Abstract

Sequential pattern mining is a technique which efficiently determines the frequent patterns from small datasets. The traditional sequential pattern mining algorithms can mine short-term sequences efficiently, but mining long sequence patterns are in efficient for these algorithms. The traditional mining algorithms use candidate generation method which leads to more search space and greater running time. The biological DNA sequences have long sequences with small alphabets. These biological data can be mined for finding the co-occurring biological sequence. These co-occurring sequences are important for biological data analysis and data mining. Closed sequential pattern mining is used for mining long sequences. The mined patterns have less number of closed sequences. This paper proposes an efficient Closed Sequential Pattern Mining (CSPAM) algorithm for efficiently mining closed sequential patterns. The CSPAM algorithm mines closed patterns without candidate generation. This algorithm uses two pruning methods namely, BackScan pruning, and frequent occurrence check methods. The former method prunes the search space and latter detects the closed sequential pattern in early run time. The proposed algorithm is compared with PrefixSpan algorithm, better scalability and interpretability is achieved for proposed algorithm. The experimental results are based on sample DNA datasets which outperform the other algorithms in efficiency, memory and running time.

Keywords

Sequential pattern mining(SPM), DNA, Closed sequential patterns, Prefixspan, CSPAM