Journal of Animal Research
  • Year: 2024
  • Volume: 14
  • Issue: 2

Gene prediction in rumen metagenomic reads of cattle using machine learning based approach

  • Author:
  • Safeer M. Saifudeen1,*, K Anilkumar1, T.V. Aravindakshan1, Jamuna Valsalan2, K. Ally3, V.L. Gleeja4
  • Total Page Count: 6
  • Published Online: Feb 25, 2025
  • Page Number: 125 to 130

1Department of Animal Genetics and Breeding, Kerala Veterinary and Animal Sciences University, India

2Centre for Advanced Studies in Animal Genetics and Breeding, Kerala Veterinary and Animal Sciences University, India

3Department of Animal Nutrition, Kerala Veterinary and Animal Sciences University, India

4Department of Statistics, Kerala Veterinary and Animal Sciences University, India

*Corresponding author: SM Saifudeen; E-mail: safeermsaifudeen@gmail.com

Online published on 25 February, 2025.

Abstract

The present study was focused to build a predictive model for protein coding genes from the rumen metagenomic data utilising most promising machine learning (ML) tools. We classified the sequence reads into coding genes and non-coding sequences, converted the sequences into k-mers of various sizes (k = three to six) and extracted features named k-mer count that were representative of the sequence reads. ML classifiers were trained using 16 genomes consisted of 13 bacterial kingdom and 3 archaeal kingdom selected from diverse environment and various systems. Among the five ML models for gene prediction, artificial neural network (ANN) performed best with maximum accuracy 89 per cent for k-mer three. We observed that logistic regression and SVMtook only reasonable computational time when compared to ANN.DNA was isolated from rumen liquor of crossbred cattle and were used for metagenomic sequencing. Annotated rumen metagenomic sequences was used to validate the ML models created. Logistic regression performed best with 85 per cent accuracy on minimum feature count itself (unigram) for k-mer four. Out of 8718 coding sequences provided to logistic regression classifier, 8073 sequences correctly predicted as genes (true positives) and remaining 645 coding sequences were predicted as non-coding (false negatives). We concluded that machine learning models created namely artificial neural network, support vector machine and logistic regression shows strong, robust and powerful ability for classification of coding and non-coding genes and it represents an intriguing and promising avenue for predicting rumen metagenomic genes.

⓿ Classification of sequence into coding and non-coding based on k-mers.

⓿ Machine learning models for gene prediction in metagenomic DNA fragments.

⓿ Validation of the models using bovine rumen metagenomic sequences.

Keywords

Rumen metagenomics, Sequence, Gene prediction, K-mer, Machine learning