Year of Publication: 2013
Page Numbers: 180-190
Authors: Chadi Kallab
Conference Name: The Second International Conference on e-Technologies and Networks for Development (ICeND2013)
- Malaysia


This paper tackles one of the most known problems in biology: search for an optimal sequences tree topology. Bio-informatics researchers and scientists have categorized this problem as Non-Polynomial Hard (NP-Hard). The main idea behind this problem is to minimize differences between given (leaf) and/or derived (parent) sequences. The count of potential solutions goes exponentially proportional to the number of given leaf sequences. This leads researchers to look for optimization methods that would provide an acceptable optimal topology. Despite the efforts of research being done to group species and gene families, many popular methods used to attempt solving the problem remain heavy in terms of the computational power needed (ex: parsimony and maximum likelihood), inferring the quasi-impossibility of finding an exact solution for more than 20 leaf sequences. Depending on the number and length of sequences to be compared, such as Pairwise Alignment or Multiple Sequence Alignment (MSA) or even alignment of an entire genome, different types of algorithms can be used to help reach an optimal topology: dynamic programming, linear programming based or heuristicbased or a combination. The higher the number of given sequences is, the more advisable and efficient it would be to go towards heuristics as they would provide a close-enough solution faster, as for instance greedy algorithms, hill climbing, simulated annealing and genetic algorithms do. Thus, as part of a larger research in Heuristics and phylogenies, this paper aims to suggest a general way to encode the problem into instances of different heuristic algorithms.