Comparative analyses of animal, plant and viral DNA size distributions via non-additive statistics.
DNA, Non-additive Statistics, Cucurbits, Bayesian Inference.
In 1990, the Human Genome Project (HGP) emerged, with the aim of identifying all human genes, in addition to completely sequencing the DNA bases present in the genome. Recalling that DNA carries genetic information of beings. Today, thirty years later, we have one hundred percent of the human genome sequenced.
There are many papers that were in charge of analyzing human DNA, among them we have \cite{provata1997,provata2007power,provata2008,marcone2019,marcone2020}, which investigate the distributions of base pairs present in nucleotides. Costa, M. \textit{et al.} (2019)\cite{marcone2019} and Silva, R. \textit{et al.} (2020)\cite{marcone2020} proposed probabilistic models based on generalizations of Boltzmann-Gibbs statistics. Taking them as a basis, we will analyze the DNA length distributions via non-additive statistics proposed by Tsallis\cite{tsallis1988}, not only for humans, but also for plants, other animals and viruses, using exons (part of the DNA that encodes protein and introns (part of the DNA that does not code for proteins, but has an important character in gene regulation).
Taking into account the importance of the analysis of this type for the development of techniques for the genetic improvement of plants, and the strong economic and social impact that fruit farming has, especially in western region of Rio Grande do Norte state, we decided to analyze the distribution of DNA lengths in cucurbits, more precisely melon (\textit{Cucumis Melo}) and cucumber (\textit{Cucumis Sativus}).
Bayesian inference gave us strong evidence that the distribution of base pair lengths for the analyzed species can be explained by a Tsallis-type q-exponential summation model. The parammeter $q$ gives us information about the correlations between the base lengths. Preliminary results point to a possible universality in the behavior of $q$, both for exons and introns and it does not depend on the subspecies. This analysis is described in detail in this work.