Abstract
Algorithm development for finding quasiperiodic regions in sequences is at the core of many problems arising in biological sequence analysis. We solve an important problem in this area. Let {\rm A} be an alphabet of size \eta and {\rm A}^\iota denote the set of sequences of length \iota over {\rm A}. Given a sequence S=s_1s_2\cdots s\iota \in {\rm A}^\iota, a positive integer p is called a period of S if s_i=s_{i+p} for 1\le i\le \iota -p. S is called p-periodic if it has a minimum period p. Let \Omega _\iota (p) denote the set of p-periodic sequences in {\rm A}^\iota. A natural measure of "nearness to p-periodcity" for S is the average Hamming distance to the nearest p-periodic sequence: D(S)=\min _{{\rm T}\in \Omega \iota (p)}D(S,T). If T is a sequence \in \Omega \iota (p) such that D(S,T)=D(S), then T is called a nearest p-periodic sequence of S and S is called p-quasiperiodic associated with the score D(S). This paper develops an efficient algorithm for finding a nearest p-periodic sequence of S by means of its modulo-p \beta =(\underbrace {q+1,\cdots ,q+1_r, \underbrace q,\cdots ,q}_{p-r}, where \iota =\alpha _1+\alpha _2+\cdots +\alpha _n is a partition of \iota and q is the quotient and r is the remainder when \iota is devided by p. This paper shows that there exists a sequence in {\rm A}^\iota whose modulo-p incidence matrix has row sum vector \alpha and column sum vector \beta.