TY - JOUR
T1 - Some properties of matrix product and its applications in nonnegative tensor decomposition
AU - 
JO - Journal of Information and Computing Science
VL - 4
SP - 243
EP - 257
PY - 2024
DA - 2024/01
SN - 3
DO - http://doi.org/
UR - https://global-sci.org/intro/article_detail/jics/22764.html
KW - databases, approximate string matching, DNA, mutation, similarity search 

AB -  In  DNA  related  research,  due  to  various  environment  conditions,  mutations  occur  very  often, 
where  a  mutation  is  defined  as  a  heritable  change  in  the  DNA  sequence.  Therefore,  approximate  string 
matching  is  applied  to  answer  those  queries  which  find  mutations.  The  problem  of  approximate  string 
matching is that given a user specified parameter, k, we want to find where the substrings, which could have 
k errors at most as compared to the query sequence, occur in the database sequences. In this paper, we make 
use  of  a  new  index  structure  to  support  the  proposed  method  for  approximate  string  matching.  In  the 
proposed index structure, EII, we map each overlapping q-gram of the database sequence into an index key, 
and record occurring positions of the q-gram in the corresponding index entry. In the proposed method, EOB, 
we first generate all possible mutations for each gram in the query sequence. Then, by utilizing information 
recorded in the EII structure, we check both local order (i.e., the order of characters in a gram) and global 
order  (i.e.,  the  order  of  grams  in  an  interval)  of  these  mutations.  The  final  answers  could  be  determined 
directly without applying dynamic programming which is used in traditional filter methods for approximate 
string  matching.  From  the  experiment  results,  we  show  that  our  method  could  outperform  the  (k  +  s)  q-
samples filter, a well-known method for approximate string matching, in terms of the processing time with 
various conditions for short query sequences.