Monday, 27 June 2016

Boyer - Moore - Horspool Algorithm

This algorithm optimize the string pattern searching , by skipping many character as much as possible which is unmatched. We generate bad match table.

This is a two stage algorithm:

 Stage 1:
  1. Preprocess the string being searched for to build bad table that contains the length to shift when a bad match occurs.
 2. Create a good suffix table.

Stage 2:
 1. The String to find is searched from the last character to the first.
 2. The bad match table is used to skip characters when a mismatch occurs.

Bad Match Table

Bad match decide how much we have to slide when mismatch occurs in patern matching.

Algorithm:
 1. Store the length of the search string as the default shift length.
 2. For - Each character  in the search string
     - Set the index for the current character value

Example Pattern : "TRUTH"

public void BadMatchTable(string pattern)
{
  _defaultValue = pattern.Length;
  _distances = new Dictionary<int,int>();

  for(int i=0 ; i<pattern.Length -1; i++)
  {
    _distance[pattern[i]] = pattern.Length -i-1 ;  
  }  
}

Table:

Index    Value
?              5
T             1
R             3
U             2


By default , first will be length of serach string.
T , value will be change becasue of, its coming twice and last value will be updated. last character is not there in above list ,because its not repeating and we have to not consider last character length.

WE HOLD THESE TRUTHS TO BE SELF- EVIDENT

TRUTH

So let starts comparing "truth" from starting, here we does not starts matching from left to right , but will start from right to left. So first character is 'H' with 'O'  , does not macth look into table default value is 5 ,so move 5 character forward because there is no such character 'H' in table.

Now again start from 'L' , start comparing from right to left again. now 'U' does not match with space character , again look into bad tables there is no value for space character so default value is 5 again move 5 character ahead. and so on.....



No comments:

Post a Comment