Saturday, September 15, 2012

How to find Z values of a string

             Finding Z value of a string is requires for Z algorithm. Z algorithm is one of the linear string matching algorithm. As pre-processing of Z algorithm we find Z value of given string. We will denote string by S an arbitrary string. Let S be a string and i >1 one of its positions. Starting at i, we consider all the substrings S[i..j], where i ≤ j ≤ |S|, so that S[i..j] matches a prefix of S itself. Among all the possible values j may take, select the greatest one, max(j) and denote the number [max(j) − i + 1] by Zi(S). If S[i] is different from S[1], then such a j does not exist and Zi(S) is set to 0.
              I think it is very hard to understand. Lets consider finding Z value using an example. Finding Z value starts from 2 to length of the string. That means if our string is "ABCDEF", we only find Z2 to Z6 except Z1. Because we can not find value for Z1.
Lets get our string as S = AABCDAABCXYAABCDAABCDX
Zi is equal to the length of the longest common prefix of S string and S[i to |S|] string. 

Finding Z2
        Devide string in to two sub-strings (first letter as a sub-string and rest as another sub-string)
             S1 =  AABCDAABCXYAABCDAABCDX
             S2 =  ABCDAABCXYAABCDAABCDX
        Then find length of the longest common prefixes of these two substrings. Look How to find prefixes of  a string if you are not well with prefixes.

Prefix of S1 sub-string    -  A, AA , AAB , AABC , AABCD , AABCDA , AABCDAA , AABCDAAB ,        AABCDAABC , AABCDAABCX , AABCDAABCXY , AABCDAABCXYA , AABCDAABCXYAA , AABCDAABCXYAAB , AABCDAABCXYAABC , AABCDAABCXYAABCD , AABCDAABCXYAABCDA , AABCDAABCXYAABCDAA , AABCDAABCXYAABCDAAB , AABCDAABCXYAABCDAABC , AABCDAABCXYAABCDAABCD , AABCDAABCXYAABCDAABCDX

Prefix of S2 substring - A , AB , ABC , ABCD , ABCDA , ABCDAA , ABCDAAB , ABCDAABC , ABCDAABCX , ABCDAABCXY , ABCDAABCXYA , ABCDAABCXYAA , ABCDAABCXYAAB , ABCDAABCXYAABC , ABCDAABCXYAABCD , ABCDAABCXYAABCDA , ABCDAABCXYAABCDAA , ABCDAABCXYAABCDAAB , ABCDAABCXYAABCDAABC , ABCDAABCXYAABCDAABCD , ABCDAABCXYAABCDAABCDX

Highlighted by green are the common prefixes of these two sub-strings. Length of the longest common prefix become as Z value of that position. Therefore, Z2 = 1
Finding Z3
        Devide string in to two sub-strings (first two letters as a sub-string and rest as another sub-string)
             S1 =  AABCDAABCXYAABCDAABCDX
             S2 =  BCDAABCXYAABCDAABCDX


        Then find length of the longest common prefixes of these two substrings.

Prefix of S1 sub-string    -  A, AA , AAB , AABC , AABCD , AABCDA , AABCDAA , AABCDAAB ,        AABCDAABC , AABCDAABCX , AABCDAABCXY , AABCDAABCXYA , AABCDAABCXYAA , AABCDAABCXYAAB , AABCDAABCXYAABC , AABCDAABCXYAABCD , AABCDAABCXYAABCDA , AABCDAABCXYAABCDAA , AABCDAABCXYAABCDAAB , AABCDAABCXYAABCDAABC , AABCDAABCXYAABCDAABCD , AABCDAABCXYAABCDAABCDX


Prefix of second substring - B , BC ,BCD , BCDA , BCDAA , BCDAAB , BCDAABC , BCDAABCX , BCDAABCXY , BCDAABCXYA , BCDAABCXYAA , BCDAABCXYAAB , BCDAABCXYAABC , BCDAABCXYAABCD , BCDAABCXYAABCDA , BCDAABCXYAABCDAA , BCDAABCXYAABCDAAB , BCDAABCXYAABCDAABC , BCDAABCXYAABCDAABCD , BCDAABCXYAABCDAABCDX

There are no any common prefixes in these two sub-strings. Therefore, Z3 = 0
Likewise we can find other Z values.
Finding Z4
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  CDAABCXYAABCDAABCDX                            Z4=0

Finding Z5
          S1 =  AABCDAABCXYAABCDAABCDX
          S2 =  DAABCXYAABCDAABCDX                                 Z5=0

Finding Z6
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 = AABCXYAABCDAABCDX                                    Z6=4

Finding Z7
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  ABCXYAABCDAABCDX                                      Z7=1

Finding Z8
           S1 =  AABCDAABCXYAABCDAABCDX
           S2 =  BCXYAABCDAABCDX                                          Z8=0

Finding Z9
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  CXYAABCDAABCDX                                            Z9=0

Finding Z10
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  XYAABCDAABCDX                                               Z10=0

Finding Z11
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  YAABCDAABCDX                                                 Z11=0

Finding Z12
           S1 =  AABCDAABCXYAABCDAABCDX
           S2 =  AABCDAABCDX                                                     Z12=9

Finding Z13
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  ABCDAABCDX                                                       Z13=1

Finding Z14
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  BCDAABCDX                                                         Z14=0

Finding Z15
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  CDAABCDX                                                            Z15=0

Finding Z16
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  DAABCDX                                                               Z16=0

Finding Z17
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  AABCDX                                                                  Z17=5

Finding Z18
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  ABCDX                                                                     Z18=1

Finding Z19
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  BCDX                                                                        Z19=0

Finding Z20
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 = CDX                                                                            Z20=0

Finding Z21
           S1 =  AABCDAABCXYAABCDAABCDX
           S2 =  DX                                                                               Z21=0

Finding Z22
            S1 =  AABCDAABCXYAABCDAABCDX
            S2 =  X                                                                                Z22=0

1 comments: