Wednesday, 13 November 2013

Palindrome String Detection by R

A palindrome is a word, phrase, number, or other sequence of symbols or elements, whose meaning may be interpreted the same way in either forward or reverse direction. The famous palindrome - "able was i ere i saw elba" is attributed to  Napoleon. 

In some context, it would be an interesting problem to check whether a given string is "Palindrome" or not. The iterative solution ( through loop) would do, but, may be a few more lines of code. This problem would have a nice and easy solution through "Recursive" process or algorithm." The problem would be broken in to a smaller problem, apply the same process until problem reaches to trivial level or "Base Case". The base case for a string would a Null or Single Character.

Let us look at an example:  "able was i ere i saw elba", first we need to check whether first and last characters are equal or not, if yes, solution lies in finding whether "*ble was i ere i saw elb*" is a palindrome or not. The process goes on, until we reduce the problem to Null or Single Character. During the process if any  iteration if the condition is not met , we can conclude that it is not a "Palindrome".

User defined R Function is coded below:

First Function - "palindromecal" is a function which deals with recursive solution and 2nd function is a calling function which would be do some house keeping stuff before calling the recursive function.

# Palindrome Recursive Function 
palindromecal <- function(str.refined) {
ans <<- FALSE
#print(str.refined)

if (nchar(str.refined) %in% c(0,1)){

ans <<- TRUE
#print('BASE CASE')
#print(ans)
}
else if(substr(str.refined,1,1)==substr(str.refined,nchar(str.refined),nchar(str.refined))) {

#print('RECURSIVE CASE')
#print(ans)
str.refined.small <- substr(str.refined,2,nchar(str.refined)-1)
palindromecal(str.refined.small)
}

"<<-"  has been used to set the reference in the global environment so as to avoid any scoping issues in recursive calls.

# Palindrome Calling Function  

 palindrome <- function(astr) {
   
  # house keepging removing blanks and punctuation etc through a pattern
  # make all to lower case 
 
  regex <- "^[ \t]+|[ \t]+$|\\.|\\'|[ \t]+|,|!|-"
  astr.clean <- gsub(regex,'',astr)
  str.refined <- tolower(astr.clean)
  shortstring <- palindromecal(str.refined)
  return(shortstring)
}


Test cases are shown for reference:
# Test1 :: True

palindrome('aa')
# > palindrome('aa')
# [1] TRUE
palindrome('A but tuba.')
# > palindrome('A but tuba.')
# [1] TRUE
palindrome('A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!')
# > palindrome('A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!')
# [1] TRUE
palindrome(' ')
# > palindrome(' ')
# [1] TRUE
palindrome('aaaabbaaaa')
# > palindrome('aaaabbaaaa')

# Test2 :: False

palindrome('..a.....b')
# > palindrome('..a.....b')
# [1] FALSE
palindrome('abcdefghijklmnopqrstuvwxyz')
# > palindrome('abcdefghijklmnopqrstuvwxyz')
# [1] FALSE


Lastly any R discussion would be incomplete without a reference to the vectorization, this function can be used with other standard vector functions such as "lapply and sapply".

list.string <- list('abc',"aa",'aaa','ab')
list.new  <- lapply(list.string,palindrome)
vec.new  <- sapply(list.string,palindrome)

# > vec.new
# [1] FALSE  TRUE  TRUE FALSE

This  piece may not have any use in any standard data analytics situations, but it can be concluded that R can be used for advanced text manipulations which is done by other languages such as Python, Perl, C or Java.