Java RegEx API "Look-behind group does not have an obvious maximum length near index ..." -


i'm on sql clause parsing , designed working regex find column outside string literals using "rad software regular expression desginer" using .net api. make sure designed regex works java too, tested using api of course (1.5 , 1.6). guess what, won't work. got message

"look-behind group not have obvious maximum length near index 28".

the string i'm trying parsed is

column_1='test''the''stuff''all''day''long' , column_2='000' ,  theverycolumniwanttofind      =    'column_1=''test''''the''''stuff''''all''''day''''long'' , column_2=''000'' ,  theverycolumniwanttofind   =    ''   theverycolumniwanttofind   =    '' , (column_3 null or column_3 = ''not interesting'') , ''1'' = ''1''' , (column_3 null or column_3 = 'still not interesting') , '1' = '1' 

as may have guessed, tried create kind of worst case ensure regex won't fail on more complicated sql clauses.

the regex looks this

(?i:(?<!=\s*'(?:[^']|(?:''))*)((?<=\s*)theverycolumniwanttofind(?=(?:\s+|=)))) 

i'm not sure if there more elegant regex (there'll one), that's not important right trick.

to explain regex in few words: if finds column i'm after, negative look-behind figure out if column name used in string literal. if so, won't match. if not, it'll match.

back question. mentioned before, won't work java. work , result in want?
found out, java not seem support unlimited look-behinds still couldn't work.
isn't right look-behind putting limit on search offset current search position? result in "position - offset"?

i found solution , because asked question here i'll share of course.

private static final string sql_string_literals_regex = "'(?:(?:[^']|(?:''))*)'"; private static final char dot = '.';  private arraylist<int[]> getnonstringliteralregions(string exclusion) {     arraylist<int[]> regions = new arraylist<int[]>();      int lastend = 0;     matcher m = pattern.compile(sql_string_literals_regex).matcher(exclusion);     while (m.find()) {         regions.add(new int[] {lastend, m.start()});         lastend = m.end();     }     if (lastend < exclusion.length())         // didn't cover last part of exclusion yet.         regions.add(new int[] {lastend, exclusion.length()});      return regions; }  protected final string getfixedexclusion(string exclusion, string[] columns, string alias) {     if (alias == null)         throw new nullpointerexception("alias must not null.");     else if (alias.charat(alias.length() - 1) != dot)         alias += dot;      stringbuilder b = new stringbuilder(exclusion);     arraylist<int[]> regions = getnonstringliteralregions(exclusion);     (int = regions.size() - 1; >= 0; --i) {         // reverse iteration keep valid indices lower regions.         int start = regions.get(i)[0], end = regions.get(i)[1];         string s = exclusion.substring(start, end);         (string column : columns)             s = s.replaceall("(?<=^|[\\w&&\\d])(?i:" + column + ")(?=[\\w&&\\d]|$)", alias + column);         b.replace(start, end, s);     }      return b.tostring(); } 

this time trick find sql string literals , avoid them when replacing columns "alias.columnname". important ensure whole column names when replacing. if to replace column "column_1" in clause

where column_1 = column_2 , column_11 = column_22 

"column_11" left untouched. (i think important keep in mind, that's why mention here faces similar problem.)
still, think workaround , if can avoid need logic, best so.

ok, anyway , i'd glad answer upcoming questions you, if any.


Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -