Java RegEx API "Look-behind group does not have an obvious maximum length near index ..." -
i'm on sql clause parsing , designed working regex find column outside string literals using "rad software regular expression desginer" using .net api. make sure designed regex works java too, tested using api of course (1.5 , 1.6). guess what, won't work. got message
"look-behind group not have obvious maximum length near index 28".
the string i'm trying parsed is
column_1='test''the''stuff''all''day''long' , column_2='000' , theverycolumniwanttofind = 'column_1=''test''''the''''stuff''''all''''day''''long'' , column_2=''000'' , theverycolumniwanttofind = '' theverycolumniwanttofind = '' , (column_3 null or column_3 = ''not interesting'') , ''1'' = ''1''' , (column_3 null or column_3 = 'still not interesting') , '1' = '1'
as may have guessed, tried create kind of worst case ensure regex won't fail on more complicated sql clauses.
the regex looks this
(?i:(?<!=\s*'(?:[^']|(?:''))*)((?<=\s*)theverycolumniwanttofind(?=(?:\s+|=))))
i'm not sure if there more elegant regex (there'll one), that's not important right trick.
to explain regex in few words: if finds column i'm after, negative look-behind figure out if column name used in string literal. if so, won't match. if not, it'll match.
back question. mentioned before, won't work java. work , result in want?
found out, java not seem support unlimited look-behinds still couldn't work.
isn't right look-behind putting limit on search offset current search position? result in "position - offset"?
i found solution , because asked question here i'll share of course.
private static final string sql_string_literals_regex = "'(?:(?:[^']|(?:''))*)'"; private static final char dot = '.'; private arraylist<int[]> getnonstringliteralregions(string exclusion) { arraylist<int[]> regions = new arraylist<int[]>(); int lastend = 0; matcher m = pattern.compile(sql_string_literals_regex).matcher(exclusion); while (m.find()) { regions.add(new int[] {lastend, m.start()}); lastend = m.end(); } if (lastend < exclusion.length()) // didn't cover last part of exclusion yet. regions.add(new int[] {lastend, exclusion.length()}); return regions; } protected final string getfixedexclusion(string exclusion, string[] columns, string alias) { if (alias == null) throw new nullpointerexception("alias must not null."); else if (alias.charat(alias.length() - 1) != dot) alias += dot; stringbuilder b = new stringbuilder(exclusion); arraylist<int[]> regions = getnonstringliteralregions(exclusion); (int = regions.size() - 1; >= 0; --i) { // reverse iteration keep valid indices lower regions. int start = regions.get(i)[0], end = regions.get(i)[1]; string s = exclusion.substring(start, end); (string column : columns) s = s.replaceall("(?<=^|[\\w&&\\d])(?i:" + column + ")(?=[\\w&&\\d]|$)", alias + column); b.replace(start, end, s); } return b.tostring(); }
this time trick find sql string literals , avoid them when replacing columns "alias.columnname". important ensure whole column names when replacing. if to replace column "column_1" in clause
where column_1 = column_2 , column_11 = column_22
"column_11" left untouched. (i think important keep in mind, that's why mention here faces similar problem.)
still, think workaround , if can avoid need logic, best so.
ok, anyway , i'd glad answer upcoming questions you, if any.
Comments
Post a Comment