regex - Python regular expression with wiki text -


i'm trying change wikitext normal text using python regular expressions substitution. there 2 formatting rules regarding wiki link.

  • [[name of page]]
  • [[name of page | text display]]

    (http://en.wikipedia.org/wiki/wikipedia:cheatsheet)

here text gives me headache.

the cd composed entirely of [[cover version]]s of [[the beatles]] songs george martin [[record producer|produced]] originally.

the text above should changed into:

the cd composed entirely of cover versions of beatles songs george martin produced originally.

the conflict between [[ ]] , [[ | ]] grammar main problem. don't need 1 complex regular expression. applying multiple (maybe two) regular expression substitution(s) in sequence ok.

please enlighten me on problem.

wikilink_rx = re.compile(r'\[\[(?:[^|\]]*\|)?([^\]]+)\]\]') return wikilink_rx.sub(r'\1', the_string) 

example: http://ideone.com/7oxuz

note: may find mediawiki parsers in http://www.mediawiki.org/wiki/alternative_parsers.


Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -