regex - Python regular expression with wiki text -
i'm trying change wikitext normal text using python regular expressions substitution. there 2 formatting rules regarding wiki link.
- [[name of page]]
[[name of page | text display]]
(http://en.wikipedia.org/wiki/wikipedia:cheatsheet)
here text gives me headache.
the cd composed entirely of [[cover version]]s of [[the beatles]] songs george martin [[record producer|produced]] originally.
the text above should changed into:
the cd composed entirely of cover versions of beatles songs george martin produced originally.
the conflict between [[ ]] , [[ | ]] grammar main problem. don't need 1 complex regular expression. applying multiple (maybe two) regular expression substitution(s) in sequence ok.
please enlighten me on problem.
wikilink_rx = re.compile(r'\[\[(?:[^|\]]*\|)?([^\]]+)\]\]') return wikilink_rx.sub(r'\1', the_string)
example: http://ideone.com/7oxuz
note: may find mediawiki parsers in http://www.mediawiki.org/wiki/alternative_parsers.
Comments
Post a Comment