regex - Remove non-ASCII characters from a string using python / django -


i have string of html stored in database. unfortunately contains characters such ® want replace these characters html equivalent, either in db or using find replace in python / django code.

any suggestions on how can this?

you can use ascii characters first 128 ones, number of each character ord , strip if it's out of range

# -*- coding: utf-8 -*-  def strip_non_ascii(string):     ''' returns string without non ascii characters'''     stripped = (c c in string if 0 < ord(c) < 127)     return ''.join(stripped)   test = u'éáé123456tgreáé@€' print test print strip_non_ascii(test) 

result

éáé123456tgreáé@€ 123456tgre@ 

please note @ included because, well, after it's ascii character. if want strip particular subset (like numbers , uppercase , lowercase letters), can limit range looking @ ascii table

edited: after reading question again, maybe need escape html code, characters appears correctly once rendered. can use escape filter on templates.


Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -