regex - Remove non-ASCII characters from a string using python / django -
i have string of html stored in database. unfortunately contains characters such ® want replace these characters html equivalent, either in db or using find replace in python / django code.
any suggestions on how can this?
you can use ascii characters first 128 ones, number of each character ord
, strip if it's out of range
# -*- coding: utf-8 -*- def strip_non_ascii(string): ''' returns string without non ascii characters''' stripped = (c c in string if 0 < ord(c) < 127) return ''.join(stripped) test = u'éáé123456tgreáé@€' print test print strip_non_ascii(test)
result
éáé123456tgreáé@€ 123456tgre@
please note @
included because, well, after it's ascii character. if want strip particular subset (like numbers , uppercase , lowercase letters), can limit range looking @ ascii table
edited: after reading question again, maybe need escape html code, characters appears correctly once rendered. can use escape
filter on templates.
Comments
Post a Comment