在 Python 中使用 difflib 比较字符串
2009年08月12日 | 标签: python | 作者:vpsee
用 Shell 写程序还是不方便,今天用 Python 把昨天写的 Bash 脚本重写了,遇到两个关于字符串的个小问题:
1、做个类似 diff 工具的效果,大致指出两个字符串的不同之处,这个可以用 difflib 模块解决。
!/usr/bin/python import difflib text1 = """http://www.vpsee.com is a website which is dedicated for building scalable websites on cloud platforms. The keywords are: Linux, Mac, Cloud Computing, C, Python, MySQL, Nginx, VPS, Performance, Scalability, Architecture, ..., etc. Have fun!""" text1_lines = text1.splitlines() text2 = """http://VPSee.com is a website which is dedicated for building scalable websites on cloud platforms. The keywords are: Linux, Mac, Cloud Computing, C, Python, MySQL, Nginx, VPS, Performance, Scalability, Programming, Optimisation, Architecture, ... , etc. Have fun !""" text2_lines = text2.splitlines() d = difflib.Differ() diff = d.compare(text1_lines, text2_lines) print '\n'.join(list(diff))
程序运行结果如下:
- http://www.vpsee.com is a website which is dedicated for ? ^^^^^^^ + http://VPSee.com is a website which is dedicated for ? ^^^ building scalable websites on cloud platforms. The keywords are: Linux, Mac, Cloud Computing, C, Python, MySQL, Nginx, VPS, Performance, Scalability, - Architecture, ..., etc. Have fun! + Programming, Optimisation, Architecture, ... , etc. Have fun !
2、如何比较两个字符串,并且忽略大小写、空白字符、TAB 制表符、换行等。这个很容易解决,把字符串转换成小写后 split,然后以空格为分隔符 join 在一起。
#!/usr/bin/python a = " \t\n\n a B C d\t\n\n\n" b = "\t\t\n\n a b c D\n\n\n\n" s1 = a.lower() s1 = ' '.join(s1.split()) s2 = b.lower() s2 = ' '.join(s2.split()) if s1 == s1: print "==" else: print "!="