在 Python 中使用 difflib 比较字符串

用 Shell 写程序还是不方便,今天用 Python 把昨天写的 Bash 脚本重写了,遇到两个关于字符串的个小问题:

1、做个类似 diff 工具的效果,大致指出两个字符串的不同之处,这个可以用 difflib 模块解决。

!/usr/bin/python
import difflib

text1 = """http://www.vpsee.com is a website which is dedicated for 
building scalable websites on cloud platforms. The keywords are: Linux, Mac,
Cloud Computing, C, Python, MySQL, Nginx, VPS, Performance, Scalability,
Architecture, ..., etc. Have fun!"""
text1_lines = text1.splitlines()

text2 = """http://VPSee.com is a website which is dedicated for 
building scalable websites on cloud platforms. The keywords are: Linux, Mac,
Cloud Computing, C, Python, MySQL, Nginx, VPS, Performance, Scalability,
Programming, Optimisation, Architecture, ... , etc. Have fun !"""
text2_lines = text2.splitlines()

d = difflib.Differ()
diff = d.compare(text1_lines, text2_lines)
print '\n'.join(list(diff))

程序运行结果如下:

- http://www.vpsee.com is a website which is dedicated for 
?        ^^^^^^^

+ http://VPSee.com is a website which is dedicated for 
?        ^^^

  building scalable websites on cloud platforms. The keywords are: Linux, Mac,
  Cloud Computing, C, Python, MySQL, Nginx, VPS, Performance, Scalability,
- Architecture, ..., etc. Have fun!
+ Programming, Optimisation, Architecture, ... , etc. Have fun !

2、如何比较两个字符串,并且忽略大小写、空白字符、TAB 制表符、换行等。这个很容易解决,把字符串转换成小写后 split,然后以空格为分隔符 join 在一起。

#!/usr/bin/python

a = "  \t\n\n a    B C   d\t\n\n\n"
b = "\t\t\n\n a    b c   D\n\n\n\n"

s1 = a.lower()
s1 = ' '.join(s1.split())
s2 = b.lower()
s2 = ' '.join(s2.split())

if s1 == s1:
        print "=="
else:
        print "!="

发表评论