Python起步(IV)－小攻城師的戰場筆記

【使用資料庫功能】

python 2.5 支援 SQLite，
在 Debian 下直接打 python 是進入 python 2.4，
所以想使用 python 2.5 時， command 改下 python2.5

【函式庫: subprocess】

1. 練習由 python 執行 Linux 指令

#在 Linux 下使用
#要呼叫外部程式時需引入 subprocess
import subprocess

#執行指令 cat /etc/passwd
subprocess.call("cat /etc/passwd".split())
subprocess.call(["cat","/etc/passwd"])
#以上兩行的意義相同
#為增加可讀性也可以改寫為以下兩行
cmd = "cat /etc/passwd"
subprocess.call(cmd.split())

#若想要執行cmd = "ls ~username/*"
#由於 tidle (~) 和 wildcard (*) 等等都是 shell 的語法
#因此特殊字元要另行處理
subprocess.call("ls ~username/*", shell=True)
#sort 會將 stdin 的資料排序
subprocess.call("cat /etc/passwd|sort", shell=True)

mycat = subprocess.Popen("cat /etc/passwd".split(), stdout=subprocess.PIPE)
#此時背景已經產生一個 cat 行程，等候指令中
mysort = subprocess.Popen(["sort"], stdin=mycat.stdout, stdout=subprocess.PIPE)
for i in mysort.stdout:
    print i,

【函式庫: 資料庫應用】
#由於 splite 在 python 2.5 以後的版本才開始內建（2.4須另行安裝），所以在此以 python2.5 執行
import sqlite3

#有兩種建立資料庫連線的方法
#第一種是絕對路徑給檔名，讀取檔案內容，若無檔案會產生
conn = sqlite3.connect('/tmp/xxx.db')
#第二種暫存在記憶體，程式結束後就消失
conn = sqlite3.connect(':memory:')

#指標指向檔案
c = conn.cursor()

#建立資料表
c.execute('''create table test (name text, email text)''')
#若要看資料表內容，可以回到 Linux command 下
#執行 splite3 /tmp/xxx.db，進入 sqlite> 後輸入 .dump 看資料表內容

#建立 record
c.execute('insert into test (name, email) values ("testname", "test@email.com")')
c.execute('insert into test (name, email) values ("username", "user@email.com")')
#由於 sqlite 預設會啟用 transaction，所以要 commit
conn.commit()

c.execute('select * from test')
for i in c:
    print i[0] + "=>" + i[1]

c.execute('update test set name="testname" where email="hahaha@email.com"')
conn.commit()
new_row = c.execute('select * from test')
for i in new_row:
    print i[0], i[1]

c.execute('delete from test where email="user@email.com"')
conn.commit()
new_row = c.execute('select * from test')
for i in new_row:
    print i[0], i[1]

【函式庫: 用 Parser 解析設定檔】
完整教學看這邊
http://williewu.blogspot.com/2007/10/python-configparser.html

1. 先準備一個 test.conf 檔案，檔案內容如下
[admin]

root = www.google.com
age = 100

2. 程式內容如下

from ConfigParser import SafeConfigParser
#表示從 ConfigParser 中取用 SafeConfigParser

parser = SafeConfigParser()
parser.read('test.conf')
#讀出設定檔內容
parser.get('admin','root')
#從 parser 分析後的樹狀結構中，抓出 admin 中的 root 並取其值
parser.getint('admin','age')
#直接以整數方式取出 age 的值

【函式庫: 寄信，smtplib】

import smtplib
#若已設定好 e-mail server，就可直接取用
s = smtplib.SMTP('localhost')
s.sendmail('來源mail','目的mail','內容')
s.quit()

【實作練習: 剖析文件檔並寄信給文件檔中的人】

1. 先準備一個 motify.conf 檔案，檔案內容如下
[admin]

source = xxxmail
staff = ooomail
msg = hello world!

2. 程式內容如下

from ConfigParser import SafeConfigParser
import smtplib
parser = SafeConfigParser()
parser.read('motify.conf')
li = [parser.get('admin','source'), parser.get('admin','staff'), parser.get('admin', 'msg')]
s = smtplib.SMTP('localhost')
s.sendmail(li[0],li[1],li[2])
s.quit()

【正規表示式】

檢查工具
http://osteele.com/tools/rework/

1. 對正規表示式來說，任意字元是以點 (.) 表示。
e.g.
a...e 表示 a 與 e 之間有任意三個字元
ae←比對失敗
a12e←比對失敗，因為不足三個字元
abcde←比對成功

2. 中括號中表示指定特定的字元，若其中一個符合則符合
e.g.
a[abc]e
比對成功的例子: aae, abe, ace
比對失敗的例子: aze

3. 小寫的檢查: a-z，大寫的檢查: A-Z，數字的檢查: 0-9
e.g.
a[0-9a-zA-Z]e
在 a 與 e 間插入多個英數字或插入一個以上的特殊符號外，其他都符合規則

4. 在中括號之中的點 (.) ，僅代表一個點。
e.g.
a[.]e
只有 a.e 符合

5. 符號 ^ 在第一個字元出現時有 not 的意思
e.g.
a[^0-9a-zA-Z]e 表示英數字以外的符號符合此項比對

6. 符號 ^ 在第一個字元以外的地方出現，代表 ^ 本身這個字
e.g.
a[0-9^a-zA-Z]e 表示 a 和 e 間出現一個 ^ 或一個英數字均符合

7. 修飾詞:
星號 (*) 可用來代表零或多個
e.g.
a.*z 若字詞頭為 a 尾為 z 則符合
ab*z 若字詞頭為 a 尾為 z ，且中間出現一個以上的 b 則符合
ab.*z 字詞頭為 ab 尾為 z 則符合

問號 (?) 代表零個或一個
w.?e
符合的範例：we、wie
不符合的範例：willie

加號 (+) 代表一個或多個
e.g.
a.+z 在 a 和 z 之間出現一個或以上的字元即符合

若希望在 a 與 z 之間有一個以上非英文大寫的任意字元，
寫法為: a[^A-Z]+z

8. 大括號用來精確比對前一個字
ab{5}z ←僅 abbbbbz 符合
ab{1,5}z ←表示 b 出現最少一次、最多五次
a[A-Z]{1,5}z ←中間可以有一到五個大寫英文字

9. 意義相同的正規表示式:
b{1,} = b+
b{0,} = b*

10. 逸出字元前要加上反斜線
a\.b
a\[b
a\\b

11. 群組(grouping): 用小括號包起來
a(abc)*z ←表示 a 開頭、z 結尾，中間出現任意次數的 "abc"
另外也有記憶小括號的功能
e.g.
import re
m = re.search('it is (fine (today))', 'it is fine today')
m.group(0)
m.group(1)
m.group(2)
#以上程式會依續印出完整字串、左起第一組小括號、第二組小括號

12. 較短的表示方式:
http://www.amk.ca/python/howto/regex/
\w = [a-zA-Z0-9_]
\s = [\t\n \r\f\v]
\d = [0-9]　　　　　　←所以 IP 比對可以改寫成 \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
　　　　　　　　　　　　還可以再縮寫成 \d{1,3}(\.\d{1,3}){3}

大寫則有反義的用途，
例如 \D 是非數字，\W 代表非英數字、\S 代表非空白字元

13. 字和空白之間的交會點是 \b
因此 "Will and Willie are good friends."
可以利用 Will\b 找出 Will (以免同時也比對到 Willie)

14. 正規表示式預設有「貪多」的特性

import re
reg = re.search("t.*d", "today is fine")
reg.group()

這樣的搜尋會一路找到結尾、再找回來、才取出 tod，
會造成效能上的耗費，因此有不貪多演算法，
\.*?　←代表抓取任意字元、任意次數、不貪多
\.+?　←代表抓取任意字元、一次以上、不貪多

不貪多演算法的說明
http://www.gais.com.tw/article.php?u=DeeR&i=20080225

15. 把沒有明顯分隔符號的字串切割重組
e.g.
import re
text = "willie123good456"
"".join(re.split(r"\d+", text))

16. 使用其他人寫好的套件剖析 XML 與 HTML

HTML:
Beautiful Soup
http://www.crummy.com/software/BeautifulSoup/

Parsing HTML 的說明
http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20HTML

XML:
ElementTree
http://effbot.org/zone/element-index.htm
Parsing XML 的說明
http://docs.python.org/lib/module-xml.etree.ElementTree.html

【實作練習: 剖析 log 中異常的 IP】
#假設 IP 為 200 開頭的是異常 IP
#!/usr/bin/python

import re
f = open('/tmp/auth.log')
for i in f:
        regex = re.search(r'200\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}', i)
        if regex:
                print regex.group()
f.close()

#re.search 表示比對正規表示式與輸入結果
#第一個參數是正規表示式，第二個是輸入內容
#regex.group() 預設參數是 0

tips:
小括號包起來的東西在 python 裡會被記憶，
因此若只想取 ip 最後一段，可改寫如下：

import re
f = open('/tmp/auth.log')
for i in f:
        regex = re.search(r'200\.[0-9]{1,3}\.[0-9]{1,3}\.([0-9]{1,3})', i)
        #加上小括號
        if regex:
                print regex.group(1)
                #加上小括號所出現的 index (從 1 開始算)
f.close()

【實作練習：分析 log 檔中的非法使用者想入侵的帳號】

#!/usr/bin/python

import re
f = open('/tmp/auth.log')
for i in f:
        regex = re.search(r'Invalid user ([^ ]+) from ([^ ]+)', i)
        if regex:
                print regex.group(1) + " => " + regex.group(2),
f.close()

【實作練習：分析 log 檔中的非法使用者想入侵的帳號 (改善執行效能)】

#!/usr/bin/python

import re
f = open('/tmp/auth.log')
rec = re.compile(r'Invalid user ([^ ]+) from ([^ ]+)')
for i in f:
        regex = rec.search(i)
        if regex:
                print regex.group(1) + " => " + regex.group(2),
f.close()

【實作練習：分析 log 檔中的非法使用者想入侵的帳號 (縮短正規表示式)】

#!/usr/bin/python

import re
f = open('/tmp/auth.log')
rec = re.compile(r'Invalid user (\w+) from ([^ ]+)')
for i in f:
        regex = rec.search(i)
        if regex:
                print regex.group(1) + " => " + regex.group(2),
f.close()

【實作練習：取出 HTML 的部分內容】

from BeautifulSoup import BeautifulSoup
f = open('test.htm')
html = f.read()
f.close()
soup = BeautifulSoup(html)
soup.html.body.span.string　　　　　　　　　#取出span標籤內夾記的內容
soup.html.body.a.string　　　　　　　　　　#預設會取出第一個找到的 a 標籤夾記的內容
soup.html.body('div')[1].a.string　　　　　#取得第二組 div 內的 a 標籤
soup.html.body.div.a['href']                #抓出 a 標籤中的屬性 href

【實作練習：取出 XML 的部分內容】(for python 2.5)

from xml.etree.ElementTree import XML
myxml = open('test.xml').read()
seek = XML(myxml)
seek.getchildren()                          #確認 seek 可找到哪些子節點
seek.find('staff').find('name').text        #取出子節點 staff 中的 name 裡頭的內容
for i in seek.findall('staff'):             #找出所有的 staff
        print i.find('name').text           #取出 staff 中的 name 內容

小攻城師

小攻城師的戰場筆記

小攻城師發表在痞客邦留言(3) 人氣()

E-mail轉寄

小攻城師的戰場筆記

我是每天都和程式語言奮戰的小攻城師，在此記錄日常遇到的問題與解答。

Python起步(IV)

歷史上的今天

留言列表

內文搜尋

最近看了什麼呢

筆記分類成這些

文書處理 (5)

前端技術 (4)

後端技術 (8)

系統相關 (5)

其他內容 (9)

文章彙整

好多人一起上進

大家通常讀這些

最新迴響