台灣美食家: 8月 2017

2017年8月29日星期二

[Apache] httponly 讓Javascript無法取得cookie

當今天網站不小心被發現有 XSS 漏洞時，攻擊者很可能會透過 javascript 盜取你的 cookie 中的 session id，來盜取帳號。

在 http(s) header 中其中有一個 header 為 Set-Cookie，將 Set-Cookie 設為 httpOnly(javascript 無法存取cookie)或 secure (http 連線無法存取 cookie)

我們可以簡單的透過 php_ini 設定，Set-Cookie: HttpOnly
session.cookie_httponly = 1

透過 apache2 設定 header (apahce 版本資訊，可透過指令 sudo apache2 -v 得知)

vim /etc/apache2/conf-available/security.conf

若是 apache 版本 >=2.2.4

Header edit Set-Cookie ^(.*)$ $1;HttpOnly;Secure

若是 apache 版本 < 2.2.4

Header set Set-Cookie HttpOnly;Secure

sudo service apache2 restart #重啟 apache 載入設定

這樣就可以避免 javascript 與不安全的 http 連線存取 cookie

2017年8月19日星期六

[Python] python google alert 爬蟲教學

目標網頁：Google Alert (https://www.google.com.tw/alerts?hl=zh-tw)

目標：透過爬蟲抓取google alert指定關鍵字，搜尋資料結果。

1. 分析目標網頁為靜態網頁還是動態網頁。靜態網頁是指資料就寫在一開始進入網頁的HTML原始碼裡，此種網頁比較好利用爬蟲去抓取，不用透過模擬瀏覽器的方式，就能取得網頁資料。但是google alert網頁是屬於動態網頁，就比較麻煩須先模擬網頁瀏覽器，執行至google alert畫面，將關鍵字輸入，才能取得結果資料。

2. 使用python套件Beautiful Soup來抓取網頁HTML DOM結構，Beautiful Soup是一個不錯的爬蟲工具，在網路上有很多人使用，其技術問題比較容易在網路上找到解答。Python2與Python3有不同的安裝方法(請自行參考其官方網站)，在windows上面可以用pip install beautifulsoup4來安裝套件。

3. 透過python模擬chrome並開啟google alert網頁，範例程式碼。

driver = webdriver.Chrome()

driver.get("https://www.google.com.tw/alerts")

4. 自動在google alert頁面輸入關鍵字。query_div為google alert網頁input的id，send_keys為模擬輸入值，keyword為想要查詢的關鍵字。輸入完畢後，網頁會顯示搜尋結果。

sbox = driver.find_element_by_id("query_div")

input = sbox.find_element_by_xpath("//input[1]")

input.send_keys(keyword)

5. 分析結果資料，並輸入進資料庫。

soup = BeautifulSoup(driver.page_source, "html.parser")

for obj in soup.find_all("li", attrs={'class': 'result'}):

content = BeautifulSoup(str(obj), "html.parser")

result_title_link = content.find("a",attrs={'class': 'result_title_link'})

print(re.sub(SPACE_RE, '',

result_title_link.text.encode("utf8"))) #title

print result_title_link["href"] #url

result_source_data = ''

result_source = content.find("div",attrs={'class': 'result_source'})

if(result_source != None):

print(re.sub(SPACE_RE, '', result_source.text)) #source

result_source_data = re.sub(SPACE_RE, '', result_source.text)

img_data = ''

img = content.find('img')

if(img != None):

print(img["src"])

img_data = img["src"]

snippet = content.find("span",attrs={'class': 'snippet'})

if(snippet != None):

print(re.sub(SPACE_RE, '', snippet.text.encode("utf8") ))

snippet = re.sub(SPACE_RE, '', snippet.text.encode("utf8") )

其上面程式碼會顯示每一個結果的標題、網頁連結、來源、圖片與簡介。

再將資料存入資料庫就完成了。

6. Python mysql資料庫範例

import mysql.connector

config = {

'user': 'db_user',

'password': 'db_password',

'host': 'host_url',

'database': 'your_database',

'charset': 'utf8',

'use_unicode': True,

}

cnx = mysql.connector.connect(**config)

cursor = cnx.cursor(buffered=True)

cursor.execute("select name from google_alert_keywords")

for keywords_obj in cursoro:

keyword = keywords_obj[0] #即可抓取出資料庫的資料

/* 抓取結果…(略) */

/* 以下為存入資料庫 */

data_obj = {

'keyword': keyword,

'title': re.sub(SPACE_RE, '',result_title_link.text.encode("utf8")),

'source': result_source_data,

'img': img_data,

'url': result_title_link["href"],

'detail': snippet,

'time': datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')

}

try:

cursor.execute(("INSERT INTO google_alert_keywords_crawler (keyword, title, source, img, url, detail, time) VALUES (%(keyword)s, %(title)s, %(source)s, %(img)s, %(url)s, %(detail)s, %(time)s)"), data_obj)

except:

print(cursor.statement)

raise

[教學] HiCloud S3

1. 先取得hicloud s3 Access key id與secret key，這兩個其官方網站登入後皆可以取得。

2. 參考s3 Javascript SDK，其範例如下。

AWS.config.update({

accessKeyId: ' accessKeyId ',

secretAccessKey: ' secretAccessKey ',

"sslEnabled": false,

"endpoint": "s3.hicloud.net.tw",

"s3ForcePathStyle": false,

"region": "us"

});

var s3 = new AWS.S3();

將access key id與secret key帶入並將endpoint指向hicloud s3的伺服器，就可以開啟s3的連線了。

3. 列出s3 Bucket裡面檔案與資料夾。

var params = {

Bucket: 'Your_Bucket_Name',

Prefix: ''

}

s3.listObjects(params, function (err, data) {

if(err)throw err;

console.log(data);

}

在參數那邊輸入您要取得的bucket，prefix是如果您的bucket裡面有資料夾，而要對特定的資料夾做取得清單時，就需要輸入prefix，空字串的話就是取得bucket根目錄所有資料夾與檔案。

4. 檔案上傳至s3伺服器。

var objKey = 'upload_file_name';

var params = {

Bucket: 'Your_Bucket_Name',

Key: 'save_path/'+objKey,

ContentType: file.type, //檔案類別

Body: file,

ACL: 'private' //權限設定

};

s3.putObject(params, function (err, data) {

if (err) {

}

});

參數bucket為檔案儲存的bucket，key為檔案儲存在bucket時候的路徑，ACL為設定檔案的權限，private為不公開，public-read為公開等。

5. S3 CORS設定

每個bucket都可以設定cores configuration，其作用在於您的檔案是否對外開放，或對某個伺服器(IP)開放，其設定如下。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<AllowedMethod>DELETE</AllowedMethod>

</CORSRule>

</CORSConfiguration>

訂閱：文章 (Atom)

2017年8月29日 星期二

[Apache] httponly 讓Javascript無法取得cookie

2017年8月19日 星期六

[Python] python google alert 爬蟲教學

[教學] HiCloud S3

[CentOS] httpd port 9000 to 80

2017年8月29日星期二

2017年8月19日星期六