最近网站频繁、大量被黑、挂马;替换被挂马文件后需要检查是否再次被挂马,由于网站比较多,所以用这个检测网站是否再次被黑,省去每次打开网站F12检测TDK的工作量;
多线程python脚本
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import threading
import queue
import time
with open('url.txt') as f:
l = f.readlines()
def btdk(url):
try:
html = requests.get(url, timeout = 10).text
except:
html = '<html><title>%s</title><meta name="keywords" content="" /><meta name="description" content="" /></html>'%url
soup = BeautifulSoup(html.lower())
t = soup.title.text.encode('utf8','ignore')
try:
k = soup.find(attrs={"name":"keywords"})['content'].encode('utf8','ignore')
except:
k = ""
try:
d = soup.find(attrs={"name":"description"})['content'].encode('utf8','ignore')
except:
d = ""
return t,d,k
class MyThread(threading.Thread):
def __init__(self, queue, url):
threading.Thread.__init__(self)
self.queue = queue
self.url = url
def run(self):
while True:
url = self.queue.get()
t,k,d = btdk(url)
with open('tdk.txt', 'a+', encoding='UTF-8') as s:
line = url+'#'+t.decode('UTF-8','ignore')+'#'+'\n'
s.writelines(line)
self.queue.task_done()
def test(l, ts=4):
ll = [i.rstrip() for i in l]
for j in range(ts):
t = MyThread(queue,ll)
t.setDaemon(True)
t.start()
for url in ll:
queue.put(url)
queue.join()
if __name__ == '__main__':
queue = queue.Queue()
start = time.time()
test(l,4)
end = time.time()
print('共耗时:%s秒' % (end - start))
使用方法:
在与脚本相同目录下新建文件url.txt; 将要检测的网站网址添加到文件中,每个域名一行
打包exe的方式
参考:https://www.xugj520.cn/archives/python-exe.html
如果需要移植到其他机器使用:
需要拷贝:__pycache__
、build
、dist
、tdk.spec
四个文件
.exe程序在dist文件夹内
你好,先谢谢你这代码,不过我运行下来只接取到网站的标题,没有关键词和描述
1、网页是否有关键词和描述
2、网站结构和BeautifulSoup过滤是否一致