Python爬虫怎么抓取html网页的代码块
发布网友
发布时间:2022-04-26 15:41
我来回答
共2个回答
热心网友
时间:2022-04-18 09:18
mport urllib.request
import re
def getHtml(url):
page = urllib.request.urlopen(url)
html = page.read()
html = html.decode('GBK')
return html
def getMeg(html):
reg = re.compile(r'******')
meglist = re.findall(reg,html)
for meg in meglist:
with open('out.txt',mode='a',encoding='utf-8') as file:
file.write('%s\n' % meg)
if __name__ == "__main__":
html = getHtml(url)
getMeg(html)
热心网友
时间:2022-04-18 10:36
范围匹配大点,像这种
re.findall('(<div class="moco-course-wrap".*?</div>)',source,re.S)
可以看下这个
http://blog.csdn.net/tangdou5682/article/details/52596863