Self-Improvement

python BeautifulSoup 본문

프로그래밍/Python

python BeautifulSoup

JoGeun 2018. 10. 21. 13:03

https://www.crummy.com/software/BeautifulSoup/bs4/doc/ 

*import

1
from bs4 import BeautifulSoup 
cs

*기본 사용법 1 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
soup = BeautifulSoup(resp, 'html.parser'
print(soup.prettify()) 
 - response 내요을 보기좋게 출력됨 
 
soup.title 
soup.title.name 
soup.title.string 
soup.p 
soup.p['class'
soup.a 
 - a 태그 하나만 출력 
soup.find_all('a'
 - a 태그 전체 출력 
soup.find(id="link3"
 - id="link3"으로 되어있는거 찾아서 출력 
cs


*기본 사용법 2 

1
2
3
4
5
6
7
8
9
#<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, 
#<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and 
#<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; 
 
for link in soup.find_all('a'): (위의 문장에서 a 태그를 찾으며 href로 되어있는 부분만 출력) 
    print(link.get('href')) 
# http://example.com/elsie 
# http://example.com/lacie 
# http://example.com/tillie 
cs


*각 구문 분석기 라이브러리 

1
2
3
4
5
6
7
8
9
10
11
12
1. 파이썬의 html.parser 
 -사용(x) lxml이 더 좋음 
 
2. lxml의 HTML parser 
 -BeautifulSoup(markup, "lxml") 
 
3. lxml의 XML parser 
 -BeautifulSoup(markup, "lxml-xml") 
 -BeautifulSoup(markup, "xml") 
 
4. html5lib 
 -BeautifulSoup(markup, "html5lib") 
cs


*Tag 

1
2
3
4
5
6
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>'"lxml"
tag = soup.b 
print(type(tag)) 
# <class 'bs4.element.Tag'> 
 
ex) soup.a, soup.p, soup.title 
cs


*Multi-valued attributes 

1
2
3
css_soup = BeautifulSoup('<p class="body strikeout"></p>'
css_soup.p['class'
# ["body", "strikeout"] 
cs


'프로그래밍 > Python' 카테고리의 다른 글

requests 모듈을 통한 DVWA Low SQL-injection  (0) 2018.10.21
requests 모듈을 통한 DVWA Low Command injection  (0) 2018.10.21
python request 모듈  (0) 2018.10.21
Head First Python 5-1장  (0) 2018.10.21
Head First Python 4장  (0) 2018.10.21