1. Web crawler of Python Library
Requests: the most friendly web crawler Library
- It provides a simple and easy-to-use web crawler function similar to HTTP protocol
- Support connection pool, SSL, Cookies, HTTP(S) proxy, etc
- Python's main page level Web crawler Library
import requests r = requests.get('https://api.github.com/user',\ auth=('user', 'pass')) r.status_code r.headers['content-type'] r.encoding r.text
Scrapy: excellent web crawler framework
- It provides the framework function and semi-finished product of building web crawler system
- Support batch and regular web page crawling, provide data processing flow, etc
- Python is the most important and professional web crawler framework
Scrapy: Python data analysis high-level application library
pyspider: powerful Web page crawling system
- It provides a complete construction function of web page crawling system
- Support database backend, message queue, priority, distributed architecture, etc
- Python's important third-party library of web crawlers
pyspider: powerful Web page crawling system
2. Web information extraction of Python Library
Beautiful soup: parsing library for HTML and XML
- It provides the function of parsing Web information such as HTML and XML
- Also known as beatifulsoup4 or bs4, it can load a variety of parsing engines
- It is often used with web crawler libraries, such as Scrapy, requests, etc
Re: regular expression parsing and processing library
- Provides a number of general functions for defining and parsing regular expressions
- It can be used in various scenarios, including fixed-point Web information extraction
- Python is one of the most important standard libraries without installation
re.search() re.match() re.findall() re.split() re.finditer() re.sub()
Python Goose: feature library for extracting article type Web pages
- It provides the function of extracting metadata such as article information / video in Web pages
- For specific types of Web pages, the application coverage is wide
- Python's main Web information extraction Library
from goose import Goose url = 'http://www.elmundo.es/elmundo/2012/10/28/espana/1351388909.html' g = Goose({'use_meta_language': False, 'target_language':'es'}) article = g.extract(url=url) article.cleaned_text[:150]
3. Web site development of Python Library
Django: the most popular Web application framework
- It provides the basic application framework of building Web system
- MTV mode: model, template and views
- Python is the most important Web application framework, a slightly complex application framework
Pyramid: a moderate scale Web application framework
- It provides a simple and convenient application framework for building Web system
- Medium size, moderate scale, suitable for rapid construction and moderate expansion of class applications
- Python product level Web application framework is simple to start and has good scalability
from wsgiref.simple_server import make_server from pyramid.config import Configurator from pyramid.response import Response def hello_world(request): return Response('Hello World!') if __name__ == '__main__': with Configurator() as config: config.add_route('hello', '/') config.add_view(hello_world, route_name='hello') app = config.make_wsgi_app() server = make_server('0.0.0.0', 6543, app) server.serve_forever()
Flash: Web application development micro framework
- It provides the simplest application framework for building Web system
- Features: simple, small-scale, fast
- Django > Pyramid > Flask
from flask import Flask app = Flask(__name__) @app.route('/') def hello_world(): return 'Hello, World!'
4. Network application development of Python Library
WeRoBot: WeChat official account development framework
- It provides the function of parsing wechat server messages and feedback messages
- An important technical means of establishing wechat robot
# Feedback a Hello World for each wechat message import werobot robot = werobot.WeRoBot(token='tokenhere') @robot.handler def hello(message): return 'Hello World!'
aip: Baidu AI open platform interface
- It provides Python function interface for accessing Baidu AI service
- Voice, face, OCR, NLP, knowledge map, image search and other fields
- Python is the main way of Baidu AI application
MyQR: QR code generation third party Library
- It provides a series of functions for generating QR codes
- Basic QR code, art QR code and dynamic QR code