Advanced Usage of the Python requests LibraryJD Cloud Tech Team – June 16, 2023

30次阅读
没有评论

1. Background

The previous article introduced the basics of the requests module—covering HTTP methods (GET, PUT, POST, etc.), payload formats (data, json), common headers (headers, cookies), and how to handle responses. This article explores advanced features.


2. Advanced Examples

2.1 requests.request()

  • method: HTTP verb (GET | POST)
  • url: Target URL
  • kwargs: 14 optional control parameters

The commonly used parameters (params, data, json, headers, cookies) were covered earlier. Below we demonstrate other parameters.


2.1.1 files – Uploading Files

Python

import requests

# Upload a file
files = {"files": open("favicon.ico", "rb")}
data  = {"name": "upload file"}

requests.request(
    method="POST",
    url="http://127.0.0.1:8080/example/request",
    data=data,
    files=files
)

The file favicon.ico must reside in the same directory as the script; otherwise, provide an absolute path.


2.1.2 auth – Basic & Digest Authentication

Python

from requests.auth import HTTPBasicAuth, HTTPDigestAuth

# 1) Basic Auth
res = requests.request(
    method="GET",
    url="http://127.0.0.1:8080/example/request",
    auth=HTTPBasicAuth("username", "password")
)
print(res.status_code)  # 200

# 2) Digest Auth
res = requests.request(
    method="GET",
    url="http://127.0.0.1:8080/example/request",
    auth=HTTPDigestAuth("username", "password")
)
  • Basic Auth is simple but transmits credentials as base64-encoded text—easily decoded.
  • Digest Auth uses challenge-response hashing, preventing plaintext exposure, yet remains vulnerable to replay attacks. Neither is as secure as HTTPS client certificates.

2.1.3 timeout – Connection / Read Timeouts

Python

# Total timeout 1 s
requests.request(
    method="POST",
    url="http://127.0.0.1:8080/example/request",
    json={"k1": "v1", "k2": "v2"},
    timeout=1
)

# Separate connect & read timeouts
requests.request(
    method="POST",
    url="http://127.0.0.1:8080/example/request",
    json={"k1": "v1", "k2": "v2"},
    timeout=(5, 15)
)

# Wait indefinitely
requests.request(
    method="POST",
    url="http://127.0.0.1:8080/example/request",
    json={"k1": "v1", "k2": "v2"},
    timeout=None
)

# Catch timeout exception
from requests.exceptions import ReadTimeout
try:
    res = requests.get("http://127.0.0.1:8080/example/request", timeout=0.1)
except ReadTimeout:
    print("Timeout caught")

2.1.4 allow_redirects – Redirect Control

Python

import requests

# Default: follow redirects
r = requests.get("http://github.com")
print(r.url)          # https://github.com/
print(r.history)      # [<Response [301]>]

# Disable redirects
r = requests.get("http://github.com", allow_redirects=False)
print(r.status_code)  # 301
print(r.history)      # []

The article also demonstrates how to manually follow redirects when posting login forms to GitHub, by toggling allow_redirects=False and inspecting response.history.


2.1.5 proxies – HTTP & SOCKS Proxy Support

Python

proxies = {
    "http":  "120.25.253.234:8123",
    "https": "163.125.222.244:8123"
}
requests.get("http://example.com", proxies=proxies)

# SOCKS5 (install with `pip install requests[socks]`)
proxies = {
    "http":  "socks5://user:pass@host:port",
    "https": "socks5://user:pass@host:port"
}
requests.get("http://example.com", proxies=proxies)

Proxies help circumvent IP bans, CAPTCHAs, or rate-limiting.


2.1.6 hooks – Response Callbacks

Python

复制

def verify_res(res, *args, **kwargs):
    print("URL:", res.url)
    res.status = "PASS" if res.status_code == 200 else "FAIL"

res = requests.get("http://www.baidu.com", hooks={"response": verify_res})
print(res.status)  # PASS

2.1.7 stream – Streaming Downloads

Python

r = requests.get("http://www.baidu.com", stream=True)

# Iterate by lines
for line in r.iter_lines():
    print("line:", line)

# Iterate by chunks (1 KB each)
for chunk in r.iter_content(chunk_size=1024):
    print("chunk:", chunk)

Useful for large files to avoid loading everything into memory.


2.1.8 verify – SSL Certificate Verification

Python

import requests
from requests.packages import urllib3

# 1) Skip verification
resp = requests.get("https://www.12306.cn", verify=False)
print(resp.status_code)  # 200

# 2) Suppress warnings
urllib3.disable_warnings()

# 3) Provide custom cert
resp = requests.get("https://www.12306.cn",
                    cert=("/path/server.crt", "/path/key"))

2.2 Exception Handling

2.2.1 raise_for_status()

Raises HTTPError for non-200 responses.

Python

from requests.exceptions import HTTPError

try:
    res = requests.post("http://127.0.0.1:8080/example/post")
    res.raise_for_status()
except HTTPError:
    print("HTTP error")

2.2.2 ReadTimeout

Catches network or read timeouts.

Python

from requests.exceptions import ReadTimeout
try:
    res = requests.get("http://127.0.0.1:8080/example/post", timeout=0.5)
except ReadTimeout:
    print("Read timeout")

2.2.3 RequestException

Base class for any request-related error.

Python

from requests.exceptions import RequestException
try:
    res = requests.get("http://127.0.0.1:8080/example/post")
except RequestException:
    print("Request failed")

3. Summary

Compared with urllib2, the requests library offers:

  • HTTP keep-alive & connection pooling
  • Cookie & session persistence
  • File upload support
  • Automatic content decoding & charset handling
  • Internationalized URLs & automatic encoding
  • Human-friendly API—hence the slogan: “HTTP for Humans.”
正文完
 0
评论(没有评论)