部署

前言

开发调试阶段，运行Flask的方式多直接使用app.run()，但Flask内置的WSGI Server的性能并不高。对于生产环境，一般使用 gunicorn。如果老项目并不需要多高的性能，而且用了很多单进程内的共享变量，使用gunicorn会影响不同会话间的通信，那么也可以试试直接用gevent。

在docker流行之前，生产环境部署flask项目多使用virtualenv+gunicorn+supervisor。docker流行之后，部署方式就换成了gunicorn+docker。如果没有容器编排服务，后端服务之前一般还会有个nginx做代理。如果使用kubernetes，一般会使用service+ingress(或istio等)。

运行方式

Flask内置WSGI Server

开发阶段一般运行方式。

# main.py
from flask import Flask
from time import sleep

app = Flask(__name__)

@app.get("/test")
def get_test():
    sleep(0.1)
    return "ok"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=10000)

运行:

python main.py

gevent

使用gevent运行flask，需要先安装gevent

python -m pip install -U gevent

代码需要稍作修改。

需要注意monkey.patch_all()一定要写在入口代码文件的最开头部分，这样 monkey patch 才能生效。

# main.py
from gevent import monkey
monkey.patch_all()
import time

from flask import Flask
from gevent.pywsgi import WSGIServer


app = Flask(__name__)


@app.get("/test")
def get_test():
    time.sleep(0.1)
    return "ok"


if __name__ == "__main__":
    server = WSGIServer(("0.0.0.0", 10000), app)
    server.serve_forever()

运行

python main.py

gunicorn+gevent

如果现有项目多使用单进程的内存级共享变量，贸然使用gunicorn可能会导致多会话之间数据访问不一致的问题。

同样需要先安装依赖。

python -m pip install -U gunicorn gevent

不同于单独使用gevent，这种方式不需要修改代码，gunicorn会自动注入gevent的monkey patch。

gunicorn可以在命令行配置启动参数，但个人一般习惯在gunicorn的配置文件内配置启动参数，这样可以动态设置一些配置，而且可以修改日志格式。

gunicorn.conf.py的配置示例如下：

# Gunicorn 配置文件
from pathlib import Path
from multiprocessing import cpu_count
import gunicorn.glogging
from datetime import datetime

class CustomLogger(gunicorn.glogging.Logger):
    def atoms(self, resp, req, environ, request_time):
        """
        重写 atoms 方法来自定义日志占位符
        """
        # 获取默认的所有占位符数据
        atoms = super().atoms(resp, req, environ, request_time)
        
        # 自定义 't' (时间戳) 的格式
        now = datetime.now().astimezone()
        atoms['t'] = now.isoformat(timespec="seconds")
        
        return atoms
    

# 预加载应用代码
preload_app = True

# 工作进程数量：通常是 CPU 核心数的 2 倍加 1
# workers = int(cpu_count() * 2 + 1)
workers = 4

# 使用 gevent 异步 worker 类型，适合 I/O 密集型应用
# 注意：gevent worker 不使用 threads 参数，而是使用协程进行并发处理
worker_class = "gevent"

# 每个 gevent worker 可处理的最大并发连接数
worker_connections = 2000

# 绑定地址和端口
bind = "127.0.0.1:10001"

# 进程名称
proc_name = "flask-dev"

# PID 文件路径
pidfile = str(Path(__file__).parent / "tmp" / "gunicorn.pid")

logger_class = CustomLogger
access_log_format = (
    '{"@timestamp": "%(t)s", '
    '"remote_addr": "%(h)s", '
    '"protocol": "%(H)s", '
    '"host": "%({host}i)s", '
    '"request_method": "%(m)s", '
    '"request_path": "%(U)s", '
    '"status_code": %(s)s, '
    '"response_length": %(b)s, '
    '"referer": "%(f)s", '
    '"user_agent": "%(a)s", '
    '"x_tracking_id": "%({x-tracking-id}i)s", '
    '"request_time": %(L)s}'
)

# 访问日志路径
accesslog = str(Path(__file__).parent / "logs" / "access.log")

# 错误日志路径
errorlog = str(Path(__file__).parent / "logs" / "error.log")

# 日志级别
loglevel = "debug"

运行。gunicorn的默认配置文件名就是gunicorn.conf.py，如果文件名不同，可以使用-c参数来指定。

gunicorn main:app

传统进程管理: 实现自动启动

在传统server部署时，要让后端服务进程异常退出后自动重启的方式常见有：

配置crontab+shell脚本。定时检查进程在不在，不在就启动。
配置supervisor。
配置systemd。

由于supervisor需要单独安装，而本着“能用就用，能少就少”的原则，个人一般不会使用 supervisor，因此本文不会涉及如何使用supervisor。

在server部署时，一般也会为项目单独创建python虚拟环境。

# 使用python内置的venv, 在当前目录创建python虚拟环境目录 .venv
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r ./requirements.txt

# 如果使用uv, 直接uv sync 即可

crontab+shell脚本 (不推荐生产环境)

刚入行的时候对systemd不熟悉，经常用crontab+shell脚本来守护进程，现在想想这种方式并不合适，比较考验shell脚本的编写水平，需要考虑方方面面

首先要确保用户级crontab启用，有些生产环境会禁用用户级的crontab，而且也不允许随便配置系统级的crontab。
crontab是分钟级的，服务停止时间可能要一分钟。
如果有控制台日志，需要手动处理日志重定向，还有日志文件轮转问题。
如果ulimit不高，还得控制ulimit。
经常出现僵尸进程，shell脚本来要写一堆状态检查的逻辑。

如果只需要简单用用，也可以提供个示例

#!/bin/bash

# 环境配置
export FLASK_ENV="production"
export DATABASE_URL="postgresql://user:pass@localhost:5432/mydb"
export REDIS_URL="redis://localhost:6379/0"

script_dir=$(cd $(dirname $0) && pwd)
app_name="gunicorn"  # 实际进程名是 gunicorn，不是Flask app
wsgi_module="wsgi:app"  # 替换 WSGI 入口
socket_path="${script_dir}/myapp.sock"  # Unix Socket 路径（避免 /run 重启丢失）
log_file="${script_dir}/app.log"
pid_file="${script_dir}/gunicorn.pid"   # 用 PID 文件控制

# 进程检测
is_running() {
    if [ -f "$pid_file" ]; then
        pid=$(cat "$pid_file")
        if ps -p "$pid" > /dev/null 2>&1 && grep -q "gunicorn.*${wsgi_module}" /proc/"$pid"/cmdline 2>/dev/null; then
            echo "Gunicorn (PID: $pid) is running"
            return 0
        else
            rm -f "$pid_file"  # 清理失效 PID
            echo "Stale PID file found, cleaned up"
            return 1
        fi
    else
        # 备用检测：通过 socket 文件 + 进程名
        if [ -S "$socket_path" ] && pgrep -f "gunicorn.*${wsgi_module}" > /dev/null 2>&1; then
            echo "Gunicorn is running (detected by socket)"
            return 0
        fi
        echo "Gunicorn is not running"
        return 1
    fi
}

# 启动应用
start_app() {
    is_running
    if [ $? -eq 0 ]; then
        echo "Already running, skip start"
        return 0
    fi

    echo "Starting Gunicorn at $(date)"
    echo "Socket: $socket_path"
    echo "Log: $log_file"

    # 确保 socket 目录存在
    mkdir -p "$(dirname "$socket_path")"

    # 启动命令（关键：不加 --daemon，用 nohup 托管）
    cd "$script_dir" || exit 1
    nohup "$script_dir/venv/bin/gunicorn" \
        --workers 3 \
        --bind "unix:$socket_path" \
        --pid "$pid_file" \          # 生成 PID 文件
        --access-logfile "$log_file" \
        --error-logfile "$log_file" \
        --log-level info \
        "$wsgi_module" > /dev/null 2>&1 &

    # 等待启动完成
    sleep 2
    if is_running; then
        echo "✓ Start success (PID: $(cat "$pid_file" 2>/dev/null))"
        return 0
    else
        echo "✗ Start failed, check $log_file"
        return 1
    fi
}

# 停止应用
stop_app() {
    is_running
    if [ $? -eq 1 ]; then
        echo "Not running, skip stop"
        return 0
    fi

    pid=$(cat "$pid_file" 2>/dev/null)
    echo "Stopping Gunicorn (PID: $pid) gracefully..."

    # 先发 SIGTERM（优雅停止）
    kill -15 "$pid" 2>/dev/null || true
    sleep 5

    # 检查是否还在运行
    if ps -p "$pid" > /dev/null 2>&1; then
        echo "Still running after 5s, force killing..."
        kill -9 "$pid" 2>/dev/null || true
        sleep 2
    fi

    # 清理残留
    rm -f "$pid_file" "$socket_path"
    echo "✓ Stopped"
}

# 重启应用
restart_app() {
    echo "Restarting Gunicorn..."
    stop_app
    sleep 1
    start_app
}

# 入口函数
main() {
    # 检查 Gunicorn 是否存在
    if [ ! -f "$script_dir/venv/bin/gunicorn" ]; then
        echo "ERROR: Gunicorn not found at $script_dir/venv/bin/gunicorn"
        echo "Hint: Did you activate virtualenv? (source venv/bin/activate)"
        exit 1
    fi

    local action=${1:-start}  # 默认动作：start

    case "$action" in
        start)
            start_app
            ;;
        stop)
            stop_app
            ;;
        restart)
            restart_app
            ;;
        status)
            is_running
            ;;
        cron-check)
            # 专为 crontab 设计：只检查+重启，不输出干扰日志
            if ! is_running > /dev/null 2>&1; then
                echo "[$(date '+%F %T')] CRON: Gunicorn down, auto-restarting..." >> "$log_file"
                start_app >> "$log_file" 2>&1
            fi
            ;;
        *)
            echo "Usage: $0 {start|stop|restart|status|cron-check}"
            echo "  cron-check: Silent mode for crontab (logs to app.log only)"
            exit 1
            ;;
    esac
}

main "$@"

手动运行测试

bash app_ctl.sh start

配置crontab

# 编辑当前用户 crontab
crontab -e

# 添加以下行（每分钟检查一次）
* * * * * /opt/myflaskapp/app_ctl.sh cron-check >/dev/null 2>&1

配置logrotate

# /etc/logrotate.d/myflaskapp
/opt/myflaskapp/app.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    copytruncate  # 避免 Gunicorn 丢失文件句柄
}

systemd (推荐生产环境使用)

创建systemd服务文件

sudo vim /etc/systemd/system/myflaskapp.service

示例如下

[Unit]
Description=Gunicorn instance for Flask App
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/path/to/your/app           # Flask 项目根目录
Environment="PATH=/path/to/venv/bin"         # 虚拟环境路径
ExecStart=/path/to/venv/bin/gunicorn \
          --workers 4 \
          --bind unix:/run/myapp.sock \      # 如果需要配置nginx，可以考虑使用Unix socket
          --access-logfile - \
          --error-logfile - \
          wsgi:app                           # 启动模块（如 app = Flask(__name__)）

# 禁止添加 --daemon！systemd 需直接监控主进程
Restart=on-failure        # 仅异常退出时重启（非0状态码、被信号杀死等）
RestartSec=5s             # 重启前等待5秒
StartLimitInterval=60s    # 60秒内
StartLimitBurst=5         # 最多重启5次，防雪崩
TimeoutStopSec=30         # 停止时等待30秒（优雅关闭）

# 安全加固
PrivateTmp=true
NoNewPrivileges=true
ProtectSystem=strict
ReadWritePaths=/run /var/log/myapp

[Install]
WantedBy=multi-user.target

设置开机自启并启动服务

sudo systemctl daemon-reload
sudo systemctl enable myflaskapp    # 开机自启
sudo systemctl start myflaskapp

可以试试用kill -9停止后端服务进程，观察能否被重新拉起。

注意，kill -15算是正常停止，不算异常退出。

Docker部署方案

Dockerfile。python不适用多阶段构建，单阶段即可。

FROM python:3.11-slim-bookworm

# 安全加固
## 创建非 root 用户（避免使用 nobody，权限太受限）
RUN useradd -m -u 1000 appuser && \
    # 安装运行时必需的系统库（非编译工具）
    apt-get update && apt-get install -y --no-install-recommends \
        libgomp1 \          # OpenMP（pandas/numpy 依赖）
        libpq5 \            # PostgreSQL 客户端
        libsqlite3-0 \      # SQLite
        && rm -rf /var/lib/apt/lists/* && \
    apt-get autoremove -y && \
    apt-get clean

# Python 优化
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

WORKDIR /app

# 利用 Docker 层缓存：先复制 requirements
COPY requirements.txt .
RUN pip install --no-cache-dir --prefer-binary -r requirements.txt \
    # 清理 pip 缓存（虽然 --no-cache-dir 已禁用，但保险起见）
    && rm -rf /root/.cache

# 应用代码
COPY --chown=appuser:appuser . .

# 使用非root用户运行
USER appuser

# 启动
EXPOSE 8000
CMD ["gunicorn", "--config", "config/gunicorn.conf.py", "wsgi:app"]

编写docker-compose.yaml

version: "3"
services:
  web:
    image: myflaskapp:latest
    container_name: flask_web
    # 端口映射
    ## 如果nginx也使用docker部署，而且使用同一个网络配置，则可以不做端口映射
    ports:
      - "8000:8000"
    # 环境变量
    environment:
      - FLASK_ENV=production
      - DATABASE_URL=postgresql://user:pass@db:5432/mydb
      - REDIS_URL=redis://redis:6379/0
    # 健康检查
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s      # 每 30 秒检查一次
      timeout: 5s        # 超时 5 秒
      start_period: 15s  # 启动后 15 秒开始检查（给应用初始化时间）
      retries: 3         # 失败重试 3 次后标记 unhealthy
    
    # 自动重启策略
    restart: unless-stopped  # always / on-failure / unless-stopped
    
    # 资源限制
    deploy:
      resources:
        limits:
          cpus: '2'        # 最多 2 个 CPU
          memory: 1G       # 最多 1GB 内存
        reservations:
          cpus: '0.5'      # 保留 0.5 个 CPU
          memory: 256M     # 保留 256MB 内存
    
    # ulimit 限制（防资源滥用）
    ulimits:
      nproc: 65535       # 最大进程数
      nofile:
        soft: 65535      # 打开文件数软限制
        hard: 65535      # 打开文件数硬限制
      core: 0            # 禁止 core dump
    
    # 安全加固
    security_opt:
      - no-new-privileges:true  # 禁止提权
    
    # 只读文件系统（除 /tmp 外）
    read_only: true
    tmpfs:
      - /tmp:rw,noexec,nosuid,size=100m
    
    # 卷挂载（日志、临时文件）
    volumes:
      - ./logs:/app/logs:rw
      # - ./static:/app/static:ro  # 静态文件（可选）
    
    # 网络
    networks:
      - app-network
        
# 网络配置
networks:
  app-network:
    driver: bridge

# 卷配置
volumes:
  db_data:
    driver: local
  redis_data:
    driver: local

kubernetes部署方案

deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-app
  namespace: default
  labels:
    app: flask-app
    tier: backend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: flask-app
  template:
    metadata:
      labels:
        app: flask-app
        tier: backend
    spec:
      securityContext:
        runAsNonRoot: true      # 禁止 root 运行
        runAsUser: 1000         # 使用非 root 用户
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault  # 启用 seccomp 安全策略
      containers:
      - name: flask-app
        image: myregistry.com/myflaskapp:1.0.0
        imagePullPolicy: IfNotPresent  # 生产环境建议用 Always
        ports:
        - name: http
          containerPort: 8000
          protocol: TCP
        env:
        - name: FLASK_ENV
          value: "production"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: flask-app-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: flask-app-secrets
              key: redis-url
        - name: SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: flask-app-secrets
              key: secret-key
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"   # 超过会 OOM Kill
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 30  # 启动后 30 秒开始检查
          periodSeconds: 10        # 每 10 秒检查一次
          timeoutSeconds: 3        # 超时 3 秒
          successThreshold: 1
          failureThreshold: 3      # 失败 3 次后重启容器
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 10  # 启动后 10 秒开始检查
          periodSeconds: 5         # 每 5 秒检查一次
          timeoutSeconds: 2
          successThreshold: 1
          failureThreshold: 3      # 失败 3 次后从 Service 移除
        startupProbe:
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          failureThreshold: 30     # 最多重试 30 次
          periodSeconds: 5         # 每 5 秒一次，共 150 秒容忍慢启动
          timeoutSeconds: 3
        securityContext:
          allowPrivilegeEscalation: false  # 禁止提权
          readOnlyRootFilesystem: true     # 根文件系统只读
          capabilities:
            drop:
            - ALL                          # 删除所有 Linux capabilities
          privileged: false
        volumeMounts:
        - name: tmp-volume
          mountPath: /tmp
        - name: config-volume
          mountPath: /app/config
          readOnly: true
      imagePullSecrets:
      - name: registry-secret  # 如果使用私有镜像仓库
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - flask-app
              topologyKey: kubernetes.io/hostname  # 避免所有 Pod 调度到同一节点
      volumes:
      - name: tmp-volume
        emptyDir:
          medium: Memory  # 使用内存卷，更快
          sizeLimit: 100Mi
      - name: config-volume
        configMap:
          name: flask-app-config

Service

apiVersion: v1
kind: Service
metadata:
  name: flask-app-service
  namespace: default
  labels:
    app: flask-app
    tier: backend
spec:
  type: ClusterIP
  selector:
    app: flask-app
  ports:
  - name: http
    port: 80        # Service 端口
    targetPort: 8000  # Pod 端口
    protocol: TCP

ingress-nginx

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flask-app-ingress
  namespace: default
  annotations:
    # ==================== Nginx 配置 ====================
    kubernetes.io/ingress.class: "nginx"
    
    # 启用 HTTPS 重定向
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    
    # 限流（每秒 10 个请求，突发 20）
    nginx.ingress.kubernetes.io/limit-rps: "10"
    nginx.ingress.kubernetes.io/limit-burst-multiplier: "2"
    
    # 客户端真实 IP
    nginx.ingress.kubernetes.io/enable-real-ip: "true"
    nginx.ingress.kubernetes.io/proxy-real-ip-cidr: "0.0.0.0/0"
    
    # 连接超时
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
    
    # 缓冲区大小
    nginx.ingress.kubernetes.io/proxy-buffering: "on"
    nginx.ingress.kubernetes.io/proxy-buffer-size: "16k"
    nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
    
    # Gzip 压缩
    nginx.ingress.kubernetes.io/enable-gzip: "true"
    nginx.ingress.kubernetes.io/gzip-level: "6"
    nginx.ingress.kubernetes.io/gzip-min-length: "1024"
    nginx.ingress.kubernetes.io/gzip-types: "text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript"
    
    # 安全头
    nginx.ingress.kubernetes.io/configuration-snippet: |
      add_header X-Frame-Options "SAMEORIGIN" always;
      add_header X-Content-Type-Options "nosniff" always;
      add_header X-XSS-Protection "1; mode=block" always;
      add_header Referrer-Policy "strict-origin-when-cross-origin" always;
    
    # 认证
    # nginx.ingress.kubernetes.io/auth-type: basic
    # nginx.ingress.kubernetes.io/auth-secret: flask-app-basic-auth
    # nginx.ingress.kubernetes.io/auth-realm: "Authentication Required"
    
    # 自定义错误页面
    # nginx.ingress.kubernetes.io/custom-http-errors: "404,500,502,503,504"
    # nginx.ingress.kubernetes.io/default-backend: custom-error-pages
    
    # 重写目标
    # nginx.ingress.kubernetes.io/rewrite-target: /$1
    
    # WAF（如果安装了 ModSecurity）
    # nginx.ingress.kubernetes.io/enable-modsecurity: "true"
    # nginx.ingress.kubernetes.io/modsecurity-snippet: |
    #   SecRuleEngine On
    #   SecRequestBodyAccess On

spec:
  tls:
  - hosts:
    - flask.example.com
    secretName: flask-app-tls-secret  # TLS 证书 Secret

  rules:
  - host: flask.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: flask-app-service
            port:
              number: 80

前言​

运行方式​

Flask内置WSGI Server​

gevent​

gunicorn+gevent​

传统进程管理: 实现自动启动​

crontab+shell脚本 (不推荐生产环境)​

systemd (推荐生产环境使用)​

Docker部署方案​

kubernetes部署方案​

deployment​

Service​

ingress-nginx​

前言