Pixiv - KiraraShss
1463 字
7 分钟
YouTube 音频下载 & 中文字幕生成(Ubuntu + pyenv + faster-whisper)完整指南
YouTube 音频下载 & 中文字幕生成(Ubuntu + pyenv + faster-whisper)完整指南
适用场景:
- YouTube 视频 没有任何字幕
- 需要 本地生成高质量中文字幕(SRT/TXT)
- 适合财经访谈、AI 分析语料整理
- 使用 Ubuntu + pyenv 管理 Python 多版本
一、系统环境要求
1. 操作系统
- Ubuntu 20.04 / 22.04 / 24.04(已验证)
2. 必需系统软件包(APT)
sudo apt updatesudo apt install -y ffmpeg git curl wget ca-certificates build-essential pkg-config libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev llvm libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev说明:
ffmpeg:音频解码(必须)- 其余依赖:用于 pyenv / Python 编译
二、Python 环境(pyenv)
1. 使用 pyenv(你当前就是这个方案)
示例:
pyenv install 3.10.14pyenv local 3.10.14确认 Python 来自 pyenv:
which python三、Python 必需包
1. faster-whisper(核心)
pip install faster-whisper说明:
- 本地 Whisper 推理(无需联网)
- 支持 CPU / GPU
large-v3对中文财经口语最稳
(可选)其他常用包
pip install torch numpy注:CPU 场景不是必须,GPU(如 4090)才需要关注 torch CUDA 版本
四、yt-dlp(YouTube 下载工具)
1. 安装
sudo apt install yt-dlp或(最新版):
pip install -U yt-dlp2. 必须参数(重点)
由于 YouTube 反爬机制,强烈建议始终使用:
- 浏览器 Cookie
- EJS 远程组件
--cookies-from-browser chrome--remote-components ejs:github浏览器需关闭,否则 Cookie 可能被锁
五、音频下载(只下音频,不下视频)
1. 推荐格式(m4a,ID=140)
yt-dlp --cookies-from-browser chrome --remote-components ejs:github -f 140 -x --audio-format m4a https://www.youtube.com/watch?v=VIDEO_ID2. 自动兜底选择(更稳)
yt-dlp --cookies-from-browser chrome --remote-components ejs:github -f "140/bestaudio[ext=m4a]/bestaudio[ext=webm]/bestaudio/best" -x --audio-format m4a https://www.youtube.com/watch?v=VIDEO_ID六、生成中文字幕(faster-whisper)
1. 单文件转写(示例脚本)
from faster_whisper import WhisperModelimport os
audio = "example.m4a"base = os.path.splitext(audio)[0]
model = WhisperModel( "large-v3", device="cpu", # 有 GPU 可改为 "cuda" compute_type="int8")
segments, info = model.transcribe( audio, language="zh", beam_size=5, vad_filter=True)
def ts(t): h = int(t // 3600) m = int((t % 3600) // 60) s = t % 60 return f"{h:02d}:{m:02d}:{s:06.3f}".replace(".", ",")
with open(base + ".zh.srt", "w", encoding="utf-8") as f: i = 1 for seg in segments: text = seg.text.strip() if not text: continue f.write(f"{i}\n{ts(seg.start)} --> {ts(seg.end)}\n{text}\n\n") i += 1输出文件:
xxx.zh.srt(字幕)- 可额外输出
xxx.zh.txt作为纯文本
七、一键脚本(yt_asr.sh)说明
功能
- 输入:YouTube URL 或本地音频文件
- 自动完成:
- 下载音频(m4a)
- 生成中文字幕(SRT + TXT)
- 兼容:pyenv / CPU / GPU
关键注意点(你踩过的坑)
- Bash 函数返回值必须干净
download_audio():- 日志必须输出到
stderr - stdout 只能输出最终音频路径
- 日志必须输出到
- 否则会导致:
❌ Audio not found: ⬇️ Downloading audio ...
正确做法:
echo xxx >&2yt-dlp ... >&2printf '%s\n' "$audio_path"
八、常见问题速查
Q1:YouTube 显示“无字幕”?
A:只能自己跑 ASR,yt-dlp 无解
Q2:bestaudio 报错?
A:先 --list-formats,选 140
Q3:YouTube 提示 bot 校验?
A:
--cookies-from-browser chrome--remote-components ejs:githubQ4:中文财经口语不准?
A:
- 用
large-v3 - 开
vad_filter=True
九、推荐目录结构
YoutubeLearnAStock/├── yt_asr.sh├── out/│ ├── *.m4a│ ├── *.zh.srt│ ├── *.zh.txt│ └── logs/└── README.md十、你现在拥有的能力
- ✅ 不依赖 YouTube 字幕
- ✅ 可批量生成高质量中文字幕
- ✅ 可直接用于:
- A 股访谈复盘
- AI 分析 / RAG
- 长期语料积累
这是 专业级工作流,不是“下载字幕小技巧”。
十一、脚本
一键执行脚本:
#!/usr/bin/env bash# yt_asr.sh - Final version + urls.txt batch mode# Supports:# 1) URLs / local audio files as args# 2) --urls-file urls.txt (one URL per line)# Robust for Ubuntu + pyenv + faster-whisper
set -euo pipefail
# -----------------------------# Defaults# -----------------------------OUTDIR="${OUTDIR:-./out}"ASR_LANG="${ASR_LANG:-zh}" # zh / en / ja / ...MODEL="${MODEL:-large-v3}"DEVICE="${DEVICE:-cpu}" # cpu | cudaCOMPUTE="${COMPUTE:-int8}"BROWSER="${BROWSER:-chrome}"USE_COOKIES="${USE_COOKIES:-1}"USE_REMOTE_COMPONENTS="${USE_REMOTE_COMPONENTS:-1}"KEEP_AUDIO="${KEEP_AUDIO:-1}"FORMAT_SELECT="${FORMAT_SELECT:-140/bestaudio[ext=m4a]/bestaudio[ext=webm]/bestaudio/best}"AUDIO_FORMAT="${AUDIO_FORMAT:-m4a}"VAD_FILTER="${VAD_FILTER:-1}"
URLS_FILE=""
# -----------------------------# Helpers# -----------------------------die() { echo "❌ $*" >&2; exit 1; }log() { echo "👉 $*" >&2; }
need_cmd() { command -v "$1" >/dev/null 2>&1 || die "Missing command: $1"; }
ensure_python_pkg() { local mod="$1" local pkg="${2:-$1}" log "🔎 python: $(command -v python)" if python - <<PY >/dev/null 2>&1import importlib.util, syssys.exit(0 if importlib.util.find_spec("$mod") else 1)PY then log "✅ Python package OK: $mod" else log "⬇️ Installing Python package: $pkg" python -m pip install -U "$pkg" fi}
is_url() { [[ "$1" =~ ^https?:// ]]; }
build_ytdlp_args() { local -a a=() [[ "$USE_COOKIES" == "1" ]] && a+=(--cookies-from-browser "$BROWSER") [[ "$USE_REMOTE_COMPONENTS" == "1" ]] && a+=(--remote-components ejs:github) printf '%s\0' "${a[@]}"}
download_audio() { local input="$1" outdir="$2" logf="$3" mkdir -p "$outdir"
local -a ytdlp=() while IFS= read -r -d '' x; do ytdlp+=("$x"); done < <(build_ytdlp_args)
local tmpl="$outdir/%(title).200B [%(id)s].%(ext)s" log "⬇️ Downloading audio: $input"
local filepath filepath="$( yt-dlp "${ytdlp[@]}" \ -f "$FORMAT_SELECT" \ -x --audio-format "$AUDIO_FORMAT" \ -o "$tmpl" \ --print after_move:filepath \ "$input" \ 2>>"$logf" )" || die "yt-dlp failed: $input"
filepath="$(printf '%s\n' "$filepath" | sed '/^[[:space:]]*$/d' | tail -n 1)" [[ -f "$filepath" ]] || die "Audio not found after download: $filepath" printf '%s\n' "$filepath"}
transcribe_audio() { local audio="$1" lang="$2" model="$3" device="$4" compute="$5" vad="$6" logf="$7" [[ -f "$audio" ]] || die "Audio not found: $audio"
local base="${audio%.*}" local srt="${base}.${lang}.srt" local txt="${base}.${lang}.txt" local json="${base}.${lang}.json" local tsv="${base}.${lang}.tsv"
log "🧠 Transcribing: $audio" python - "$audio" "$lang" "$model" "$device" "$compute" "$vad" \ "$srt" "$txt" "$json" "$tsv" >>"$logf" 2>&1 << 'PY'import sys, json, osfrom faster_whisper import WhisperModel
audio, lang, model_name, device, compute, vad, srt_p, txt_p, json_p, tsv_p = sys.argv[1:]vad = vad == "1"
model = WhisperModel(model_name, device=device, compute_type=compute)segments, info = model.transcribe(audio, language=lang, beam_size=5, vad_filter=vad)
def ts(t): h=int(t//3600); m=int((t%3600)//60); s=t%60 return f"{h:02d}:{m:02d}:{s:06.3f}".replace(".", ",")
rows=[]with open(srt_p,"w",encoding="utf-8") as srt, open(txt_p,"w",encoding="utf-8") as txt: i=1 for seg in segments: text=(seg.text or "").strip() if not text: continue srt.write(f"{i}\n{ts(seg.start)} --> {ts(seg.end)}\n{text}\n\n") txt.write(text+"\n") rows.append({"i":i,"start":float(seg.start),"end":float(seg.end),"text":text}) i+=1
with open(json_p,"w",encoding="utf-8") as f: json.dump({"audio":os.path.basename(audio),"lang":lang,"segments":rows},f,ensure_ascii=False,indent=2)
with open(tsv_p,"w",encoding="utf-8") as f: f.write("i\tstart\tend\ttext\n") for r in rows: f.write(f"{r['i']}\t{r['start']:.3f}\t{r['end']:.3f}\t{r['text']}\n")PY}
usage() { cat <<'USAGE'Usage: ./yt_asr.sh [options] <url_or_audio> [more...] ./yt_asr.sh --urls-file urls.txt
Options: --urls-file FILE Read URLs from file (one per line, # for comments) -o, --outdir DIR --lang LANG zh / en / ja (default: zh) --model MODEL --device cpu|cuda --compute TYPE --browser NAME --no-cookies --no-remote-components --keep-audio | --no-keep-audio --no-vad -h, --helpUSAGE}
# -----------------------------# Parse args# -----------------------------ARGS=()while [[ $# -gt 0 ]]; do case "$1" in --urls-file) URLS_FILE="$2"; shift 2;; -o|--outdir) OUTDIR="$2"; shift 2;; --lang) ASR_LANG="$2"; shift 2;; --model) MODEL="$2"; shift 2;; --device) DEVICE="$2"; shift 2;; --compute) COMPUTE="$2"; shift 2;; --browser) BROWSER="$2"; shift 2;; --no-cookies) USE_COOKIES=0; shift;; --no-remote-components) USE_REMOTE_COMPONENTS=0; shift;; --keep-audio) KEEP_AUDIO=1; shift;; --no-keep-audio) KEEP_AUDIO=0; shift;; --no-vad) VAD_FILTER=0; shift;; -h|--help) usage; exit 0;; *) ARGS+=("$1"); shift;; esacdone
# -----------------------------# Preflight# -----------------------------need_cmd yt-dlpneed_cmd ffmpegneed_cmd pythonensure_python_pkg faster_whisper faster-whisper
mkdir -p "$OUTDIR" "$OUTDIR/logs"
# -----------------------------# Collect inputs# -----------------------------ITEMS=()
if [[ -n "$URLS_FILE" ]]; then [[ -f "$URLS_FILE" ]] || die "urls file not found: $URLS_FILE" while IFS= read -r line; do line="$(echo "$line" | sed 's/#.*//g' | xargs)" [[ -z "$line" ]] && continue ITEMS+=("$line") done < "$URLS_FILE"fi
ITEMS+=("${ARGS[@]}")[[ ${#ITEMS[@]} -ge 1 ]] || { usage; exit 1; }
# -----------------------------# Main loop# -----------------------------for item in "${ITEMS[@]}"; do echo "============================================================" >&2 log "INPUT: $item"
safe_id="$(echo "$item" | sed 's#[^A-Za-z0-9._-]#_#g' | cut -c1-80)" ts="$(date +%Y%m%d_%H%M%S)" logf="$OUTDIR/logs/${ts}_${safe_id}.log" : > "$logf"
audio="" downloaded=0
if is_url "$item"; then audio="$(download_audio "$item" "$OUTDIR" "$logf")" downloaded=1 else [[ -f "$item" ]] || die "Not a URL or file: $item" audio="$item" fi
transcribe_audio "$audio" "$ASR_LANG" "$MODEL" "$DEVICE" "$COMPUTE" "$VAD_FILTER" "$logf"
if [[ "$downloaded" == "1" && "$KEEP_AUDIO" == "0" ]]; then rm -f -- "$audio" fi
log "📝 Log saved: $logf"done
log "🎉 All done."赞助支持
如果这篇文章对你有帮助,欢迎赞助支持!
YouTube 音频下载 & 中文字幕生成(Ubuntu + pyenv + faster-whisper)完整指南
https://jkwei.com/posts/knowledge/youtube_asr_workflow_ubuntu_pyenv/ 最后更新于 2026-01-15