Agentic RAG

在 Modular RAG 阶段，我们通过解构“检索-重排-生成”各个模块，并引入查询重写与多路召回，解决了基础 RAG 的线性瓶颈。然而，即便是在模块化的架构下，系统依然是“被动”执行预设流程。面对法律咨询、医疗决策等极具专业深度的垂直领域时，传统的 Modular RAG 依然面临“语义断层”的挑战：当用户提问过于口语化而知识库条文过于专业时，即便是经过模块化重写，若初始检索分值低于阈值，系统往往会直接放弃或产生幻觉。

本篇文档将带你实施 Agentic RAG 智能体优化策略。通过在 Modular RAG 的基础上引入自主决策循环与自我纠错机制，我们在 《中华人民共和国民法典》 数据集的复杂逻辑测试中，实现了从“固定模块流”向“智能体自主推理”的跨越。

一、架构概览

数据层：
- 核心数据集：采用《中华人民共和国民法典》。该数据具有严谨的逻辑层级和极高的专业术语密度，是验证智能体“语义重写”与“法理推理”能力的绝佳素材。
- 私有数据接入：系统具备高度的灵活性，支持上传行业规范、专家共识等 PDF，通过 SSH 通道（SCP）实现快速数据流转，即可实现从公共知识到行业私有知识的无缝切换。
- 解析策略：放弃会导致语义碎裂的机械切片，采用 SentenceSplitter 结合法律条文层级进行结构化解析，确保单个 Chunk 的语义完整性，为智能体提供高质量的“认知素材”。
推理层：
- 核心引擎：采用 vLLM 托管 Qwen3-8B，承担复杂的法理分析与查询重写任务。
数据存储与检索层：
- 高效向量库：集成 Milvus Lite。采用嵌入式部署范式，通过 Python 原生加载本地 .db 文件。
- 嵌入模型：调用内置库加载 Qwen3-Embedding-8B，利用其 4096 维语义空间捕捉复杂的法理关联。
逻辑编排层：
- 自主决策工作流：超越 Modular RAG 的固定顺序，基于事件驱动架构实现“判定-反思-纠错-重搜”的闭环。
- 循环迭代范式：
  - 自信度评估：引入 BGE-Reranker-v2-m3。不再盲目接受检索结果，而是根据精排分值（Threshold=0.35）判定“知识是否足以回答”。
  - 自主纠错重写：若检索质量不达标，Agent 自动反思并将口语提问转化为标准法律术语（语义飞跃），重新触发检索。思维链生成：通过输出 “think” 标签展示 Agent 的内部推导过程。
评估与溯源层：
- 实验验证：针对 Agentic 架构特有的“纠错重试”环节，对比原始检索与重写检索的分值跃升，量化智能体的“避坑”能力。
- 证据链透明化：在输出答案的同时，强制展示原始法条原文及 Rerank 置信度分值，并配合流式 “think” 标签展示推理链路，严控幻觉。

二、优化步骤

为了彻底解决垂直领域中“搜不到”和“不敢信”的痛点，我们在代码中实施了以下三大 Agentic 进阶策略：

1：基于 Reranker 的自信度决策门控

在 Agentic RAG 中，重排模块不再仅是排序工具，而是系统路径的选择器。

代码实现逻辑：
- 分值化质量监测：在检索模块初筛后，立即接入 BGE-Reranker-v2-m3。
- 决策门控：实测设定 0.35 为动态判定阈值。若 Top-1 片段得分低于该值，系统判定当前“知识储备不足”，自动拦截生成请求。
- 结果：该机制有效过滤了 99% 的无关噪声。

2：自主纠错与语义重写循环

这是 Agentic RAG 区别于 Modular RAG 的核心标志，实现了检索路径的动态修复。

典型案例：
- 用户输入：“租房子没到期房东想涨房租怎么办？”
- 初始检索：关键词“租房”在民法典中仅匹配到 0.07 分的无关条文（判定失败）。
- 智能体动作：触发纠错逻辑--推理出核心法理--自动执行二次检索。
- 二次检索：置信度提升至 0.99（匹配成功）。
代码实现逻辑：在 while 循环内封装重写逻辑，让 LLM 根据 Reranker 的反馈“重新思考”搜索词，直到分值达标或达到重试上限。

3：可观测性推理与证据溯源

在高标准的法律/技术咨询场景中，结果的“可信度”与准确度同等重要。

代码实现逻辑：
- CoT 思考路径展示：利用 Qwen3 的思考模型特性，实时展示智能体从“口语描述”到“法理判定”的逻辑跃迁。
- 证据溯源输出：在最终回复末尾，系统强制回溯输出检索到的 source_nodes：
  - 原文对照：展示 PDF 解析出的原始《民法典》法条。
  - 置信度量化：展示每一条证据的重排得分。
- 结论：将传统的“黑盒生成”转变为“透明证据链”，显著提升了系统在垂直行业落地时的专业说服力。

三、沐曦 (MetaX) 部署指南

本章节适用于 曦云 C500 等沐曦系列算力卡。

1. 硬件与基础环境

算力型号：曦云 C500 (64GB) * 1
算力主机：
- jiajia-mxc：vLLM / 0.11.0 / Python 3.10 / maca 3.3.0.11
- suanfeng-mxc：vLLM / 0.13.0 / Python 3.10 / maca 3.3.0.303

2. 基础步骤

进入算力容器，启动实例后，点击 JupyterLab 进入工作台。

3. 实现步骤

3.1 下载 LlamaIndex 与 Milvus Lite 框架

创建终端窗口(Terminal)

输入代码：

pip install --target /data/llama_libs --no-deps -i https://mirrors.aliyun.com/pypi/simple/ -U \
"pymilvus==2.6.6" milvus-lite orjson minio pathspec python-dateutil pytz six \
llama-index-core llama-index-readers-file llama-index-llms-openai llama-index-llms-openai-like \
llama-index-embeddings-huggingface llama-index-vector-stores-milvus llama-index-postprocessor-sbert-rerank  \
llama-index-instrumentation llama-index-workflows llama-index-utils-workflow  \
llama-index-retrievers-bm25 rank-bm25 bm25s  PyStemmer  \
sentence-transformers pypdf docx2txt nest-asyncio ujson grpcio google-api-core protobuf banks griffe sqlalchemy dataclasses-json marshmallow typing-inspect fsspec filetype deprecated wrapt dirtyjson tenacity jinja2 pyyaml \
pandas numpy nltk tiktoken requests charset-normalizer urllib3 certifi idna sniffio anyio h11 httpcore httpx mypy_extensions typing_extensions scikit-learn scipy joblib threadpoolctl tqdm pyarrow \
ragas langchain-core langchain-openai langsmith requests_toolbelt "numpy<2.0" uuid_utils tenacity regex appdirs instructor docstring_parser langchain_community llama-index-llms-huggingface
pip install griffe -t /data/llama_libs
pip install accelerate

完成下载后，新建一个新的终端:

3.2 启动 vLLM 推理

在新的终端内输入代码：

python -m vllm.entrypoints.openai.api_server \
    --model /mnt/moark-models/Qwen3-8B \
    --gpu-memory-utilization 0.4 \
    --port 8000

当终端提示INFO： Application startup compete，则完成vLLM启动步骤。

3.3 创建并运行 Python 脚本

点击 Python File：

输入代码：

import sys, os, asyncio, nest_asyncio, torch, time, re
import pandas as pd

# 1. 环境初始化
PRIVATE_LIB = "/data/llama_libs"
if PRIVATE_LIB not in sys.path:
    sys.path.insert(0, PRIVATE_LIB)
nest_asyncio.apply()

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai_like import OpenAILike
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.core.node_parser import SentenceSplitter
from llama_index.postprocessor.sbert_rerank import SentenceTransformerRerank
from llama_index.core.schema import QueryBundle

# 2. 路径与参数配置
CONFIG = {
    "pdf_path": "/mnt/moark-models/Agentic_RAG/minfadian.pdf",
    "embed_path": "/mnt/moark-models/Qwen3-Embedding-8B",
    "llm_api_base": "http://localhost:8000/v1",
    "rerank_path": "/mnt/moark-models//bge-reranker-v2-m3",
    "db_path": "./agentic_civil_code.db",
    "threshold": 0.35, 
    "max_retries": 1,
    "rag_device": "cuda:0" 
}

def clean_think_tag(text):
    """清理Qwen生成的思考标签，避免干扰后续流程"""
    text = str(text).strip()
    # 移除  或类似的思考标签
    pattern = re.compile(r'<think>.*?</think>|<思考>.*?</思考>', re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(pattern, '', text)
    return cleaned.strip()

# ==========================================
# 核心 Agent 逻辑函数
# ==========================================
async def agentic_law_query(user_query, retriever, reranker):
    current_query = user_query
    attempt = 0
    final_nodes = []

    while attempt <= CONFIG["max_retries"]:
        print(f"  └─ [思考] 正在进行第 {attempt+1} 轮法律条款匹配...")
        
        raw_nodes = retriever.retrieve(current_query)
        
        if not raw_nodes:
            attempt += 1
            continue
            
        ranked_nodes = reranker.postprocess_nodes(
            raw_nodes, query_bundle=QueryBundle(current_query)
        )
        
        top_score = ranked_nodes[0].score if ranked_nodes else 0
        
        if top_score >= CONFIG["threshold"]:
            print(f"  └─ [决策] 匹配成功 (置信度: {top_score:.2f})")
            # 取前 5 个最相关的法条，增加上下文完整性
            final_nodes = ranked_nodes[:5] 
            break
        else:
            if attempt < CONFIG["max_retries"]:
                print(f"  └─ [决策] 相关度不足 ({top_score:.2f})，正在纠错...")
                rewrite_prompt = f"请将该法律咨询转化为1个精准的法律术语关键词：{user_query}"
                res = await Settings.llm.acomplete(rewrite_prompt)
                current_query = clean_think_tag(res.text)
                attempt += 1
            else:
                final_nodes = ranked_nodes[:5]
                break

    # 组织答复生成
    context_str = "\n\n".join([n.get_content() for n in final_nodes])
    final_prompt = (
        f"依据以下法条：\n{context_str}\n\n"
        f"回答问题：{user_query}\n"
        f"请进行专业法理分析并给出结论。"
    )
    
    response = await Settings.llm.acomplete(final_prompt)
    return clean_think_tag(response.text), final_nodes

# ==========================================
# 主程序
# ==========================================
async def main():
    print("\n" + "="*60)
    print(">>> 正在启动 民法典智能顾问系统...")
    print(">>> 正在加载大模型、嵌入模型与重排器，请稍候...")

    # 配置 LLM
    Settings.llm = OpenAILike(
        model="/mnt/moark-models/Qwen3-8B", 
        api_base=CONFIG["llm_api_base"], 
        api_key="fake",
        is_chat_model=True,
        timeout=300.0 
    )
    
    # 配置 Embedding
    Settings.embed_model = HuggingFaceEmbedding(
        model_name=CONFIG["embed_path"], 
        device=CONFIG["rag_device"],
        trust_remote_code=True,
        embed_batch_size=10
    )
    
    # 配置 Reranker
    reranker = SentenceTransformerRerank(
        model=CONFIG["rerank_path"], 
        top_n=5, # 增加重排召回数量，确保能拿到完整的法条
        device=CONFIG["rag_device"]
    )

    # ==========================================
    # 知识库管理 
    # ==========================================
    db_file = CONFIG["db_path"]
    needs_indexing = not os.path.exists(db_file)
    
    if needs_indexing:
        print(f">>> [系统] 未检测到知识库文件，模式：重建索引。")
    else:
        print(f">>> [系统] 检测到知识库文件：{db_file}，模式：加载索引。")

    vector_store = MilvusVectorStore(uri=CONFIG["db_path"], dim=4096, overwrite=False)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)

    if needs_indexing:
        print(">>> 正在进行《民法典》知识入库 ...")
        try:
            documents = SimpleDirectoryReader(input_files=[CONFIG["pdf_path"]]).load_data()

            node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=100)
            
            nodes = node_parser.get_nodes_from_documents(documents)
            vector_index = VectorStoreIndex(nodes, storage_context=storage_context, embed_model=Settings.embed_model)
            print(">>> [系统] 知识库构建完成 。")
        except Exception as e:
            print(f">>> [错误] 入库失败: {e}")
            if os.path.exists(db_file):
                try:
                    if os.path.isfile(db_file): os.remove(db_file)
                    else: import shutil; shutil.rmtree(db_file)
                except: pass
            sys.exit(1)
    else:
        print(">>> 已检测到现有知识库，正在直接加载...")
        print(">>> [提示] 如果发现依据依然不完整，请删除 .db 文件以应用新的大切片配置。")
        try:
            vector_index = VectorStoreIndex.from_vector_store(
                vector_store, 
                storage_context=storage_context,
                embed_model=Settings.embed_model
            )
        except Exception as e:
            print(f">>> [错误] 加载失败: {e}")
            sys.exit(1)
    
    # 增加 retriever 的召回数量
    retriever = vector_index.as_retriever(similarity_top_k=10)

    # 打印欢迎界面
    print("\n" + "="*60)
    print("  👋 您好！我叫【罗老师】，是基于 Agentic RAG 技术构建的《民法典》法律助手。")
    print("  我不仅能检索法律条文，还能在理解不到位时自主纠错并重新思考。")
    print("  💡 您可以咨询我任何关于《民法典》的问题，例如：")
    print("     - '我被张三家的疯狗追杀，为了躲命把李四家价值十万的古董花瓶撞碎了，李四非要我赔钱，我该不该赔？'")
    print("     - '租房子没到期房东想涨房租怎么办？'")
    print("     - '吃了上错桌的菜需要付钱吗？'")
    print("  输入 'quit'、'exit' 或 '退出' 即可结束对话。")
    print("="*60 + "\n")
    while True:
        try:
            user_input = input("用户 >> ").strip()
            if user_input.lower() in ['quit', 'exit', '退出']: break
            if not user_input: continue

            answer, source_nodes = await agentic_law_query(user_input, retriever, reranker)
            
            print(f"\罗老师回复 >>")
            print(f"**法理分析答复：**\n{answer}")
            
            print("\n" + "-"*30 + " 法律依据溯源 " + "-"*30)
            if not source_nodes:
                print(" [提示] 未能在库中找到高相关的具体条文。")
            else:
                for i, node in enumerate(source_nodes):
                    score = node.score if node.score is not None else "N/A"
                    
                    print(f"\n【证据 {i+1}】(重排置信度: {score:.4f})")
                    # 输出完整内容，不做任何截断
                    content = node.get_content().strip()
                    print(content) 
                    print("-" * 40) 
            print("\n" + "-" * 74)

        except Exception as e:
            print(f"\n>>> [交互错误] {e}")
            import traceback
            traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

按Ctrl + S保存文件，并完成文件命名test。再次新建一个终端，输入python test.py，即可进入 Agentic RAG 系统。
- 对话系统：
- 顾问答复：
- 依据溯源：

四、燧原 (Enflame) 部署指南

本章节适用于 燧原 S60 等燧原系列算力卡。

1. 硬件与基础环境

算力型号：燧原 S60 (48GB) * 2
算力主机：bd-suiyuan：Ubuntu / 22.04 / Python 3.13 / ef 1.5.0.604

2. 基础步骤

进入算力容器，启动实例后，点击 JupyterLab 进入工作台。

3. 实现步骤

3.1 下载 LlamaIndex 与 Milvus Lite 框架

创建终端窗口(Terminal)

输入代码：

pip install --target /data/llama_libs --no-deps -i https://mirrors.aliyun.com/pypi/simple/ -U \
"pymilvus==2.6.6" milvus-lite orjson minio pathspec python-dateutil pytz six \
llama-index-core llama-index-readers-file llama-index-llms-openai llama-index-llms-openai-like \
llama-index-embeddings-huggingface llama-index-vector-stores-milvus llama-index-postprocessor-sbert-rerank  \
llama-index-instrumentation llama-index-workflows llama-index-utils-workflow  \
llama-index-retrievers-bm25 rank-bm25 bm25s  PyStemmer  \
sentence-transformers pypdf docx2txt nest-asyncio ujson grpcio google-api-core protobuf banks griffe sqlalchemy dataclasses-json marshmallow typing-inspect fsspec filetype deprecated wrapt dirtyjson tenacity jinja2 pyyaml \
pandas numpy nltk tiktoken requests charset-normalizer urllib3 certifi idna sniffio anyio h11 httpcore httpx mypy_extensions typing_extensions scikit-learn scipy joblib threadpoolctl tqdm pyarrow \
ragas langchain-core langchain-openai langsmith requests_toolbelt "numpy<2.0" uuid_utils tenacity regex appdirs instructor docstring_parser langchain_community llama-index-llms-huggingface jsonpatch
pip install griffe -t /data/llama_libs
pip install accelerate

完成下载后，新建一个新的终端:

3.2 启动 vLLM 推理

在新的终端内输入代码：

CUDA_VISIBLE_DEVICES=0 vllm serve /mnt/moark-models/Qwen3-8B --gpu-memory-utilization 0.7 --port 8000

当终端提示INFO： Application startup compete，则完成vLLM启动步骤。

3.3 创建并运行 Python 脚本

点击 Python File：

输入代码：

import sys, os, asyncio, nest_asyncio, torch, time, re
import pandas as pd
import torch_gcu
from torch_gcu import transfer_to_gcu

# 1. 环境初始化
PRIVATE_LIB = "/data/llama_libs"
if PRIVATE_LIB not in sys.path:
    sys.path.insert(0, PRIVATE_LIB)
nest_asyncio.apply()

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai_like import OpenAILike
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.core.node_parser import SentenceSplitter
from llama_index.postprocessor.sbert_rerank import SentenceTransformerRerank
from llama_index.core.schema import QueryBundle

# 2. 路径与参数配置
CONFIG = {
    "pdf_path": "/mnt/moark-models/Agentic_RAG/minfadian.pdf",
    "embed_path": "/mnt/moark-models/Qwen3-Embedding-8B",
    "llm_api_base": "http://localhost:8000/v1",
    "rerank_path": "/mnt/moark-models//bge-reranker-v2-m3",
    "db_path": "./agentic_civil_code.db",
    "threshold": 0.35, 
    "max_retries": 1,
    "rag_device": "cuda:1" 
}

def clean_think_tag(text):
    """清理Qwen生成的思考标签，避免干扰后续流程"""
    text = str(text).strip()
    # 移除  或类似的思考标签
    pattern = re.compile(r'<think>.*?</think>|<思考>.*?</思考>', re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(pattern, '', text)
    return cleaned.strip()

# ==========================================
# 核心 Agent 逻辑函数
# ==========================================
async def agentic_law_query(user_query, retriever, reranker):
    current_query = user_query
    attempt = 0
    final_nodes = []

    while attempt <= CONFIG["max_retries"]:
        print(f"  └─ [思考] 正在进行第 {attempt+1} 轮法律条款匹配...")
        
        raw_nodes = retriever.retrieve(current_query)
        
        if not raw_nodes:
            attempt += 1
            continue
            
        ranked_nodes = reranker.postprocess_nodes(
            raw_nodes, query_bundle=QueryBundle(current_query)
        )
        
        top_score = ranked_nodes[0].score if ranked_nodes else 0
        
        if top_score >= CONFIG["threshold"]:
            print(f"  └─ [决策] 匹配成功 (置信度: {top_score:.2f})")
            # 取前 5 个最相关的法条，增加上下文完整性
            final_nodes = ranked_nodes[:3] 
            break
        else:
            if attempt < CONFIG["max_retries"]:
                print(f"  └─ [决策] 相关度不足 ({top_score:.2f})，正在纠错...")
                rewrite_prompt = f"请将该法律咨询转化为1个精准的法律术语关键词：{user_query}"
                res = await Settings.llm.acomplete(rewrite_prompt)
                current_query = clean_think_tag(res.text)
                attempt += 1
            else:
                final_nodes = ranked_nodes[:3]
                break

    # 组织答复生成
    context_str = "\n\n".join([n.get_content() for n in final_nodes])
    final_prompt = (
        f"依据以下法条：\n{context_str}\n\n"
        f"回答问题：{user_query}\n"
        f"请进行专业法理分析并给出结论。"
    )
    
    response = await Settings.llm.acomplete(final_prompt)
    return clean_think_tag(response.text), final_nodes

# ==========================================
# 主程序
# ==========================================
async def main():
    print("\n" + "="*60)
    print(">>> 正在启动【罗老师】民法典智能顾问系统...")
    print(">>> 正在加载大模型、嵌入模型与重排器，请稍候...")

    # 配置 LLM
    Settings.llm = OpenAILike(
        model="/mnt/moark-models/Qwen3-8B", 
        api_base=CONFIG["llm_api_base"], 
        api_key="fake",
        is_chat_model=True,
        timeout=300.0 
    )
    
    # 配置 Embedding
    Settings.embed_model = HuggingFaceEmbedding(
        model_name=CONFIG["embed_path"], 
        device=CONFIG["rag_device"],
        trust_remote_code=True,
        embed_batch_size=20
    )
    
    # 配置 Reranker
    reranker = SentenceTransformerRerank(
        model=CONFIG["rerank_path"], 
        top_n=5, # 增加重排召回数量，确保能拿到完整的法条
        device=CONFIG["rag_device"]
    )

    # ==========================================
    # 知识库管理 
    # ==========================================
    db_file = CONFIG["db_path"]
    needs_indexing = not os.path.exists(db_file)
    
    if needs_indexing:
        print(f">>> [系统] 未检测到知识库文件，模式：重建索引。")
    else:
        print(f">>> [系统] 检测到知识库文件：{db_file}，模式：加载索引。")

    vector_store = MilvusVectorStore(uri=CONFIG["db_path"], dim=4096, overwrite=False)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)

    if needs_indexing:
        print(">>> 正在进行《民法典》知识入库 (GPU 1)...")
        try:
            documents = SimpleDirectoryReader(input_files=[CONFIG["pdf_path"]]).load_data()

            node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=100)
            
            nodes = node_parser.get_nodes_from_documents(documents)
            vector_index = VectorStoreIndex(nodes, storage_context=storage_context, embed_model=Settings.embed_model)
            print(">>> [系统] 知识库构建完成 (已采用大切片模式)。")
        except Exception as e:
            print(f">>> [错误] 入库失败: {e}")
            if os.path.exists(db_file):
                try:
                    if os.path.isfile(db_file): os.remove(db_file)
                    else: import shutil; shutil.rmtree(db_file)
                except: pass
            sys.exit(1)
    else:
        print(">>> 已检测到现有知识库，正在直接加载...")
        print(">>> [提示] 如果发现依据依然不完整，请删除 .db 文件以应用新的大切片配置。")
        try:
            vector_index = VectorStoreIndex.from_vector_store(
                vector_store, 
                storage_context=storage_context,
                embed_model=Settings.embed_model
            )
        except Exception as e:
            print(f">>> [错误] 加载失败: {e}")
            sys.exit(1)
    
    # 增加 retriever 的召回数量
    retriever = vector_index.as_retriever(similarity_top_k=10)

    # 打印欢迎界面
    print("\n" + "="*60)
    print("  👋 您好！我叫罗老师，是基于 Agentic RAG 技术构建的《民法典》法律助手。")
    print("  我不仅能检索法律条文，还能在理解不到位时自主纠错并重新思考。")
    print("  💡 您可以咨询我任何关于《民法典》的问题，例如：")
    print("     - '我被张三家的疯狗追杀，为了躲命把李四家价值十万的古董花瓶撞碎了，李四非要我赔钱，我该不该赔？'")
    print("     - '租房子没到期房东想涨房租怎么办？'")
    print("     - '吃了上错桌的菜需要付钱吗？'")
    print("  输入 'quit'、'exit' 或 '退出' 即可结束对话。")
    print("="*60 + "\n")
    while True:
        try:
            user_input = input("用户 >> ").strip()
            if user_input.lower() in ['quit', 'exit', '退出']: break
            if not user_input: continue

            answer, source_nodes = await agentic_law_query(user_input, retriever, reranker)
            
            print(f"\n顾问回复 >>")
            print(f"**法理分析答复：**\n{answer}")
            
            print("\n" + "-"*30 + " 法律依据溯源 " + "-"*30)
            if not source_nodes:
                print(" [提示] 未能在库中找到高相关的具体条文。")
            else:
                for i, node in enumerate(source_nodes):
                    score = node.score if node.score is not None else "N/A"
                    
                    print(f"\n【证据 {i+1}】(重排置信度: {score:.4f})")
                    # 输出完整内容，不做任何截断
                    content = node.get_content().strip()
                    print(content) 
                    print("-" * 40) 
            print("\n" + "-" * 74)

        except Exception as e:
            print(f"\n>>> [交互错误] {e}")
            import traceback
            traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(main())

按Ctrl + S保存文件，并完成文件命名test。再次新建一个终端，输入python test.py，即可进入 Agentic RAG 系统。
- 对话系统：
- 顾问答复：
- 依据溯源：

五、附录：评估原始数据

本实验通过对比 Modular RAG（单轮检索）与 Agentic RAG（含 Threshold=0.5 门控的自主纠错循环）在复杂法律咨询场景下的表现，获取如下实测数据：

表现评价：各项指标表现极其优异。

一、架构概览​

二、优化步骤​

1：基于 Reranker 的自信度决策门控​

2：自主纠错与语义重写循环​

3：可观测性推理与证据溯源​

三、沐曦 (MetaX) 部署指南​

1. 硬件与基础环境​

2. 基础步骤​

3. 实现步骤​

3.1 下载 LlamaIndex 与 Milvus Lite 框架​

3.2 启动 vLLM 推理​

3.3 创建并运行 Python 脚本​

四、燧原 (Enflame) 部署指南​

1. 硬件与基础环境​

2. 基础步骤​

3. 实现步骤​

3.1 下载 LlamaIndex 与 Milvus Lite 框架​

3.2 启动 vLLM 推理​

3.3 创建并运行 Python 脚本​

五、附录：评估原始数据​

一、架构概览

二、优化步骤

1：基于 Reranker 的自信度决策门控

2：自主纠错与语义重写循环

3：可观测性推理与证据溯源

三、沐曦 (MetaX) 部署指南

1. 硬件与基础环境

2. 基础步骤

3. 实现步骤

3.1 下载 LlamaIndex 与 Milvus Lite 框架

3.2 启动 vLLM 推理

3.3 创建并运行 Python 脚本

四、燧原 (Enflame) 部署指南

1. 硬件与基础环境

2. 基础步骤

3. 实现步骤

3.1 下载 LlamaIndex 与 Milvus Lite 框架

3.2 启动 vLLM 推理

3.3 创建并运行 Python 脚本

五、附录：评估原始数据