部署图像生成模型 (Diffusers)

本指南提供了一套基于 Hugging Face Diffusers 的通用部署方案，旨在解决国产算力的适配难题，助您在不同硬件架构上高效运行各类文生图模型。

推理框架概览

Hugging Face Diffusers：生图领域最基础、最灵活的库，适合进行快速原型设计、算法研究、LoRA 微调及本地推理。

前提条件

资源准备：
- 内置模型：使用平台内置模型库（路径 /mnt/moark-models/），零等待、零流量，实现即刻加载。依赖库。
环境一致性：
- 镜像匹配：国产芯片对底层驱动（Driver）和编译工具链（Toolkit）有严格要求。请严格按照各章节指定的镜像版本创建实例，错误的镜像将导致 import error 或无法调用加速卡。

一、沐曦 (MetaX) 部署指南

本章节适用于 曦云 C500 等沐曦系列算力卡。

1. 通用环境准备

所有沐曦模型部署均需基于以下环境配置进行：

算力型号：曦云 C500 (64GB)
镜像选择：PyTorch / 2.6.0 / Python 3.10 / maca 3.2.1.3

镜像选择

基础操作步骤：

进入工作台：启动实例后，点击 JupyterLab 进入容器环境。
新建脚本：点击 “Notebook” 图标，新建一个 .ipynb 文件。

2. 模型部署实战

请根据您需要的模型选择对应的实战案例代码。

2.1 Qwen-Image

本示例演示如何在 曦云 C500 算力环境下，加载平台内置模型并生成赛博朋克风格图像。

运行推理代码： 新建 Notebook 单元格运行

# 1. 安装必要的依赖库
!pip install diffusers

from diffusers import DiffusionPipeline
import torch

# 2. 引用沐曦内置模型库 (无需下载)
# 路径指向挂载盘中的 Qwen-Image
model_name = "/mnt/moark-models/Qwen-Image"

# 3. 自动检测计算设备
if torch.cuda.is_available():
    torch_dtype = torch.bfloat16
    device = "cuda"
else:
    torch_dtype = torch.float32
    device = "cpu"

# 4. 加载生成管道
pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)

# 5. 定义 Prompt (赛博美学风格社交卡片)
prompt = '''一张 9:16 竖版逼真的赛博美学未来社交资料卡照片：一只手轻握一张竖直半透明的亚克力质感卡片，
占据画面视觉中心。卡片呈现未来社交平台“模力方舟”的个人主页界面，设计极简，无冗余装饰。
卡片边缘圆润柔和，泛着粉紫色与冰蓝色的渐变霓虹光晕，背景深邃模糊，进一步突显卡片本身如水晶般清澈的质感。
界面信息仿佛微雕其中，立体而清晰，依次展示：
头像（居中悬浮，带全息环绕特效）
用户名与顶部“认证会员”动态徽章
名称 模力方舟(MoArk)算力体验官
关注数 2777
被关注数 1.2w
加入时间： 2025/11/7
关注按钮（呈现可点击的柔光质感）
手指轻触处反射出柔和光影，整体氛围既富有电影感，又充满高科技终端的沉浸体验。'''

negative_prompt = ""

# 设置宽高 (9:16)
width, height = 928, 1664

# 6. 执行生成
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=width,
    height=height,
    num_inference_steps=30,
    true_cfg_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]

# 7. 保存结果
image.save("qwen-image.png")
print("图片生成完成，已保存为 qwen-image.png")

查看结果与排查： 代码运行结束后，您可以在左侧文件栏找到 qwen-image.png，双击即可查看生成的图片。图像结果

二、燧原 (Enflame) 部署指南

本章节适用于 S60 等燧原系列算力卡。由于底层架构差异，需引入适配库：import torch_gcu 与from torch_gcu import transfer_to_gcu

1. 通用环境准备

所有燧原模型部署均需基于以下环境配置进行：

算力型号：Enflame S60 (48GB)
镜像选择：Ubuntu / 22.04 / Python 3.13 / ef 1.5.0.604

镜像选择

基础操作步骤：

进入工作台：启动实例后，点击 JupyterLab 进入容器。
新建脚本：点击图标新建一个 .ipynb 文件。

2. 模型部署实战

请根据您需要的模型选择对应的实战案例代码。

2.1 FLUX.1-Krea-dev

本示例演示如何在 燧原 S60 算力环境下，加载平台内置模型并生成赛博朋克风格图像。

运行推理代码： 新建 Notebook 单元格运行。

import torch
import torch_gcu # 引入torch_gcu库
from torch_gcu import transfer_to_gcu # CUDA代码一键迁移
from diffusers import FluxPipeline

# 引用燧原 S60 的内置模型库
model_name = "/mnt/moark-models/FLUX.1-Krea-dev"

if torch.cuda.is_available():
    torch_dtype = torch.bfloat16
    device = "cuda"
else:
    torch_dtype = torch.float32
    device = "cpu"

pipe = FluxPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)

prompt = '''A 9:16 vertical, realistic cyber-aesthetic future social profile card photo: A hand gently holds a vertically semi-transparent acrylic card, occupying the visual center of the picture. The card presents the personal homepage interface of the future social platform "MoArk", with a minimalist design and no redundant decorations. The edges of the card are rounded and soft, with a gradient neon halo of pink-purple and ice blue. The background is deep and blurry, further highlighting the crystal-clear texture of the card itself. The interface information seems to be micro-engraved, three-dimensional and clear, displayed in sequence:
Avatar (suspended in the center, with a holographic surround effect)
Username and the dynamic "Verified Member" badge at the top
Name: MoArk (MoArk) Computing Power Experience Officer
Followers: 2,777
Following: 12,000
Join Date: 11/7/2025
Follow button (presenting a soft light touchable effect)
A soft light and shadow is reflected at the point where the finger touches, creating an atmosphere that is both cinematic and immersive, like a scene from a near-future live-action game.'''

image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=30,
    max_sequence_length=512,
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]

# 保存结果
image.save("flux-dev.png")

查看结果与排查： 运行完成后在左侧文件栏查看 flux-dev.png。

图像结果

显存溢出 (OOM) 解决方案

如果在运行 FLUX 这类大模型时遇到 OutOfMemoryError：

打开终端 (Terminal)。
输入 efsmi -pmon 查看显存占用（按 Ctrl+C 退出）。
找到僵尸进程 PID，输入 kill -9 <PID> 强制释放显存。

问题排查

2.2 LongCat-Image

本示例演示如何在 燧原 S60 算力环境下，加载平台内置模型并生成动物主题图像。

运行推理代码 新建 Notebook 单元格运行。

!cp -r /mnt/moark-models/github/LongCat-Image/longcat_image .  # 提取核心代码包

import torch
import torch_gcu
from torch_gcu import transfer_to_gcu
from transformers import AutoProcessor
from longcat_image.models import LongCatImageTransformer2DModel
from longcat_image.pipelines import LongCatImagePipeline

device = torch.device('gcu')
checkpoint_dir = '/mnt/moark-models/LongCat-Image'
text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer'  )
transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer',
torch_dtype=torch.bfloat16, use_safetensors=True).to(device)
pipe = LongCatImagePipeline.from_pretrained(
checkpoint_dir,
transformer=transformer,
text_processor=text_processor
)
pipe.to(device, torch.bfloat16)

prompt = "仅一只三花小猫与仅一只小伯恩山犬并排坐在公园草地上，彼此靠近，画面中除它们外不出现任何其他动物或人物。中距离视角，午后暖阳，三花毛色黑白橙；伯恩山三色黑白棕，眉心棕斑、胸前白毛清晰。柔和自然光，浅景深，构图简洁、干净。"
negative_prompt = "第二只狗, 两只狗, 群狗, 两只猫, 群猫, 其他动物, 人物, 拥挤, 重复, 克隆, 多主体, 文字, Logo, 水印, 砖墙"

image = pipe(
prompt,
negative_prompt,
height=768,
width=1344,
guidance_scale=4.5,
num_inference_steps=40,
num_images_per_prompt=1,
generator=torch.Generator(device=device).manual_seed(43),
enable_cfg_renorm=True,
enable_prompt_rewrite=True # Reusing the text encoder as a built-in prompt rewriter
).images[0]

image.save('./t2d_example.png')

查看结果： 运行完成后在左侧文件栏查看 t2d_example.png。

图像结果

三、昇腾 (Ascend) 部署指南

本章节适用于 昇腾 910B 等华为系列算力卡。由于底层架构差异，需引入适配库：import torch_npu 与from torch_npu.contrib import transfer_to_npu

1. 通用环境准备

所有昇腾的模型部署均需基于以下环境配置进行：

算力型号：Ascend 910B (64GB)
镜像选择：PyTorch / 2.8.0 / Python 3.11 / CANN 8.3.RC2

镜像选择

基础操作步骤：

进入工作台：启动实例后，点击 JupyterLab 进入容器。
新建脚本：点击图标新建一个 .ipynb 文件。

2. 模型部署实战

请根据您需要的模型选择对应的实战案例代码。

2.1 Qwen-Image-Edit-2511

本示例演示如何在 Ascend 910B 算力环境下，加载平台内置模型并生成赛博朋克风格图像。

运行推理代码： 新建 Notebook 单元格运行。

!pip install diffusers
!pip install transformers
!pip install accelerate
import os
import torch
import torch_npu
from PIL import Image
from diffusers import QwenImageEditPlusPipeline
from torch_npu.npu import amp # 导入AMP模块
from torch_npu.contrib import transfer_to_npu    # 使能自动迁移

pipeline = QwenImageEditPlusPipeline.from_pretrained("/mnt/moark-models/Qwen-Image-Edit-2511", torch_dtype=torch.bfloat16)
print("pipeline loaded")

pipeline.enable_model_cpu_offload()
pipeline.set_progress_bar_config(disable=None)
image1 = Image.open("/mnt/moark-models/input1.jpg").convert('RGB') 
prompt = """请根据图片中的角色，设计一张风格诙谐、荒诞、有表演感的2025年度总结的海报。
#版式要求
1.设计一组 3:4 竖版比例、适合小红书传播的梗图
#文案设计
1.文案内容
"2025年"
"攒了很多想买的，但没攒钱"
2.文字放置在画面上方，要清晰可读，并且与画面留有足够呼吸空间
3.不与角色主体重叠
#角色设计
1.保持角色的一致性
2.更改角色的肢体动作，呈现出跑步的姿态，以配合文案传达情绪
#背景设计
1.干净、简洁背景，纯色或浅灰背景
2.不添加复杂装饰，不要场景叠加
#辅助元素
1.可少量加入辅助性的购物元素修饰，仅用于强化情绪，不可喧宾夺主"""
inputs = {
    "image": [image1],
    "prompt": prompt,
    "generator": torch.manual_seed(0),
    "true_cfg_scale": 4.0,
    "negative_prompt": " ",
    "num_inference_steps": 40,
    "guidance_scale": 0.9,
    "num_images_per_prompt": 1,
}
with torch.inference_mode():
    output = pipeline(**inputs)
    output_image = output.images[0]
    output_image.save("2025_happy_engding.png")
    print("image saved at", os.path.abspath("2025_happy_engding.png"))

查看结果与排查： 运行完成后在左侧文件栏查看 2025_happy_ending.png。

图像结果

显存溢出 (OOM) 解决方案

如果在运行 Qwen-Image 这类大模型时遇到 OutOfMemoryError：

打开终端 (Terminal)。
输入 npu-smi info 查看显存占用（按 Ctrl+C 退出）。
找到僵尸进程 PID，输入 kill -9 <PID> 强制释放显存。

三、本地访问与服务封装 (通用)

由于安全原因，平台暂不开放公网直接访问端口。若您将 Diffusers 模型封装为 API 服务（如使用 FastAPI 监听 8188 端口），需通过 SSH 隧道进行本地访问。

1. 建立 SSH 隧道

1.1 设置密码

在工作台找到对应实例，点击“设置密码”并牢记。

1.2 打开本地终端

Windows：打开“我的电脑”，在地址栏输入 powershell 或 cmd 并回车。打开终端

1.3 获取连接信息

在工作台复制登录指令（包含 ssh 命令、用户名、IP 和端口）。

获取信息

1.4 执行映射指令

在本地终端执行以下指令（注意替换为您的实际信息）：

# 格式：ssh -CNg -L 本地端口:127.0.0.1:容器端口 用户名@IP -p SSH端口
ssh -CNg -L 8188:127.0.0.1:8188 root+vm-xxxxx@xxx.xx.xxx.xxx -p xxxxx

映�射成功

（注意：输入密码后终端会呈静默状态，保持窗口开启即可）

2. 本地调用

隧道建立后，您可以在本地代码或浏览器中访问 http://localhost:8188/v1 来调用云端的推理服务。

推理框架概览​

前提条件​

一、 沐曦 (MetaX) 部署指南​

1. 通用环境准备​

2. 模型部署实战​

2.1 Qwen-Image​

二、 燧原 (Enflame) 部署指南​

1. 通用环境准备​

2. 模型部署实战​

2.1 FLUX.1-Krea-dev​

2.2 LongCat-Image​

三、 昇腾 (Ascend) 部署指南​

1. 通用环境准备​

2. 模型部署实战​

2.1 Qwen-Image-Edit-2511​

三、本地访问与服务封装 (通用)​

1. 建立 SSH 隧道​

1.1 设置密码​

1.2 打开本地终端​

1.3 获取连接信息​

1.4 执行映射指令​

2. 本地调用​

推理框架概览

前提条件

一、沐曦 (MetaX) 部署指南

1. 通用环境准备

2. 模型部署实战

2.1 Qwen-Image

二、燧原 (Enflame) 部署指南

1. 通用环境准备

2. 模型部署实战

2.1 FLUX.1-Krea-dev

2.2 LongCat-Image

三、昇腾 (Ascend) 部署指南

1. 通用环境准备

2. 模型部署实战

2.1 Qwen-Image-Edit-2511

三、本地访问与服务封装 (通用)

1. 建立 SSH 隧道

1.1 设置密码

1.2 打开本地终端

1.3 获取连接信息

1.4 执行映射指令

2. 本地调用