OCR 图文识别

基本介绍

OCR（Optical Character Recognition，光学字符识别）技术是指通过人工智能模型，从图像中提取文本信息的技术。它的核心任务是将图像中的文字内容转化为可编辑的文本数据，广泛应用于文档数字化、信息提取、数据录入等领域。

目前模型广场已上线的 OCR 模型包括：GOT-OCR2_0

GOT-OCR2_0

GOT-OCR2_0 提供功能强大的 OCR 解决方案，能够高精度、快速、全面的提取图像中的文本信息，支持多种语言，适用于各种场景，例如提取身份证、银行卡、PDF 文档、表格、车牌、手写文字、设备铭牌、数学公式等图像信息。

使用方法

您可以点击 GOT-OCR2_0 在线免费体验。以下是代码调用示例。

Bash
Python

bash showLineNumbers curl https://ai.gitee.com/v1/images/ocr \ -X POST \ -H "Authorization: Bearer 私人令牌" \ -F "model=GOT-OCR2_0" -F "image=@path/to/image.jpg" -F "response_format=text"

python
import requests
API_URL = "https://ai.gitee.com/v1/images/ocr"
HEADERS = {
"Authorization": "Bearer 私人令牌",
}
def query(image_path, model="GOT-OCR2_0", response_format="text"):
with open(image_path, "rb") as image_file:
response = requests.post(
API_URL,
headers=HEADERS,
files={"image": (image_path, image_file)},
data={"model": model, "response_format": response_format},
)
return response.json()

    output = query("test.jpg")
    print(output) # {"text": "xxx"}

参数说明：

私人令牌：用于验证调用身份，点击私人令牌获取
model：填写 GOT-OCR2_0 指定使用 OCR 大模型
image：需要进行 OCR 的图片文件
- 支持png, jpg, jpeg, webp, gif 格式的图片
- 最大分辨率支持 4096X4096
- 文件不超过 3MB
response_format：格式化类型
- 值为 text 返回纯文本内容
- 值为 format 返回 mathpix-markdown 格式内容，建议带数学公式的图片使用该参数

使用示例

使用图片：

执行上文中的代码后将会响应：

{
  "text": "美迪兰（南京）医疗设备有限公司\n名称：..."
}

GOT-OCR2_0在线体验效果如下：

更多示例代码您可参考模力方舟示例代码仓库。

基本介绍​

GOT-OCR2_0​

使用方法​

参数说明：​

使用示例​

基本介绍

GOT-OCR2_0

使用方法

参数说明：

使用示例