Course Code: ftvlms
Duration: 14 hours
Prerequisites:
  • 了解深度学习在视觉和自然语言处理中的应用
  • 具备PyTorch和基于transformer模型的经验
  • 熟悉多模态模型架构

目标受众

  • 电脑视觉工程师
  • AI开发者
Overview:

Fine-Tuning 视觉语言模型(VLMs)是一种专业技能,用于增强多模态AI系统,这些系统处理视觉和文本输入,以应用于现实世界。

这项由讲师指导的培训(线上或线下)针对高级计算机视觉工程师和AI开发人员,他们希望微调如CLIP和Flamingo等VLMs,以提高在行业特定视觉文本任务中的表现。

培训结束后,参与者将能够:

  • 理解视觉语言模型的架构和预训练方法。
  • 微调VLMs以进行分类、检索、字幕生成或多模态问答。
  • 准备数据集并应用PEFT策略以减少资源使用。
  • 评估并在生产环境中部署定制的VLMs。

课程形式

  • 互动式讲座和讨论。
  • 大量练习和实践。
  • 在实时实验室环境中进行动手实作。

课程定制选项

  • 如需为本课程定制培训,请联系我们安排。
Course Outline:

视觉语言模型简介

  • VLMs概述及其在多模态AI中的角色
  • 流行架构:CLIP、Flamingo、BLIP等
  • 应用案例:搜索、字幕生成、自动化系统、内容分析

准备Fine-Tuning环境

  • 设置OpenCLIP及其他VLM库
  • 图像-文本对的数据集格式
  • 视觉和语言输入的预处理管道

Fine-Tuning CLIP及类似模型

  • 对比损失与联合嵌入空间
  • 实操:在自定义数据集上微调CLIP
  • 处理领域特定及多语言数据

高级Fine-Tuning技术

  • 使用LoRA和基于适配器的方法提升效率
  • 提示调优与视觉提示注入
  • 零样本与微调评估的权衡

评估与基准测试

  • VLMs的评估指标:检索准确率、BLEU、CIDEr、召回率
  • 视觉-文本对齐诊断
  • 可视化嵌入空间与错误分类

部署与实际应用

  • 导出模型以进行推理(TorchScript、ONNX)
  • 将VLMs集成到管道或API中
  • 资源考虑与模型扩展

案例研究与应用场景

  • 媒体分析与内容审核
  • 电子商务与数字图书馆中的搜索与检索
  • 机器人与自动化系统中的多模态交互

总结与下一步

Sites Published:

United Arab Emirates - Fine-Tuning Vision-Language Models (VLMs)

Qatar - Fine-Tuning Vision-Language Models (VLMs)

Egypt - Fine-Tuning Vision-Language Models (VLMs)

Saudi Arabia - Fine-Tuning Vision-Language Models (VLMs)

South Africa - Fine-Tuning Vision-Language Models (VLMs)

Brasil - Fine-Tuning Vision-Language Models (VLMs)

Canada - Fine-Tuning Vision-Language Models (VLMs)

中国 - Fine-Tuning Vision-Language Models (VLMs)

香港 - Fine-Tuning Vision-Language Models (VLMs)

澳門 - Fine-Tuning Vision-Language Models (VLMs)

台灣 - Fine-Tuning Vision-Language Models (VLMs)

USA - Fine-Tuning Vision-Language Models (VLMs)

Österreich - Fine-Tuning Vision-Language Models (VLMs)

Schweiz - Fine-Tuning Vision-Language Models (VLMs)

Deutschland - Fine-Tuning Vision-Language Models (VLMs)

Czech Republic - Fine-Tuning Vision-Language Models (VLMs)

Denmark - Fine-Tuning Vision-Language Models (VLMs)

Estonia - Fine-Tuning Vision-Language Models (VLMs)

Finland - Fine-Tuning Vision-Language Models (VLMs)

Greece - Fine-Tuning Vision-Language Models (VLMs)

Magyarország - Fine-Tuning Vision-Language Models (VLMs)

Ireland - Fine-Tuning Vision-Language Models (VLMs)

Luxembourg - Fine-Tuning Vision-Language Models (VLMs)

Latvia - Fine-Tuning Vision-Language Models (VLMs)

España - Fine-Tuning Vision-Language Models (VLMs)

Italia - Fine-Tuning Vision-Language Models (VLMs)

Lithuania - Fine-Tuning Vision-Language Models (VLMs)

Nederland - Fine-Tuning Vision-Language Models (VLMs)

Norway - Fine-Tuning Vision-Language Models (VLMs)

Portugal - Fine-Tuning Vision-Language Models (VLMs)

România - Fine-Tuning Vision-Language Models (VLMs)

Sverige - Fine-Tuning Vision-Language Models (VLMs)

Türkiye - Fine-Tuning Vision-Language Models (VLMs)

Malta - Fine-Tuning Vision-Language Models (VLMs)

Belgique - Fine-Tuning Vision-Language Models (VLMs)

France - Fine-Tuning Vision-Language Models (VLMs)

日本 - Fine-Tuning Vision-Language Models (VLMs)

Australia - Fine-Tuning Vision-Language Models (VLMs)

Malaysia - Fine-Tuning Vision-Language Models (VLMs)

New Zealand - Fine-Tuning Vision-Language Models (VLMs)

Philippines - Fine-Tuning Vision-Language Models (VLMs)

Singapore - Fine-Tuning Vision-Language Models (VLMs)

Thailand - Fine-Tuning Vision-Language Models (VLMs)

Vietnam - Fine-Tuning Vision-Language Models (VLMs)

India - Fine-Tuning Vision-Language Models (VLMs)

Argentina - Fine-Tuning Vision-Language Models (VLMs)

Chile - Fine-Tuning Vision-Language Models (VLMs)

Costa Rica - Fine-Tuning Vision-Language Models (VLMs)

Ecuador - Fine-Tuning Vision-Language Models (VLMs)

Guatemala - Fine-Tuning Vision-Language Models (VLMs)

Colombia - Fine-Tuning Vision-Language Models (VLMs)

México - Fine-Tuning Vision-Language Models (VLMs)

Panama - Fine-Tuning Vision-Language Models (VLMs)

Peru - Fine-Tuning Vision-Language Models (VLMs)

Uruguay - Fine-Tuning Vision-Language Models (VLMs)

Venezuela - Fine-Tuning Vision-Language Models (VLMs)

Polska - Fine-Tuning Vision-Language Models (VLMs)

United Kingdom - Fine-Tuning Vision-Language Models (VLMs)

South Korea - Fine-Tuning Vision-Language Models (VLMs)

Pakistan - Fine-Tuning Vision-Language Models (VLMs)

Sri Lanka - Fine-Tuning Vision-Language Models (VLMs)

Bulgaria - Fine-Tuning Vision-Language Models (VLMs)

Bolivia - Fine-Tuning Vision-Language Models (VLMs)

Indonesia - Fine-Tuning Vision-Language Models (VLMs)

Kazakhstan - Fine-Tuning Vision-Language Models (VLMs)

Moldova - Fine-Tuning Vision-Language Models (VLMs)

Morocco - Fine-Tuning Vision-Language Models (VLMs)

Tunisia - Fine-Tuning Vision-Language Models (VLMs)

Kuwait - Fine-Tuning Vision-Language Models (VLMs)

Oman - Fine-Tuning Vision-Language Models (VLMs)

Slovakia - Fine-Tuning Vision-Language Models (VLMs)

Kenya - Fine-Tuning Vision-Language Models (VLMs)

Nigeria - Fine-Tuning Vision-Language Models (VLMs)

Botswana - Fine-Tuning Vision-Language Models (VLMs)

Slovenia - Fine-Tuning Vision-Language Models (VLMs)

Croatia - Fine-Tuning Vision-Language Models (VLMs)

Serbia - Fine-Tuning Vision-Language Models (VLMs)

Bhutan - Fine-Tuning Vision-Language Models (VLMs)

Nepal - Fine-Tuning Vision-Language Models (VLMs)

Uzbekistan - Fine-Tuning Vision-Language Models (VLMs)