Course Code: ftvlms
Duration: 14 hours
Prerequisites:
  • 了解深度學習在視覺和自然語言處理中的應用
  • 具備PyTorch和基於transformer模型的經驗
  • 熟悉多模態模型架構

目標受眾

  • 電腦視覺工程師
  • AI開發者
Overview:

Fine-Tuning 視覺語言模型(VLMs)是一種專業技能,用於增強多模態AI系統,這些系統處理視覺和文本輸入,以應用於現實世界。

這項由講師指導的培訓(線上或線下)針對高級計算機視覺工程師和AI開發人員,他們希望微調如CLIP和Flamingo等VLMs,以提高在行業特定視覺文本任務中的表現。

培訓結束後,參與者將能夠:

  • 理解視覺語言模型的架構和預訓練方法。
  • 微調VLMs以進行分類、檢索、字幕生成或多模態問答。
  • 準備數據集並應用PEFT策略以減少資源使用。
  • 評估並在生產環境中部署定制的VLMs。

課程形式

  • 互動式講座和討論。
  • 大量練習和實踐。
  • 在實時實驗室環境中進行動手實作。

課程定制選項

  • 如需為本課程定制培訓,請聯繫我們安排。
Course Outline:

視覺語言模型簡介

  • VLMs概述及其在多模態AI中的角色
  • 流行架構:CLIP、Flamingo、BLIP等
  • 應用案例:搜索、字幕生成、自動化系統、內容分析

準備Fine-Tuning環境

  • 設置OpenCLIP及其他VLM庫
  • 圖像-文本對的數據集格式
  • 視覺和語言輸入的預處理管道

Fine-Tuning CLIP及類似模型

  • 對比損失與聯合嵌入空間
  • 實操:在自定義數據集上微調CLIP
  • 處理領域特定及多語言數據

高級Fine-Tuning技術

  • 使用LoRA和基於適配器的方法提升效率
  • 提示調優與視覺提示注入
  • 零樣本與微調評估的權衡

評估與基準測試

  • VLMs的評估指標:檢索準確率、BLEU、CIDEr、召回率
  • 視覺-文本對齊診斷
  • 可視化嵌入空間與錯誤分類

部署與實際應用

  • 導出模型以進行推理(TorchScript、ONNX)
  • 將VLMs集成到管道或API中
  • 資源考慮與模型擴展

案例研究與應用場景

  • 媒體分析與內容審核
  • 電子商務與數字圖書館中的搜索與檢索
  • 機器人與自動化系統中的多模態交互

總結與下一步

Sites Published:

United Arab Emirates - Fine-Tuning Vision-Language Models (VLMs)

Qatar - Fine-Tuning Vision-Language Models (VLMs)

Egypt - Fine-Tuning Vision-Language Models (VLMs)

Saudi Arabia - Fine-Tuning Vision-Language Models (VLMs)

South Africa - Fine-Tuning Vision-Language Models (VLMs)

Brasil - Fine-Tuning Vision-Language Models (VLMs)

Canada - Fine-Tuning Vision-Language Models (VLMs)

中国 - Fine-Tuning Vision-Language Models (VLMs)

香港 - Fine-Tuning Vision-Language Models (VLMs)

澳門 - Fine-Tuning Vision-Language Models (VLMs)

台灣 - Fine-Tuning Vision-Language Models (VLMs)

USA - Fine-Tuning Vision-Language Models (VLMs)

Österreich - Fine-Tuning Vision-Language Models (VLMs)

Schweiz - Fine-Tuning Vision-Language Models (VLMs)

Deutschland - Fine-Tuning Vision-Language Models (VLMs)

Czech Republic - Fine-Tuning Vision-Language Models (VLMs)

Denmark - Fine-Tuning Vision-Language Models (VLMs)

Estonia - Fine-Tuning Vision-Language Models (VLMs)

Finland - Fine-Tuning Vision-Language Models (VLMs)

Greece - Fine-Tuning Vision-Language Models (VLMs)

Magyarország - Fine-Tuning Vision-Language Models (VLMs)

Ireland - Fine-Tuning Vision-Language Models (VLMs)

Luxembourg - Fine-Tuning Vision-Language Models (VLMs)

Latvia - Fine-Tuning Vision-Language Models (VLMs)

España - Fine-Tuning Vision-Language Models (VLMs)

Italia - Fine-Tuning Vision-Language Models (VLMs)

Lithuania - Fine-Tuning Vision-Language Models (VLMs)

Nederland - Fine-Tuning Vision-Language Models (VLMs)

Norway - Fine-Tuning Vision-Language Models (VLMs)

Portugal - Fine-Tuning Vision-Language Models (VLMs)

România - Fine-Tuning Vision-Language Models (VLMs)

Sverige - Fine-Tuning Vision-Language Models (VLMs)

Türkiye - Fine-Tuning Vision-Language Models (VLMs)

Malta - Fine-Tuning Vision-Language Models (VLMs)

Belgique - Fine-Tuning Vision-Language Models (VLMs)

France - Fine-Tuning Vision-Language Models (VLMs)

日本 - Fine-Tuning Vision-Language Models (VLMs)

Australia - Fine-Tuning Vision-Language Models (VLMs)

Malaysia - Fine-Tuning Vision-Language Models (VLMs)

New Zealand - Fine-Tuning Vision-Language Models (VLMs)

Philippines - Fine-Tuning Vision-Language Models (VLMs)

Singapore - Fine-Tuning Vision-Language Models (VLMs)

Thailand - Fine-Tuning Vision-Language Models (VLMs)

Vietnam - Fine-Tuning Vision-Language Models (VLMs)

India - Fine-Tuning Vision-Language Models (VLMs)

Argentina - Fine-Tuning Vision-Language Models (VLMs)

Chile - Fine-Tuning Vision-Language Models (VLMs)

Costa Rica - Fine-Tuning Vision-Language Models (VLMs)

Ecuador - Fine-Tuning Vision-Language Models (VLMs)

Guatemala - Fine-Tuning Vision-Language Models (VLMs)

Colombia - Fine-Tuning Vision-Language Models (VLMs)

México - Fine-Tuning Vision-Language Models (VLMs)

Panama - Fine-Tuning Vision-Language Models (VLMs)

Peru - Fine-Tuning Vision-Language Models (VLMs)

Uruguay - Fine-Tuning Vision-Language Models (VLMs)

Venezuela - Fine-Tuning Vision-Language Models (VLMs)

Polska - Fine-Tuning Vision-Language Models (VLMs)

United Kingdom - Fine-Tuning Vision-Language Models (VLMs)

South Korea - Fine-Tuning Vision-Language Models (VLMs)

Pakistan - Fine-Tuning Vision-Language Models (VLMs)

Sri Lanka - Fine-Tuning Vision-Language Models (VLMs)

Bulgaria - Fine-Tuning Vision-Language Models (VLMs)

Bolivia - Fine-Tuning Vision-Language Models (VLMs)

Indonesia - Fine-Tuning Vision-Language Models (VLMs)

Kazakhstan - Fine-Tuning Vision-Language Models (VLMs)

Moldova - Fine-Tuning Vision-Language Models (VLMs)

Morocco - Fine-Tuning Vision-Language Models (VLMs)

Tunisia - Fine-Tuning Vision-Language Models (VLMs)

Kuwait - Fine-Tuning Vision-Language Models (VLMs)

Oman - Fine-Tuning Vision-Language Models (VLMs)

Slovakia - Fine-Tuning Vision-Language Models (VLMs)

Kenya - Fine-Tuning Vision-Language Models (VLMs)

Nigeria - Fine-Tuning Vision-Language Models (VLMs)

Botswana - Fine-Tuning Vision-Language Models (VLMs)

Slovenia - Fine-Tuning Vision-Language Models (VLMs)

Croatia - Fine-Tuning Vision-Language Models (VLMs)

Serbia - Fine-Tuning Vision-Language Models (VLMs)

Bhutan - Fine-Tuning Vision-Language Models (VLMs)

Nepal - Fine-Tuning Vision-Language Models (VLMs)

Uzbekistan - Fine-Tuning Vision-Language Models (VLMs)