New Possibilities Opened by Multimodal RAG

AI Summit Seoul 2025 30 min

Session Overview

This session will explore the core principles of Multimodal Retrieval-Augmented Generation (RAG) — an approach that improves the accuracy and reliability of AI responses by combining and interpreting multiple data formats. From early stages of converting images into text to the latest architecture enabling direct search and integration across different data types, the talk will trace the evolution of multimodal AI systems. It will also highlight how these capabilities connect with agent systems and future developments.

Key points:

Learn methods to enhance response reliability by incorporating visual information (charts, diagrams, images), audio, and video data—not just text.
Understand the latest architecture that enables multimodal input and interaction, moving beyond text-only systems.
Explore real-world use cases of applying Multimodal RAG for complex document analysis and question-answering systems in various industries.

Speaker

Sky Kim

Senior Software Engineer

Unity Technologies

Sky Kim is a software engineer applying AI/ML technologies across various industries. He began his career at Samsung Medison after graduating from Seoul National University, where he focused on developing AI solutions for medical image diagnosis. He has since built extensive expertise in vision AI. Currently, as a Senior Software Engineer at Unity Technologies, he provides AI/ML technology consulting across industries including gaming, automotive, architecture, and media in the APAC region.

New Possibilities Opened by Multimodal RAG

Session Overview

Speaker

Contact

DMK Global