AI Summit — Session & Speaker
Track C - Session 11

From Black-Box to Benchmarked: Building Trustworthy Gen AI Applications

AI Summit Seoul 2025 20 min

Session Overview

This talk explains why evaluation is essential for ensuring the reliability and quality of generative AI applications. Moving beyond black-box approaches, it makes the case for a systematic evaluation framework that measures and improves reasoning, stability, and consistency of models.

The session highlights Weights & Biases Weave as a foundation for reproducible validation through complete traceability of data, models, and code. Such evaluation-centric workflows directly address the long-tail problem and the generalization limits faced by LLMs and agentic AI systems—ultimately enabling transparent and trustworthy GenAI.


Speaker

Oh Hyun-woo
Oh Hyun-woo
Senior AI Solution Engineer
Weights & Biases

Oh Hyun-woo leads initiatives across APAC to help organizations build scalable, efficient AI development workflows, with a particular focus on LLM and GenAI. He specializes in assessing enterprise AI environments and enabling teams to adopt W&B solutions tailored to their unique workflows. Before W&B, he worked at NAVER and VUNO applying AI to large-scale search systems and medical image analysis.

Register
Session details may be updated as the event approaches. Final schedule to be announced on the official site.