Held in conjunction with KDD'24
The landscape of machine learning and artificial intelligence has been profoundly reshaped by the advent of Generative AI Models and their applications, such as ChatGPT, GPT-4, Sora, and etc. Generative AI includes Large Language Models (LLMs) such as GPT, Claude, Flan-T5, Falcon, Llama, etc., and generative diffusion models. These models have not only showcased unprecedented capabilities but also catalyzed trans- formative shifts across numerous fields. Concurrently, there is a burgeoning interest in the comprehensive evaluation of Generative AI models, as evidenced by pioneering efforts in research bench- marks and frameworks for LLMs like PromptBench, BotChat, OpenCompass, MINT, and others. Despite these advancements, the quest to accurately assess the trustworthiness, safety, and ethical congruence of Generative AI Models continues to pose significant challenges. This underscores an urgent need for developing robust evaluation frameworks that can ensure these technologies are reliable and can be seamlessly integrated into society in a beneficial manner. Our workshop is dedicated to foster- ing interdisciplinary collaboration and innovation in this vital area, focusing on the development of new datasets, metrics, methods, and models that can advance our understanding and application of Generative AI.
Contact: kdd2024-ws-genai-eval@amazon.comSunday 25 August 2024 – Thursday 29 August 2024, Barcelona, Spain
Introduction by organizers.
Xin (Luna) Dong Principal Scientist, Meta
Youxiang Zhu, Nana Lin, Xiaohui Liang, John Batsis, Robert Roth and Brian MacWhinney
Daniel Shin, Gao Pei, Priyadarshini Kumari and Tarek Besold
Tobias Pettersson, Maria Riveiro and Tuwe Löfström
Nikhil Madaan, Krishna Kesari, Manisha Verma, Shaunak Mishra and Tor Steiner
Jincheng Li, Chunyu Xie, Xiaoyu Wu, Bin Wang and Dawei Leng
Aidong Zhang Thomas M. Linville Endowed Professor of Computer Science, University of Virginia
Yarong Feng, Zongyi Liu, Yuan Ling, Shunyan Luo, Shujing Dong, Shuyi Wang and Bruce Ferry
Ziniu Hu
Concluding remarks by organizers.
This workshop aims to serve as a pivotal platform for discussing the forefront of Generative AI trustworthiness and evaluation advancements. Generative AI models, such as Large Language Models (LLMs) and Diffusion Models have revolutionized various domains, underscoring the critical need for reliable Generative AI technologies. As these models increasingly influence decision-making processes, establishing robust evaluation metrics and methods becomes paramount. Our objective is to delve into diverse evaluation strategies to enhance Generative AI models reliability across applications. The workshop topics include, but are not limited to:
The workshop is designed to convene researchers from the realms of machine learning, data mining, and beyond, fostering the interdisciplinary exploration into Generative AI trustworthiness and evaluation. By featuring a blend of invited talks, presentations of peer-reviewed papers, and panel discussions, this workshop aims to facilitate exchanges of insights and foster collaborations across research and industry sectors. Participants from diverse fields such as Data Mining, Machine Learning, Natural Language Processing (NLP), and Information Retrieval are encouraged to share knowl- edge, debate challenges, and explore synergies, thereby advancing the state of the art in Generative AI technologies.