Held in conjunction with KDD'24
The landscape of machine learning and artificial intelligence has been profoundly reshaped by the advent of Generative AI Models and their applications, such as ChatGPT, GPT-4, Sora, and etc. Generative AI includes Large Language Models (LLMs) such as GPT, Claude, Flan-T5, Falcon, Llama, etc., and generative diffusion models. These models have not only showcased unprecedented capabilities but also catalyzed trans- formative shifts across numerous fields. Concurrently, there is a burgeoning interest in the comprehensive evaluation of Generative AI models, as evidenced by pioneering efforts in research bench- marks and frameworks for LLMs like PromptBench, BotChat, OpenCompass, MINT, and others. Despite these advancements, the quest to accurately assess the trustworthiness, safety, and ethical congruence of Generative AI Models continues to pose significant challenges. This underscores an urgent need for developing robust evaluation frameworks that can ensure these technologies are reliable and can be seamlessly integrated into society in a beneficial manner. Our workshop is dedicated to foster- ing interdisciplinary collaboration and innovation in this vital area, focusing on the development of new datasets, metrics, methods, and models that can advance our understanding and application of Generative AI.
Contact: kdd2024-ws-genai-eval@amazon.comSunday 25 August 2024 (2:00-6:00PM), Barcelona, Spain
Introduction by organizers.
Krishnaram Kenthapadi Chief Scientist, Clinical AI, Oracle Health
Shu-Ting Pi, Pradeep Bagavan, Yejia Li, Disha and Qun Liu.
Presenter: Shu-Ting Pi
Xinchi Qiu, William F. Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp and Nicholas Lane.
Presenter: William F. Shen
Joymallya Chakraborty, Wei Xia, Anirban Majumder, Dan Ma and Naveed Janvekar.
Presenter: Joymallya Chakraborty
Zeyu Yang, Xiaochen Zheng, Zhao Meng and Roger Wattenhofer.
Presenter: Zeyu Yang
TBD
Sorin Draghici Program director in the Division of Information and Intelligent Systems (IIS) at NSF
Harshvardhan Solanki, Jyoti Singh, Yihui Chong and Ankur Teredesai
Presenter: Ankur Teredesai
Antonis Maronikolakis, Ana Peleteiro Ramallo, Weiwei Cheng and Thomas Kober.
Presenter: Antonis Maronikolakis
Tianyu Ding, Adi Banerjee, Yunhong Li, Laurent Mombaerts, Tarik Borogovac and Juan Pablo De la Cruz Weinstein.
Presenter: Adi Banerjee
Closing by organizers.
7 Oral presentations & 17 Posters
Joshua Ward, Chi-Hua Wang and Guang Cheng.
Yu Xia, Chi-Hua Wang, Joshua Mabry and Guang Cheng.
Shu-Ting Pi, Pradeep Bagavan, Yejia Li, Disha and Qun Liu.
Joymallya Chakraborty, Wei Xia, Anirban Majumder, Dan Ma and Naveed Janvekar.
Zeyu Yang, Xiaochen Zheng, Zhao Meng and Roger Wattenhofer.
Daniel Lopez-Martinez.
Sarik Ghazarian, Yidong Zou, Swair Shah, Nanyun Peng, Anurag Beniwal, Christopher Potts and Narayanan Sadagopan.
Yong Xie, Karan Aggarwal, Aitzaz Ahmad and Stephen Lau.
Tianyu Ding, Adi Banerjee, Yunhong Li, Laurent Mombaerts, Tarik Borogovac and Juan Pablo De la Cruz Weinstein.
Javier Conde, Miguel González, Gonzalo Martínez, Fernando Moral, Elena Merino-Gómez and Pedro Reviriego.
Lorenzo Pacchiardi, Lucy Cheke and José Hernández-Orallo.
Danil Shaikhelislamov, Mikhail Drobyshevskiy and Andrey Belevantsev.
Antonis Maronikolakis, Ana Peleteiro Ramallo, Weiwei Cheng and Thomas Kober.
Xinchi Qiu, William F. Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp and Nicholas Lane.
Shiyao Cui, Zhenyu Zhang, Yilong Chen, Wenyuan Zhang, Tianyun Liu, Siqi Wang and Tingwen Liu.
Simon Valentin, Jinmiao Fu, Gianluca Detommaso, Shaoyuan Xu, Giovanni Zappella and Bryan Wang.
Mina Ghashami, Mikhail Kuznetsov, Vianne Gao, Ganyu Teng, Phil Wallis, Joseph Xie, Ali Torkamani, Baris Coskun and Wei Ding.
Harshvardhan Solanki, Jyoti Singh, Yihui Chong and Ankur Teredesai.
This workshop aims to serve as a pivotal platform for discussing the forefront of Generative AI trustworthiness and evaluation advancements. Generative AI models, such as Large Language Models (LLMs) and Diffusion Models have revolutionized various domains, underscoring the critical need for reliable Generative AI technologies. As these models increasingly influence decision-making processes, establishing robust evaluation metrics and methods becomes paramount. Our objective is to delve into diverse evaluation strategies to enhance Generative AI models reliability across applications. The workshop topics include, but are not limited to:
The workshop is designed to convene researchers from the realms of machine learning, data mining, and beyond, fostering the interdisciplinary exploration into Generative AI trustworthiness and evaluation. By featuring a blend of invited talks, presentations of peer-reviewed papers, and panel discussions, this workshop aims to facilitate exchanges of insights and foster collaborations across research and industry sectors. Participants from diverse fields such as Data Mining, Machine Learning, Natural Language Processing (NLP), and Information Retrieval are encouraged to share knowl- edge, debate challenges, and explore synergies, thereby advancing the state of the art in Generative AI technologies.