GenAI Evaluation KDD2024:

KDD workshop on Evaluation and Trustworthiness of Generative AI Models

Held in conjunction with KDD'24


Welcome to GenAI Evaluation KDD 2024 !

The landscape of machine learning and artificial intelligence has been profoundly reshaped by the advent of Generative AI Models and their applications, such as ChatGPT, GPT-4, Sora, and etc. Generative AI includes Large Language Models (LLMs) such as GPT, Claude, Flan-T5, Falcon, Llama, etc., and generative diffusion models. These models have not only showcased unprecedented capabilities but also catalyzed trans- formative shifts across numerous fields. Concurrently, there is a burgeoning interest in the comprehensive evaluation of Generative AI models, as evidenced by pioneering efforts in research bench- marks and frameworks for LLMs like PromptBench, BotChat, OpenCompass, MINT, and others. Despite these advancements, the quest to accurately assess the trustworthiness, safety, and ethical congruence of Generative AI Models continues to pose significant challenges. This underscores an urgent need for developing robust evaluation frameworks that can ensure these technologies are reliable and can be seamlessly integrated into society in a beneficial manner. Our workshop is dedicated to foster- ing interdisciplinary collaboration and innovation in this vital area, focusing on the development of new datasets, metrics, methods, and models that can advance our understanding and application of Generative AI.

Contact: kdd2024-ws-genai-eval@amazon.com

Call for Contributions

  • Link to the submission website: [link]
  • This workshop aims to serve as a pivotal platform for discussing the forefront of Generative AI trustworthiness and evaluation advancements. Generative AI models, such as Large Language Models (LLMs) and Diffusion Models have revolutionized various domains, underscoring the critical need for reliable Generative AI technologies. As these models increasingly influence decision-making processes, establishing robust evaluation metrics and methods becomes paramount. Our objective is to delve into diverse evaluation strategies to enhance Generative AI models reliability across applications. The workshop topics include, but are not limited to:

    • Holistic Evaluation: Covering datasets, metrics, and methodologies
    • Trustworthiness in Generative AI Models:
      • Truthfulness: Counteracting misinformation, hallucination, inconsistency, sycophancy in responses, adversarial factuality.
      • Ensuring Safety and Security: privacy concerns, preventing harmful and toxicity content.
      • Addressing Bias and Fairness.
      • Ethical Considerations: social norm alignment, compliance with values, regulations and laws.
      • Privacy: privacy awareness and privacy leakage.
      • Enhancing misuse resistance, explainability, and robustness.
    • User-Centric Assessment.
    • Multi-perspective Evaluation: Emphasizing logical reasoning, knowledge depth, problem-solving, and user alignment.
    • Cross-Modal Evaluation: Integrating text, image, audio, etc.

    The workshop is designed to convene researchers from the realms of machine learning, data mining, and beyond, fostering the interdisciplinary exploration into Generative AI trustworthiness and evaluation. By featuring a blend of invited talks, presentations of peer-reviewed papers, and panel discussions, this workshop aims to facilitate exchanges of insights and foster collaborations across research and industry sectors. Participants from diverse fields such as Data Mining, Machine Learning, Natural Language Processing (NLP), and Information Retrieval are encouraged to share knowl- edge, debate challenges, and explore synergies, thereby advancing the state of the art in Generative AI technologies.

    Submission Guidelines

    • Paper submissions are limited to 9 pages, excluding references, must be in PDF and use ACM Conference Proceeding templates (two column format).
    • Additional supplemental material focused on reproducibility can be provided. Proofs, pseudo-code, and code may also be included in the supplement, which has no explicit page limit. The supplement format could be either single column or double column. The paper should be self-contained, since reviewers are not required to read the supplement.
    • The Word template guideline can be found here: [link]
    • The Latex/overleaf template guideline can be found here: [link]
    • The submissions will be judged for quality and relevance through single-blind reviewing.
    • A paper should be submitted in PDF format through EasyChair at the following link: [link]