new!   Estimating LLM consistency: A user baseline vs surrogate metrics.   [PDF, BibTeX, talk video, data and code, ]
Xiaoyuan Wu, Weiran Lin, Omer Akgul, and Lujo Bauer.
In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, November 2025. Association for Computational Linguistics. Senior Area Chair Highlight  DOI:10.18653/v1/2025.emnlp-main.1554