EPRI's New Electric Sector Benchmarking
EPRI, the Electric Power Research Institute, has recently published its first domain-specific benchmarking results targeting the electric power sector. Released on December 9, 2025, this groundbreaking study offers essential insights into how large language models (LLMs) perform in real-world utility contexts, which has become increasingly important as utilities integrate AI into their operations.
Importance of Accurate AI Models in Utilities
As the demand for precision in power system planning increases, understanding the operational complexities that utilities face becomes crucial. Existing benchmarks often rely on general academic knowledge, focusing primarily on disciplines like mathematics, science, and coding—areas that do not thoroughly reflect the unique challenges within the electric sector.
In contrast, EPRI's benchmarking examines fundamental operational aspects, such as generation, transmission, and distribution of electricity. By utilizing a dataset of over 2,100 questions constructed by 94 power sector experts, the study offers a robust approach to evaluate the depth of understanding LLMs have regarding the technical and regulatory intricacies in the field.
Key Findings from the Benchmarking
The benchmarking revealed some pertinent takeaways:
- - Reliability Gaps with Open-Ended Questions: When respondents answered questions in an open-ended format versus multiple-choice questions, there was a dramatic drop in average accuracy—by around 27 percentage points. Expert-level questions yielded disappointing results, with top models scoring between only 46% to 71%.
- - Multiple-Choice Questions Provide a Baseline: The models scored impressively on multiple-choice items, achieving between 83% and 86%. This performance aligns well with their results on comparable math and science benchmarks, yet it emphasizes that these scores benefit from the structured nature of multiple-choice assessments.
- - Open-Weight Models Show Potential: These LLMs, which have publicly available trained parameters, are tracking closely behind proprietary models. Their growth potential and capability for self-hosting present utilities with considerable flexibility and innovative opportunities.
- - Web Searches Slightly Improve Scores: Allowing models to enhance their responses through web searches led to marginal improvements in accuracy, by about 2-4%, though this also came with the possibility of retrieving irrelevant or false data.
Methodology of the Benchmarking
EPRI's systematic approach consisted of three distinct phases designed to comprehensively assess LLM capabilities:
1.
Knowledge Assessment with MCQs: The first phase utilized multiple-choice questions to measure the models' knowledge base.
2.
Web-Integrated Testing: In the second phase, models were allowed to leverage web searches, examining how external data sources can assist in their responses.
3.
Open-Ended Response Evaluation: This last phase focused on analyzing the reliability of models when presented with open-ended questions, gauging their understanding when free-form responses were required.
To ensure the robustness of results, each phase underwent three iterations per model, with confidence intervals included to account for variability among responses.
Future Directions for Benchmarking
EPRI's initiative is part of its broader Open Power AI Consortium, established to promote AI applications that are specifically tailored for the electric sector. Future phases of benchmarking are set to include evaluations of domain-augmented tools, moving beyond generic assessments to real applications within the utility landscape.
In conclusion, EPRI's efforts pave the way for a more nuanced understanding of how AI and language models can be effectively applied within the electric power sector. By emphasizing the need for accuracy and real-world relevancy, EPRI establishes critical benchmarks that will undoubtedly shape AI integration strategies moving forward.
For further information and access to the full report, please visit:
WattWorks
Contact
For media inquiries, please reach out to Rachel Gantz at 202-293-7517 or via email at [email protected]