Our goal is to survey the major AI Torah tools and provide community-relevant evaluations of their capabilities, reliability, and safety. Below is the summary scorecard of a proposed evaluation framework that would be applied to the most prominent tools available utilizing AI for Torah study, engagement, and Jewish guidance. These are composed of aggregate scores given by a team of evaluators using a combination of subjective grading and objective tests. Click any box to read more about the meaning of any particular evaluation, or click here to read the detailed methodology (dummy link).
This framework and presentation was inspired by the AI Safety Index, but adapted for the specific usage of Torah-based AI chatbots/tools and reflecting the sensibilities of the communities for which they are geared.
Prototype Notice: The scores shown below are illustrative placeholders (and as far as we know, the Agudah is not developing a chatbot). We are seeking collaborators to help develop the evaluation methodology and conduct actual assessments. Please do get in touch if you'd like to help.
Jewish AI Safety Index: Summary Scorecard
Independent Evaluation of AI Torah/Judaism Assistants
Evaluation Domains Explained
Methodology Overview
Evaluation Panel: Scores derived from blind assessments by 20 working rabbis working in Orthodox schools and synagogues.
Testing Framework: A testbank of 100+ queries was used for each indicator, with 4-8 indicators combined into larger categories for the summary scorecard.
Scoring: Grades reflect aggregated human evaluator assessments, graded on a 1-5 scale.
Help Build This Index
We need rabbis, educators, AI researchers, and community members to help develop evaluation criteria and conduct assessments. Your expertise matters.
Get Involved