Evaluation

Measuring how well an AI performs using tests, examples, or benchmarks.