Genkit Eval Framework for UI generation
This is for evaluating A2UI (v0.8) against various LLMs.
Setup
To use the models, you need to set the following environment variables with your API keys:
GEMINI_API_KEYOPENAI_API_KEYANTHROPIC_API_KEY
You can set these in a .env file in the root of the project, or in your shell's configuration file (e.g., .bashrc, .zshrc).
You also need to install dependencies before running:
pnpm install
Running all evals (warning: can use lots of model quota)
To run the flow, use the following command:
pnpm run evalAll
Running a Single Test
You can run the script for a single model and data point by using the --model and --prompt command-line flags. This is useful for quick tests and debugging.
Syntax
pnpm run eval -- --model='<model_name>' --prompt=<prompt_name>
Example
To run the test with the gpt-5-mini (reasoning: minimal) model and the generateDogUIs prompt, use the following command:
pnpm run eval -- --model='gpt-5-mini (reasoning: minimal)' --prompt=generateDogUIs
Controlling Output
By default, the script only prints the summary table and any errors that occur during generation. To see the full JSON output for each successful generation, use the --verbose flag.
To keep the input and output for each run in separate files, specify the --keep=<output_dir> flag, which will create a directory hierarchy with the input and output for each LLM call in separate files.
Example
pnpm run evalAll -- --verbose
pnpm run evalAll -- --keep=output