Retrieval-augmented generation (RAG) pipelines have become the de-facto approach for building AI assistants with external knowledge. Given a user query, RAG pipelines retrieve (R) information from external sources, before invoking a Large Language Model (LLM), augmented (A) with this information, to generate (G) responses. However, developing effective RAG pipelines is challenging because retrieval and generation components---often chained in varying orders---are intertwined, making it hard to identify which component(s) cause errors in the output. Developers often need to answer "what-if" questions---e.g., what if chunk sizes were larger or retrieval used embeddings versus keywords---but such experimentation requires hours of re-processing. We present RAGGY, a developer tool that enables rapid "what-if" analysis by combining a Python library of composable RAG primitives with an interactive debugging interface. We contribute the design and implementation of RAGGY, insights into expert debugging patterns through a qualitative study with 12 engineers, and design implications for RAG tools.
ACM CHI Conference on Human Factors in Computing Systems