RAG-based QA Assistant | Tingyun Cheng

Background

Text-based Retrieval-Augmented Generation System

As part of the “Web Search and Mining” module at NCCU, our team architected a comprehensive text-based Retrieval-Augmented Generation system. We implemented agentic retrieval workflows alongside router-based query planning strategies to optimise retrieval accuracy. Within this collaborative effort, I applied my full-stack expertise to build custom web-based visualisation and prompt-testing tools, which significantly accelerated our team's evaluation and iteration cycles.

Tech Stack

Python, LLM Orchestration (Ollama), Vector Search (FAISS), Docker, SQLite

Project Repository

https://github.com/tingyun1015/WSM-Final_Project_RAG

NEW! I am currently redeveloping with the frontend interface: https://github.com/tingyun1015/RAG_project

Task Requirement

For this project, we formed a four-person team tasked with building a RAG system capable of answering queries based on provided documents within a strict one-month timeframe. We were required to use Ollama as our model provider, specifically constrained to the granite4:3b model in a local environment.

Agile Project Management & Leadership

Introducing Backlogs and Kanban Boards

I introduced straightforward agile management frameworks to the team, utilising a product backlog for task specification and a Kanban board to track our progress. We integrated these tools into our online meetings to ensure strategic alignment and foster highly effective collaboration, which ultimately drove the successful delivery of the project.

Structuring Project Progression

As the project had to be built from the ground up, I structured our development cycle into three distinct phases. We initially dedicated several weeks to research and trials, which culminated in formulating the core architecture. Following this, I implemented a breakdown strategy to address different categories of questions sequentially, ensuring we could systematically deliver the final product.

Designing a Router-based Architecture

Following our initial research phase, we identified specific recurrent patterns within the documents and queries. Consequently, I devised a robust routing strategy to handle these varied matched patterns. The system processes queries individually, relying on the Router to dynamically dispatch each query to its appropriate processing chain:

Summary-Router Chain
- Handles queries that match summary-related syntax.
- This chain mainly addresses “Summary Questions”.
Name-Router Chain
- Handles queries that match exactly one or two documents.
- This chain mainly addresses “Factual Questions” and “Multi-hop Reasoning Questions” with a single document path; while multiple document matches deal with “Multi-doc Info Integration Questions”, “Multi-doc Comparison Questions”, and “Multi-doc Time Sequence Questions”.
LLM-Router Chain
- Serves as a fallback for all other queries, which are handled directly by the LLM.

Building Internal Development Tools

Score Dashboard

I developed a Score Dashboard to visualise experimental results in a tabular format, enabling straightforward comparison of results from each execution, including scores and their corresponding generated answers. This representation made it easier to evaluate and compare the performance of various strategies and configurations.

Prompt Playground

I set up a web-based playground for quick experimentation with the LLM, allowing us to test and optimise prompt design and configuration settings.

Results

Competing against seven groups, we ultimately secured second prize for this intensive one-month project. This experience provided me with invaluable insight into building RAG systems from the ground up. Further details can be found in our:

Slides Final Report