Agentic AI and LLM application: LLM application accelerating unstructured clinical trial document (CSPs) development
A proof-of-concept (PofC) chatbot using agentic AI and AWS to accelerate clinical trial protocol document workflows
Tools & Techniques
Python (LangChain
/LangGraph
, pymupdf
), AWS Bedrock Agent, AWS Sagemaker, AWS Sandbox, FastAPI
, GitHub
Summary
Clinical study protocols (CSPs) are essential for maintaining both the integrity and scientific validity of clinical trials. Examples can be found here. However, CSP development is often a complex and time-intensive process for pharmacetical companies: it requires deep disease-specific expertise for accurate trial design, aligning with best practices established in previous trials, and consistency with organizational standards and regulatory requirements.
Currently, using general chat applications like ChatGPT to extract and summarize clinical study protocols poses significant challenges: they lack domain-specific training, often miss critical regulatory and contextual nuances, and struggle with diverse document formats and data types. As a result, the extracted information can be incomplete, non-reproducible, or misleading.
To streamline the handling of unstructured CSP documents and improve the quality of information extraction and summarization, I explored AI-assisted ETL processes leveraging advanced techniques such as Retrieval-Augmented Generation (RAG), foundation models, and AI agents. To ensure accessibility and ease of use, I deployed a conversational AI chatbot capable of performing these tasks interactively. This approach is designed to support statisticians in accelerating the development of high-quality clinical trial protocols.
CSP document digitalization
Tasks include: loading unstructured PDF to structured vectordatase, extraction of primary objective, endpoints, assessments/flowchart (tables) and basic trial information extraction, clinical assessment ontologies mapping, etc.
AI Agent Development
Utilized RAG framework (via LangChain
and LangGraph
Graph API), and FastAPI as interactive API server. The RAG was integrated with a LLM model (Claude 3.5 Sonnet) via AWS bedrock agent, enabling the generation of responses to user queries while providing relevant references if needed.
Flow Chart
- Overview of agentic ai & RAG structure (simiplified for illustration only)
Primary Notice: The original chatbot was deployed using company-hosted cloud services.