Agentic AI and LLM application: LLM application accelerating unstructured clinical trial document (CSPs) development

1 minute read

A proof-of-concept (PofC) chatbot using agentic AI and AWS to accelerate clinical trial protocol document workflows

Tools & Techniques

Python (LangChain/LangGraph, pymupdf), AWS Bedrock Agent, AWS Sagemaker, AWS Sandbox, FastAPI, GitHub

Summary

Clinical study protocols (CSPs) are essential for maintaining both the integrity and scientific validity of clinical trials. Examples can be found here. However, CSP development is often a complex and time-intensive process for pharmacetical companies: it requires deep disease-specific expertise for accurate trial design, aligning with best practices established in previous trials, and consistency with organizational standards and regulatory requirements.

Currently, using general chat applications like ChatGPT to extract and summarize clinical study protocols poses significant challenges: they lack domain-specific training, often miss critical regulatory and contextual nuances, and struggle with diverse document formats and data types. As a result, the extracted information can be incomplete, non-reproducible, or misleading.

To streamline the handling of unstructured CSP documents and improve the quality of information extraction and summarization, I explored AI-assisted ETL processes leveraging advanced techniques such as Retrieval-Augmented Generation (RAG), foundation models, and AI agents. To ensure accessibility and ease of use, I deployed a conversational AI chatbot capable of performing these tasks interactively. This approach is designed to support statisticians in accelerating the development of high-quality clinical trial protocols.

CSP document digitalization

Tasks include: loading unstructured PDF to structured vectordatase, extraction of primary objective, endpoints, assessments/flowchart (tables) and basic trial information extraction, clinical assessment ontologies mapping, etc.

csp image

AI Agent Development

Utilized RAG framework (via LangChain and LangGraph Graph API), and FastAPI as interactive API server. The RAG was integrated with a LLM model (Claude 3.5 Sonnet) via AWS bedrock agent, enabling the generation of responses to user queries while providing relevant references if needed.

Flow Chart

Overview of agentic ai & RAG structure (simiplified for illustration only)

rag image

Primary Notice: The original chatbot was deployed using company-hosted cloud services.

Share on

X Facebook LinkedIn Bluesky

Shihui (Shirley) Zhu

Agentic AI and LLM application: LLM application accelerating unstructured clinical trial document (CSPs) development

Tools & Techniques

Summary

CSP document digitalization

AI Agent Development

Flow Chart

Share on

You May Also Enjoy

Real-world data (RWD) pharmacoepidemiology analysis: Constructed study design and performed statistical analysis on real-world data to inform clinical trial designs

Computer Vision: Prediction of homologous recombination deficiency (HRD) from whole slide images (WSIs) of H&E-stained ovarian cancer biopsies via deep learning and transformers

Senior Design Project: Machine learning Classification for Diagnostics of Leukemia Based on Single Cell Cytometry Data Analysis