Integrating Large Language Models for Traffic Crash Classification in Data-Constrained Regions
Publication Type
Conference Paper
Authors

Traffic crashes remain a critical public safety issue in developing regions, where poor data quality, inconsistent reporting, and inadequate classification frameworks hinder effective analysis and policy action. This study presents a novel methodology for enhancing traffic crash classification using advanced large language models in a data-constrained environment. We employ OpenAI’s GPT-4o to reclassify 48,815 crash records from Palestine (2019–2022), converting unstructured narratives into a structured taxonomy aligned with international standards. Using few-shot learning, GPT-4o achieved over 92% classification accuracy, revealing dominant causes such as failure to yield (42.64%) and driver negligence (34.76%), while identifying 11.55% of cases as pedestrian related. To explore predictive potential, we integrate this refined dataset into a ConvLSTM (Convolutional Long Short-Term Memory) model, capturing spatio-temporal dependencies to forecast crash frequencies across regions and time intervals. This integration of large language model-driven reclassification with deep spatio-temporal learning not only improves the reliability of crash datasets but also uncovers actionable insights for targeted road safety interventions. Comparative analysis with existing literature highlights the novelty of applying LLMs in tandem with spatio-temporal models for traffic analysis in low resource settings. The findings carry significant implications for data-driven policymaking, infrastructure design, and urban traffic safety strategies.

Conference
Conference Title
Engineering for Palestine Conference (ENG4PAL)
Conference Country
Palestine
Conference Date
Sept. 29, 2025 - Sept. 30, 2025
Conference Sponsor
PPU
Additional Info
Conference Website