Introduction & Motivation
Voice-over: ../assets/audio/intro.mp3
Crosswalks are important because they give pedestrians a safe space to cross the road, but detecting them automatically is not always simple. A reliable detector can improve safety for everyone, especially people who are blind or have low vision, and it also supports autonomous cars and smart city systems. If detection fails, it can create unsafe situations, so researchers have put a lot of focus on solving this problem.
Earlier systems mainly looked for simple visual cues like parallel lines and repeated stripe patterns, but these methods often fail when crosswalk paint is worn out, cars block the view, or lighting is poor. More recent work uses deep learning models such as object detectors and segmentation networks. Datasets like KITTI and Cityscapes have helped train and test these models. Even newer research looks at vision-language models (VLMs), which combine visual features with language reasoning to move beyond just “finding paint” toward understanding whether it is actually safe to cross [1] [3] [4].
This tutorial reviews different approaches to crosswalk detection, explains their strengths and weaknesses, and points out the open challenges that researchers are still trying to solve.
