AutoAlt

2024

min read

A browser extension that describes shoe images for screen reader users — built at the LG AI Youth Camp.

PythonYOLOv8FastAPIGPT-4

The problem

Visually impaired people shopping online often hit a wall with product images. Screen readers need alt text to describe what’s on screen — and most shopping sites don’t have it. You can’t tell two pairs of shoes apart if the page just says “image” twelve times.

Our team spotted this during the LG AI Youth Camp in 2024 — a program run by LG Discovery Lab and Seoul National University. We spent about three months building a solution.

What we built

AutoAlt takes a product image, runs it through a custom object detection model, then hands the result off to GPT-4 to turn into a natural-language description. What comes out the other end is text a screen reader can actually read aloud.

We narrowed the scope a lot over the project. Started with “all clothing on shopping sites,” ended at shoes — the only category we could train reliably in the time we had.

Stack

YOLOv8 trained on my MacBook Pro (MPS acceleration — first time I heard the fans sound like a jet)
FastAPI backend — receives image, returns model JSON
GPT-4 Turbo — turns {type: "sneaker", laces: true} into a full sentence
Single HTML frontend, triggered via right-click context menu on images

I was the only developer on the team. Everyone else handled planning, design, and presentation.

Results

We won three awards at the final ceremony — LG Talent, Growth, and Exploration prizes — and I was individually selected for the US Silicon Valley trip.

At Stanford during the US camp, we also prototyped an AI legal advice chatbot for minor traffic violations — different project, same design-thinking energy.

The full story — application panic, SNU dorm all-nighter, YOLO training on a laptop — is in my LG AI Youth Camp blog post. The US camp writeup covers Silicon Valley.

AutoAlt

The problem#

What we built#

Stack#

Results#

More#

The problem

What we built

Stack

Results

More