Diabetic Retinopathy Detection using Vision Transformers

Project Overview

This project implements a state-of-the-art Vision Transformer (ViT) model for automated diabetic retinopathy detection from retinal fundus images. The system achieves 94% accuracy in classifying disease severity levels, providing a crucial tool for early detection and prevention of diabetes-related vision loss.

Key Achievements

High Accuracy: 94% classification accuracy on diabetic retinopathy severity levels
Modern Architecture: Leverages Vision Transformers for superior feature learning from retinal images
Production Deployment: Fully containerized and deployed on AWS Lambda for scalable predictions
Fast Inference: Optimized for real-time screening in clinical settings

Technical Stack

Deep Learning Framework

Vision Transformer (ViT) architecture
Custom preprocessing pipeline for retinal images
Transfer learning from pre-trained medical imaging models

API & Deployment

RESTful API for image upload and prediction
Asynchronous request handling
Input validation and error handling

Containerization

Multi-stage Docker builds for optimized image size
Reproducible development and production environments
Easy deployment across different platforms

Cloud Infrastructure

Serverless deployment for cost-effective scaling
API Gateway integration for HTTPS endpoints
S3 integration for model artifact storage

Model Architecture

The Vision Transformer processes retinal images by:

Dividing images into fixed-size patches
Linearly embedding each patch
Adding positional embeddings
Processing through transformer encoder layers
Classification head for severity prediction

Clinical Impact

This system assists ophthalmologists by:

Providing rapid screening results
Prioritizing high-risk patients
Reducing diagnostic workload
Enabling early intervention for vision preservation

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Arnav Aditya