Amin Banitalebi's Personal Webpage

• License Plate Recognition (2019)

Project: To perform license plate recognition, on the edge (very low power in-camera chipset), cloud, or combination of the edge and cloud.

We built tiny neural networks to achieve this task. These are End-to-End mixed-precision quantized YOLO-like models that are extremely fast, yet

very accurate. These models run on different platforms from GPU servers all the way to Nvidia JetSonTX2, Google Coral, PaspberryPi 3+, and in-

camera 3516EV300 chipsets.

Demo:

• Hand Gesture Recognition (2018)

Project: By 2018, there are already many successful models proposed for gesture recognition. This project investigated the practicallity of those

models for real-time applications. To this end, we leveraged a custom TRN (Temporal Relations Network) structure to train a model over the Jester

dataset for hand gesture recognition. This model could classify one of several designed actions in under 50 ms. Later the model outputs were

translated to a humanoid robot directions for a live demonstration:

Demo:

• Facial Attributes Recognition (2018)

Project: Facial attributes recognition for deployment on low power chips. We designed and trained a multi-task neural network with over 10

branches to recognize attributes such as age, gender, smile, makeup, facial hair, eye-glasses, etc. This model runs on JetSonTX2 under 100 ms per

frame.

Demo:

• Face Detection and Recognition (2018)

Project: To perform face detection and recognition for various applications (various speed, accuracy, and dataset size requirements; clustering,

classification, or id recognition). This involved with exploring/modifying various deep learning frameworks/models, to train/transfer-learn for

detection and recognition tasks.

Demo:

For this demo, I used the PubFig dataset (Public Figures Face Dataset) as face id candidates, LFW datset (Labeled Faces in the Wild) for training,

and tried a couple of YouTube videos for fun (watch full screen in high resolution setting to read the labels!):

• Text Detection and Recognition (2018)

Project: Detect and recognize (read) text from video/image. This involved with exploring the state-of-the-art OCR (Optical Character Recognition)

models. Various Attention-based CNN/RNN models were used for detection, recognition, or end-to-end tasks. Traditional OCR approaches fail for a

non-flat warped background with transformations.

Demo:

Using a separate detection and character level recognition framework, I trained deep learning based models, and applied them to a couple of

YouTube vidoes below. In each video, bounding boxes are first detected around the text area, then text crops are shown below the actual video

frame (sorted according to recognition confidence). For each text crop (bounding box), the title above shows what the recognition model has found.

Although the detection seems to have high precision and high recall, I believe recognition can still be further improved.

• Command Recommendation in an IDE environment (2017)

Project: Modern Integrated Development Environments (IDEs) often provide next command recommendation to the users to ease the development

process. A current approach is to use co-occurrence analysis, that has a superior performance compared to the traditional supervised learning

approaches. In this project we explored the usage of deep neural networks for this task. We trained a custom LSTM network and demonstrated it

can perform on-par with the co-occurrence method.

Code: https://gitlab.com/abanitalebi/command-recommendation

Amin Banitalebi Dehkordi, PhD
(Amin Banitalebi)

Links

PUBLICATIONS: click here

DATASETS: click here

CODE: click here (or visit my GitLab GitHub pages)

ACADEMIC PROJECTS: click here

Visit my Google Scholar page here.

Projects

• Detecting and Analyzing Objects in a Conveyor Belt Surveillance Video:

Project: To understand and analyze the objects moving on a conveyor belt, in a dim environment

Solution: Using connected component segment analysis and contour detection to find objects

• Ultra-Accurate Image Registration:

Project: Register an image aquired by a camera on board of a satellite to a ground-truth image on the ground, accurate to half-a-pixel

Solution: A complex mathematical optimization approach based on extended Kalman filtering. The optimization is very flexible, accounts for

uncertainty in the measurements and independent variables, converges in only a few iterations in less than 10-30 seconds.

Before Registration:

Matching:

After Registration:

• Ultra-Accurate Registration of Multiple Images - Bundle Adjustment:

Project: Register multiple images obtained from different satellite cameras together, and to a ground truth image on the ground

Solution: Optimization using extended Kalman filters. The overlapping area between multiple images has a half-a-pixel registration accuracy. This

accuracy is better than the accuracy obtained by registering every image against the ground truth reference image. The figure below shows slight

misregistration remaining in an overlapping AOI.

• Video Frame Stabilization:

Project: Register a stack of video frames from a satellite camera to each other, and to a ground truth reference image

Solution: Optimization using extended Kalman filters to register each frame first. This will produce a video, with a remaining degree of

misregistration. Then, register every frame against the median of center frames. The resulting video will have a sub-pixel frame-to-frame

regostration.

Original video, before registration:

After the initial optimization (AOI):

Final video (AOI):

• Style Transfer with Variational Generative Adversarial Network (GAN):

Project: To demonstrate latent space arithmetics when using Variational GANs

Demo:

Using CelebFaces dataset, to demonstrate Male-to-Female transfer:

No-Smile to Smile transfer:

I tried my childhood picture (with no alignment) to Smile transfer. Looks like it needs alignment, but shows the concept:

• Ortho Rectification:

Project: To project images acquired by a sensor, to the ground plane. Involves with efficient resampling algorithms.

Demo:

• 2D to 3D Video Conversion:

Project: To automatically convert 2D video to 3D

Solution 1: To train a learning model that can generate depth maps from 2D video. And then use depth maps to generate additional views.

Solution 2: To estimate a depth map in an optimization problem, then perform motion segmentation to assign a depth value to each object. This

was followed by depth map post filtering for quality enhancement and flicker reduction.

• Other Projects Worth a Mention:

- Image/Video enhancement and restoration (parallel implementation using Amazon AWS EMR):

- Lightness contrast stretching

- Adaptive histogram equalization

- Saturation, color temperature adjustment, gamma correction, sharpenning

- Tone-mapping and color space conversions

- Frame-rate up/down conversion:

- Filling bad or no-data pixels

- Python implementation with C++ SWIG bindings for efficient interpolation/extrapolation

- Patch-matching to restore corrupted sections of video frames

- Compiling a huge MATLAB code base using MATLAB Compiler

• Distributed Inference / Edge-Cloud Collaboration:

- Joint bit-width assignment and neural network splitting for distributing the inference between an edge and cloud or more

- Integrated as a pipeline to HiLens (a Huawei edge device and platform)

- Implemented for a license plate recognition application (3516 arm chip)

(Download: KDD Paper Code)

• Model Composition:

- Combine multiple models in one, without labeled data of all tasks (tasks could be partially overlapping or non-overlapping)

(Download: BMVC Paper Code)

• Robust Object Detection / Domain Adaptation:

- Make object detection models robust to domain shifts such as image corruptions, weather/lighting changes, etc.

- Support domain changes such as natural image to cartoon, painting, clipart

- Train on one dataset and test on another

(Download: ICCV Paper Code)

• Joint Inference:

- Use a large & small model together to distribute inference based on the inputs characteristics

- Goal: Preserve the accuracy, but speed-up the inference

- Unsupervised & specialized extensions

- Applied to Vision (Image Classifiaction, Object Detection, ...) and NLP (Classification, Translation on BERT, T5, ...)

(Download: BMVC Paper Code)