Amin Banitalebi's Personal Webpage

Amin Banitalebi Dehkordi, PhD
(Amin Banitalebi)

Links

• Visual attention modeling for High Dynamic Range (HDR) video using eye-tracking data:

HDR video has become widely available through consumer market displays and cameras. Maturity of these systems relies hugely on improving the

users' quality of experience. We studied the visual saliency aspect of human visual system using eye tracking data collected from a series of

subjective tests and HDR video content.

DML-iTrack-HDR: our freely available eye-tracking database of ground truth eye fixation maps from several HDR sequences.

* database: access DML-iTrack-HDR database at here

* details and reference papers:

1) A. banitalebi-Dehkordi, M. Azimi, M. T. Pourazad, and P. Nasiopoulos, "Visual saliency aided high dynamic range (HDR) video quality metrics,"

International Conference on Communications (ICC) 2016. pdf - published version

2) A. Banitalebi-Dehkordi, M. Azimi, M. T. Pourazad, and P. Nasiopoulos, "Saliency-Adided HDR quality metrics," ISO/IEC JTC1/SC29/WG11, Doc.

m37317, Oct. 2015, Geneva, Switzerland. pdf - published version

LBVS-HDR: LBVS-HDR is a Learning Based Visual Saliency detection model for High Dynamic Range video.

* code: available here

* details and reference paper:

3) A. Banitalebi-Dehkordi, Y. Dong, M. T. Pourazad, and P. Nasiopoulos, "A Learning Based Visual Saliency Fusion Model for High Dynamic Range

Video (LBVS-HDR)," 23rd European Signal Processing Conference (EUSIPCO) 2015. pdf - published version

• Saliency prediction on stereoscopic videos using eye-tracking:

Eye-tracking device is used to collect a largscale dataset of human eye fixations for a set of 3D/2D sequences. Machine learning methods are used

to predict the fixations for stereoscopic videos.

DML-iTrack-3D: our freely available eye-tracking dataset for stereoscopic 3D video. In addition to 3D videos and their ground truth fixation maps,

we also provide 2D versions of the same scenes and their eye-tracking data for free.

* database & benchmarking state-of-the-art: access from here or here

* details and reference paper:

1) Amin Banitalebi-Dehkordi, Eleni Nasiopoulos, Mahsa T. Pourazad, and Panos Nasiopoulos, "Benchmark three-dimensional eye-tracking dataset for

visual saliency prediction on stereoscopic three-dimensional video," J. Electron. Imaging. 25 (1), 013008 (January 14, 2016); doi:

10.1117/1.JEI.25.1.013008. pdf - published version

LBVS-3D: our visual attention model for 3D video.

* benchmark: here

* code: here

* details and reference paper: A. Banitalebi-Dehkordi, M.T. Pourazad, and P. Nasiopoulos, "A Learning-Based Visual Saliency prediction model

for stereoscopic 3D video (LBVS-3D)," Multimedia Tools and Applications, 2016, DOI 10.1007/s11042-016-4155-y.

pdf published version

• Quality Assessment of Stereoscopic Videos:

The goal in this project is to design and formulate a full-reference quality metric for stereoscopic 3D video. The 3D Quality of Experience (QoE) is

modeled as a combination of cyclopean view quality as well as the depth map quality. To validate the proposed method, we use a large database of

stereoscopic videos containing various types of distortions that occur during the multiview video transmission pipeline (simulcast compression, 3D

video compression, view synthesis, additive Gaussian noise, blurring, and depth map comression). Largescale subjective test (using 88 subjects)

was performed to evaluate the performance of the proposed metric. Comparison with the state-of-the-art demonstrated the superiority of our

method.

Visit here to download part of our video data and refer to following publications for more information:

1) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "An Efficient Human Visual System Based Quality Metric for 3D Video," Springer

Journal of Multimedia Tools and Applications, pp. 1-29, Feb. 2015, DOI: 10.1007/s11042-015-2466-z. pdf - published version

2) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "A human-visual-system based 3D video quality metric," ISO/IEC JTC1/SC29/WG11,

Doc. M27745, Geneva, Switzerland, January 2013. pdf - published version

3) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "A human-visual-system based 3D video quality metric," International Conference on

3D Imaging (IC3D), Dec. 2012, Belgium. pdf - published version

4) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "3D video quality metric for 3D video compression," 11^th IEEE IVMSP Workshop: 3D

Image/Video Technologies and Applications, Seoul, June, 2013. pdf - published version

5) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "A study on the relationship between depth map quality and the overall 3D video

quality of experience," 3DTV-Conference: The True Vision-Capture, Transmission and Dispaly of 3D Video (3DTV-CON), 2013. pdf - published

version

6) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "3D video quality metric for mobile applications," 2013 IEEE International Conference

on Acoustics, Speech and Signal Processing (ICASSP), 2013. pdf - published version

7) 61/731,864 “An efficient human visual system based quality metric for 3D video,” A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos,

US Provisional Patent Application, filed Nov. 30, 2012.

• Effect of Eye Dominance in the Perception of Stereoscopic 3D Video:

Around 90 percent of humans have a dominant eye (70 % right-eye dominant and 20 % left-eye dominant). We study the effect of eye-dominance for

different scenarios when subjects watch asymmetric stereoscopic sequences when: one view is compressed more than the other view (asymmetric

compression), when one view is down-sampled (and upsampled again: mixed-resolution), one view is synthesized, or one view is blurred. We

provide some recommendations and report the results of our subjective tests.

Please refer to the following paper for more information:

1) A. Banitalebi-Dehkordi, M. T. Pourazad, and P. Nasiopoulos, "Effect of eye dominance on the perception of stereoscopic 3D video," 2014 IEEE

International Conference on Image Processing, ICIP, Oct. 2014, Paris. pdf - published version

• Compression of High Dynamic Range (HDR) Video:

HDR video technologies are finding their ways through the consumer market. While MPEG is standardizing the compression of HDR video, we

performed a subjective/objective study on the performance of the current compression standards of H.264/AVC and HEVC for HDR video

compression. Our results showed that HEVC can be selected as a base for developing compression standard for HDR video.

Visit here or here to find our HDR video database. Please find more details regarding our study in the following paper:

1) A. Banitalebi-Dehkordi, M. Azimi, M. T. Pourazad, and P. Nasiopoulos, "Compression of high dynamic range video using the HEVC and H.

264/AVC standards," 2014 10th International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine),

Rhodes Island, Greece, Aug. 2014 (invited paper). pdf - published version

• Optimal frame rate for capturing, transmission, and display of stereoscopic 3D video:

While the effect of frame rate on the quality and bit rate of 2D video is well explored, there is an interest from academia and industry to know the

best frame rate for capturing, transmission, and display of stereoscopic 3D video. We performed subjective evaluations to find the Mean Opinion

Score (MOS) of human subjects for various frame rates and bit rates. We concluded that increasing the frame rate of stereoscopic 3D video over 60

fps does not provide much improvements in the perceived quality. Moreover, higher frame rates (48 & 60 fps) are distincly prefferd over the lower

frame rate videos. Considering the current available capturing, transmission, and display technologies, we provide guidlines and recommendations

for efficient use of 3D video.

Our sequences are made publiclly available in here or here. More information can be found in the following papers:

1) A. Banitalebi-Dehkordi, M. Azimi, M. T. Pourazad, and P. Nasiopoulos, "Effect of high frame rates on 3D video quality of experience," 2014 IEEE

International Conference on Consumer Electronics, Las Vegas, US, Jan. 2014. pdf - published version

2) A. Banitalebi-Dehkordi, M. Azimi, M. T. Pourazad, and P. Nasiopoulos, "The Effect of Frame Rate on 3D Video Quality and Bitrate," Springer

Journal of 3D Research, vol. 6:1, pp. 5-34, March 2015, DOI 10.1007/s13319-014-0034-3. pdf - published version

• Quality evaluation of HDR video using objective quality metrics:

MPEG is standardizing the compression of HDR video. They are actively performing subjective and objective evaluations on the HDR video encoder.

In this study, we perform objective and sujective evaluations to find the best quality metric for HDR video content.

Refer here or here for our publicly available HDR video database and refer to the following papers for more details:

1) A. Banitalebi-Dehkordi, M. Azimi, Y. Dong, M. T. Pourazad, and P. Nasiopoulos, "Quality assessment of High Dynamic Range (HDR) video content

using existing full-reference metrics," ISO/IEC JTC1/SC29/WG11, Doc. M34661, Strasburg, France, Oct. 2014. pdf - published version

2) M. Azimi, A. Banitalebi-Dehkordi, Y. Dong, M. T. Pourazad, and P. Nasiopoulos, “Evaluating the performance of existing full-reference quality

metrics on High Dynamic Range (HDR) Video content,” ICMSP 2014: XII International Conference on Multimedia Signal Processing, Nov. 2014,

Venice, Italy. pdf - published version

• Speaker Localization: Localizing the speech sources using a compressive sensing approach:

Please find more details in the following paper:

1) M. Banitalebi-Dehkordi, A. Banitalebi-Dehkordi, “Compressive Sampling Based Sound Sources Localization via Sparse FFT

Based Features,” International Arab Journal of Information Technology, IAJIT, April, 2013.

2) M. Banitalebi-Dehkordi, A. Banitalebi-Dehkordi, “Music genre classification using spectral analysis and sparse representation

of the signals,” Journal of Signal Processing Systems, Springer, June 2013, DOI: 10.1007/s11265-013-0797-4. pdf - published

• Digital Watermarking:

1) A. Banitalebi, S. Nader-Esfahani and A. Nasiri Avanaki, “Robust LSB watermarking optimized for local structural similarity,”

19th Iranian Conference on Electrical Engineering, ICEE, Tehran, Iran, May, 2011. pdf - published version

2) A. Banitalebi-Dehkordi, M. Banitalebi Dehkordi, and J. Abouei, "An improvement technique based on structural similarity

thresholding for digital watermarking," Hindawi Journal of Advances in Computer Engineering, Vol. 2014, Article ID 580697. pdf

- published version

• Perceptual video coding:

1) A. Banitalebi, and H. R. Tohidypour, “Exploring the distributed video coding in a quality assessment context,” American

Journal of Signal Processing, vol. 1, pp. 17-23, 2011. pdf - published version

2) A. Banitalebi, S. Nader-Esfahani and A. Nasiri Avanaki, “A perceptual based motion compensation technique for video

coding,” 6th Iranian Conference on Machine Vision and Image Processing, MVIP 2010, Isfahan, Iran, Oct. 2010. pdf - published

• Face and visual speech recognition:

1) M. Banitalebi-Dehkordi, A. Banitalebi-Dehkordi, J. Abouei, and K. N. Plataniotis, "Face recognition using a new compressive

sampling-based feature extraction method," Springer Journal of Multimedia Tools and Applications, 2017, doi: 10.1007/s11042-

017-5007-0. camera ready.pdf - published version

2) A. Banitalebi, M. Moosaei, G. A. Hossein-Zadeh, “An investigation on the usage of image quality assessment in visual speech

recognition,” 3rd International Congress on Image and Signal Processing, CISP 2010, China, 2010. published version

• Cognitive radios and telecommunications:

1) S. Imani, A. Banitalebi-Dehkordi, M. Cheraghi, “A random matrix approach to wide band spectrum sensing: unknown noise

variance case,” 21st Iranian Conference on Electrical Engineering, ICEE 2013, Mashhad, Iran. pdf - published version

2) S. Imani, A. Banitalebi, M. Kamarei, “Adaptive sub-optimal energy detection based wideband spectrum sensing for cognitive

radios,” 1st International Conference on Electrical, Control and Computer Engineering, INECCE, Pahang, Malaysia, June, 2011.

published version

3) S. Imani, A. Banitalebi, M. Kamarei, “Using weighted multilevel wavelet decomposition for wideband spectrum sensing in

cognitive radios,” 19th Iranian Conference on Electrical Engineering, ICEE, Tehran, Iran, May, 2011. published version

• Biomedical signal processing:

1) A. Banitalebi, S. K. Setarehdan, G. A. Hossein-Zadeh, “A technique based on chaos for brain computer interfacing,” 14th

International CSI Conference (CSICC2009), Tehran, Iran, Oct. 2009. pdf - published version

2) A. Banitalebi, G. A. Hossein-Zadeh, “On the Variability of Chaos Indices in Sleep EEG Signals,” 15th Iranian International

Conference on Biomedical Engineering, Mashhad, Iran, Feb. 2009. pdf

PUBLICATIONS: click here

DATASETS: click here

CODE: click here (or visit my GitLab GitHub pages)

INDUSTRY PROJECTS: click here

Visit my Google Scholar page here.

Projects

• 3D-HDR Video:

High Dynamic Range (HDR) displays and cameras are finding their ways through the consumer market, just a few years after 3D displays and

cameras became widely available. Soon both academia and industry will explore the commercialization of 3D-HDR content, displays, and cameras.

3D-HDR video will have a large close-to-natural gamut of HDR video, while at the same time provides an impression of depth perception to the

viewers. To assist the efforts to the research community, we provide a freely available 3D-HDR video database called DML-3Dhdr.

DML-3Dhdr: Our database of Stereoscopic HDR (SHDR) videos that are captured from natural scenes, calibrated, and post-processed. The sequences

in this database are uncompressed, and come in HDR format for left-right views.

* database: can be accessed from here

* details and reference paper:

1) A. Banitalebi-Dehkordi, "Introducing a public Stereoscopic 3D High Dynamic Range (SHDR) video database," 3D Res (2017) 8: 3.

doi:10.1007/s13319-017-0115-1. pdf - published version - free on readcube

• Knowledge distillation for low-power object detection:

The existing solutions for object detection distillation rely on the availability of both a teacher model and ground-truth labels. We propose a new

perspective to relax this constraint. In our framework, a student is first trained with pseudo labels generated by the teacher, and then fine-tuned

using labeled data, if any available. Extensive experiments demonstrate improvements over existing object detection distillation algorithms. In

addition, decoupling the teacher and ground-truth distillation in this framework provides interesting properties such as: 1) using unlabeled data to

further improve the student's performance, 2) combining multiple teacher models of different architectures, even with different object categories,

and 3) reducing the need for labeled data (with only 20% of COCO labels, this method achieves the same performance as the model trained on the

entire set of labels). Furthermore, a by-product of this approach is the potential usage for domain adaptation. We verify these properties through

extensive experiments.

1) A. Banitalebi-Dehkordi, “Knowledge distillation for low-power object detection: A simple technique and its extensions for training compact

models using unlabeled data,” IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021. (Download: ICCV IEEE)

• RePiant (ReGenerate) the training set images as a means of augmentation:

Convolutional Neural Networks (CNNs) for visual tasks are believed to learn both the low-level textures and high-level object attributes, throughout

the network depth. This paper further investigates the `texture bias' in CNNs. To this end, we regenerate multiple instances of training examples

from each original image, through a process we call `repainting'. The repainted examples preserve the shape and structure of the regions and

objects within the scenes, but diversify their texture and color. Our method can regenerate a same image at different daylight, season, or weather

conditions, can have colorization or de-colorization effects, or even bring back some texture information from blacked-out areas. The in-place

repaint allows us to further use these repainted examples for improving the generalization of CNNs. Through an extensive set of experiments, we

demonstrate the usefulness of the repainted examples in training, for the tasks of image classification (ImageNet) and object detection (COCO),

over several state-of-the-art network architectures at different capacities, and across different data availability regimes.

1) A. Banitalebi-Dehkordi, Y. Zhang, “Repaint: Improving the Generalization of Down-Stream Visual Tasks by Generating Multiple Instances of

Training Examples,” British Machine Vision Conference, BMVC 2021. (Download: ArXiv GitHub BMVC)

• ML4CO: Machine Learning for Combinatorial Optimization:

The 2021 NeurIPS Machine Learning for Combinatorial Optimization (ML4CO) competition was designed with the goal of improving state-of-the-art

combinatorial optimization solvers by replacing key heuristic components with machine learning models. The competition's main scientific question

was the following: is machine learning a viable option for improving traditional combinatorial optimization solvers on specific problem distributions,

when historical data is available? This was motivated by the fact that in many practical scenarios, the data changes only slightly between the

repetitions of a combinatorial optimization problem, and this is an area where machine learning models are particularly powerful at. This paper

summarizes the solution and lessons learned by the Huawei EI-OROAS team in the dual task of the competition. The submission of our team

achieved the second place in the final ranking, with a very close distance to the first spot. In addition, our solution was ranked first consistently for

several weekly leaderboard updates before the final evaluation. We provide insights gained from a large number of experiments, and argue that a

simple Graph Convolutional Neural Network (GCNNs) can achieve state-of-the-art results if trained and tuned properly.

1) A. Banitalebi-Dehkordi and Y. Zhang, “ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For

Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data,” Proceedings of the ML4CO (Machine Learning for

Combinatorial Optimization) Competition, NeurIPS 2021. (Download: ArXiv ML4CO)