본문 바로가기

전체 글

(15)

[Paper Review 2] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation https://arxiv.org/abs/1711.09020 Conference on Computer Vision and Pattern Recognition (CVPR), 2018 1. Introduction What is image-to-image translation?? 주어진 이미지의 모습을 다른 모습으로 바꾸는 것을 의미한다. 두 개의 다른 도메인에서 학습 데이터가 주어졌을 때, StarGAN은 한 도메인에서 다른 하나의 도메인으로 바꾸는 것을 학습하는 모델이다. CelebA 데이터셋과 같이, labeled 데이터셋을 사용하는 multi-domain image translation 과제를 수행할 때, 기존의 모델들이 비효율적인 것을 확인할 수 있다. k개의 domain 사이에서의 모든 매핑들을 학습..

[Paper Review 1] Perceptual Adversarial Networks for Image-to-Image Transformation https://arxiv.org/abs/1706.09138 Conference on Computer Vision and Pattern Recognition (CVPR), 2017 1. Introduction Image-to-Image Transformation -> Input 이미지를 원하는 output 이미지로 출력하는 것을 목표로 한다. 선행연구에서는 Image-to-Image transformation을 수행하기 위해서, CNN을 supervised manner로 학습시켜왔다. 이는 Input 이미지를 hidden representation으로 인코딩을 하고, output 이미지로 디코딩을 하는 방식이다. 또한, 선행연구는 GAN을 사용하여 Image-to-Image Transformation을 수..

[Paper Review 4] GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis Conference on Neural Information Processing Systems (NeurIPS), 2020 https://arxiv.org/abs/2007.02442 1. Introduction 고해상도의 이미지들을 활용한 View Synthesis 작업을 수행할 때, Convolutional GAN이 굉장히 효과적이다. 하지만, 이러한 성공에 불구하고, SOTA 모델들은 3D shape와 viewpoint와 같은 generative factor들을 disentangle을 제대로 하지 못하고 있다. 인간은 새로운 viewpoint에서 물체를 상상하고 세상의 3D 구조를 파악하는 데 능한 것과는 반대되는 모습이다. 따라서, 최근의 연구들은 3D-aware image synthesis를 접근하..

[Paper Review 3] 3D human pose estimation in video with temporal convolutions and semi-supervised training Conference on Computer Vision and Pattern Recognition (CVPR) 2019 https://arxiv.org/abs/1811.11742 1. Introduction 이 논문에서는 비디오에서 3D human pose estimation을 다룬다. 추정 문제를 해결하기 위해, 2D keypoint detection과 3D pose estimation, 두 방법으로 분할해서 해결하게 된다. 하지만, 이 방법은 여러 3D pose들이 같은 2D 키포인트에 매핑이 될 수 있는 등, 모호성이 내재되어있기도 하다. 기존의 연구들은 Recurrent Neural Network (RNN)을 사용해서 모호성을 해결해왔다. 반면에, CNN 모델을 사용하면, 여러 프레임의 병렬 처리가..

[Paper Review 2] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (This article is written for study purposes. Please be aware that there may be incorrect information.) European Conference on Computer Vision (ECCV), 2020 (Best Paper Honorable Mention) https://arxiv.org/pdf/2003.08934.pdf This paper focuses on the problem of View Synthesis. View Synthesis gets multiple images of scenes along with their corresponding camera poses. 1. Introduction NeRF represents..

[Paper Review 1] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (This article is written for study purposes. Please be aware that there may be incorrect information.) Conference on Computer Vision and Pattern Recognition (CVPR) 2017 https://arxiv.org/abs/1612.00593 1. Introduction Typical convolutional architectures require highly regular input data formats (e.g. Image Grids, 3D voxels) However, Point Clouds or Meshes are not in regular format. Therefore, mo..

목록 더보기

티스토리툴바