소개

안내

이 튜토리얼에서는 Vulkan 그래픽스 및 컴퓨트 API의 기초를 배우게 됩니다. Vulkan은 Khronos 그룹(OpenGL로 유명한)이 만든 새로운 API로, 현대적인 그래픽 카드를 더 잘 추상화했습니다. 이 새로운 인터페이스를 통해 애플리케이션이 하고자 하는 일을 더 정확하게 설명할 수 있어, OpenGL이나 Direct3D 같은 기존 API들과 비교했을 때 더 나은 성능과 예측 가능한 드라이버 동작을 얻을 수 있습니다. Vulkan의 기본 개념은 Direct3D 12와 Metal과 비슷하지만, Vulkan은 완전한 크로스 플랫폼이며 Windows, Linux, Android에서 동시에 개발할 수 있다는 장점이 있습니다.

하지만 이러한 이점들을 얻기 위해서는 훨씬 더 상세한 API를 다뤄야 한다는 대가를 치러야 합니다. 초기 프레임 버퍼 생성부터 버퍼나 텍스처 이미지 같은 객체들의 메모리 관리까지, 그래픽스 API와 관련된 모든 세부 사항을 애플리케이션에서 처음부터 설정해야 합니다. 그래픽스 드라이버가 도와주는 부분이 훨씬 적어지므로, 올바른 동작을 보장하기 위해 애플리케이션에서 더 많은 작업을 해야 합니다.

여기서 얻을 수 있는 교훈은 Vulkan이 모든 사람을 위한 것은 아니라는 점입니다. 고성능 컴퓨터 그래픽스에 열정적이고 그만큼의 노력을 기꺼이 투자할 의향이 있는 프로그래머들을 대상으로 합니다. 만약 컴퓨터 그래픽스보다 게임 개발에 더 관심이 있다면, OpenGL이나 Direct3D를 계속 사용하는 것이 좋을 수 있습니다. 이들은 당분간 Vulkan을 위해 deprecated될 일이 없을 것입니다. 다른 대안으로는 Unreal Engine이나 Unity 같은 엔진을 사용하는 것입니다. 이러한 엔진들은 Vulkan을 내부적으로 사용하면서도 더 높은 수준의 API를 제공합니다.

이제 이 튜토리얼을 따라하기 위한 몇 가지 전제 조건을 살펴보겠습니다:

Vulkan을 지원하는 그래픽 카드와 드라이버 (NVIDIA, AMD, Intel, Apple Silicon (또는 Apple M1))
C++ 경험 (RAII, 초기화 리스트에 대한 친숙도)
C++17 기능을 제대로 지원하는 컴파일러 (Visual Studio 2017+, GCC 7+, 또는 Clang 5+)
3D 컴퓨터 그래픽스에 대한 기본 지식

이 튜토리얼에서는 OpenGL이나 Direct3D 개념에 대한 지식을 전제로 하지는 않지만, 3D 컴퓨터 그래픽스의 기초는 알고 있어야 합니다. 예를 들어 원근 투영에 대한 수학적 설명은 하지 않을 것입니다. 컴퓨터 그래픽스 개념에 대한 훌륭한 소개는 이 온라인 책을 참고하세요. 다른 훌륭한 컴퓨터 그래픽스 리소스들은 다음과 같습니다:

Ray tracing in one weekend
Physically Based Rendering book
실제 엔진에서 Vulkan이 사용된 오픈소스 Quake와 DOOM 3

C++ 대신 C를 사용할 수도 있지만, 다른 선형 대수 라이브러리를 사용해야 하고 코드 구조화는 직접 해야 합니다. 우리는 로직과 리소스 수명을 관리하기 위해 클래스와 RAII 같은 C++ 기능들을 사용할 것입니다. Rust 개발자들을 위한 두 가지 대체 버전의 튜토리얼도 있습니다: Vulkano 기반, Vulkanalia 기반.

다른 프로그래밍 언어를 사용하는 개발자들이 쉽게 따라할 수 있도록, 그리고 기본 API를 경험하기 위해 우리는 Vulkan과 작업할 때 원래의 C API를 사용할 것입니다. 하지만 C++를 사용하고 있다면, 일부 지저분한 작업을 추상화하고 특정 유형의 오류를 방지하는 데 도움이 되는 새로운 Vulkan-Hpp 바인딩을 사용하는 것이 좋을 수 있습니다.

E-book

이 튜토리얼을 e-book으로 읽고 싶다면 EPUB나 PDF 버전을 다운로드할 수 있습니다:

EPUB
PDF

튜토리얼 구조

우리는 Vulkan이 어떻게 작동하는지, 그리고 화면에 첫 삼각형을 그리기 위해 해야 할 작업들에 대한 개요부터 시작할 것입니다. 전체적인 그림 속에서 각각의 작은 단계들이 어떤 역할을 하는지 이해하고 나면, 그 단계들의 목적이 더 명확해질 것입니다. 다음으로 Vulkan SDK, 선형 대수 연산을 위한 GLM 라이브러리, 윈도우 생성을 위한 GLFW로 개발 환경을 설정할 것입니다. 이 튜토리얼에서는 Visual Studio를 사용하는 Windows와 GCC를 사용하는 Ubuntu Linux에서의 설정 방법을 다룰 것입니다.

그 다음에는 첫 삼각형을 렌더링하는 데 필요한 Vulkan 프로그램의 모든 기본 구성 요소들을 구현할 것입니다. 각 챕터는 대략 다음과 같은 구조를 따릅니다:

새로운 개념과 그 목적 소개
프로그램에 통합하기 위한 모든 관련 API 호출 사용
일부를 헬퍼 함수로 추상화

각 챕터는 이전 챕터의 후속편으로 작성되었지만, 특정 Vulkan 기능을 소개하는 독립적인 글로도 읽을 수 있습니다. 즉, 이 사이트는 참조 자료로도 유용합니다. 모든 Vulkan 함수와 타입은 명세서에 링크되어 있어서 클릭하면 더 자세히 알아볼 수 있습니다. Vulkan은 매우 새로운 API이므로 명세서 자체에 일부 부족한 점이 있을 수 있습니다. 이 Khronos 저장소에 피드백을 제출하시기 바랍니다.

앞서 언급했듯이, Vulkan API는 그래픽스 하드웨어를 최대한 제어할 수 있도록 많은 매개변수를 가진 상당히 자세한 API를 가지고 있습니다. 이로 인해 텍스처 생성과 같은 기본적인 작업도 매번 반복해야 하는 많은 단계가 필요합니다. 따라서 우리는 튜토리얼 전반에 걸쳐 자체적인 헬퍼 함수 모음을 만들 것입니다.

각 챕터는 또한 그 시점까지의 전체 코드 링크로 마무리됩니다. 코드의 구조에 대해 의문이 있거나 버그가 있어서 비교해보고 싶을 때 참조할 수 있습니다. 모든 코드 파일은 정확성을 검증하기 위해 여러 벤더의 그래픽 카드에서 테스트되었습니다. 각 챕터 끝에는 해당 주제와 관련된 질문을 할 수 있는 댓글 섹션도 있습니다. 도움을 받기 위해서는 플랫폼, 드라이버 버전, 소스 코드, 예상 동작, 실제 동작을 명시해 주시기 바랍니다.

이 튜토리얼은 커뮤니티의 노력으로 만들어지는 것을 목표로 합니다. Vulkan은 아직 매우 새로운 API이며 모범 사례가 아직 제대로 확립되지 않았습니다. 튜토리얼과 사이트 자체에 대한 어떤 종류의 피드백이라도 있다면 GitHub 저장소에 이슈나 풀 리퀘스트를 제출해 주시기 바랍니다. 저장소를 watch하면 튜토리얼 업데이트 알림을 받을 수 있습니다.

화면에 첫 Vulkan 삼각형을 그리는 의식을 마친 후에는, 선형 변환, 텍스처, 3D 모델을 포함하도록 프로그램을 확장할 것입니다.

이전에 그래픽스 API들을 다뤄보셨다면, 화면에 첫 기하도형이 나타날 때까지 많은 단계가 필요하다는 것을 아실 것입니다. Vulkan에도 이러한 초기 단계들이 많이 있지만, 각각의 개별 단계들이 이해하기 쉽고 불필요하게 느껴지지 않는다는 것을 보게 될 것입니다. 또한 지루해 보이는 삼각형을 그리고 나면, 완전히 텍스처가 입혀진 3D 모델을 그리는 데는 그렇게 많은 추가 작업이 필요하지 않고, 그 이후의 각 단계는 훨씬 더 보람있다는 점을 기억하는 것이 중요합니다.

튜토리얼을 따라하다가 문제가 발생하면, 먼저 FAQ를 확인하여 문제와 해결책이 이미 나열되어 있는지 확인하세요. 그래도 해결되지 않는다면 가장 관련된 챕터의 댓글 섹션에서 자유롭게 도움을 요청하세요.

고성능 그래픽스 API의 미래로 뛰어들 준비가 되셨나요? 시작해봅시다!

라이선스

달리 명시되지 않는 한 이 콘텐츠는 CC BY-SA 4.0 라이선스 하에 제공됩니다. 기여하면서, 귀하는 동일한 라이선스 하에 귀하의 기여를 공개적으로 라이선스하는 것에 동의합니다.

소스 저장소의 code 디렉토리에 있는 코드 목록은 CC0 1.0 Universal 하에 라이선스됩니다. 해당 디렉토리에 기여하면서, 귀하는 동일한 퍼블릭 도메인과 같은 라이선스 하에 귀하의 기여를 공개적으로 라이선스하는 것에 동의합니다.

이 프로그램은 유용하게 사용될 수 있기를 바라며 배포되지만, 특정 목적에 대한 적합성이나 상품성에 대한 보증을 포함한 어떠한 보증도 제공하지 않습니다.

개요

이 챕터에서는 Vulkan과 이것이 해결하고자 하는 문제들에 대한 소개로 시작하겠습니다. 그 다음에는 첫 삼각형을 그리는 데 필요한 요소들을 살펴볼 것입니다. 이를 통해 이후의 각 챕터들을 이해하는 데 도움이 될 큰 그림을 제공할 것입니다. 마지막으로 Vulkan API의 구조와 일반적인 사용 패턴을 다룰 것입니다.

Vulkan의 기원

이전의 그래픽스 API들처럼, Vulkan도 GPU에 대한 크로스 플랫폼 추상화로 설계되었습니다. 이러한 API들 대부분의 문제는 설계된 시대의 그래픽스 하드웨어가 대부분 설정 가능한 고정 기능으로 제한되어 있었다는 점입니다. 프로그래머들은 정해진 형식으로 버텍스 데이터를 제공해야 했고, 조명과 셰이딩 옵션에 관해서는 GPU 제조사의 결정에 따를 수밖에 없었습니다.

그래픽 카드 아키텍처가 발전하면서, 점점 더 많은 프로그래밍 가능한 기능을 제공하기 시작했습니다. 이 모든 새로운 기능들을 어떻게든 기존 API에 통합해야 했습니다. 이로 인해 이상적이지 않은 추상화가 생기고, 프로그래머의 의도를 현대적인 그래픽스 아키텍처에 매핑하기 위해 그래픽스 드라이버 측에서 많은 추측 작업을 해야 했습니다. 이것이 게임의 성능을 때로는 상당한 폭으로 개선하는 드라이버 업데이트가 많은 이유입니다. 이러한 드라이버의 복잡성 때문에, 애플리케이션 개발자들은 셰이더에서 허용되는 문법과 같은 벤더 간의 불일치도 처리해야 합니다. 이러한 새로운 기능들 외에도, 지난 10년 동안 강력한 그래픽스 하드웨어를 탑재한 모바일 기기들이 대거 등장했습니다. 이러한 모바일 GPU들은 에너지와 공간 요구사항에 따라 다른 아키텍처를 가지고 있습니다. 한 가지 예로 타일드 렌더링이 있는데, 이 기능에 대한 더 많은 제어권을 프로그래머에게 제공함으로써 성능 향상의 이점을 얻을 수 있을 것입니다. 이러한 API들의 시대에서 비롯된 또 다른 제한사항은 제한된 멀티스레딩 지원으로, CPU 측면에서 병목 현상이 발생할 수 있습니다.

Vulkan은 현대적인 그래픽스 아키텍처를 위해 처음부터 새로 설계됨으로써 이러한 문제들을 해결합니다. 더 자세한 API를 사용하여 프로그래머가 자신의 의도를 명확하게 지정할 수 있게 함으로써 드라이버 오버헤드를 줄이고, 여러 스레드가 병렬로 명령을 생성하고 제출할 수 있게 합니다. 단일 컴파일러로 표준화된 바이트코드 형식을 사용함으로써 셰이더 컴파일의 불일치를 줄입니다. 마지막으로, 그래픽스와 컴퓨트 기능을 하나의 API로 통합함으로써 현대 그래픽 카드의 범용 처리 능력을 인정합니다.

삼각형을 그리는 데 필요한 것들

이제 제대로 작동하는 Vulkan 프로그램에서 삼각형을 렌더링하는 데 필요한 모든 단계들을 개괄적으로 살펴보겠습니다. 여기서 소개되는 모든 개념들은 다음 챕터들에서 자세히 설명될 것입니다. 이것은 단지 개별 구성 요소들을 전체적으로 이해하는 데 도움이 될 큰 그림을 제공하기 위한 것입니다.

단계 1 - 인스턴스와 물리 장치 선택

Vulkan 애플리케이션은 VkInstance를 통해 Vulkan API를 설정하는 것으로 시작합니다. 인스턴스는 애플리케이션과 사용할 API 확장을 설명하여 생성됩니다. 인스턴스를 생성한 후에는 Vulkan을 지원하는 하드웨어를 조회하고 작업에 사용할 하나 이상의 VkPhysicalDevice를 선택할 수 있습니다. VRAM 크기와 장치 기능과 같은 속성을 조회하여 원하는 장치를 선택할 수 있습니다. 예를 들어 전용 그래픽 카드를 선호하도록 할 수 있습니다.

단계 2 - 논리 장치와 큐 패밀리

사용할 적절한 하드웨어 장치를 선택한 후에는 VkDevice(논리 장치)를 생성해야 하는데, 여기서 멀티 뷰포트 렌더링과 64비트 부동소수점과 같이 사용할 VkPhysicalDeviceFeatures를 더 구체적으로 설명합니다. 또한 사용하고자 하는 큐 패밀리를 지정해야 합니다. 드로우 명령과 메모리 작업과 같은 Vulkan에서 수행되는 대부분의 작업은 VkQueue에 제출하여 비동기적으로 실행됩니다. 큐는 큐 패밀리에서 할당되며, 각 큐 패밀리는 해당 큐에서 특정 작업 집합을 지원합니다. 예를 들어, 그래픽스, 컴퓨트, 메모리 전송 작업을 위한 별도의 큐 패밀리가 있을 수 있습니다. 큐 패밀리의 가용성은 물리 장치 선택의 구별 요소로도 사용될 수 있습니다. Vulkan을 지원하는 장치가 그래픽스 기능을 전혀 제공하지 않을 수도 있지만, 오늘날 Vulkan을 지원하는 모든 그래픽 카드는 일반적으로 우리가 관심 있는 모든 큐 작업을 지원합니다.

단계 3 - 윈도우 서피스와 스왑 체인

오프스크린 렌더링에만 관심이 있는 것이 아니라면, 렌더링된 이미지를 표시할 윈도우를 생성해야 합니다. 윈도우는 네이티브 플랫폼 API나 GLFW, SDL과 같은 라이브러리를 사용하여 생성할 수 있습니다. 이 튜토리얼에서는 GLFW를 사용할 것이지만, 이에 대해서는 다음 챕터에서 더 자세히 다룰 것입니다.

윈도우에 실제로 렌더링하기 위해서는 두 가지 구성 요소가 더 필요합니다: 윈도우 서피스(VkSurfaceKHR)와 스왑 체인(VkSwapchainKHR)입니다. KHR 접미사에 주목하세요. 이는 이러한 객체들이 Vulkan 확장의 일부임을 의미합니다. Vulkan API 자체는 완전히 플랫폼에 구애받지 않기 때문에, 윈도우 관리자와 상호 작용하기 위해 표준화된 WSI(Window System Interface) 확장을 사용해야 합니다. 서피스는 렌더링할 윈도우에 대한 크로스 플랫폼 추상화이며, 일반적으로 네이티브 윈도우 핸들(예: Windows의 HWND)에 대한 참조를 제공하여 인스턴스화됩니다. 다행히도 GLFW 라이브러리에는 이러한 플랫폼별 세부 사항을 처리하는 내장 함수가 있습니다.

스왑 체인은 렌더 타겟의 집합입니다. 기본적인 목적은 현재 렌더링 중인 이미지가 현재 화면에 표시되고 있는 이미지와 다르도록 보장하는 것입니다. 이는 완성된 이미지만 표시되도록 하는 데 중요합니다. 프레임을 그리고 싶을 때마다 스왑 체인에 렌더링할 이미지를 요청해야 합니다. 프레임 그리기를 마치면 이미지는 스왑 체인으로 반환되어 어느 시점에 화면에 표시됩니다. 렌더 타겟의 수와 완성된 이미지를 화면에 표시하는 조건은 프레젠트 모드에 따라 다릅니다. 일반적인 프레젠트 모드로는 더블 버퍼링(vsync)과 트리플 버퍼링이 있습니다. 이에 대해서는 스왑 체인 생성 챕터에서 살펴볼 것입니다.

일부 플랫폼에서는 VK_KHR_display와 VK_KHR_display_swapchain 확장을 통해 윈도우 관리자와 상호 작용하지 않고 직접 디스플레이에 렌더링할 수 있습니다. 이를 통해 전체 화면을 나타내는 서피스를 생성할 수 있으며, 예를 들어 자체 윈도우 관리자를 구현하는 데 사용할 수 있습니다.

단계 4 - 이미지 뷰와 프레임버퍼

스왑 체인에서 가져온 이미지에 그리기 위해서는 VkImageView와 VkFramebuffer로 래핑해야 합니다. 이미지 뷰는 사용할 이미지의 특정 부분을 참조하고, 프레임버퍼는 컬러, 깊이, 스텐실 타겟으로 사용할 이미지 뷰를 참조합니다. 스왑 체인에 많은 다른 이미지들이 있을 수 있으므로, 각각에 대한 이미지 뷰와 프레임버퍼를 미리 생성하고 그리기 시에 적절한 것을 선택할 것입니다.

단계 5 - 렌더 패스

Vulkan의 렌더 패스는 렌더링 작업 중에 사용되는 이미지의 유형, 그것들이 어떻게 사용될 것인지, 그리고 그 내용이 어떻게 처리되어야 하는지를 설명합니다. 초기 삼각형 렌더링 애플리케이션에서는 하나의 이미지를 컬러 타겟으로 사용할 것이며, 그리기 작업 직전에 단색으로 지우기를 원한다고 Vulkan에 알릴 것입니다. 렌더 패스가 이미지의 유형만 설명하는 반면, VkFramebuffer는 실제로 특정 이미지를 이러한 슬롯에 바인딩합니다.

단계 6 - 그래픽스 파이프라인

Vulkan의 그래픽스 파이프라인은 VkPipeline 객체를 생성하여 설정됩니다. 이는 뷰포트 크기와 깊이 버퍼 작업과 같은 그래픽 카드의 구성 가능한 상태와 VkShaderModule 객체를 사용한 프로그래밍 가능한 상태를 설명합니다. VkShaderModule 객체들은 셰이더 바이트코드로부터 생성됩니다. 드라이버는 또한 파이프라인에서 어떤 렌더 타겟이 사용될 것인지 알아야 하는데, 이는 렌더 패스를 참조하여 지정합니다.

기존 API들과 비교했을 때 Vulkan의 가장 독특한 특징 중 하나는 그래픽스 파이프라인의 거의 모든 구성을 미리 설정해야 한다는 것입니다. 이는 다른 셰이더로 전환하거나 버텍스 레이아웃을 약간 변경하려면 그래픽스 파이프라인을 완전히 다시 생성해야 한다는 의미입니다. 즉, 렌더링 작업에 필요한 모든 다른 조합에 대해 미리 많은 VkPipeline 객체를 생성해야 합니다. 뷰포트 크기와 클리어 색상과 같은 일부 기본 구성만 동적으로 변경할 수 있습니다. 또한 모든 상태를 명시적으로 설명해야 하며, 예를 들어 기본 색상 블렌드 상태가 없습니다.

좋은 소식은 Just-In-Time 컴파일 대신 Ahead-Of-Time 컴파일과 같은 작업을 수행하기 때문에, 드라이버에 더 많은 최적화 기회가 있고 런타임 성능이 더 예측 가능하다는 것입니다. 다른 그래픽스 파이프라인으로 전환하는 것과 같은 큰 상태 변경이 매우 명시적으로 이루어지기 때문입니다.

단계 7 - 커맨드 풀과 커맨드 버퍼

앞서 언급했듯이, 그리기 작업과 같이 실행하고자 하는 Vulkan의 많은 작업들은 큐에 제출되어야 합니다. 이러한 작업들은 제출되기 전에 먼저 VkCommandBuffer에 기록되어야 합니다. 이러한 커맨드 버퍼들은 특정 큐 패밀리와 연관된 VkCommandPool에서 할당됩니다. 간단한 삼각형을 그리기 위해서는 다음 작업들을 포함하는 커맨드 버퍼를 기록해야 합니다:

렌더 패스 시작
그래픽스 파이프라인 바인딩
3개의 버텍스 그리기
렌더 패스 종료

프레임버퍼의 이미지는 스왑 체인이 제공할 특정 이미지에 따라 달라지므로, 가능한 각 이미지에 대해 커맨드 버퍼를 기록하고 그리기 시점에 적절한 것을 선택해야 합니다. 대안은 매 프레임마다 커맨드 버퍼를 다시 기록하는 것이지만, 이는 그다지 효율적이지 않습니다.

단계 8 - 메인 루프

이제 그리기 명령이 커맨드 버퍼에 래핑되었으므로, 메인 루프는 꽤 간단합니다. 먼저 vkAcquireNextImageKHR로 스왑 체인에서 이미지를 가져옵니다. 그런 다음 해당 이미지에 대한 적절한 커맨드 버퍼를 선택하고 vkQueueSubmit으로 실행합니다. 마지막으로 vkQueuePresentKHR로 이미지를 스왑 체인에 반환하여 화면에 표시합니다.

큐에 제출된 작업들은 비동기적으로 실행됩니다. 따라서 올바른 실행 순서를 보장하기 위해 세마포어와 같은 동기화 객체를 사용해야 합니다. 그리기 커맨드 버퍼의 실행은 이미지 획득이 완료될 때까지 기다리도록 설정되어야 합니다. 그렇지 않으면 화면에 표시하기 위해 아직 읽고 있는 이미지에 렌더링을 시작할 수 있습니다. 마찬가지로 vkQueuePresentKHR 호출은 렌더링이 완료될 때까지 기다려야 하며, 이를 위해 렌더링이 완료된 후 신호를 보내는 두 번째 세마포어를 사용할 것입니다.

요약

이 빠른 둘러보기를 통해 첫 삼각형을 그리기 위한 작업에 대한 기본적인 이해를 얻으셨을 것입니다. 실제 프로그램에는 버텍스 버퍼 할당, 유니폼 버퍼 생성, 텍스처 이미지 업로드와 같은 더 많은 단계가 포함되며 이는 이후 챕터에서 다룰 것입니다. 하지만 Vulkan은 그 자체로 학습 곡선이 충분히 가파르기 때문에 간단하게 시작할 것입니다. 처음에는 버텍스 버퍼 대신 버텍스 좌표를 버텍스 셰이더에 직접 포함시키는 방식으로 약간의 편법을 사용할 것입니다. 이는 버텍스 버퍼 관리에 먼저 커맨드 버퍼에 대한 친숙도가 필요하기 때문입니다.

요약하자면, 첫 삼각형을 그리기 위해서는 다음이 필요합니다:

VkInstance 생성
지원되는 그래픽 카드 선택(VkPhysicalDevice)
그리기와 표시를 위한 VkDevice와 VkQueue 생성
윈도우, 윈도우 서피스, 스왑 체인 생성
스왑 체인 이미지를 VkImageView로 래핑
렌더 타겟과 사용법을 지정하는 렌더 패스 생성
렌더 패스를 위한 프레임버퍼 생성
그래픽스 파이프라인 설정
가능한 모든 스왑 체인 이미지에 대한 그리기 명령이 포함된 커맨드 버퍼 할당 및 기록
이미지 획득, 적절한 그리기 커맨드 버퍼 제출, 이미지를 스왑 체인에 반환하여 프레임 그리기

단계가 많지만, 각 개별 단계의 목적은 앞으로의 챕터에서 매우 간단하고 명확하게 설명될 것입니다. 전체 프로그램과 비교하여 단일 단계의 관계가 혼란스럽다면 이 챕터를 다시 참조하시기 바랍니다.

API 개념

이 챕터는 Vulkan API가 하위 레벨에서 어떻게 구조화되어 있는지에 대한 간단한 개요로 마무리하겠습니다.

코딩 규칙

모든 Vulkan 함수, 열거형, 구조체는 LunarG에서 개발한 Vulkan SDK에 포함된 vulkan.h 헤더에 정의되어 있습니다. 다음 챕터에서 이 SDK를 설치하는 방법을 살펴볼 것입니다.

함수는 소문자 vk 접두사를, 열거형과 구조체 같은 타입은 Vk 접두사를, 열거형 값은 VK_ 접두사를 가집니다. API는 함수에 매개변수를 전달하기 위해 구조체를 많이 사용합니다. 예를 들어, 객체 생성은 일반적으로 다음과 같은 패턴을 따릅니다:

VkXXXCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_XXX_CREATE_INFO;
createInfo.pNext = nullptr;
createInfo.foo = ...;
createInfo.bar = ...;

VkXXX object;
if (vkCreateXXX(&createInfo, nullptr, &object) != VK_SUCCESS) {
    std::cerr << "failed to create object" << std::endl;
    return false;
}

Vulkan의 많은 구조체에서는 sType 멤버에 구조체의 타입을 명시적으로 지정해야 합니다. pNext 멤버는 확장 구조체를 가리킬 수 있으며 이 튜토리얼에서는 항상 nullptr일 것입니다. 객체를 생성하거나 파괴하는 함수들은 드라이버 메모리에 대한 사용자 정의 할당자를 사용할 수 있게 하는 VkAllocationCallbacks 매개변수를 가지며, 이 또한 이 튜토리얼에서는 nullptr로 둘 것입니다.

거의 모든 함수는 VK_SUCCESS 또는 오류 코드인 VkResult를 반환합니다. 명세서는 각 함수가 반환할 수 있는 오류 코드와 그 의미를 설명합니다.

검증 계층

앞서 언급했듯이, Vulkan은 높은 성능과 낮은 드라이버 오버헤드를 위해 설계되었습니다. 따라서 기본적으로는 매우 제한된 오류 검사와 디버깅 기능만을 포함합니다. 잘못된 작업을 수행하면 드라이버는 오류 코드를 반환하는 대신 종종 충돌할 것이며, 더 나쁜 경우에는 사용자의 그래픽 카드에서는 작동하는 것처럼 보이다가 다른 카드에서는 완전히 실패할 수 있습니다.

Vulkan은 검증 계층이라고 알려진 기능을 통해 광범위한 검사를 활성화할 수 있게 합니다. 검증 계층은 함수 매개변수에 대한 추가 검사를 실행하고 메모리 관리 문제를 추적하는 등의 작업을 수행하기 위해 API와 그래픽스 드라이버 사이에 삽입될 수 있는 코드 조각입니다. 좋은 점은 개발 중에는 이것들을 활성화하고 애플리케이션을 릴리스할 때는 완전히 비활성화하여 오버헤드를 0으로 만들 수 있다는 것입니다. 누구나 자신만의 검증 계층을 작성할 수 있지만, LunarG의 Vulkan SDK는 이 튜토리얼에서 사용할 표준 검증 계층 세트를 제공합니다. 또한 계층으로부터 디버그 메시지를 받기 위한 콜백 함수를 등록해야 합니다.

Vulkan은 모든 작업에 대해 매우 명시적이고 검증 계층이 매우 광범위하기 때문에, OpenGL과 Direct3D에 비해 화면이 검은색인 이유를 찾아내는 것이 실제로 더 쉬울 수 있습니다!

코드 작성을 시작하기 전에 한 단계가 더 남아있습니다. 바로 개발 환경 설정입니다.

개발 환경

이 챕터에서는 Vulkan 애플리케이션 개발을 위한 환경을 설정하고 몇 가지 유용한 라이브러리를 설치할 것입니다. 컴파일러를 제외한 우리가 사용할 모든 도구들은 Windows, Linux, MacOS와 호환되지만, 설치 단계가 조금씩 다르기 때문에 여기서는 별도로 설명합니다.

Windows

Windows에서 개발하신다면, Visual Studio를 사용하여 코드를 컴파일한다고 가정하겠습니다. 완전한 C++17 지원을 위해서는 Visual Studio 2017이나 2019를 사용해야 합니다. 아래 설명된 단계들은 VS 2017을 기준으로 작성되었습니다.

Vulkan SDK

Vulkan 애플리케이션을 개발하는 데 가장 중요한 구성 요소는 SDK입니다. SDK는 헤더, 표준 검증 계층, 디버깅 도구, 그리고 Vulkan 함수를 위한 로더를 포함합니다. 이 로더는 OpenGL의 GLEW와 비슷하게 - 익숙하시다면 - 런타임에 드라이버에서 함수들을 찾습니다.

SDK는 LunarG 웹사이트에서 페이지 하단의 버튼을 사용하여 다운로드할 수 있습니다. 계정을 만들 필요는 없지만, 계정이 있으면 유용할 수 있는 추가 문서에 접근할 수 있습니다.

설치를 진행하면서 SDK의 설치 위치에 주의를 기울이세요. 가장 먼저 할 일은 그래픽 카드와 드라이버가 Vulkan을 제대로 지원하는지 확인하는 것입니다. SDK를 설치한 디렉토리로 가서 Bin 디렉토리를 열고 vkcube.exe 데모를 실행하세요. 다음과 같은 화면이 보여야 합니다:

오류 메시지가 나타난다면 드라이버가 최신 상태인지, Vulkan 런타임을 포함하고 있는지, 그리고 그래픽 카드가 지원되는지 확인하세요. 주요 벤더의 드라이버 링크는 소개 챕터를 참조하세요.

이 디렉토리에는 개발에 유용한 또 다른 프로그램이 있습니다. glslangValidator.exe와 glslc.exe 프로그램은 사람이 읽을 수 있는 GLSL을 바이트코드로 컴파일하는 데 사용됩니다. 이에 대해서는 셰이더 모듈 챕터에서 자세히 다룰 것입니다. Bin 디렉토리에는 또한 Vulkan 로더와 검증 계층의 바이너리가 들어있고, Lib 디렉토리에는 라이브러리가 들어있습니다.

마지막으로, Vulkan 헤더가 포함된 Include 디렉토리가 있습니다. 다른 파일들도 자유롭게 살펴보셔도 되지만, 이 튜토리얼에서는 필요하지 않을 것입니다.

GLFW

앞서 언급했듯이, Vulkan 자체는 플랫폼에 구애받지 않는 API이며 렌더링된 결과를 표시할 윈도우를 생성하는 도구를 포함하지 않습니다. Vulkan의 크로스 플랫폼 이점을 활용하고 Win32의 복잡함을 피하기 위해, Windows, Linux, MacOS를 지원하는 GLFW 라이브러리를 사용하여 윈도우를 생성할 것입니다. 이 목적으로 SDL 같은 다른 라이브러리들도 있지만, GLFW의 장점은 윈도우 생성 외에도 Vulkan의 다른 플랫폼 특정적인 부분들도 추상화한다는 것입니다.

GLFW의 최신 릴리스는 공식 웹사이트에서 찾을 수 있습니다. 이 튜토리얼에서는 64비트 바이너리를 사용할 것이지만, 물론 32비트 모드로 빌드하는 것을 선택할 수도 있습니다. 그 경우 Lib 대신 Lib32 디렉토리에 있는 Vulkan SDK 바이너리와 링크해야 합니다. 다운로드 후, 아카이브를 편리한 위치에 압축 해제하세요. 저는 문서의 Visual Studio 디렉토리 아래에 Libraries 디렉토리를 만들기로 했습니다.

GLM

DirectX 12와 달리, Vulkan은 선형 대수 연산을 위한 라이브러리를 포함하지 않으므로 하나를 다운로드해야 합니다. GLM은 그래픽스 API와 함께 사용하도록 설계된 좋은 라이브러리이며 OpenGL에서도 일반적으로 사용됩니다.

GLM은 헤더 전용 라이브러리이므로 최신 버전을 다운로드하여 편리한 위치에 저장하기만 하면 됩니다. 이제 다음과 같은 디렉토리 구조를 가지게 될 것입니다:

Visual Studio 설정

이제 모든 의존성을 설치했으므로 Vulkan을 위한 기본 Visual Studio 프로젝트를 설정하고 모든 것이 제대로 작동하는지 확인하기 위해 약간의 코드를 작성할 수 있습니다.

Visual Studio를 시작하고 이름을 입력하고 확인을 눌러 새로운 Windows Desktop Wizard 프로젝트를 만드세요.

디버그 메시지를 출력할 곳이 있도록 애플리케이션 유형으로 Console Application (.exe)가 선택되어 있는지 확인하고, Visual Studio가 상용구 코드를 추가하지 않도록 Empty Project를 체크하세요.

확인을 눌러 프로젝트를 만들고 C++ 소스 파일을 추가하세요. 이미 이 방법을 알고 계시겠지만, 완전성을 위해 단계들이 여기 포함되어 있습니다.

이제 다음 코드를 파일에 추가하세요. 지금 당장 이해하려고 하지 마세요; 우리는 단지 Vulkan 애플리케이션을 컴파일하고 실행할 수 있는지 확인하고 있는 것입니다. 다음 챕터에서 처음부터 시작할 것입니다.

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>

#include <iostream>

int main() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Vulkan window", nullptr, nullptr);

    uint32_t extensionCount = 0;
    vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

    std::cout << extensionCount << " extensions supported\n";

    glm::mat4 matrix;
    glm::vec4 vec;
    auto test = matrix * vec;

    while(!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }

    glfwDestroyWindow(window);

    glfwTerminate();

    return 0;
}

이제 오류를 제거하기 위해 프로젝트를 구성해 보겠습니다. 프로젝트 속성 대화 상자를 열고 대부분의 설정이 Debug와 Release 모드 모두에 적용되므로 All Configurations가 선택되어 있는지 확인하세요.

C++ -> General -> Additional Include Directories로 가서 드롭다운 박스에서 <Edit...>를 누르세요.

Vulkan, GLFW, GLM의 헤더 디렉토리를 추가하세요:

다음으로, Linker -> General에서 라이브러리 디렉토리 편집기를 여세요:

그리고 Vulkan과 GLFW의 오브젝트 파일 위치를 추가하세요:

Linker -> Input으로 가서 Additional Dependencies 드롭다운 박스에서 <Edit...>를 누르세요.

Vulkan과 GLFW 오브젝트 파일의 이름을 입력하세요:

그리고 마지막으로 C++17 기능을 지원하도록 컴파일러를 변경하세요:

이제 프로젝트 속성 대화 상자를 닫을 수 있습니다. 모든 것을 올바르게 했다면 코드에서 더 이상 오류가 강조 표시되지 않아야 합니다.

마지막으로, 실제로 64비트 모드에서 컴파일하고 있는지 확인하세요:

F5를 눌러 프로젝트를 컴파일하고 실행하면 다음과 같이 명령 프롬프트와 창이 나타나야 합니다:

확장 개수가 0이 아니어야 합니다. 축하합니다, 이제 Vulkan과 놀아볼 준비가 되었습니다!

Linux

이 지침들은 Ubuntu, Fedora, Arch Linux 사용자를 대상으로 하지만, 패키지 관리자별 명령을 자신에게 맞는 것으로 변경하여 따라할 수 있을 것입니다. C++17을 지원하는 컴파일러(GCC 7+ 또는 Clang 5+)가 있어야 합니다. 또한 make도 필요합니다.

Vulkan 패키지

Linux에서 Vulkan 애플리케이션을 개발하는 데 가장 중요한 구성 요소는 Vulkan 로더, 검증 계층, 그리고 시스템이 Vulkan을 지원하는지 테스트하기 위한 몇 가지 명령줄 유틸리티입니다:

sudo apt install vulkan-tools 또는 sudo dnf install vulkan-tools: 명령줄 유틸리티, 가장 중요한 것은 vulkaninfo와 vkcube입니다. 이것들을 실행하여 시스템이 Vulkan을 지원하는지 확인하세요.
sudo apt install libvulkan-dev 또는 sudo dnf install vulkan-loader-devel: Vulkan 로더를 설치합니다. 이 로더는 OpenGL의 GLEW와 비슷하게 - 익숙하시다면 - 런타임에 드라이버에서 함수들을 찾습니다.
sudo apt install vulkan-validationlayers spirv-tools 또는 sudo dnf install mesa-vulkan-devel vulkan-validation-layers-devel: 표준 검증 계층과 필요한 SPIR-V 도구를 설치합니다. 이것들은 Vulkan 애플리케이션을 디버깅할 때 매우 중요하며, 다음 챕터에서 설명할 것입니다.

Arch Linux에서는 sudo pacman -S vulkan-devel을 실행하여 위의 모든 필요한 도구를 설치할 수 있습니다.

설치가 성공적이었다면, Vulkan 부분은 모두 준비된 것입니다. vkcube를 실행하여 다음과 같은 창이 나타나는지 확인하세요:

X Window System과 XFree86-VidModeExtension

시스템에 이 라이브러리들이 없을 수 있습니다. 없다면 다음 명령어로 설치할 수 있습니다:

sudo apt install libxxf86vm-dev 또는 dnf install libXxf86vm-devel: XFree86-VidModeExtension에 대한 인터페이스를 제공합니다.
sudo apt install libxi-dev 또는 dnf install libXi-devel: XINPUT 확장에 대한 X Window System 클라이언트 인터페이스를 제공합니다.

GLFW

앞서 언급했듯이, Vulkan 자체는 플랫폼에 구애받지 않는 API이며 렌더링된 결과를 표시할 윈도우를 생성하는 도구를 포함하지 않습니다. Vulkan의 크로스 플랫폼 이점을 활용하고 X11의 복잡함을 피하기 위해, Windows, Linux, MacOS를 지원하는 GLFW 라이브러리를 사용하여 윈도우를 생성할 것입니다. 이 목적으로 SDL 같은 다른 라이브러리들도 있지만, GLFW의 장점은 윈도우 생성 외에도 Vulkan의 다른 플랫폼 특정적인 부분들도 추상화한다는 것입니다.

다음 명령어로 GLFW를 설치할 것입니다:

sudo apt install libglfw3-dev

또는

sudo dnf install glfw-devel

또는

sudo pacman -S glfw

GLM

이것은 libglm-dev 또는 glm-devel 패키지에서 설치할 수 있는 헤더 전용 라이브러리입니다:

sudo apt install libglm-dev

또는

sudo dnf install glm-devel

또는

sudo pacman -S glm

셰이더 컴파일러

거의 모든 것이 준비되었지만, 사람이 읽을 수 있는 GLSL을 바이트코드로 컴파일하는 프로그램이 필요합니다.

Khronos Group의 glslangValidator와 Google의 glslc, 이 두 가지가 인기 있는 셰이더 컴파일러입니다. 후자는 GCC와 Clang과 비슷한 사용법을 가지고 있어서 우리는 이것을 선택할 것입니다: Ubuntu에서는 Google의 비공식 바이너리를 다운로드하고 glslc를 /usr/local/bin에 복사하세요. 권한에 따라 sudo가 필요할 수 있습니다. Fedora에서는 sudo dnf install glslc를, Arch Linux에서는 sudo pacman -S shaderc를 실행하세요. 테스트를 위해 glslc를 실행하면 컴파일할 셰이더를 전달하지 않았다고 정당하게 불평해야 합니다:

glslc: error: no input files

glslc에 대해서는 셰이더 모듈 챕터에서 자세히 다룰 것입니다.

메이크파일 프로젝트 설정하기

이제 모든 의존성을 설치했으므로 Vulkan을 위한 기본 메이크파일 프로젝트를 설정하고 모든 것이 제대로 작동하는지 확인하기 위해 약간의 코드를 작성할 수 있습니다.

VulkanTest와 같은 이름으로 편리한 위치에 새 디렉토리를 만드세요. main.cpp라는 소스 파일을 만들고 다음 코드를 삽입하세요. 지금 당장 이해하려고 하지 마세요; 우리는 단지 Vulkan 애플리케이션을 컴파일하고 실행할 수 있는지 확인하고 있는 것입니다. 다음 챕터에서 처음부터 시작할 것입니다.

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>

#include <iostream>

int main() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Vulkan window", nullptr, nullptr);

    uint32_t extensionCount = 0;
    vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

    std::cout << extensionCount << " extensions supported\n";

    glm::mat4 matrix;
    glm::vec4 vec;
    auto test = matrix * vec;

    while(!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }

    glfwDestroyWindow(window);

    glfwTerminate();

    return 0;
}

다음으로, 이 기본적인 Vulkan 코드를 컴파일하고 실행하기 위한 메이크파일을 작성해보겠습니다. Makefile이라는 이름의 새 빈 파일을 만드세요. 변수와 규칙이 어떻게 작동하는지와 같은 메이크파일에 대한 기본적인 경험이 있다고 가정하겠습니다. 그렇지 않다면, 이 튜토리얼로 빠르게 배울 수 있습니다.

먼저 파일의 나머지 부분을 단순화하기 위해 몇 가지 변수를 정의하겠습니다. 기본 컴파일러 플래그를 지정할 CFLAGS 변수를 정의하세요:

CFLAGS = -std=c++17 -O2

우리는 현대적인 C++(-std=c++17)를 사용할 것이며, 최적화 레벨을 O2로 설정할 것입니다. 프로그램을 더 빨리 컴파일하기 위해 -O2를 제거할 수 있지만, 릴리스 빌드에서는 다시 넣는 것을 잊지 말아야 합니다.

비슷하게, 링커 플래그를 LDFLAGS 변수에 정의하세요:

LDFLAGS = -lglfw -lvulkan -ldl -lpthread -lX11 -lXxf86vm -lXrandr -lXi

-lglfw 플래그는 GLFW용이고, -lvulkan은 Vulkan 함수 로더와 링크하며, 나머지 플래그들은 GLFW가 필요로 하는 저수준 시스템 라이브러리입니다. 나머지 플래그들은 GLFW 자체의 의존성입니다: 스레딩과 윈도우 관리를 위한 것입니다.

Xxf86vm과 Xi 라이브러리가 아직 시스템에 설치되어 있지 않을 수 있습니다. 다음 패키지들에서 찾을 수 있습니다:

sudo apt install libxxf86vm-dev libxi-dev

또는

sudo dnf install libXi-devel libXxf86vm-devel

또는

sudo pacman -S libxi libxxf86vm

이제 VulkanTest를 컴파일하는 규칙을 지정하는 것은 간단합니다. 들여쓰기에 공백 대신 탭을 사용해야 합니다.

VulkanTest: main.cpp
    g++ $(CFLAGS) -o VulkanTest main.cpp $(LDFLAGS)

메이크파일을 저장하고 main.cpp와 Makefile이 있는 디렉토리에서 make를 실행하여 이 규칙이 작동하는지 확인하세요. 이렇게 하면 VulkanTest 실행 파일이 생성되어야 합니다.

이제 두 가지 규칙을 더 정의하겠습니다. test는 실행 파일을 실행하고 clean은 빌드된 실행 파일을 제거합니다:

.PHONY: test clean

test: VulkanTest
    ./VulkanTest

clean:
    rm -f VulkanTest

make test를 실행하면 프로그램이 성공적으로 실행되고 Vulkan 확장 개수가 표시되어야 합니다. 빈 창을 닫으면 애플리케이션이 성공 반환 코드(0)와 함께 종료되어야 합니다. 이제 다음과 같은 완전한 메이크파일이 있어야 합니다:

CFLAGS = -std=c++17 -O2
LDFLAGS = -lglfw -lvulkan -ldl -lpthread -lX11 -lXxf86vm -lXrandr -lXi

VulkanTest: main.cpp
    g++ $(CFLAGS) -o VulkanTest main.cpp $(LDFLAGS)

.PHONY: test clean

test: VulkanTest
    ./VulkanTest

clean:
    rm -f VulkanTest

이제 이 디렉토리를 Vulkan 프로젝트의 템플릿으로 사용할 수 있습니다. 복사본을 만들고 HelloTriangle과 같은 이름으로 변경한 다음 main.cpp의 모든 코드를 제거하세요.

이제 진짜 모험을 시작할 준비가 되었습니다.

MacOS

이 지침은 Xcode와 Homebrew 패키지 관리자를 사용한다고 가정합니다. 또한, MacOS 버전 10.11 이상이 필요하고 기기가 Metal API를 지원해야 한다는 점을 기억하세요.

Vulkan SDK

MacOS용 SDK 버전은 내부적으로 MoltenVK를 사용합니다. MacOS에는 Vulkan에 대한 네이티브 지원이 없으므로, MoltenVK는 실제로 Vulkan API 호출을 Apple의 Metal 그래픽스 프레임워크로 변환하는 계층 역할을 합니다. 이를 통해 Apple의 Metal 프레임워크의 디버깅과 성능 이점을 활용할 수 있습니다.

다운로드 후, 내용을 원하는 폴더에 추출하기만 하면 됩니다(Xcode에서 프로젝트를 만들 때 참조해야 한다는 점을 기억하세요). 추출된 폴더 안의 Applications 폴더에 SDK를 사용하여 몇 가지 데모를 실행할 수 있는 실행 파일들이 있어야 합니다. vkcube 실행 파일을 실행하면 다음과 같은 화면이 나타날 것입니다:

GLFW

앞서 언급했듯이, Vulkan 자체는 플랫폼에 구애받지 않는 API이며 렌더링된 결과를 표시할 윈도우를 생성하는 도구를 포함하지 않습니다. Windows, Linux, MacOS를 지원하는 GLFW 라이브러리를 사용하여 윈도우를 생성할 것입니다. 이 목적으로 SDL 같은 다른 라이브러리들도 있지만, GLFW의 장점은 윈도우 생성 외에도 Vulkan의 다른 플랫폼 특정적인 부분들도 추상화한다는 것입니다.

MacOS에 GLFW를 설치하기 위해 Homebrew 패키지 관리자를 사용하여 glfw 패키지를 설치할 것입니다:

brew install glfw

GLM

Vulkan은 선형 대수 연산을 위한 라이브러리를 포함하지 않으므로 하나를 다운로드해야 합니다. GLM은 그래픽스 API와 함께 사용하도록 설계된 좋은 라이브러리이며 OpenGL에서도 일반적으로 사용됩니다.

이것은 glm 패키지에서 설치할 수 있는 헤더 전용 라이브러리입니다:

brew install glm

Xcode 설정하기

이제 모든 의존성이 설치되었으므로 Vulkan을 위한 기본 Xcode 프로젝트를 설정할 수 있습니다. 여기서의 대부분의 지침은 본질적으로 모든 의존성을 프로젝트에 연결하기 위한 많은 "배관 작업"입니다. 또한 다음 지침에서 vulkansdk 폴더를 언급할 때마다 Vulkan SDK를 추출한 폴더를 참조하고 있다는 점을 기억하세요.

Xcode를 시작하고 새 Xcode 프로젝트를 만드세요. 열리는 창에서 Application > Command Line Tool을 선택하세요.

Next를 선택하고, 프로젝트 이름을 입력하고 Language에서 C++를 선택하세요.

Next를 누르면 프로젝트가 생성되었을 것입니다. 이제 생성된 main.cpp 파일의 코드를 다음 코드로 변경해 보겠습니다:

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>

#include <iostream>

int main() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Vulkan window", nullptr, nullptr);

    uint32_t extensionCount = 0;
    vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

    std::cout << extensionCount << " extensions supported\n";

    glm::mat4 matrix;
    glm::vec4 vec;
    auto test = matrix * vec;

    while(!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }

    glfwDestroyWindow(window);

    glfwTerminate();

    return 0;
}

이 모든 코드가 하는 일을 아직 이해할 필요는 없습니다. 우리는 단지 모든 것이 제대로 작동하는지 확인하기 위해 몇 가지 API 호출을 설정하고 있을 뿐입니다.

Xcode는 이미 찾을 수 없는 라이브러리와 같은 몇 가지 오류를 표시하고 있을 것입니다. 이제 이러한 오류들을 제거하기 위해 프로젝트를 구성하기 시작하겠습니다. Project Navigator 패널에서 프로젝트를 선택하세요. Build Settings 탭을 열고:

Header Search Paths 필드를 찾아 /usr/local/include 링크를 추가하세요(Homebrew가 헤더를 설치하는 곳이므로 glm과 glfw3 헤더 파일이 여기에 있어야 합니다)와 Vulkan 헤더를 위한 vulkansdk/macOS/include 링크를 추가하세요.
Library Search Paths 필드를 찾아 /usr/local/lib 링크를 추가하세요(마찬가지로 Homebrew가 라이브러리를 설치하는 곳이므로 glm과 glfw3 라이브러리 파일이 여기에 있어야 합니다)와 vulkansdk/macOS/lib 링크를 추가하세요.

다음과 같이 보여야 합니다(당연히 파일을 어디에 두었는지에 따라 경로가 다를 수 있습니다):

이제 Build Phases 탭의 Link Binary With Libraries에서 glfw3와 vulkan 프레임워크를 모두 추가할 것입니다. 작업을 쉽게 하기 위해 프로젝트에 동적 라이브러리를 추가할 것입니다(정적 프레임워크를 사용하고 싶다면 이들 라이브러리의 문서를 확인하세요).

glfw의 경우 /usr/local/lib 폴더를 열면 libglfw.3.x.dylib와 같은 이름의 파일이 있을 것입니다("x"는 라이브러리의 버전 번호로, Homebrew에서 패키지를 다운로드한 시기에 따라 다를 수 있습니다). 해당 파일을 Xcode의 Linked Frameworks and Libraries 탭으로 드래그하기만 하면 됩니다.
vulkan의 경우, vulkansdk/macOS/lib로 이동하세요. libvulkan.1.dylib와 libvulkan.1.x.xx.dylib 파일 모두에 대해 같은 작업을 하세요("x"는 다운로드한 SDK의 버전 번호입니다).

이러한 라이브러리들을 추가한 후, 같은 탭의 Copy Files에서 Destination을 "Frameworks"로 변경하고, 하위 경로를 지우고 "Copy only when installing"의 선택을 해제하세요. "+" 기호를 클릭하고 이 세 프레임워크를 여기에도 모두 추가하세요.

Xcode 구성이 다음과 같이 보여야 합니다:

마지막으로 설정해야 할 것은 몇 가지 환경 변수입니다. Xcode 툴바에서 Product > Scheme > Edit Scheme...로 이동하여 Arguments 탭에서 다음 두 환경 변수를 추가하세요:

VK_ICD_FILENAMES = vulkansdk/macOS/share/vulkan/icd.d/MoltenVK_icd.json
VK_LAYER_PATH = vulkansdk/macOS/share/vulkan/explicit_layer.d

다음과 같이 보여야 합니다:

마지막으로, 모든 설정이 완료되었습니다! 이제 프로젝트를 실행하면(선택한 구성에 따라 빌드 구성을 Debug 또는 Release로 설정하는 것을 잊지 마세요) 다음과 같은 화면이 보일 것입니다:

확장 개수가 0이 아니어야 합니다. 다른 로그들은 라이브러리들에서 나온 것으로, 구성에 따라 다른 메시지가 표시될 수 있습니다.

이제 진짜 시작을 할 준비가 되었습니다.

삼각형 그리기

설정

기본 코드

기본 구조

이전 장에서는 적절한 구성으로 Vulkan 프로젝트를 만들고 샘플 코드로 테스트해보았습니다. 이번 장에서는 다음 코드부터 시작하겠습니다:

#include <vulkan/vulkan.h>

#include <iostream>
#include <stdexcept>
#include <cstdlib>

class HelloTriangleApplication {
public:
    void run() {
        initVulkan();
        mainLoop();
        cleanup();
    }

private:
    void initVulkan() {

    }

    void mainLoop() {

    }

    void cleanup() {

    }
};

int main() {
    HelloTriangleApplication app;

    try {
        app.run();
    } catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

먼저 LunarG SDK의 Vulkan 헤더를 포함시킵니다. 이 헤더는 함수, 구조체, 열거형을 제공합니다. stdexcept와 iostream 헤더는 오류를 보고하고 전파하는 데 사용됩니다. cstdlib 헤더는 EXIT_SUCCESS와 EXIT_FAILURE 매크로를 제공합니다.

프로그램은 클래스로 감싸져 있으며, 여기에 Vulkan 객체들을 private 클래스 멤버로 저장하고 각각을 초기화하는 함수들을 추가할 것입니다. 이 함수들은 initVulkan 함수에서 호출됩니다. 모든 준비가 완료되면 프레임 렌더링을 시작하기 위해 메인 루프로 진입합니다. 잠시 후에 mainLoop 함수에 창이 닫힐 때까지 반복하는 루프를 추가할 것입니다. 창이 닫히고 mainLoop가 반환되면, cleanup 함수에서 사용했던 리소스들을 해제할 것입니다.

실행 중에 치명적인 오류가 발생하면 설명이 포함된 std::runtime_error 예외를 던집니다. 이 예외는 main 함수로 전파되어 명령 프롬프트에 출력됩니다. 다양한 표준 예외 타입을 처리하기 위해 더 일반적인 std::exception을 catch합니다. 곧 다루게 될 오류의 한 예시는 필요한 확장 기능이 지원되지 않는 경우입니다.

이 장 이후에 나오는 모든 장은 initVulkan에서 호출될 새로운 함수 하나와 cleanup에서 마지막에 해제해야 할 하나 이상의 새로운 Vulkan 객체를 클래스의 private 멤버에 추가할 것입니다.

리소스 관리

malloc으로 할당된 모든 메모리가 free 호출을 필요로 하는 것처럼, 우리가 생성하는 모든 Vulkan 객체는 더 이상 필요하지 않을 때 명시적으로 파괴되어야 합니다. C++에서는 RAII나 <memory> 헤더에서 제공하는 스마트 포인터를 사용하여 자동 리소스 관리가 가능합니다. 하지만 이 튜토리얼에서는 Vulkan 객체의 할당과 해제를 명시적으로 하기로 했습니다. 결국 Vulkan의 특징은 실수를 방지하기 위해 모든 작업을 명시적으로 하는 것이므로, API가 어떻게 작동하는지 배우기 위해서는 객체의 수명을 명시적으로 다루는 것이 좋습니다.

이 튜토리얼을 따라한 후에는 생성자에서 Vulkan 객체를 획득하고 소멸자에서 해제하는 C++ 클래스를 작성하거나, 소유권 요구사항에 따라 std::unique_ptr 또는 std_shared_ptr에 커스텀 삭제자를 제공하여 자동 리소스 관리를 구현할 수 있습니다. RAII는 더 큰 Vulkan 프로그램에 권장되는 모델이지만, 학습 목적으로는 뒤에서 어떤 일이 일어나는지 아는 것이 좋습니다.

Vulkan 객체는 vkCreateXXX와 같은 함수로 직접 생성되거나, vkAllocateXXX와 같은 함수로 다른 객체를 통해 할당됩니다. 객체가 더 이상 어디에서도 사용되지 않는다는 것을 확인한 후에는 vkDestroyXXX와 vkFreeXXX 같은 대응되는 함수로 파괴해야 합니다. 이러한 함수들의 매개변수는 객체 타입마다 다르지만, 모두가 공유하는 하나의 매개변수가 있습니다: pAllocator. 이는 커스텀 메모리 할당자를 위한 콜백을 지정할 수 있는 선택적 매개변수입니다. 이 튜토리얼에서는 이 매개변수를 무시하고 항상 nullptr을 인자로 전달할 것입니다.

GLFW 통합하기

오프스크린 렌더링을 위해 Vulkan을 사용하려는 경우 창을 만들지 않아도 완벽하게 작동하지만, 실제로 무언가를 보여주는 것이 훨씬 더 흥미롭죠! 먼저 #include <vulkan/vulkan.h> 라인을 다음으로 교체하세요:

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

이렇게 하면 GLFW가 자체 정의를 포함하고 자동으로 Vulkan 헤더를 로드합니다. initWindow 함수를 추가하고 다른 호출들 전에 run 함수에서 이를 호출하도록 추가하세요. 이 함수를 사용하여 GLFW를 초기화하고 창을 만들 것입니다.

void run() {
    initWindow();
    initVulkan();
    mainLoop();
    cleanup();
}

private:
    void initWindow() {

    }

initWindow의 첫 번째 호출은 GLFW 라이브러리를 초기화하는 glfwInit()이어야 합니다. GLFW는 원래 OpenGL 컨텍스트를 만들기 위해 설계되었기 때문에, 다음 호출로 OpenGL 컨텍스트를 만들지 않도록 지시해야 합니다:

glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);

창 크기 조정은 나중에 살펴볼 특별한 처리가 필요하므로, 지금은 다음과 같은 창 힌트 호출로 비활성화하세요:

glfwWindowHint(GLFW_RESIZABLE, GLFW_FALSE);

이제 실제 창을 만드는 일만 남았습니다. 참조를 저장하기 위해 GLFWwindow* window; private 클래스 멤버를 추가하고 다음과 같이 창을 초기화하세요:

window = glfwCreateWindow(800, 600, "Vulkan", nullptr, nullptr);

처음 세 매개변수는 창의 너비, 높이, 제목을 지정합니다. 네 번째 매개변수는 선택적으로 창을 열 모니터를 지정할 수 있고, 마지막 매개변수는 OpenGL에만 관련이 있습니다.

앞으로 이 값들을 여러 번 참조할 것이므로 하드코딩된 너비와 높이 숫자 대신 상수를 사용하는 것이 좋습니다. HelloTriangleApplication 클래스 정의 위에 다음 라인들을 추가했습니다:

const uint32_t WIDTH = 800;
const uint32_t HEIGHT = 600;

그리고 창 생성 호출을 다음과 같이 교체했습니다:

window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);

이제 initWindow 함수가 다음과 같이 보일 것입니다:

void initWindow() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    glfwWindowHint(GLFW_RESIZABLE, GLFW_FALSE);

    window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
}

오류가 발생하거나 창이 닫힐 때까지 애플리케이션을 실행 상태로 유지하기 위해, mainLoop 함수에 다음과 같은 이벤트 루프를 추가해야 합니다:

void mainLoop() {
    while (!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }
}

이 코드는 꽤 자명합니다. 사용자가 창을 닫을 때까지 루프를 돌며 X 버튼을 누르는 것과 같은 이벤트를 확인합니다. 이는 또한 나중에 단일 프레임을 렌더링하는 함수를 호출할 루프이기도 합니다.

창이 닫히면 창을 파괴하고 GLFW 자체를 종료하여 리소스를 정리해야 합니다. 이것이 우리의 첫 번째 cleanup 코드가 될 것입니다:

void cleanup() {
    glfwDestroyWindow(window);

    glfwTerminate();
}

이제 프로그램을 실행하면 창을 닫아서 애플리케이션이 종료될 때까지 Vulkan이라는 제목의 창이 표시되는 것을 볼 수 있습니다. 이제 Vulkan 애플리케이션의 뼈대가 마련되었으니, 첫 번째 Vulkan 객체를 만들어봅시다!

C++ 코드

인스턴스

인스턴스 생성

가장 먼저 해야 할 일은 인스턴스를 생성하여 Vulkan 라이브러리를 초기화하는 것입니다. 인스턴스는 애플리케이션과 Vulkan 라이브러리 사이의 연결고리이며, 인스턴스를 생성할 때 드라이버에게 애플리케이션에 대한 몇 가지 세부 정보를 지정하게 됩니다.

먼저 createInstance 함수를 추가하고, 이를 initVulkan 함수 내에서 호출합니다.

void initVulkan() {
    createInstance();
}

추가로 인스턴스 핸들을 저장할 데이터 멤버를 추가합니다:

private:
VkInstance instance;

이제 인스턴스를 생성하기 위해, 애플리케이션에 관한 정보를 담은 구조체를 먼저 채워야 합니다. 이 데이터는 기술적으로 선택 사항이지만, 드라이버에게 특정 애플리케이션에 최적화할 수 있는 유용한 정보를 제공할 수 있습니다 (예: 특정 특수 동작을 가진 잘 알려진 그래픽 엔진을 사용하기 때문에). 이 구조체는 VkApplicationInfo라고 불립니다:

void createInstance() {
    VkApplicationInfo appInfo{};
    appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
    appInfo.pApplicationName = "Hello Triangle";
    appInfo.applicationVersion = VK_MAKE_VERSION(1, 0, 0);
    appInfo.pEngineName = "No Engine";
    appInfo.engineVersion = VK_MAKE_VERSION(1, 0, 0);
    appInfo.apiVersion = VK_API_VERSION_1_0;
}

앞서 언급했듯이, Vulkan의 많은 구조체는 sType 멤버에 타입을 명시적으로 지정할 것을 요구합니다. 또한 이는 앞으로 확장 정보(extension information)를 가리킬 수 있는 pNext 멤버가 있는 여러 구조체 중 하나입니다. 여기서는 nullptr로 두기 위해 값 초기화를 사용하고 있습니다.

Vulkan에서는 많은 정보가 함수 매개변수 대신 구조체를 통해 전달되며, 인스턴스를 생성하는 데 충분한 정보를 제공하기 위해 한 개의 구조체를 더 채워야 합니다. 이 다음 구조체는 선택 사항이 아니며, Vulkan 드라이버에게 우리가 사용하고자 하는 글로벌 확장과 검증 레이어를 알려줍니다. 여기서 "글로벌"이라는 것은 이들이 특정 장치가 아니라 전체 프로그램에 적용된다는 의미이며, 이는 다음 몇 장에서 명확해질 것입니다.

VkInstanceCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
createInfo.pApplicationInfo = &appInfo;

처음 두 매개변수는 이해하기 쉽습니다. 다음 두 항목은 원하는 글로벌 확장을 지정합니다. 개요 장에서 언급했듯이, Vulkan은 플랫폼에 구애받지 않는 API이므로 창 시스템과의 인터페이스를 위해 확장이 필요합니다. GLFW에는 이러한 작업에 필요한 확장을 반환하는 편리한 내장 함수가 있으며, 이를 구조체에 전달할 수 있습니다:

uint32_t glfwExtensionCount = 0;
const char** glfwExtensions;

glfwExtensions = glfwGetRequiredInstanceExtensions(&glfwExtensionCount);

createInfo.enabledExtensionCount = glfwExtensionCount;
createInfo.ppEnabledExtensionNames = glfwExtensions;

구조체의 마지막 두 멤버는 활성화할 글로벌 검증 레이어를 결정합니다. 이들에 대해서는 다음 장에서 더 자세히 다룰 예정이므로, 지금은 이들을 비워두면 됩니다.

createInfo.enabledLayerCount = 0;

이제 인스턴스를 생성하기 위해 Vulkan이 필요로 하는 모든 사항을 지정하였으며, 마침내 vkCreateInstance 호출을 할 수 있습니다:

VkResult result = vkCreateInstance(&createInfo, nullptr, &instance);

보시다시피, Vulkan에서 객체 생성 함수의 매개변수가 따르는 일반적인 패턴은 다음과 같습니다:

생성 정보가 담긴 구조체에 대한 포인터
사용자 정의 할당자 콜백에 대한 포인터 (이 튜토리얼에서는 항상 nullptr)
새 객체의 핸들을 저장할 변수를 가리키는 포인터

모든 과정이 순조롭게 진행되었다면, 인스턴스의 핸들이 VkInstance 클래스 멤버에 저장됩니다. 거의 모든 Vulkan 함수는 VK_SUCCESS 또는 에러 코드를 반환하는 VkResult 타입의 값을 반환합니다. 인스턴스가 성공적으로 생성되었는지 확인하기 위해 결과 값을 저장할 필요 없이, 성공 값에 대한 검사를 사용하면 됩니다:

if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
    throw std::runtime_error("failed to create instance!");
}

이제 프로그램을 실행하여 인스턴스가 성공적으로 생성되었는지 확인하세요.

VK_ERROR_INCOMPATIBLE_DRIVER 에러 발생

최신 MoltenVK SDK를 사용하는 MacOS 환경에서는 vkCreateInstance 호출에서 VK_ERROR_INCOMPATIBLE_DRIVER 에러가 반환될 수 있습니다. Getting Start Notes에 따르면, 1.3.216 버전의 Vulkan SDK부터는 VK_KHR_PORTABILITY_subset 확장이 필수입니다.

이 에러를 해결하기 위해, 먼저 VkInstanceCreateInfo 구조체의 flags에 VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR 비트를 추가한 후, 인스턴스 활성화 확장 목록에 VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME을 추가하세요.

일반적으로 코드는 다음과 같이 될 수 있습니다:

...

std::vector<const char*> requiredExtensions;

for(uint32_t i = 0; i < glfwExtensionCount; i++) {
    requiredExtensions.emplace_back(glfwExtensions[i]);
}

requiredExtensions.emplace_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);

createInfo.flags |= VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR;

createInfo.enabledExtensionCount = (uint32_t) requiredExtensions.size();
createInfo.ppEnabledExtensionNames = requiredExtensions.data();

if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
    throw std::runtime_error("failed to create instance!");
}

확장 지원 확인하기

vkCreateInstance 문서를 살펴보면, 가능한 에러 코드 중 하나가 VK_ERROR_EXTENSION_NOT_PRESENT임을 알 수 있습니다. 우리는 단순히 요구하는 확장을 지정하고, 해당 에러 코드가 반환되면 종료할 수 있습니다. 이는 창 시스템 인터페이스와 같은 필수 확장에는 타당하지만, 선택적 기능을 확인하고자 할 경우에는 어떻게 해야 할까요?

인스턴스를 생성하기 전에 지원되는 확장 목록을 가져오기 위해 vkEnumerateInstanceExtensionProperties 함수가 있습니다. 이 함수는 확장의 개수를 저장할 변수를 가리키는 포인터와, 확장의 세부 정보를 저장할 VkExtensionProperties 배열을 인자로 받습니다. 또한, 특정 검증 레이어로 확장을 필터링할 수 있는 선택적 첫 번째 매개변수를 받는데, 이는 지금은 무시하겠습니다.

확장 세부 정보를 저장할 배열을 할당하기 위해 먼저 확장의 개수를 알아야 합니다. 후자의 매개변수를 비워두면 확장의 개수만 요청할 수 있습니다:

uint32_t extensionCount = 0;
vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

이제 확장 세부 정보를 저장할 배열을 할당합니다 (<vector> 포함):

std::vector<VkExtensionProperties> extensions(extensionCount);

마지막으로 확장 세부 정보를 쿼리합니다:

vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, extensions.data());

각 VkExtensionProperties 구조체에는 확장의 이름과 버전이 포함되어 있습니다. 간단한 for 루프를 사용하여 이들을 나열할 수 있습니다 (\t는 들여쓰기를 위한 탭입니다):

std::cout << "available extensions:\n";

for (const auto& extension : extensions) {
    std::cout << '\t' << extension.extensionName << '\n';
}

Vulkan 지원에 대한 세부 정보를 제공하고자 한다면, 이 코드를 createInstance 함수에 추가할 수 있습니다. 도전 과제로, glfwGetRequiredInstanceExtensions가 반환한 모든 확장이 지원되는 확장 목록에 포함되어 있는지 확인하는 함수를 만들어 보세요.

정리하기

VkInstance는 프로그램 종료 직전에만 파괴되어야 합니다. cleanup 함수 내에서 vkDestroyInstance 함수를 사용하여 파괴할 수 있습니다:

void cleanup() {
    vkDestroyInstance(instance, nullptr);

    glfwDestroyWindow(window);

    glfwTerminate();
}

vkDestroyInstance 함수의 매개변수들은 이해하기 쉽습니다. 앞 장에서 언급했듯이, Vulkan의 할당 및 해제 함수들은 선택적인 할당자 콜백을 가지는데, 이는 nullptr를 전달하여 무시합니다. 이후 장들에서 생성할 다른 모든 Vulkan 리소스들은 인스턴스가 파괴되기 전에 정리되어야 합니다.

인스턴스 생성 이후의 더 복잡한 단계로 진행하기 전에, 검증 레이어를 확인하여 디버깅 옵션을 평가할 시간입니다.

C++ 코드

검증 레이어

검증 레이어란 무엇인가?

Vulkan API는 최소한의 드라이버 오버헤드를 목표로 설계되었으며, 이 목표의 한 표현으로 기본적으로 매우 제한된 오류 검사가 API 내에 포함되어 있습니다. 열거형(enum)을 잘못된 값으로 설정하거나 필수 매개변수에 null 포인터를 전달하는 것과 같은 간단한 실수들도 명시적으로 처리되지 않으며, 단순히 크래시나 정의되지 않은 동작(undefined behavior)을 초래할 뿐입니다. Vulkan은 사용자가 수행하는 모든 작업에 대해 매우 명시적으로 작성하도록 요구하기 때문에, 새로운 GPU 기능을 사용하면서 논리 디바이스 생성 시 이를 요청하는 것을 잊는 등 작은 실수를 범하기 쉽습니다.

그러나 이것이 API에 이러한 검사를 추가할 수 없다는 의미는 아닙니다. Vulkan은 검증 레이어(validation layers) 라고 알려진 우아한 시스템을 도입하여 이러한 기능을 제공합니다. 검증 레이어는 선택적으로 사용할 수 있는 구성 요소로, Vulkan 함수 호출에 후킹되어 추가 작업을 수행합니다. 검증 레이어에서 수행하는 일반적인 작업은 다음과 같습니다:

매개변수의 값을 명세(specification)와 비교하여 잘못 사용되었는지 확인
객체의 생성과 소멸을 추적하여 리소스 누수를 감지
호출이 발생한 스레드를 추적하여 스레드 안전성을 확인
모든 호출 및 해당 매개변수를 표준 출력에 로깅
프로파일링 및 재생을 위해 Vulkan 호출을 추적

다음은 진단용 검증 레이어에서의 함수 구현 예시입니다:

VkResult vkCreateInstance(
    const VkInstanceCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkInstance* instance) {

    if (pCreateInfo == nullptr || instance == nullptr) {
        log("필수 매개변수에 null 포인터가 전달되었습니다!");
        return VK_ERROR_INITIALIZATION_FAILED;
    }

    return real_vkCreateInstance(pCreateInfo, pAllocator, instance);
}

이러한 검증 레이어는 원하는 디버깅 기능을 모두 포함하도록 자유롭게 쌓을 수 있습니다. 예를 들어, 디버그 빌드에서는 검증 레이어를 활성화하고 릴리즈 빌드에서는 완전히 비활성화함으로써 양쪽의 장점을 모두 누릴 수 있습니다!

Vulkan은 기본적으로 어떠한 검증 레이어도 내장하고 있지 않지만, LunarG Vulkan SDK는 일반적인 오류를 검사하는 훌륭한 레이어 세트를 제공합니다. 이 레이어들은 완전히 오픈 소스이며, 어떤 종류의 실수를 검사하는지 확인하거나 기여할 수도 있습니다. 검증 레이어를 사용하는 것은 정의되지 않은 동작에 우연히 의존하여 애플리케이션이 다양한 드라이버에서 깨지는 것을 방지하는 가장 좋은 방법입니다.

검증 레이어는 시스템에 설치되어 있을 때만 사용할 수 있습니다. 예를 들어, LunarG 검증 레이어는 Vulkan SDK가 설치된 PC에서만 사용 가능합니다.

이전에는 Vulkan에서 인스턴스 전용과 디바이스 전용, 두 종류의 검증 레이어가 존재했습니다. 인스턴스 레이어는 인스턴스와 같은 전역 Vulkan 객체와 관련된 호출만 검사하고, 디바이스 전용 레이어는 특정 GPU와 관련된 호출만 검사하는 것이 목적이었습니다. 하지만 디바이스 전용 레이어는 이제 더 이상 사용되지 않으며, 인스턴스 검증 레이어가 모든 Vulkan 호출에 적용됩니다. 사양 문서에서는 호환성을 위해 디바이스 레벨에서도 검증 레이어를 활성화할 것을 권장하는데, 일부 구현에서는 이를 요구합니다. 우리는 논리 디바이스 레벨에서도 인스턴스와 동일한 레이어를 지정할 것이며, 이는 나중에 살펴볼 논리 디바이스와 큐에서 확인할 수 있습니다.

검증 레이어 사용하기

이번 섹션에서는 Vulkan SDK에서 제공하는 표준 진단 레이어를 활성화하는 방법을 살펴보겠습니다. 확장(extension)과 마찬가지로, 검증 레이어도 이름을 지정하여 활성화해야 합니다. 유용한 표준 검증 기능은 SDK에 포함된 VK_LAYER_KHRONOS_validation 레이어에 모두 번들되어 있습니다.

먼저, 활성화할 레이어와 활성화 여부를 지정하기 위해 두 개의 구성 변수를 프로그램에 추가합니다. 저는 프로그램이 디버그 모드로 컴파일되는지 여부에 따라 해당 값을 설정하도록 선택했습니다. NDEBUG 매크로는 C++ 표준의 일부로 "디버그가 아님"을 의미합니다.

const uint32_t WIDTH = 800;
const uint32_t HEIGHT = 600;

const std::vector<const char*> validationLayers = {
    "VK_LAYER_KHRONOS_validation"
};

#ifdef NDEBUG
    const bool enableValidationLayers = false;
#else
    const bool enableValidationLayers = true;
#endif

그 다음, 요청한 모든 레이어가 사용 가능한지 확인하는 checkValidationLayerSupport 함수를 추가합니다. 먼저 vkEnumerateInstanceLayerProperties 함수를 사용하여 사용 가능한 모든 레이어를 나열합니다. 이 함수의 사용법은 인스턴스 생성 챕터에서 다룬 vkEnumerateInstanceExtensionProperties와 동일합니다.

bool checkValidationLayerSupport() {
    uint32_t layerCount;
    vkEnumerateInstanceLayerProperties(&layerCount, nullptr);

    std::vector<VkLayerProperties> availableLayers(layerCount);
    vkEnumerateInstanceLayerProperties(&layerCount, availableLayers.data());

    return false;
}

다음으로, validationLayers에 있는 모든 레이어가 availableLayers 목록에 존재하는지 확인합니다. 이때 strcmp를 사용하기 위해 <cstring>을 포함해야 할 수도 있습니다.

for (const char* layerName : validationLayers) {
    bool layerFound = false;

    for (const auto& layerProperties : availableLayers) {
        if (strcmp(layerName, layerProperties.layerName) == 0) {
            layerFound = true;
            break;
        }
    }

    if (!layerFound) {
        return false;
    }
}

return true;

이제 이 함수를 createInstance 함수 내에서 사용할 수 있습니다:

void createInstance() {
    if (enableValidationLayers && !checkValidationLayerSupport()) {
        throw std::runtime_error("검증 레이어가 요청되었으나 사용 가능하지 않습니다!");
    }

    ...
}

디버그 모드에서 프로그램을 실행하여 오류가 발생하지 않는지 확인하세요. 만약 오류가 발생한다면 FAQ를 확인해 보시기 바랍니다.

마지막으로, VkInstanceCreateInfo 구조체 인스턴스 생성 시 검증 레이어 이름들을 포함하도록 수정합니다:

if (enableValidationLayers) {
    createInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
    createInfo.ppEnabledLayerNames = validationLayers.data();
} else {
    createInfo.enabledLayerCount = 0;
}

체크가 성공하면 vkCreateInstance는 VK_ERROR_LAYER_NOT_PRESENT 오류를 반환하지 않아야 하지만, 실제로 프로그램을 실행하여 확인하는 것이 좋습니다.

메시지 콜백

검증 레이어는 기본적으로 디버그 메시지를 표준 출력으로 출력하지만, 프로그램 내에서 명시적인 콜백을 제공하여 직접 처리할 수도 있습니다. 이를 통해 모든 메시지가 반드시 (치명적인) 오류가 아니라는 점에서 원하는 종류의 메시지를 선택적으로 확인할 수 있습니다. 만약 지금 당장 이 작업을 수행하고 싶지 않다면 이 장의 마지막 섹션으로 건너뛰어도 됩니다.

메시지와 관련된 세부 정보를 처리하기 위한 콜백을 설정하려면, VK_EXT_debug_utils 확장을 사용하여 콜백이 포함된 디버그 메신저를 설정해야 합니다.

먼저, 검증 레이어가 활성화되어 있는지 여부에 따라 필요한 확장 목록을 반환하는 getRequiredExtensions 함수를 생성합니다:

std::vector<const char*> getRequiredExtensions() {
    uint32_t glfwExtensionCount = 0;
    const char** glfwExtensions;
    glfwExtensions = glfwGetRequiredInstanceExtensions(&glfwExtensionCount);

    std::vector<const char*> extensions(glfwExtensions, glfwExtensions + glfwExtensionCount);

    if (enableValidationLayers) {
        extensions.push_back(VK_EXT_DEBUG_UTILS_EXTENSION_NAME);
    }

    return extensions;
}

GLFW에서 지정한 확장은 항상 필요하지만, 디버그 메신저 확장은 조건에 따라 추가됩니다. 여기서 사용한 VK_EXT_DEBUG_UTILS_EXTENSION_NAME 매크로는 리터럴 문자열 "VK_EXT_debug_utils"와 동일하며, 이 매크로를 사용하면 오타를 방지할 수 있습니다.

이제 이 함수를 createInstance에서 사용할 수 있습니다:

auto extensions = getRequiredExtensions();
createInfo.enabledExtensionCount = static_cast<uint32_t>(extensions.size());
createInfo.ppEnabledExtensionNames = extensions.data();

프로그램을 실행하여 VK_ERROR_EXTENSION_NOT_PRESENT 오류가 발생하지 않는지 확인하세요. 이 확장이 존재하는지 별도로 확인할 필요는 없으며, 이는 검증 레이어의 사용 가능성에 의해 암시되기 때문입니다.

이제 디버그 콜백 함수가 어떻게 생겼는지 살펴보겠습니다. PFN_vkDebugUtilsMessengerCallbackEXT 프로토타입을 가진 정적 멤버 함수 debugCallback을 추가하세요. VKAPI_ATTR와 VKAPI_CALL은 Vulkan이 올바른 서명으로 이 함수를 호출할 수 있도록 보장합니다.

static VKAPI_ATTR VkBool32 VKAPI_CALL debugCallback(
    VkDebugUtilsMessageSeverityFlagBitsEXT messageSeverity,
    VkDebugUtilsMessageTypeFlagsEXT messageType,
    const VkDebugUtilsMessengerCallbackDataEXT* pCallbackData,
    void* pUserData) {

    std::cerr << "검증 레이어: " << pCallbackData->pMessage << std::endl;

    return VK_FALSE;
}

첫 번째 매개변수는 메시지의 심각도를 지정하며, 다음 플래그들 중 하나의 값을 가질 수 있습니다:

VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT: 진단 메시지
VK_DEBUG_UTILS_MESSAGE_SEVERITY_INFO_BIT_EXT: 리소스 생성 등의 정보 메시지
VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT: 오류는 아닐 수 있으나 애플리케이션의 버그일 가능성이 높은 동작에 대한 메시지
VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT: 잘못된 동작에 대한 메시지로, 크래시를 유발할 수 있음

이 열거형의 값들은 메시지의 심각도가 특정 수준 이상인지를 비교 연산으로 확인할 수 있도록 구성되어 있습니다. 예를 들어:

if (messageSeverity >= VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT) {
    // 메시지가 충분히 중요하여 표시됨
}

messageType 매개변수는 다음 값을 가질 수 있습니다:

VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT: 사양이나 성능과 관련 없는 이벤트 발생
VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT: 사양 위반 또는 잠재적 실수를 나타내는 이벤트 발생
VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT: Vulkan의 비최적 사용 가능성

pCallbackData 매개변수는 메시지의 세부 정보를 담고 있는 VkDebugUtilsMessengerCallbackDataEXT 구조체를 가리키며, 가장 중요한 멤버는 다음과 같습니다:

pMessage: null 종료 문자열 형식의 디버그 메시지
pObjects: 메시지와 관련된 Vulkan 객체 핸들 배열
objectCount: 배열 내 객체의 개수

마지막으로, pUserData 매개변수는 콜백 설정 시 지정했던 포인터를 포함하며, 이를 통해 사용자가 원하는 데이터를 전달할 수 있습니다.

콜백 함수는 해당 Vulkan 호출을 중단할지 여부를 나타내는 부울 값을 반환합니다. 만약 콜백이 true를 반환하면, 해당 호출은 VK_ERROR_VALIDATION_FAILED_EXT 오류와 함께 중단됩니다. 이는 일반적으로 검증 레이어 자체를 테스트할 때만 사용되므로, 항상 VK_FALSE를 반환해야 합니다.

남은 작업은 Vulkan에 이 콜백 함수에 대해 알리는 것입니다. 다소 놀랍게도, Vulkan의 디버그 콜백도 명시적으로 생성 및 소멸해야 하는 핸들로 관리됩니다. 이러한 콜백은 디버그 메신저의 일부이며, 원하는 만큼 여러 개를 생성할 수 있습니다. instance 바로 아래에 이 핸들을 위한 클래스 멤버를 추가합니다:

VkDebugUtilsMessengerEXT debugMessenger;

이제 createInstance 호출 직후, initVulkan에서 호출될 setupDebugMessenger 함수를 추가합니다:

void initVulkan() {
    createInstance();
    setupDebugMessenger();
}

void setupDebugMessenger() {
    if (!enableValidationLayers) return;

}

메신저와 그 콜백에 대한 세부 정보를 담을 구조체를 채워야 합니다:

VkDebugUtilsMessengerCreateInfoEXT createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_MESSENGER_CREATE_INFO_EXT;
createInfo.messageSeverity = VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT;
createInfo.messageType = VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT;
createInfo.pfnUserCallback = debugCallback;
createInfo.pUserData = nullptr; // 선택 사항

messageSeverity 필드는 콜백이 호출되길 원하는 모든 심각도 유형을 지정할 수 있습니다. 여기서는 VK_DEBUG_UTILS_MESSAGE_SEVERITY_INFO_BIT_EXT를 제외한 모든 유형을 지정하여, 자세한 일반 디버그 정보는 제외하고 가능한 문제에 대한 알림만 받도록 했습니다.

유사하게 messageType 필드는 콜백이 알림을 받을 메시지 유형을 필터링할 수 있게 해줍니다. 여기서는 모든 유형을 단순히 활성화했습니다. 필요에 따라 일부를 비활성화할 수 있습니다.

마지막으로, pfnUserCallback 필드는 콜백 함수의 포인터를 지정합니다. 선택적으로 pUserData 필드에 포인터를 전달할 수 있으며, 이 포인터는 콜백 함수의 pUserData 매개변수를 통해 전달됩니다. 예를 들어, HelloTriangleApplication 클래스의 포인터를 전달할 수도 있습니다.

검증 레이어 메시지 및 디버그 콜백을 구성하는 방법에는 이보다 훨씬 다양한 옵션이 있지만, 이 설정은 튜토리얼을 시작하기 위한 좋은 기본 설정입니다. 가능한 설정에 대한 자세한 정보는 확장 사양을 참조하세요.

이 구조체는 vkCreateDebugUtilsMessengerEXT 함수에 전달되어 VkDebugUtilsMessengerEXT 객체를 생성하는 데 사용됩니다. 안타깝게도 이 함수는 확장 함수이므로 자동으로 로드되지 않습니다. vkGetInstanceProcAddr를 사용하여 직접 주소를 찾아야 합니다. 이를 위해 백그라운드에서 처리할 프록시 함수를 생성할 것입니다. 이 함수는 HelloTriangleApplication 클래스 정의 바로 위에 추가했습니다.

VkResult CreateDebugUtilsMessengerEXT(VkInstance instance, const VkDebugUtilsMessengerCreateInfoEXT* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkDebugUtilsMessengerEXT* pDebugMessenger) {
    auto func = (PFN_vkCreateDebugUtilsMessengerEXT) vkGetInstanceProcAddr(instance, "vkCreateDebugUtilsMessengerEXT");
    if (func != nullptr) {
        return func(instance, pCreateInfo, pAllocator, pDebugMessenger);
    } else {
        return VK_ERROR_EXTENSION_NOT_PRESENT;
    }
}

vkGetInstanceProcAddr 함수는 해당 함수가 로드되지 않으면 nullptr을 반환합니다. 이제 이 함수를 호출하여, 해당 확장 객체를 생성할 수 있습니다:

if (CreateDebugUtilsMessengerEXT(instance, &createInfo, nullptr, &debugMessenger) != VK_SUCCESS) {
    throw std::runtime_error("디버그 메신저 설정에 실패했습니다!");
}

두 번째 마지막 매개변수는 앞서 nullptr로 설정한 선택적 할당자 콜백입니다. 그 외의 매개변수는 상당히 직관적입니다. 디버그 메신저는 우리 Vulkan 인스턴스와 그 레이어에 특화되어 있으므로 첫 번째 인수로 명시적으로 지정해야 합니다. 이후 다른 자식 객체에서도 이 패턴을 보게 될 것입니다.

VkDebugUtilsMessengerEXT 객체도 vkDestroyDebugUtilsMessengerEXT 호출을 통해 정리되어야 합니다. 마찬가지로 vkCreateDebugUtilsMessengerEXT처럼 이 함수도 명시적으로 로드되어야 합니다.

CreateDebugUtilsMessengerEXT 바로 아래에 또 다른 프록시 함수를 생성합니다:

void DestroyDebugUtilsMessengerEXT(VkInstance instance, VkDebugUtilsMessengerEXT debugMessenger, const VkAllocationCallbacks* pAllocator) {
    auto func = (PFN_vkDestroyDebugUtilsMessengerEXT) vkGetInstanceProcAddr(instance, "vkDestroyDebugUtilsMessengerEXT");
    if (func != nullptr) {
        func(instance, debugMessenger, pAllocator);
    }
}

이 함수가 정적 클래스 함수이거나 클래스 외부의 함수임을 확인하세요. 그런 다음 cleanup 함수 내에서 이를 호출합니다:

void cleanup() {
    if (enableValidationLayers) {
        DestroyDebugUtilsMessengerEXT(instance, debugMessenger, nullptr);
    }

    vkDestroyInstance(instance, nullptr);

    glfwDestroyWindow(window);

    glfwTerminate();
}

인스턴스 생성 및 소멸 디버깅

검증 레이어를 통한 디버깅을 프로그램에 추가했지만, 아직 모든 것을 다루지는 않았습니다. vkCreateDebugUtilsMessengerEXT 호출은 유효한 인스턴스가 생성된 후에 이루어져야 하며, vkDestroyDebugUtilsMessengerEXT는 인스턴스가 소멸되기 전에 호출되어야 합니다. 이 때문에 현재로서는 vkCreateInstance와 vkDestroyInstance 호출 중 발생하는 문제를 디버깅할 수 없습니다.

그러나 확장 문서를 주의 깊게 읽어보면, 이 두 함수 호출에 대해 별도의 디버그 유틸 메신저를 생성하는 방법이 있음을 알 수 있습니다. 이는 VkInstanceCreateInfo의 pNext 확장 필드에 VkDebugUtilsMessengerCreateInfoEXT 구조체의 포인터를 전달하기만 하면 됩니다. 먼저 메신저 생성 정보를 별도의 함수로 추출합니다:

void populateDebugMessengerCreateInfo(VkDebugUtilsMessengerCreateInfoEXT& createInfo) {
    createInfo = {};
    createInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_MESSENGER_CREATE_INFO_EXT;
    createInfo.messageSeverity = VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT;
    createInfo.messageType = VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT;
    createInfo.pfnUserCallback = debugCallback;
}

...

void setupDebugMessenger() {
    if (!enableValidationLayers) return;

    VkDebugUtilsMessengerCreateInfoEXT createInfo;
    populateDebugMessengerCreateInfo(createInfo);

    if (CreateDebugUtilsMessengerEXT(instance, &createInfo, nullptr, &debugMessenger) != VK_SUCCESS) {
        throw std::runtime_error("디버그 메신저 설정에 실패했습니다!");
    }
}

이제 이 함수를 createInstance 함수 내에서도 재사용할 수 있습니다:

void createInstance() {
    ...

    VkInstanceCreateInfo createInfo{};
    createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
    createInfo.pApplicationInfo = &appInfo;

    ...

    VkDebugUtilsMessengerCreateInfoEXT debugCreateInfo{};
    if (enableValidationLayers) {
        createInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
        createInfo.ppEnabledLayerNames = validationLayers.data();

        populateDebugMessengerCreateInfo(debugCreateInfo);
        createInfo.pNext = (VkDebugUtilsMessengerCreateInfoEXT*) &debugCreateInfo;
    } else {
        createInfo.enabledLayerCount = 0;

        createInfo.pNext = nullptr;
    }

    if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
        throw std::runtime_error("인스턴스 생성에 실패했습니다!");
    }
}

여기서 debugCreateInfo 변수는 vkCreateInstance 호출 전에 소멸되지 않도록 if 문 밖에 배치되었습니다. 이렇게 추가 디버그 메신저를 생성하면, vkCreateInstance와 vkDestroyInstance 호출 시 자동으로 사용되며 이후 정리됩니다.

테스트

이제 검증 레이어의 동작을 확인하기 위해 의도적으로 실수를 만들어 보겠습니다. cleanup 함수에서 DestroyDebugUtilsMessengerEXT 호출을 임시로 제거한 후 프로그램을 실행하세요. 프로그램이 종료되면 다음과 유사한 메시지가 출력될 것입니다:

만약 메시지가 보이지 않는다면 설치를 확인해 보세요.

어떤 호출이 메시지를 트리거했는지 확인하고 싶다면, 메시지 콜백에 중단점을 추가한 후 스택 트레이스를 확인할 수 있습니다.

설정

검증 레이어의 동작을 제어하기 위한 설정은 VkDebugUtilsMessengerCreateInfoEXT 구조체에 지정된 플래그 외에도 훨씬 더 많습니다. Vulkan SDK의 Config 디렉토리를 확인해 보세요. 그곳에서 레이어 설정 방법을 설명하는 vk_layer_settings.txt 파일을 찾을 수 있습니다.

자신의 애플리케이션에 대해 레이어 설정을 구성하려면, 해당 파일을 프로젝트의 Debug 및 Release 디렉토리로 복사한 후 원하는 동작을 설정하기 위한 지침을 따르십시오. 하지만 이 튜토리얼의 나머지 부분에서는 기본 설정을 사용한다고 가정하겠습니다.

이 튜토리얼 전반에 걸쳐 저는 검증 레이어가 얼마나 유용한지, 그리고 Vulkan을 사용할 때 자신이 무엇을 하고 있는지 정확히 아는 것이 얼마나 중요한지를 보여주기 위해 의도적인 실수를 몇 가지 포함시킬 것입니다. 이제 시스템 내의 Vulkan 디바이스를 살펴볼 시간입니다.

C++ 코드

물리적 장치와 큐 패밀리

물리적 장치 선택하기

VkInstance를 통해 Vulkan 라이브러리를 초기화한 후에는 우리가 필요로 하는 기능들을 지원하는 그래픽 카드를 시스템에서 찾아 선택해야 합니다. 사실 여러 개의 그래픽 카드를 선택해서 동시에 사용할 수도 있지만, 이 튜토리얼에서는 우리가 필요로 하는 기능을 갖춘 첫 번째 그래픽 카드만 사용하도록 하겠습니다.

pickPhysicalDevice 함수를 추가하고 initVulkan 함수에서 이를 호출하도록 하겠습니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    pickPhysicalDevice();
}

void pickPhysicalDevice() {

}

선택하게 될 그래픽 카드는 새로운 클래스 멤버로 추가된 VkPhysicalDevice 핸들에 저장됩니다. 이 객체는 VkInstance가 파괴될 때 암시적으로 파괴되므로, cleanup 함수에서 별도로 처리할 필요가 없습니다.

VkPhysicalDevice physicalDevice = VK_NULL_HANDLE;

그래픽 카드를 나열하는 것은 extension을 나열하는 것과 매우 비슷하며, 먼저 개수만 조회하는 것으로 시작합니다.

uint32_t deviceCount = 0;
vkEnumeratePhysicalDevices(instance, &deviceCount, nullptr);

Vulkan을 지원하는 device가 0개라면 더 진행할 이유가 없습니다.

if (deviceCount == 0) {
    throw std::runtime_error("failed to find GPUs with Vulkan support!");
}

그렇지 않다면 이제 모든 VkPhysicalDevice 핸들을 담을 배열을 할당할 수 있습니다.

std::vector<VkPhysicalDevice> devices(deviceCount);
vkEnumeratePhysicalDevices(instance, &deviceCount, devices.data());

이제 각각의 device를 평가하여 우리가 수행하고자 하는 작업에 적합한지 확인해야 합니다. 모든 그래픽 카드가 동등하게 만들어지지는 않았기 때문입니다. 이를 위해 새로운 함수를 도입하겠습니다:

bool isDeviceSuitable(VkPhysicalDevice device) {
    return true;
}

그리고 물리적 장치들 중 어느 것이 우리가 이 함수에 추가할 요구사항을 충족하는지 확인해보겠습니다.

for (const auto& device : devices) {
    if (isDeviceSuitable(device)) {
        physicalDevice = device;
        break;
    }
}

if (physicalDevice == VK_NULL_HANDLE) {
    throw std::runtime_error("failed to find a suitable GPU!");
}

다음 섹션에서는 isDeviceSuitable 함수에서 확인할 첫 번째 요구사항들을 소개합니다. 이후 챕터에서 더 많은 Vulkan 기능을 사용하기 시작하면서 이 함수에 더 많은 검사를 추가할 것입니다.

기본 device 적합성 검사

device의 적합성을 평가하기 위해 먼저 몇 가지 세부 정보를 조회해볼 수 있습니다. 이름, 타입, 지원하는 Vulkan 버전과 같은 기본적인 device 속성은 vkGetPhysicalDeviceProperties를 사용하여 조회할 수 있습니다.

VkPhysicalDeviceProperties deviceProperties;
vkGetPhysicalDeviceProperties(device, &deviceProperties);

텍스처 압축, 64비트 float, 멀티 뷰포트 렌더링(VR에 유용한)과 같은 선택적 기능의 지원 여부는 vkGetPhysicalDeviceFeatures를 사용하여 조회할 수 있습니다:

VkPhysicalDeviceFeatures deviceFeatures;
vkGetPhysicalDeviceFeatures(device, &deviceFeatures);

device memory와 queue family에 관한 더 많은 세부 정보를 조회할 수 있는데, 이는 다음 섹션에서 다루도록 하겠습니다.

예를 들어, 우리의 애플리케이션이 geometry shader를 지원하는 전용 그래픽 카드에서만 사용 가능하다고 가정해봅시다. 그러면 isDeviceSuitable 함수는 다음과 같이 보일 것입니다:

bool isDeviceSuitable(VkPhysicalDevice device) {
    VkPhysicalDeviceProperties deviceProperties;
    VkPhysicalDeviceFeatures deviceFeatures;
    vkGetPhysicalDeviceProperties(device, &deviceProperties);
    vkGetPhysicalDeviceFeatures(device, &deviceFeatures);

    return deviceProperties.deviceType == VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU &&
           deviceFeatures.geometryShader;
}

device가 적합한지 아닌지만 확인하고 첫 번째 것을 선택하는 대신, 각 device에 점수를 매기고 가장 높은 점수를 가진 것을 선택할 수도 있습니다. 이렇게 하면 전용 그래픽 카드에 더 높은 점수를 주어 우선순위를 둘 수 있지만, 그것만 사용 가능한 경우에는 통합 GPU로 대체할 수 있습니다. 다음과 같이 구현할 수 있습니다:

#include <map>

...

void pickPhysicalDevice() {
    ...

    // 자동으로 후보들을 점수 순으로 정렬하기 위해 ordered map 사용
    std::multimap<int, VkPhysicalDevice> candidates;

    for (const auto& device : devices) {
        int score = rateDeviceSuitability(device);
        candidates.insert(std::make_pair(score, device));
    }

    // 최고 점수 후보가 적합한지 확인
    if (candidates.rbegin()->first > 0) {
        physicalDevice = candidates.rbegin()->second;
    } else {
        throw std::runtime_error("failed to find a suitable GPU!");
    }
}

int rateDeviceSuitability(VkPhysicalDevice device) {
    ...

    int score = 0;

    // 전용 GPU는 상당한 성능 이점이 있음
    if (deviceProperties.deviceType == VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU) {
        score += 1000;
    }

    // 텍스처의 최대 가능 크기는 그래픽 품질에 영향을 미침
    score += deviceProperties.limits.maxImageDimension2D;

    // 애플리케이션은 geometry shader 없이는 작동할 수 없음
    if (!deviceFeatures.geometryShader) {
        return 0;
    }

    return score;
}

이 튜토리얼에서는 이 모든 것을 구현할 필요는 없지만, device 선택 프로세스를 어떻게 설계할 수 있는지에 대한 아이디어를 제공하기 위한 것입니다. 물론 선택 가능한 device들의 이름을 표시하고 사용자가 선택하도록 할 수도 있습니다.

우선은 Vulkan 지원이 필요한 유일한 요구사항이므로 어떤 GPU든 상관없이 진행하겠습니다:

bool isDeviceSuitable(VkPhysicalDevice device) {
    return true;
}

다음 섹션에서는 확인해야 할 첫 번째 실제 필수 기능에 대해 논의하겠습니다.

큐 패밀리

이전에 간단히 언급했듯이 Vulkan에서는 그리기부터 텍스처 업로드까지 거의 모든 작업이 큐에 명령을 제출해야 합니다. 서로 다른 큐 패밀리에서 비롯된 여러 종류의 큐가 있으며, 각 큐 패밀리는 특정 명령들의 부분집합만을 허용합니다. 예를 들어, 컴퓨트 명령만 처리할 수 있는 큐 패밀리나 메모리 전송 관련 명령만 허용하는 큐 패밀리가 있을 수 있습니다.

우리는 device가 어떤 큐 패밀리들을 지원하는지, 그리고 그 중 어떤 것이 우리가 사용하고자 하는 명령들을 지원하는지 확인해야 합니다. 이를 위해 우리가 필요로 하는 모든 큐 패밀리를 찾는 새로운 함수 findQueueFamilies를 추가하겠습니다.

현재는 그래픽스 명령을 지원하는 큐만 찾아볼 것이므로, 함수는 다음과 같이 보일 수 있습니다:

uint32_t findQueueFamilies(VkPhysicalDevice device) {
    // 그래픽스 큐 패밀리를 찾는 로직
}

하지만 다음 챕터 중 하나에서 이미 다른 큐를 찾아볼 예정이므로, 이에 대비해 인덱스들을 구조체로 묶는 것이 좋습니다:

struct QueueFamilyIndices {
    uint32_t graphicsFamily;
};

QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device) {
    QueueFamilyIndices indices;
    // 구조체를 채우기 위한 큐 패밀리 인덱스를 찾는 로직
    return indices;
}

하지만 큐 패밀리를 사용할 수 없다면 어떻게 될까요? findQueueFamilies에서 예외를 던질 수도 있지만, 이 함수는 device 적합성에 대한 결정을 내리기에 적절한 위치가 아닙니다. 예를 들어, 전용 전송 큐 패밀리가 있는 device를 선호할 수는 있지만 필수 요구사항은 아닐 수 있습니다. 따라서 특정 큐 패밀리가 발견되었는지를 나타내는 방법이 필요합니다.

uint32_t의 어떤 값도 이론적으로는 유효한 큐 패밀리 인덱스가 될 수 있기 때문에(0도 포함), 매직 값을 사용해 큐 패밀리가 없음을 나타내는 것은 불가능합니다. 다행히도 C++17에서는 값이 존재하는지 여부를 구분할 수 있는 데이터 구조를 도입했습니다:

#include <optional>

...

std::optional<uint32_t> graphicsFamily;

std::cout << std::boolalpha << graphicsFamily.has_value() << std::endl; // false

graphicsFamily = 0;

std::cout << std::boolalpha << graphicsFamily.has_value() << std::endl; // true

std::optional은 무언가를 할당할 때까지 값을 포함하지 않는 래퍼입니다. has_value() 멤버 함수를 호출하여 언제든지 값을 포함하고 있는지 여부를 확인할 수 있습니다. 이는 다음과 같이 로직을 변경할 수 있다는 의미입니다:

#include <optional>

...

struct QueueFamilyIndices {
    std::optional<uint32_t> graphicsFamily;
};

QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device) {
    QueueFamilyIndices indices;
    // 찾을 수 있는 큐 패밀리에 인덱스 할당
    return indices;
}

이제 findQueueFamilies를 실제로 구현하기 시작할 수 있습니다:

QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device) {
    QueueFamilyIndices indices;

    ...

    return indices;
}

큐 패밀리 목록을 검색하는 과정은 예상대로이며 vkGetPhysicalDeviceQueueFamilyProperties를 사용합니다:

uint32_t queueFamilyCount = 0;
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr);

std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data());

VkQueueFamilyProperties 구조체는 지원되는 작업 유형과 해당 패밀리를 기반으로 생성할 수 있는 큐의 수를 포함하여 큐 패밀리에 대한 세부 정보를 포함합니다. 우리는 최소한 VK_QUEUE_GRAPHICS_BIT를 지원하는 큐 패밀리 하나를 찾아야 합니다.

int i = 0;
for (const auto& queueFamily : queueFamilies) {
    if (queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT) {
        indices.graphicsFamily = i;
    }

    i++;
}

이제 이 멋진 큐 패밀리 검색 함수를 가지고 있으니, 이를 isDeviceSuitable 함수에서 검사로 사용하여 device가 우리가 사용하고자 하는 명령들을 처리할 수 있는지 확인할 수 있습니다:

bool isDeviceSuitable(VkPhysicalDevice device) {
    QueueFamilyIndices indices = findQueueFamilies(device);

    return indices.graphicsFamily.has_value();
}

이를 좀 더 편리하게 만들기 위해, 구조체 자체에 일반적인 검사도 추가하겠습니다:

struct QueueFamilyIndices {
    std::optional<uint32_t> graphicsFamily;

    bool isComplete() {
        return graphicsFamily.has_value();
    }
};

...

bool isDeviceSuitable(VkPhysicalDevice device) {
    QueueFamilyIndices indices = findQueueFamilies(device);

    return indices.isComplete();
}

이제 이를 findQueueFamilies에서 조기 종료를 위해서도 사용할 수 있습니다:

for (const auto& queueFamily : queueFamilies) {
    ...

    if (indices.isComplete()) {
        break;
    }

    i++;
}

좋습니다, 적절한 물리적 장치를 찾기 위해 지금은 이 정도면 충분합니다! 다음 단계는 이와 인터페이스하기 위한 논리 장치를 생성하는 것입니다.

C++ 코드

논리적 장치와 큐

소개

물리 장치를 선택한 후에는 이와 인터페이스하기 위한 논리적 장치를 설정해야 합니다. 논리적 장치 생성 과정은 인스턴스 생성 과정과 비슷하며 우리가 사용하고자 하는 기능들을 명시합니다. 또한 사용 가능한 큐 패밀리들을 조회했으니 이제 어떤 큐를 생성할지 지정해야 합니다. 요구사항이 다양한 경우 동일한 물리 장치에서 여러 논리적 장치를 생성할 수도 있습니다.

먼저 논리적 장치 핸들을 저장할 새로운 클래스 멤버를 추가합니다.

VkDevice device;

다음으로, initVulkan에서 호출될 createLogicalDevice 함수를 추가합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    pickPhysicalDevice();
    createLogicalDevice();
}

void createLogicalDevice() {

}

생성할 큐 지정하기

논리적 장치를 생성하는 것은 다시 한번 구조체에 여러 세부 사항을 지정하는 것을 포함하며, 그 중 첫 번째는 VkDeviceQueueCreateInfo입니다. 이 구조체는 단일 큐 패밀리에 대해 우리가 원하는 큐의 개수를 설명합니다. 현재는 그래픽스 기능이 있는 큐에만 관심이 있습니다.

QueueFamilyIndices indices = findQueueFamilies(physicalDevice);

VkDeviceQueueCreateInfo queueCreateInfo{};
queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queueCreateInfo.queueFamilyIndex = indices.graphicsFamily.value();
queueCreateInfo.queueCount = 1;

현재 사용 가능한 드라이버들은 각 큐 패밀리에 대해 소수의 큐만 생성하도록 허용하며, 실제로 하나 이상은 필요하지 않습니다. 그 이유는 여러 스레드에서 모든 커맨드 버퍼를 생성한 다음 메인 스레드에서 단일 저오버헤드 호출로 한 번에 모두 제출할 수 있기 때문입니다.

Vulkan은 0.0과 1.0 사이의 부동소수점 숫자를 사용하여 큐에 우선순위를 할당하여 커맨드 버퍼 실행 스케줄링에 영향을 줄 수 있게 합니다. 큐가 하나뿐이더라도 이는 필수입니다:

float queuePriority = 1.0f;
queueCreateInfo.pQueuePriorities = &queuePriority;

사용할 device 기능 지정하기

다음으로 지정할 정보는 우리가 사용할 device 기능들의 집합입니다. 이는 이전 챕터에서 vkGetPhysicalDeviceFeatures로 지원 여부를 조회했던 geometry shader와 같은 기능들입니다. 현재는 특별한 것이 필요하지 않으므로 단순히 정의하고 모든 것을 VK_FALSE로 두면 됩니다. Vulkan으로 더 흥미로운 작업을 하기 시작할 때 이 구조체로 돌아오겠습니다.

VkPhysicalDeviceFeatures deviceFeatures{};

논리적 장치 생성하기

이전 두 구조체가 준비되었으니, 이제 메인 VkDeviceCreateInfo 구조체를 채우기 시작할 수 있습니다.

VkDeviceCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;

먼저 큐 생성 정보와 device 기능 구조체에 대한 포인터를 추가합니다:

createInfo.pQueueCreateInfos = &queueCreateInfo;
createInfo.queueCreateInfoCount = 1;

createInfo.pEnabledFeatures = &deviceFeatures;

나머지 정보는 VkInstanceCreateInfo 구조체와 유사하며 extension과 검증 레이어를 지정해야 합니다. 차이점은 이번에는 이것들이 device 특정적이라는 것입니다.

device 특정 extension의 예시로는 VK_KHR_swapchain이 있으며, 이는 해당 device에서 렌더링된 이미지를 창에 표시할 수 있게 해줍니다. 시스템에는 이러한 기능이 없는 Vulkan device가 있을 수 있습니다. 예를 들어 컴퓨트 연산만 지원하는 경우가 있을 수 있습니다. 이 extension에 대해서는 스왑 체인 챕터에서 다시 다루겠습니다.

이전 Vulkan 구현에서는 인스턴스와 device 특정 검증 레이어를 구분했지만, 이제는 더 이상 그렇지 않습니다. 이는 최신 구현에서는 VkDeviceCreateInfo의 enabledLayerCount와 ppEnabledLayerNames 필드가 무시된다는 의미입니다. 하지만 이전 구현과의 호환성을 위해 여전히 이를 설정하는 것이 좋습니다:

createInfo.enabledExtensionCount = 0;

if (enableValidationLayers) {
    createInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
    createInfo.ppEnabledLayerNames = validationLayers.data();
} else {
    createInfo.enabledLayerCount = 0;
}

현재는 device 특정 extension이 필요하지 않습니다.

이제 적절하게 이름 지어진 vkCreateDevice 함수를 호출하여 논리적 장치를 인스턴스화할 준비가 되었습니다.

if (vkCreateDevice(physicalDevice, &createInfo, nullptr, &device) != VK_SUCCESS) {
    throw std::runtime_error("failed to create logical device!");
}

매개변수는 인터페이스할 물리 장치, 방금 지정한 큐와 사용 정보, 선택적 할당 콜백 포인터, 그리고 논리적 장치 핸들을 저장할 변수에 대한 포인터입니다. 인스턴스 생성 함수와 마찬가지로 이 호출도 존재하지 않는 extension을 활성화하거나 지원되지 않는 기능의 사용을 지정하는 경우 오류를 반환할 수 있습니다.

장치는 cleanup에서 vkDestroyDevice 함수로 파괴되어야 합니다:

void cleanup() {
    vkDestroyDevice(device, nullptr);
    ...
}

논리적 장치는 인스턴스와 직접 상호작용하지 않기 때문에 매개변수로 포함되지 않습니다.

큐 핸들 검색하기

큐는 논리적 장치와 함께 자동으로 생성되지만, 아직 이와 인터페이스할 핸들이 없습니다. 먼저 그래픽스 큐에 대한 핸들을 저장할 클래스 멤버를 추가합니다:

VkQueue graphicsQueue;

Device 큐는 device가 파괴될 때 암시적으로 정리되므로 cleanup에서 별도로 처리할 필요가 없습니다.

각 큐 패밀리에 대한 큐 핸들을 검색하기 위해 vkGetDeviceQueue 함수를 사용할 수 있습니다. 매개변수는 논리적 장치, 큐 패밀리, 큐 인덱스, 그리고 큐 핸들을 저장할 변수에 대한 포인터입니다. 이 패밀리에서 하나의 큐만 생성하므로 인덱스 0을 사용하면 됩니다.

vkGetDeviceQueue(device, indices.graphicsFamily.value(), 0, &graphicsQueue);

논리적 장치와 큐 핸들이 있으니 이제 실제로 그래픽 카드를 사용하여 작업을 수행할 수 있습니다! 다음 몇 챕터에서는 결과를 창 시스템에 표시하기 위한 리소스를 설정하겠습니다.

C++ 코드

프레젠테이션

윈도우 서피스

Vulkan은 플랫폼에 독립적인 API이므로, 직접 윈도우 시스템과 인터페이스할 수 없습니다. Vulkan과 윈도우 시스템 간의 연결을 설정하여 화면에 결과를 표시하기 위해서는 WSI(Window System Integration) extension을 사용해야 합니다. 이 장에서는 첫 번째로 VK_KHR_surface를 다룰 것입니다. 이는 렌더링된 이미지를 표시할 추상 타입의 서피스를 나타내는 VkSurfaceKHR 객체를 제공합니다. 우리 프로그램의 서피스는 GLFW로 이미 열어둔 윈도우가 뒷받침할 것입니다.

VK_KHR_surface extension은 인스턴스 레벨 extension이며, 이미 활성화되어 있습니다. glfwGetRequiredInstanceExtensions가 반환하는 목록에 포함되어 있기 때문입니다. 이 목록에는 다음 몇 장에서 사용할 다른 WSI extension들도 포함되어 있습니다.

윈도우 서피스는 인스턴스 생성 직후에 생성되어야 합니다. 물리 장치 선택에 영향을 미칠 수 있기 때문입니다. 이를 미룬 이유는 윈도우 서피스가 렌더 타겟과 프레젠테이션이라는 더 큰 주제의 일부이며, 이에 대한 설명이 기본 설정을 복잡하게 만들었을 것이기 때문입니다. 또한 윈도우 서피스는 Vulkan에서 완전히 선택적인 컴포넌트라는 점도 주목할 만합니다. 오프스크린 렌더링만 필요한 경우에는 필요하지 않습니다. Vulkan에서는 OpenGL에서 필요했던 것처럼 보이지 않는 윈도우를 만드는 등의 해킹 없이도 이것이 가능합니다.

윈도우 서피스 생성

디버그 콜백 바로 아래에 surface 클래스 멤버를 추가하는 것으로 시작합니다.

VkSurfaceKHR surface;

VkSurfaceKHR 객체와 그 사용은 플랫폼에 독립적이지만, 생성은 그렇지 않습니다. 윈도우 시스템 세부 사항에 의존하기 때문입니다. 예를 들어, Windows에서는 HWND와 HMODULE 핸들이 필요합니다. 따라서 플랫폼별 extension이 있으며, Windows에서는 VK_KHR_win32_surface라고 하며 이 역시 glfwGetRequiredInstanceExtensions의 목록에 자동으로 포함됩니다.

Windows에서 서피스를 생성하는 데 이 플랫폼별 extension을 어떻게 사용하는지 보여드리겠지만, 이 튜토리얼에서는 실제로 사용하지는 않을 것입니다. GLFW 같은 라이브러리를 사용하면서 다시 플랫폼별 코드를 사용하는 것은 의미가 없기 때문입니다. GLFW는 실제로 플랫폼 차이를 처리해주는 glfwCreateWindowSurface를 제공합니다. 그래도 이를 사용하기 전에 내부에서 어떤 일이 일어나는지 보는 것이 좋습니다.

네이티브 플랫폼 함수에 접근하려면 상단의 include를 다음과 같이 업데이트해야 합니다:

#define VK_USE_PLATFORM_WIN32_KHR
#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>
#define GLFW_EXPOSE_NATIVE_WIN32
#include <GLFW/glfw3native.h>

윈도우 서피스는 Vulkan 객체이므로, 채워야 할 VkWin32SurfaceCreateInfoKHR 구조체가 함께 제공됩니다. 이는 hwnd와 hinstance 두 가지 중요한 매개변수를 가집니다. 이들은 윈도우와 프로세스의 핸들입니다.

VkWin32SurfaceCreateInfoKHR createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR;
createInfo.hwnd = glfwGetWin32Window(window);
createInfo.hinstance = GetModuleHandle(nullptr);

glfwGetWin32Window 함수는 GLFW 윈도우 객체에서 원시 HWND를 가져오는 데 사용됩니다. GetModuleHandle 호출은 현재 프로세스의 HINSTANCE 핸들을 반환합니다.

그 후 vkCreateWin32SurfaceKHR로 서피스를 생성할 수 있으며, 여기에는 인스턴스, 서피스 생성 세부 정보, 커스텀 할당자, 그리고 서피스 핸들을 저장할 변수에 대한 매개변수가 포함됩니다. 기술적으로 이는 WSI extension 함수이지만 매우 일반적으로 사용되어 표준 Vulkan 로더에 포함되어 있으므로, 다른 extension과 달리 명시적으로 로드할 필요가 없습니다.

if (vkCreateWin32SurfaceKHR(instance, &createInfo, nullptr, &surface) != VK_SUCCESS) {
    throw std::runtime_error("failed to create window surface!");
}

이 과정은 Linux와 같은 다른 플랫폼에서도 비슷합니다. X11에서는 vkCreateXcbSurfaceKHR가 XCB 연결과 윈도우를 생성 세부 정보로 받습니다.

glfwCreateWindowSurface 함수는 각 플랫폼마다 다른 구현으로 정확히 이 작업을 수행합니다. 이제 이를 우리 프로그램에 통합해보겠습니다. 인스턴스 생성과 setupDebugMessenger 직후 initVulkan에서 호출될 createSurface 함수를 추가합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
}

void createSurface() {

}

GLFW 호출은 구조체 대신 간단한 매개변수를 받으므로 함수의 구현이 매우 간단합니다:

void createSurface() {
    if (glfwCreateWindowSurface(instance, window, nullptr, &surface) != VK_SUCCESS) {
        throw std::runtime_error("failed to create window surface!");
    }
}

매개변수는 VkInstance, GLFW 윈도우 포인터, 커스텀 할당자, 그리고 VkSurfaceKHR 변수에 대한 포인터입니다. 단순히 관련 플랫폼 호출의 VkResult를 전달합니다. GLFW는 서피스를 파괴하기 위한 특별한 함수를 제공하지 않지만, 원래 API를 통해 쉽게 할 수 있습니다:

void cleanup() {
    ...
    vkDestroySurfaceKHR(instance, surface, nullptr);
    vkDestroyInstance(instance, nullptr);
    ...
}

서피스가 인스턴스보다 먼저 파괴되도록 해야 합니다.

프레젠테이션 지원 쿼리하기

Vulkan 구현이 윈도우 시스템 통합을 지원하더라도, 시스템의 모든 장치가 이를 지원하는 것은 아닙니다. 따라서 isDeviceSuitable을 확장하여 장치가 우리가 생성한 서피스에 이미지를 표시할 수 있는지 확인해야 합니다. 프레젠테이션은 큐별 기능이므로, 실제로는 우리가 생성한 서피스에 대한 프레젠테이션을 지원하는 큐 패밀리를 찾는 문제입니다.

드로잉 명령을 지원하는 큐 패밀리와 프레젠테이션을 지원하는 큐 패밀리가 겹치지 않을 수 있습니다. 따라서 QueueFamilyIndices 구조체를 수정하여 별도의 프레젠테이션 큐가 있을 수 있음을 고려해야 합니다:

struct QueueFamilyIndices {
    std::optional<uint32_t> graphicsFamily;
    std::optional<uint32_t> presentFamily;

    bool isComplete() {
        return graphicsFamily.has_value() && presentFamily.has_value();
    }
};

다음으로, findQueueFamilies 함수를 수정하여 우리의 윈도우 서피스에 프레젠테이션할 수 있는 큐 패밀리를 찾아보겠습니다. 이를 확인하는 함수는 vkGetPhysicalDeviceSurfaceSupportKHR이며, 물리 장치, 큐 패밀리 인덱스, 서피스를 매개변수로 받습니다. VK_QUEUE_GRAPHICS_BIT와 같은 루프에 이 호출을 추가합니다:

VkBool32 presentSupport = false;
vkGetPhysicalDeviceSurfaceSupportKHR(device, i, surface, &presentSupport);

그런 다음 boolean 값을 확인하고 프레젠테이션 패밀리 큐 인덱스를 저장합니다:

if (presentSupport) {
    indices.presentFamily = i;
}

결국 이들이 같은 큐 패밀리가 될 가능성이 매우 높지만, 프로그램 전체에서 일관된 접근을 위해 별도의 큐인 것처럼 다룰 것입니다. 그럼에도 성능 향상을 위해 드로잉과 프레젠테이션을 같은 큐에서 지원하는 물리 장치를 명시적으로 선호하는 로직을 추가할 수 있습니다.

프레젠테이션 큐 생성하기

남은 것은 논리 장치 생성 절차를 수정하여 프레젠테이션 큐를 생성하고 VkQueue 핸들을 검색하는 것입니다. 핸들을 위한 멤버 변수를 추가합니다:

VkQueue presentQueue;

다음으로, 두 패밀리 모두에서 큐를 생성하기 위해 여러 VkDeviceQueueCreateInfo 구조체가 필요합니다. 이를 위한 우아한 방법은 필요한 큐를 위한 모든 고유한 큐 패밀리의 집합을 만드는 것입니다:

#include <set>

...

QueueFamilyIndices indices = findQueueFamilies(physicalDevice);

std::vector<VkDeviceQueueCreateInfo> queueCreateInfos;
std::set<uint32_t> uniqueQueueFamilies = {indices.graphicsFamily.value(), indices.presentFamily.value()};

float queuePriority = 1.0f;
for (uint32_t queueFamily : uniqueQueueFamilies) {
    VkDeviceQueueCreateInfo queueCreateInfo{};
    queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
    queueCreateInfo.queueFamilyIndex = queueFamily;
    queueCreateInfo.queueCount = 1;
    queueCreateInfo.pQueuePriorities = &queuePriority;
    queueCreateInfos.push_back(queueCreateInfo);
}

그리고 VkDeviceCreateInfo를 수정하여 벡터를 가리키도록 합니다:

createInfo.queueCreateInfoCount = static_cast<uint32_t>(queueCreateInfos.size());
createInfo.pQueueCreateInfos = queueCreateInfos.data();

큐 패밀리가 같다면 인덱스를 한 번만 전달하면 됩니다. 마지막으로, 큐 핸들을 검색하는 호출을 추가합니다:

vkGetDeviceQueue(device, indices.presentFamily.value(), 0, &presentQueue);

큐 패밀리가 같다면 두 핸들은 이제 같은 값을 가질 가능성이 높습니다. 다음 장에서는 스왑 체인과 이것이 어떻게 서피스에 이미지를 표시할 수 있게 해주는지 살펴보겠습니다.

C++ 코드

스왑 체인

Vulkan에는 "기본 프레임버퍼"라는 개념이 없기 때문에, 화면에 표시하기 전에 렌더링할 버퍼를 소유하는 인프라가 필요합니다. 이 인프라를 스왑 체인이라고 하며 Vulkan에서는 명시적으로 생성해야 합니다. 스왑 체인은 본질적으로 화면에 표시되기를 기다리는 이미지들의 큐입니다. 우리의 애플리케이션은 이러한 이미지를 가져와서 그린 다음 큐로 반환합니다. 큐가 정확히 어떻게 작동하고 큐에서 이미지를 표시하는 조건은 스왑 체인이 어떻게 설정되었는지에 따라 다르지만, 스왑 체인의 일반적인 목적은 이미지 표시를 화면의 리프레시 속도와 동기화하는 것입니다.

스왑 체인 지원 확인하기

모든 그래픽 카드가 화면에 직접 이미지를 표시할 수 있는 것은 아닙니다. 예를 들어 서버용으로 설계되어 디스플레이 출력이 없는 경우가 있을 수 있습니다. 둘째로, 이미지 표시는 윈도우 시스템 및 윈도우와 관련된 서피스와 밀접하게 연결되어 있기 때문에 실제로 Vulkan 코어의 일부가 아닙니다. 지원 여부를 확인한 후 VK_KHR_swapchain 디바이스 확장을 활성화해야 합니다.

이를 위해 먼저 isDeviceSuitable 함수를 확장하여 이 확장이 지원되는지 확인하겠습니다. 이전에 VkPhysicalDevice가 지원하는 확장을 나열하는 방법을 살펴보았으므로, 이는 꽤 간단할 것입니다. Vulkan 헤더 파일은 VK_KHR_swapchain으로 정의된 VK_KHR_SWAPCHAIN_EXTENSION_NAME이라는 편리한 매크로를 제공합니다. 이 매크로를 사용하면 컴파일러가 오타를 잡아낼 수 있다는 장점이 있습니다.

먼저 검증 레이어 목록과 비슷하게 필요한 디바이스 확장 목록을 선언합니다.

const std::vector<const char*> deviceExtensions = {
    VK_KHR_SWAPCHAIN_EXTENSION_NAME
};

다음으로, isDeviceSuitable에서 추가 검사로 호출되는 새로운 함수 checkDeviceExtensionSupport를 만듭니다:

bool isDeviceSuitable(VkPhysicalDevice device) {
    QueueFamilyIndices indices = findQueueFamilies(device);

    bool extensionsSupported = checkDeviceExtensionSupport(device);

    return indices.isComplete() && extensionsSupported;
}

bool checkDeviceExtensionSupport(VkPhysicalDevice device) {
    return true;
}

함수의 본문을 수정하여 확장을 열거하고 필요한 모든 확장이 그 중에 있는지 확인합니다.

bool checkDeviceExtensionSupport(VkPhysicalDevice device) {
    uint32_t extensionCount;
    vkEnumerateDeviceExtensionProperties(device, nullptr, &extensionCount, nullptr);

    std::vector<VkExtensionProperties> availableExtensions(extensionCount);
    vkEnumerateDeviceExtensionProperties(device, nullptr, &extensionCount, availableExtensions.data());

    std::set<std::string> requiredExtensions(deviceExtensions.begin(), deviceExtensions.end());

    for (const auto& extension : availableExtensions) {
        requiredExtensions.erase(extension.extensionName);
    }

    return requiredExtensions.empty();
}

여기서는 확인되지 않은 필수 확장을 나타내기 위해 문자열 집합을 사용하기로 했습니다. 이렇게 하면 사용 가능한 확장 시퀀스를 열거하면서 쉽게 체크할 수 있습니다. 물론 checkValidationLayerSupport에서처럼 중첩된 루프를 사용할 수도 있습니다. 성능 차이는 무시할 만합니다. 이제 코드를 실행하고 그래픽 카드가 실제로 스왑 체인을 생성할 수 있는지 확인하세요. 이전 장에서 확인했던 프레젠테이션 큐의 가용성은 스왑 체인 확장이 지원되어야 한다는 것을 의미한다는 점에 주목해야 합니다. 하지만 명시적으로 확인하는 것이 좋고, 확장은 명시적으로 활성화되어야 합니다.

디바이스 확장 활성화하기

스왑체인을 사용하려면 먼저 VK_KHR_swapchain 확장을 활성화해야 합니다. 확장을 활성화하려면 논리적 디바이스 생성 구조체를 약간 수정하기만 하면 됩니다:

createInfo.enabledExtensionCount = static_cast<uint32_t>(deviceExtensions.size());
createInfo.ppEnabledExtensionNames = deviceExtensions.data();

이렇게 할 때 기존의 createInfo.enabledExtensionCount = 0; 줄을 반드시 교체하세요.

스왑 체인 지원 세부 정보 조회하기

스왑 체인이 사용 가능한지 확인하는 것만으로는 충분하지 않습니다. 실제로 우리의 윈도우 서피스와 호환되지 않을 수 있기 때문입니다. 스왑 체인 생성에는 인스턴스와 디바이스 생성보다 훨씬 더 많은 설정이 필요하므로, 진행하기 전에 더 많은 세부 정보를 조회해야 합니다.

기본적으로 확인해야 할 세 가지 종류의 속성이 있습니다:

기본 서피스 기능 (스왑 체인의 최소/최대 이미지 수, 이미지의 최소/최대 너비와 높이)
서피스 포맷 (픽셀 포맷, 색 공간)
사용 가능한 프레젠테이션 모드

findQueueFamilies와 비슷하게, 이러한 세부 정보를 조회한 후 전달하기 위해 구조체를 사용할 것입니다. 앞서 언급한 세 가지 유형의 속성은 다음과 같은 구조체와 구조체 목록의 형태로 제공됩니다:

struct SwapChainSupportDetails {
    VkSurfaceCapabilitiesKHR capabilities;
    std::vector<VkSurfaceFormatKHR> formats;
    std::vector<VkPresentModeKHR> presentModes;
};

이제 이 구조체를 채우는 새로운 함수 querySwapChainSupport를 만들 것입니다.

SwapChainSupportDetails querySwapChainSupport(VkPhysicalDevice device) {
    SwapChainSupportDetails details;

    return details;
}

이 섹션에서는 이 정보를 포함하는 구조체를 조회하는 방법을 다룹니다. 이러한 구조체의 의미와 정확히 어떤 데이터를 포함하는지는 다음 섹션에서 설명합니다.

먼저 기본 서피스 기능부터 시작해보겠습니다. 이러한 속성은 조회하기 쉽고 단일 VkSurfaceCapabilitiesKHR 구조체로 반환됩니다.

vkGetPhysicalDeviceSurfaceCapabilitiesKHR(device, surface, &details.capabilities);

이 함수는 지원되는 기능을 결정할 때 지정된 VkPhysicalDevice와 VkSurfaceKHR 윈도우 서피스를 고려합니다. 모든 지원 조회 함수는 이 두 가지가 스왑 체인의 핵심 구성 요소이기 때문에 첫 번째 매개변수로 이 두 가지를 가집니다.

다음 단계는 지원되는 서피스 포맷을 조회하는 것입니다. 이는 구조체의 목록이기 때문에, 2개의 함수 호출이라는 익숙한 패턴을 따릅니다:

uint32_t formatCount;
vkGetPhysicalDeviceSurfaceFormatsKHR(device, surface, &formatCount, nullptr);

if (formatCount != 0) {
    details.formats.resize(formatCount);
    vkGetPhysicalDeviceSurfaceFormatsKHR(device, surface, &formatCount, details.formats.data());
}

벡터가 사용 가능한 모든 포맷을 저장할 수 있도록 크기를 조정해야 합니다. 마지막으로, 지원되는 프레젠테이션 모드를 조회하는 것도 vkGetPhysicalDeviceSurfacePresentModesKHR로 정확히 같은 방식으로 작동합니다:

uint32_t presentModeCount;
vkGetPhysicalDeviceSurfacePresentModesKHR(device, surface, &presentModeCount, nullptr);

if (presentModeCount != 0) {
    details.presentModes.resize(presentModeCount);
    vkGetPhysicalDeviceSurfacePresentModesKHR(device, surface, &presentModeCount, details.presentModes.data());
}

이제 모든 세부 정보가 구조체에 있으므로, 이 함수를 활용하여 스왑 체인 지원이 적절한지 확인하도록 isDeviceSuitable를 한 번 더 확장해 보겠습니다. 이 튜토리얼에서는 우리가 가진 윈도우 서피스에 대해 지원되는 이미지 포맷과 프레젠테이션 모드가 각각 하나 이상 있다면 스왑 체인 지원이 충분합니다.

bool swapChainAdequate = false;
if (extensionsSupported) {
    SwapChainSupportDetails swapChainSupport = querySwapChainSupport(device);
    swapChainAdequate = !swapChainSupport.formats.empty() && !swapChainSupport.presentModes.empty();
}

확장이 사용 가능한지 확인한 후에만 스왑 체인 지원을 조회하는 것이 중요합니다. 함수의 마지막 줄은 다음과 같이 변경됩니다:

return indices.isComplete() && extensionsSupported && swapChainAdequate;

스왑 체인에 대한 올바른 설정 선택하기

swapChainAdequate 조건이 충족되면 지원이 확실히 충분하지만, 여전히 최적성이 다양한 여러 모드가 있을 수 있습니다. 이제 가능한 최상의 스왑 체인을 위한 올바른 설정을 찾는 몇 가지 함수를 작성할 것입니다. 결정해야 할 세 가지 유형의 설정이 있습니다:

서피스 포맷 (색상 깊이)
프레젠테이션 모드 (이미지를 화면에 "스왑"하는 조건)
스왑 범위 (스왑 체인의 이미지 해상도)

이러한 각 설정에 대해 이상적인 값을 염두에 두고, 사용 가능한 경우 그것을 선택하고 그렇지 않은 경우 다음으로 가장 좋은 것을 찾는 로직을 만들 것입니다.

서피스 포맷

이 설정에 대한 함수는 다음과 같이 시작합니다. 나중에 SwapChainSupportDetails 구조체의 formats 멤버를 인자로 전달할 것입니다.

VkSurfaceFormatKHR chooseSwapSurfaceFormat(const std::vector<VkSurfaceFormatKHR>& availableFormats) {

}

각 VkSurfaceFormatKHR 항목은 format과 colorSpace 멤버를 포함합니다. format 멤버는 색상 채널과 타입을 지정합니다. 예를 들어, VK_FORMAT_B8G8R8A8_SRGB는 B, G, R과 알파 채널을 그 순서대로 픽셀당 총 32비트를 위한 8비트 부호 없는 정수로 저장한다는 의미입니다. colorSpace 멤버는 VK_COLOR_SPACE_SRGB_NONLINEAR_KHR 플래그를 사용하여 SRGB 색 공간이 지원되는지 여부를 나타냅니다. 이 플래그는 사양의 이전 버전에서는 VK_COLORSPACE_SRGB_NONLINEAR_KHR이라고 불렸다는 점에 주의하세요.

색 공간의 경우 가능하다면 SRGB를 사용할 것입니다. 이는 더 정확한 인지 색상을 제공하기 때문입니다. 또한 이것은 나중에 사용할 텍스처와 같은 이미지에 대한 표준 색 공간이기도 합니다. 이 때문에 SRGB 색상 포맷도 사용해야 하며, 가장 일반적인 것 중 하나는 VK_FORMAT_B8G8R8A8_SRGB입니다.

목록을 살펴보고 선호하는 조합이 사용 가능한지 확인해 보겠습니다:

for (const auto& availableFormat : availableFormats) {
    if (availableFormat.format == VK_FORMAT_B8G8R8A8_SRGB && availableFormat.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) {
        return availableFormat;
    }
}

이것도 실패한다면 사용 가능한 포맷을 얼마나 "좋은지"에 따라 순위를 매길 수 있지만, 대부분의 경우 지정된 첫 번째 포맷으로 만족하는 것이 괜찮습니다.

VkSurfaceFormatKHR chooseSwapSurfaceFormat(const std::vector<VkSurfaceFormatKHR>& availableFormats) {
    for (const auto& availableFormat : availableFormats) {
        if (availableFormat.format == VK_FORMAT_B8G8R8A8_SRGB && availableFormat.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) {
            return availableFormat;
        }
    }

    return availableFormats[0];
}

프레젠테이션 모드

프레젠테이션 모드는 화면에 이미지를 표시하는 실제 조건을 나타내기 때문에 스왑 체인에서 가장 중요한 설정이라고 할 수 있습니다. Vulkan에서는 네 가지 가능한 모드를 사용할 수 있습니다:

VK_PRESENT_MODE_IMMEDIATE_KHR: 애플리케이션이 제출한 이미지가 즉시 화면으로 전송되며, 이는 테어링을 발생시킬 수 있습니다.
VK_PRESENT_MODE_FIFO_KHR: 스왑 체인은 큐로, 디스플레이가 리프레시될 때 큐의 앞에서 이미지를 가져가고 프로그램은 렌더링된 이미지를 큐의 뒤에 삽입합니다. 큐가 가득 차면 프로그램은 기다려야 합니다. 이는 현대 게임에서 볼 수 있는 수직 동기화와 가장 유사합니다. 디스플레이가 리프레시되는 순간을 "수직 블랭크"라고 합니다.
VK_PRESENT_MODE_FIFO_RELAXED_KHR: 이 모드는 애플리케이션이 늦고 마지막 수직 블랭크에서 큐가 비어 있는 경우에만 이전 모드와 다릅니다. 다음 수직 블랭크를 기다리는 대신, 이미지가 마침내 도착했을 때 바로 전송됩니다. 이로 인해 눈에 보이는 테어링이 발생할 수 있습니다.
VK_PRESENT_MODE_MAILBOX_KHR: 이것은 두 번째 모드의 또 다른 변형입니다. 큐가 가득 찼을 때 애플리케이션을 블록하는 대신, 이미 큐에 있는 이미지들이 단순히 새로운 것으로 교체됩니다. 이 모드는 표준 수직 동기화보다 지연 문제가 적으면서도 테어링을 피하면서 가능한 한 빠르게 프레임을 렌더링하는 데 사용할 수 있습니다. 이는 일반적으로 "트리플 버퍼링"으로 알려져 있지만, 세 개의 버퍼가 존재한다는 것만으로 반드시 프레임레이트가 제한되지 않는다는 것을 의미하지는 않습니다.

VK_PRESENT_MODE_FIFO_KHR 모드만이 사용 가능함이 보장되므로, 다시 한 번 사용 가능한 최상의 모드를 찾는 함수를 작성해야 합니다:

VkPresentModeKHR chooseSwapPresentMode(const std::vector<VkPresentModeKHR>& availablePresentModes) {
    return VK_PRESENT_MODE_FIFO_KHR;
}

개인적으로 에너지 사용이 문제가 되지 않는다면 VK_PRESENT_MODE_MAILBOX_KHR이 매우 좋은 절충안이라고 생각합니다. 이는 수직 블랭크 직전까지 가능한 한 최신의 이미지를 렌더링하면서도 비교적 낮은 지연 시간을 유지하면서 테어링을 피할 수 있게 해줍니다. 모바일 기기에서는 에너지 사용이 더 중요하므로 대신 VK_PRESENT_MODE_FIFO_KHR을 사용하고 싶을 것입니다. 이제 VK_PRESENT_MODE_MAILBOX_KHR이 사용 가능한지 목록을 살펴보겠습니다:

VkPresentModeKHR chooseSwapPresentMode(const std::vector<VkPresentModeKHR>& availablePresentModes) {
    for (const auto& availablePresentMode : availablePresentModes) {
        if (availablePresentMode == VK_PRESENT_MODE_MAILBOX_KHR) {
            return availablePresentMode;
        }
    }

    return VK_PRESENT_MODE_FIFO_KHR;
}

스왑 범위

이제 마지막 주요 속성 하나가 남았으며, 이를 위해 마지막 함수를 하나 더 추가할 것입니다:

VkExtent2D chooseSwapExtent(const VkSurfaceCapabilitiesKHR& capabilities) {

}

스왑 범위는 스왑 체인 이미지의 해상도이며, 거의 항상 우리가 그리는 윈도우의 해상도와 픽셀 단위로 정확히 일치합니다(이에 대해서는 잠시 후에 자세히 설명하겠습니다). 가능한 해상도의 범위는 VkSurfaceCapabilitiesKHR 구조체에 정의되어 있습니다. Vulkan은 currentExtent 멤버의 너비와 높이를 설정하여 윈도우의 해상도와 일치시키라고 알려줍니다. 하지만 일부 윈도우 매니저는 여기서 차이를 허용하며, 이는 currentExtent의 너비와 높이를 특별한 값(uint32_t의 최대값)으로 설정하여 표시됩니다. 이 경우 우리는 minImageExtent와 maxImageExtent 범위 내에서 윈도우와 가장 잘 일치하는 해상도를 선택할 것입니다. 하지만 올바른 단위로 해상도를 지정해야 합니다.

GLFW는 크기를 측정할 때 두 가지 단위를 사용합니다: 픽셀과 화면 좌표입니다. 예를 들어, 윈도우를 생성할 때 앞서 지정한 {WIDTH, HEIGHT} 해상도는 화면 좌표로 측정됩니다. 하지만 Vulkan은 픽셀로 작동하므로 스왑 체인 범위도 픽셀 단위로 지정되어야 합니다. 불행히도 고 DPI 디스플레이(예: Apple의 Retina 디스플레이)를 사용하는 경우, 화면 좌표가 픽셀과 일치하지 않습니다. 대신 더 높은 픽셀 밀도로 인해 픽셀 단위의 윈도우 해상도가 화면 좌표의 해상도보다 더 클 것입니다. 따라서 Vulkan이 스왑 범위를 고정하지 않는다면, 원래의 {WIDTH, HEIGHT}를 그대로 사용할 수 없습니다. 대신 최소 및 최대 이미지 범위와 대조하기 전에 glfwGetFramebufferSize를 사용하여 픽셀 단위의 윈도우 해상도를 쿼리해야 합니다.

#include <cstdint> // uint32_t를 위해 필요
#include <limits> // std::numeric_limits를 위해 필요
#include <algorithm> // std::clamp를 위해 필요

...

VkExtent2D chooseSwapExtent(const VkSurfaceCapabilitiesKHR& capabilities) {
    if (capabilities.currentExtent.width != std::numeric_limits<uint32_t>::max()) {
        return capabilities.currentExtent;
    } else {
        int width, height;
        glfwGetFramebufferSize(window, &width, &height);

        VkExtent2D actualExtent = {
            static_cast<uint32_t>(width),
            static_cast<uint32_t>(height)
        };

        actualExtent.width = std::clamp(actualExtent.width, capabilities.minImageExtent.width, capabilities.maxImageExtent.width);
        actualExtent.height = std::clamp(actualExtent.height, capabilities.minImageExtent.height, capabilities.maxImageExtent.height);

        return actualExtent;
    }
}

여기서 clamp 함수는 width와 height 값을 구현이 지원하는 허용된 최소 및 최대 범위 사이로 제한하는 데 사용됩니다.

스왑 체인 생성하기

이제 런타임에 해야 할 선택들을 돕는 이러한 모든 헬퍼 함수들을 가지게 되었으므로, 마침내 작동하는 스왑 체인을 생성하는 데 필요한 모든 정보를 갖게 되었습니다.

이러한 호출들의 결과로 시작하는 createSwapChain 함수를 만들고 initVulkan에서 논리적 디바이스 생성 후에 호출하도록 합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
}

void createSwapChain() {
    SwapChainSupportDetails swapChainSupport = querySwapChainSupport(physicalDevice);

    VkSurfaceFormatKHR surfaceFormat = chooseSwapSurfaceFormat(swapChainSupport.formats);
    VkPresentModeKHR presentMode = chooseSwapPresentMode(swapChainSupport.presentModes);
    VkExtent2D extent = chooseSwapExtent(swapChainSupport.capabilities);
}

이러한 속성들 외에도 스왑 체인에서 원하는 이미지 수를 결정해야 합니다. 구현은 기능하는 데 필요한 최소 개수를 지정합니다:

uint32_t imageCount = swapChainSupport.capabilities.minImageCount;

하지만 단순히 이 최소값을 고수하면 렌더링할 다른 이미지를 얻기 위해 때때로 드라이버가 내부 작업을 완료할 때까지 기다려야 할 수 있습니다. 따라서 최소값보다 하나 더 많은 이미지를 요청하는 것이 권장됩니다:

uint32_t imageCount = swapChainSupport.capabilities.minImageCount + 1;

이렇게 할 때 최대 이미지 수를 초과하지 않도록 해야 합니다. 여기서 0은 최대값이 없다는 것을 의미하는 특별한 값입니다:

if (swapChainSupport.capabilities.maxImageCount > 0 && imageCount > swapChainSupport.capabilities.maxImageCount) {
    imageCount = swapChainSupport.capabilities.maxImageCount;
}

Vulkan 객체의 전통대로, 스왑 체인 객체를 생성하려면 큰 구조체를 채워야 합니다. 매우 익숙한 방식으로 시작됩니다:

VkSwapchainCreateInfoKHR createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
createInfo.surface = surface;

스왑 체인이 어떤 서피스와 연결되어야 하는지 지정한 후, 스왑 체인 이미지의 세부 사항을 지정합니다:

createInfo.minImageCount = imageCount;
createInfo.imageFormat = surfaceFormat.format;
createInfo.imageColorSpace = surfaceFormat.colorSpace;
createInfo.imageExtent = extent;
createInfo.imageArrayLayers = 1;
createInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;

imageArrayLayers는 각 이미지가 구성되는 레이어의 수를 지정합니다. 스테레오스코픽 3D 애플리케이션을 개발하는 경우가 아니라면 항상 1입니다. imageUsage 비트 필드는 스왑 체인의 이미지를 어떤 종류의 작업에 사용할 것인지 지정합니다. 이 튜토리얼에서는 직접 이미지에 렌더링할 것이므로, 이미지들은 컬러 어태치먼트로 사용됩니다. 포스트 프로세싱과 같은 작업을 수행하기 위해 먼저 별도의 이미지에 렌더링할 수도 있습니다. 이 경우 VK_IMAGE_USAGE_TRANSFER_DST_BIT와 같은 값을 대신 사용하고 메모리 작업을 사용하여 렌더링된 이미지를 스왑 체인 이미지로 전송할 수 있습니다.

QueueFamilyIndices indices = findQueueFamilies(physicalDevice);
uint32_t queueFamilyIndices[] = {indices.graphicsFamily.value(), indices.presentFamily.value()};

if (indices.graphicsFamily != indices.presentFamily) {
    createInfo.imageSharingMode = VK_SHARING_MODE_CONCURRENT;
    createInfo.queueFamilyIndexCount = 2;
    createInfo.pQueueFamilyIndices = queueFamilyIndices;
} else {
    createInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
    createInfo.queueFamilyIndexCount = 0; // 선택 사항
    createInfo.pQueueFamilyIndices = nullptr; // 선택 사항
}

다음으로, 여러 큐 패밀리에서 사용될 스왑 체인 이미지를 어떻게 처리할지 지정해야 합니다. 그래픽스 큐 패밀리가 프레젠테이션 큐와 다른 경우 우리의 애플리케이션에서 이런 상황이 발생할 것입니다. 그래픽스 큐에서 스왑 체인의 이미지에 그림을 그린 다음 프레젠테이션 큐에 제출할 것입니다. 여러 큐에서 접근하는 이미지를 처리하는 방법에는 두 가지가 있습니다:

VK_SHARING_MODE_EXCLUSIVE: 이미지는 한 번에 하나의 큐 패밀리가 소유하며, 다른 큐 패밀리에서 사용하기 전에 소유권을 명시적으로 전송해야 합니다. 이 옵션이 최상의 성능을 제공합니다.
VK_SHARING_MODE_CONCURRENT: 이미지는 명시적인 소유권 전송 없이 여러 큐 패밀리에서 사용할 수 있습니다.

큐 패밀리가 다른 경우, 이 튜토리얼에서는 소유권 관련 챕터를 피하기 위해 동시 모드를 사용할 것입니다. 소유권 관련 개념은 나중에 더 잘 설명될 수 있기 때문입니다. 동시 모드에서는 queueFamilyIndexCount와 pQueueFamilyIndices 매개변수를 사용하여 어떤 큐 패밀리 간에 소유권이 공유될 것인지 미리 지정해야 합니다. 그래픽스 큐 패밀리와 프레젠테이션 큐 패밀리가 같은 경우(대부분의 하드웨어에서 이런 경우일 것입니다), 배타적 모드를 사용해야 합니다. 동시 모드는 최소 두 개의 서로 다른 큐 패밀리를 지정해야 하기 때문입니다.

createInfo.preTransform = swapChainSupport.capabilities.currentTransform;

지원되는 경우(capabilities의 supportedTransforms) 90도 시계 방향 회전이나 수평 뒤집기와 같은 특정 변환을 스왑 체인의 이미지에 적용하도록 지정할 수 있습니다. 변환을 원하지 않는 경우 단순히 현재 변환을 지정하면 됩니다.

createInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;

compositeAlpha 필드는 알파 채널을 윈도우 시스템의 다른 윈도우들과 블렌딩하는 데 사용해야 하는지 지정합니다. 거의 항상 알파 채널을 무시하고 싶을 것이므로 VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR을 사용합니다.

createInfo.presentMode = presentMode;
createInfo.clipped = VK_TRUE;

presentMode 멤버는 말 그대로입니다. clipped 멤버가 VK_TRUE로 설정되면 가려진 픽셀의 색상은 신경 쓰지 않는다는 의미입니다. 예를 들어 다른 윈도우가 그 앞에 있는 경우입니다. 이러한 픽셀을 다시 읽어와서 예측 가능한 결과를 얻을 필요가 정말로 없다면, 클리핑을 활성화하여 최상의 성능을 얻을 수 있습니다.

createInfo.oldSwapchain = VK_NULL_HANDLE;

마지막으로 남은 필드는 oldSwapchain입니다. Vulkan에서는 애플리케이션이 실행되는 동안 윈도우 크기가 조정되는 등의 이유로 스왑 체인이 무효화되거나 최적화되지 않은 상태가 될 수 있습니다. 이런 경우 스왑 체인을 처음부터 다시 생성해야 하며, 이 필드에 이전 스왑 체인에 대한 참조를 지정해야 합니다. 이는 복잡한 주제이며 향후 챕터에서 더 자세히 알아볼 것입니다. 지금은 스왑 체인을 하나만 생성한다고 가정하겠습니다.

이제 VkSwapchainKHR 객체를 저장할 클래스 멤버를 추가합니다:

VkSwapchainKHR swapChain;

스왑 체인 생성은 이제 vkCreateSwapchainKHR을 호출하는 것만큼 간단합니다:

if (vkCreateSwapchainKHR(device, &createInfo, nullptr, &swapChain) != VK_SUCCESS) {
    throw std::runtime_error("failed to create swap chain!");
}

매개변수는 논리적 디바이스, 스왑 체인 생성 정보, 선택적 커스텀 할당자, 그리고 핸들을 저장할 변수에 대한 포인터입니다. 놀라운 것은 없습니다. 디바이스를 정리하기 전에 vkDestroySwapchainKHR을 사용하여 정리해야 합니다:

void cleanup() {
    vkDestroySwapchainKHR(device, swapChain, nullptr);
    ...
}

이제 애플리케이션을 실행하여 스왑 체인이 성공적으로 생성되었는지 확인하세요! 이 시점에서 vkCreateSwapchainKHR에서 접근 위반 오류가 발생하거나 Failed to find 'vkGetInstanceProcAddress' in layer SteamOverlayVulkanLayer.dll과 같은 메시지가 표시되면 Steam 오버레이 레이어에 대한 FAQ 항목을 참조하세요.

검증 레이어가 활성화된 상태에서 createInfo.imageExtent = extent; 줄을 제거해 보세요. 검증 레이어 중 하나가 즉시 실수를 잡아내고 도움이 되는 메시지가 출력되는 것을 볼 수 있습니다:

스왑 체인 이미지 검색하기

이제 스왑 체인이 생성되었으므로, 그 안의 VkImage 핸들들을 검색하는 것만 남았습니다. 이후 챕터들의 렌더링 작업 중에 이것들을 참조할 것입니다. 핸들을 저장할 클래스 멤버를 추가하세요:

std::vector<VkImage> swapChainImages;

이미지들은 스왑 체인을 위해 구현에 의해 생성되었으며 스왑 체인이 파괴될 때 자동으로 정리될 것이므로, 정리 코드를 추가할 필요가 없습니다.

핸들을 검색하는 코드를 createSwapChain 함수의 끝, vkCreateSwapchainKHR 호출 바로 다음에 추가하고 있습니다. 이들을 검색하는 것은 Vulkan에서 객체 배열을 검색했던 다른 때와 매우 비슷합니다. 스왑 체인에서 최소 이미지 수만 지정했기 때문에 구현은 더 많은 이미지를 가진 스왑 체인을 생성할 수 있다는 점을 기억하세요. 그래서 먼저 vkGetSwapchainImagesKHR로 최종 이미지 수를 쿼리한 다음, 컨테이너의 크기를 조정하고 마지막으로 다시 호출하여 핸들을 검색합니다.

vkGetSwapchainImagesKHR(device, swapChain, &imageCount, nullptr);
swapChainImages.resize(imageCount);
vkGetSwapchainImagesKHR(device, swapChain, &imageCount, swapChainImages.data());

마지막으로, 스왑 체인 이미지를 위해 선택한 포맷과 범위를 멤버 변수에 저장합니다. 이후 챕터들에서 이들이 필요할 것입니다.

VkSwapchainKHR swapChain;
std::vector<VkImage> swapChainImages;
VkFormat swapChainImageFormat;
VkExtent2D swapChainExtent;

...

swapChainImageFormat = surfaceFormat.format;
swapChainExtent = extent;

이제 우리는 그릴 수 있고 윈도우에 표시할 수 있는 이미지 세트를 가지게 되었습니다. 다음 챕터에서는 이미지들을 렌더 타겟으로 설정하는 방법을 다루기 시작하고, 그 다음 실제 그래픽스 파이프라인과 그리기 명령을 살펴보기 시작할 것입니다!

C++ 코드

이미지 뷰

스왑 체인의 이미지를 포함한 모든 VkImage를 렌더 파이프라인에서 사용하기 위해서는 VkImageView 객체를 생성해야 합니다. 이미지 뷰는 말 그대로 이미지를 바라보는 뷰입니다. 이미지를 어떻게 접근할지, 그리고 이미지의 어느 부분에 접근할지를 설명합니다. 예를 들어, 밉맵 레벨 없이 2D 텍스처 깊이 텍스처로 취급해야 하는지 등을 설정할 수 있습니다.

이번 장에서는 createImageViews 함수를 작성하여 스왑 체인의 모든 이미지에 대해 기본 이미지 뷰를 생성할 것입니다. 이렇게 생성된 이미지 뷰는 나중에 컬러 타겟으로 사용될 것입니다.

먼저 이미지 뷰를 저장할 클래스 멤버를 추가합니다:

std::vector<VkImageView> swapChainImageViews;

createImageViews 함수를 생성하고 스왑 체인 생성 직후에 호출하도록 합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
}

void createImageViews() {

}

가장 먼저 해야 할 일은 생성할 모든 이미지 뷰를 담을 수 있도록 리스트의 크기를 조정하는 것입니다:

void createImageViews() {
    swapChainImageViews.resize(swapChainImages.size());

}

다음으로, 모든 스왑 체인 이미지를 순회하는 루프를 설정합니다.

for (size_t i = 0; i < swapChainImages.size(); i++) {

}

이미지 뷰 생성을 위한 매개변수는 VkImageViewCreateInfo 구조체에 지정됩니다. 처음 몇 개의 매개변수는 간단합니다.

VkImageViewCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
createInfo.image = swapChainImages[i];

viewType과 format 필드는 이미지 데이터를 어떻게 해석할지 지정합니다. viewType 매개변수를 통해 이미지를 1D 텍스처, 2D 텍스처, 3D 텍스처 및 큐브맵으로 취급할 수 있습니다.

createInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
createInfo.format = swapChainImageFormat;

components 필드를 사용하면 색상 채널을 서로 바꿀 수 있습니다. 예를 들어, 모노크롬 텍스처를 위해 모든 채널을 빨간색 채널에 매핑할 수 있습니다. 채널에 0과 1의 상수값을 매핑할 수도 있습니다. 우리의 경우에는 기본 매핑을 사용하겠습니다.

createInfo.components.r = VK_COMPONENT_SWIZZLE_IDENTITY;
createInfo.components.g = VK_COMPONENT_SWIZZLE_IDENTITY;
createInfo.components.b = VK_COMPONENT_SWIZZLE_IDENTITY;
createInfo.components.a = VK_COMPONENT_SWIZZLE_IDENTITY;

subresourceRange 필드는 이미지의 용도와 이미지의 어느 부분에 접근해야 하는지를 설명합니다. 우리의 이미지는 밉맵 레벨이나 다중 레이어 없이 컬러 타겟으로 사용될 것입니다.

createInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
createInfo.subresourceRange.baseMipLevel = 0;
createInfo.subresourceRange.levelCount = 1;
createInfo.subresourceRange.baseArrayLayer = 0;
createInfo.subresourceRange.layerCount = 1;

만약 입체 3D 애플리케이션을 작업하고 있다면, 여러 레이어가 있는 스왑 체인을 생성할 수 있습니다. 그런 다음 서로 다른 레이어에 접근하여 왼쪽 눈과 오른쪽 눈의 뷰를 나타내는 각 이미지에 대해 여러 이미지 뷰를 생성할 수 있습니다.

이제 vkCreateImageView를 호출하여 이미지 뷰를 생성할 수 있습니다:

if (vkCreateImageView(device, &createInfo, nullptr, &swapChainImageViews[i]) != VK_SUCCESS) {
    throw std::runtime_error("failed to create image views!");
}

이미지와 달리 이미지 뷰는 우리가 명시적으로 생성했기 때문에, 프로그램 종료 시 이를 제거하기 위한 유사한 루프를 추가해야 합니다:

void cleanup() {
    for (auto imageView : swapChainImageViews) {
        vkDestroyImageView(device, imageView, nullptr);
    }

    ...
}

이미지 뷰는 이미지를 텍스처로 사용하기에 충분하지만, 아직 렌더 타겟으로 사용하기에는 부족합니다. 이를 위해서는 프레임버퍼라고 하는 한 단계의 간접 참조가 더 필요합니다. 하지만 먼저 그래픽스 파이프라인을 설정해야 합니다.

C++ 코드

그래픽 파이프라인 기초

소개

앞으로 몇 장에 걸쳐 우리의 첫 번째 삼각형을 그리기 위한 그래픽스 파이프라인을 설정할 것입니다. 그래픽스 파이프라인은 메시의 정점과 텍스처를 가져와 렌더 타겟의 픽셀로 변환하는 일련의 작업들입니다. 아래는 간단히 도식화한 개요입니다:

입력 어셈블러는 지정된 버퍼에서 원시 정점 데이터를 수집하며, 인덱스 버퍼를 사용하여 정점 데이터 자체를 복제하지 않고도 특정 요소를 반복할 수 있습니다.

정점 셰이더는 모든 정점에 대해 실행되며, 일반적으로 정점 위치를 모델 공간에서 스크린 공간으로 변환하는 변환을 적용합니다. 또한 정점별 데이터를 파이프라인 아래로 전달합니다.

테셀레이션 셰이더는 특정 규칙에 따라 기하를 세분화하여 메시 품질을 향상시킬 수 있게 해줍니다. 이는 주로 벽돌 벽이나 계단과 같은 표면이 가까이 있을 때 덜 평평해 보이게 만드는 데 사용됩니다.

기하 셰이더는 모든 프리미티브(삼각형, 선, 점)에 대해 실행되며, 이를 폐기하거나 입력된 것보다 더 많은 프리미티브를 출력할 수 있습니다. 이는 테셀레이션 셰이더와 비슷하지만 훨씬 더 유연합니다. 하지만 인텔의 통합 GPU를 제외한 대부분의 그래픽 카드에서 성능이 좋지 않기 때문에 오늘날의 애플리케이션에서는 많이 사용되지 않습니다.

래스터화 단계는 프리미티브를 프래그먼트로 이산화합니다. 이들은 프레임버퍼에서 채우는 픽셀 요소입니다. 화면 밖에 있는 프래그먼트는 모두 폐기되며, 정점 셰이더가 출력한 속성들은 그림에서 보이는 것처럼 프래그먼트들 사이에서 보간됩니다. 일반적으로 깊이 테스트로 인해 다른 프리미티브 프래그먼트 뒤에 있는 프래그먼트들도 이 단계에서 폐기됩니다.

프래그먼트 셰이더는 살아남은 모든 프래그먼트에 대해 호출되며, 프래그먼트가 어떤 프레임버퍼에 어떤 색상과 깊이 값으로 쓰여질지를 결정합니다. 이는 정점 셰이더에서 보간된 데이터를 사용하여 수행되며, 여기에는 텍스처 좌표와 조명을 위한 법선 등이 포함될 수 있습니다.

색상 블렌딩 단계는 프레임버퍼의 같은 픽셀에 매핑되는 서로 다른 프래그먼트들을 혼합하는 작업을 적용합니다. 프래그먼트들은 서로를 단순히 덮어쓰거나, 더해지거나, 투명도를 기반으로 혼합될 수 있습니다.

녹색으로 표시된 단계들은 고정 기능 단계로 알려져 있습니다. 이러한 단계들은 매개변수를 사용하여 작업을 조정할 수는 있지만, 작동 방식은 미리 정의되어 있습니다.

반면 주황색으로 표시된 단계들은 프로그래밍 가능하며, 이는 원하는 작업을 정확히 수행하기 위해 직접 작성한 코드를 그래픽 카드에 업로드할 수 있다는 의미입니다. 이를 통해 프래그먼트 셰이더를 사용하여 텍스처링과 조명부터 레이 트레이서까지 다양한 기능을 구현할 수 있습니다. 이러한 프로그램들은 많은 GPU 코어에서 동시에 실행되어 정점이나 프래그먼트와 같은 많은 객체들을 병렬로 처리합니다.

OpenGL이나 Direct3D와 같은 이전 API를 사용해 보셨다면, glBlendFunc나 OMSetBlendState와 같은 호출로 언제든지 파이프라인 설정을 변경할 수 있었을 것입니다. Vulkan의 그래픽스 파이프라인은 거의 완전히 불변이어서, 셰이더를 변경하거나, 다른 프레임버퍼를 바인딩하거나, 블렌드 함수를 변경하려면 파이프라인을 처음부터 다시 생성해야 합니다. 이는 렌더링 작업에서 사용하고자 하는 모든 상태 조합을 나타내는 여러 파이프라인을 생성해야 한다는 단점이 있습니다. 하지만 파이프라인에서 수행할 모든 작업을 미리 알 수 있기 때문에, 드라이버가 이를 훨씬 더 잘 최적화할 수 있습니다.

프로그래밍 가능한 단계 중 일부는 의도하는 작업에 따라 선택적으로 사용할 수 있습니다. 예를 들어, 단순한 기하를 그리는 경우에는 테셀레이션과 기하 단계를 비활성화할 수 있습니다. 깊이 값에만 관심이 있다면 프래그먼트 셰이더 단계를 비활성화할 수 있는데, 이는 섀도우 맵 생성에 유용합니다.

다음 장에서는 먼저 화면에 삼각형을 그리는 데 필요한 두 가지 프로그래밍 가능한 단계인 정점 셰이더와 프래그먼트 셰이더를 생성할 것입니다. 블렌딩 모드, 뷰포트, 래스터화와 같은 고정 기능 설정은 그 다음 장에서 설정할 것입니다. Vulkan에서 그래픽스 파이프라인 설정의 마지막 부분은 입력 및 출력 프레임버퍼의 명세를 포함합니다.

initVulkan의 createImageViews 바로 다음에 호출되는 createGraphicsPipeline 함수를 생성하세요. 앞으로 몇 장에 걸쳐 이 함수를 작성할 것입니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createGraphicsPipeline();
}

...

void createGraphicsPipeline() {

}

C++ 코드

셰이더 모듈

이전 API들과 달리, Vulkan의 셰이더 코드는 GLSL이나 HLSL과 같은 사람이 읽을 수 있는 문법이 아닌 바이트코드 형식으로 지정해야 합니다. 이 바이트코드 형식을 SPIR-V라고 하며, Vulkan과 OpenCL(둘 다 Khronos API) 모두에서 사용할 수 있도록 설계되었습니다. 이는 그래픽스와 컴퓨트 셰이더를 작성하는 데 사용할 수 있는 형식이지만, 이 튜토리얼에서는 Vulkan의 그래픽스 파이프라인에서 사용되는 셰이더에 초점을 맞출 것입니다.

바이트코드 형식을 사용하는 장점은 GPU 벤더들이 셰이더 코드를 네이티브 코드로 변환하기 위해 작성한 컴파일러가 훨씬 덜 복잡하다는 것입니다. 과거에는 GLSL과 같은 사람이 읽을 수 있는 문법을 사용할 때 일부 GPU 벤더들이 표준을 해석하는 데 있어 다소 유연했습니다. 이러한 벤더의 GPU로 복잡한 셰이더를 작성하게 되면, 다른 벤더의 드라이버가 문법 오류로 인해 코드를 거부하거나, 더 나쁜 경우에는 컴파일러 버그로 인해 셰이더가 다르게 실행될 위험이 있었습니다. SPIR-V와 같은 간단한 바이트코드 형식을 사용하면 이러한 문제를 피할 수 있을 것입니다.

하지만 이것이 우리가 이 바이트코드를 직접 작성해야 한다는 의미는 아닙니다. Khronos는 GLSL을 SPIR-V로 컴파일하는 자체 벤더 독립적인 컴파일러를 출시했습니다. 이 컴파일러는 셰이더 코드가 완전히 표준을 준수하는지 확인하고 프로그램과 함께 제공할 수 있는 하나의 SPIR-V 바이너리를 생성하도록 설계되었습니다. 또한 이 컴파일러를 라이브러리로 포함하여 런타임에 SPIR-V를 생성할 수도 있지만, 이 튜토리얼에서는 그렇게 하지 않을 것입니다. glslangValidator.exe를 통해 이 컴파일러를 직접 사용할 수 있지만, 대신 Google의 glslc.exe를 사용할 것입니다. glslc의 장점은 GCC와 Clang과 같은 잘 알려진 컴파일러와 동일한 매개변수 형식을 사용하고 include와 같은 추가 기능을 포함한다는 것입니다. 두 컴파일러 모두 Vulkan SDK에 이미 포함되어 있으므로 추가로 다운로드할 필요가 없습니다.

GLSL은 C 스타일 문법을 가진 셰이딩 언어입니다. 이 언어로 작성된 프로그램은 모든 객체에 대해 호출되는 main 함수를 가집니다. 입력에 매개변수를 사용하고 출력에 반환 값을 사용하는 대신, GLSL은 입력과 출력을 처리하기 위해 전역 변수를 사용합니다. 이 언어는 내장 벡터와 행렬 기본 타입과 같이 그래픽스 프로그래밍을 돕는 많은 기능을 포함합니다. 외적, 행렬-벡터 곱, 벡터에 대한 반사와 같은 연산을 위한 함수들이 포함되어 있습니다. 벡터 타입은 요소의 수를 나타내는 숫자가 붙은 vec로 불립니다. 예를 들어, 3D 위치는 vec3에 저장됩니다. .x와 같은 멤버를 통해 단일 컴포넌트에 접근할 수 있으며, 동시에 여러 컴포넌트로부터 새로운 벡터를 생성하는 것도 가능합니다. 예를 들어, vec3(1.0, 2.0, 3.0).xy 표현식은 vec2를 결과로 합니다. 벡터의 생성자는 벡터 객체와 스칼라 값의 조합도 받을 수 있습니다. 예를 들어, vec3는 vec3(vec2(1.0, 2.0), 3.0)으로 생성될 수 있습니다.

이전 장에서 언급했듯이, 화면에 삼각형을 그리기 위해서는 버텍스 셰이더(vertex shader)와 프래그먼트 셰이더(fragment shader)를 작성해야 합니다. 다음 두 섹션에서는 각각의 GLSL 코드를 다룰 것이며, 그 후에 두 개의 SPIR-V 바이너리를 생성하고 프로그램에 로드하는 방법을 보여드리겠습니다.

버텍스 셰이더 (vertex shader)

버텍스 셰이더는 들어오는 각 정점(Vertex)을 처리합니다. 모델 공간 위치, 색상, 법선, 텍스처 좌표와 같은 속성을 입력으로 받습니다. 출력은 클립 좌표계의 최종 위치와 색상, 텍스처 좌표와 같이 프래그먼트 셰이더로 전달되어야 하는 속성들입니다. 이러한 값들은 래스터라이저에 의해 프래그먼트들 사이에서 보간되어 부드러운 그라데이션을 만들어냅니다.

클립 좌표는 버텍스 셰이더에서 나온 4차원 벡터로, 전체 벡터를 마지막 컴포넌트로 나누어 정규화된 장치 좌표로 변환됩니다. 이 정규화된 장치 좌표는 프레임버퍼를 [-1, 1] × [-1, 1] 좌표계에 매핑하는 동차 좌표로, 다음과 같습니다:

컴퓨터 그래픽스를 다뤄본 적이 있다면 이미 이것에 익숙할 것입니다. OpenGL을 사용해본 적이 있다면, Y 좌표의 부호가 이제 반전되었다는 것을 알 수 있습니다. Z 좌표는 이제 Direct3D에서처럼 0에서 1 사이의 범위를 사용합니다.

우리의 첫 번째 삼각형에서는 어떤 변환도 적용하지 않을 것이며, 세 정점의 위치를 정규화된 장치 좌표로 직접 지정하여 다음과 같은 모양을 만들 것입니다:

마지막 컴포넌트를 1로 설정하여 버텍스 셰이더에서 클립 좌표로 직접 출력함으로써 정규화된 장치 좌표를 직접 출력할 수 있습니다. 이렇게 하면 클립 좌표를 정규화된 장치 좌표로 변환하는 나눗셈이 아무것도 변경하지 않을 것입니다.

일반적으로 이러한 좌표들은 버텍스 버퍼(vertex buffer)에 저장되지만, Vulkan에서 버텍스 버퍼를 생성하고 데이터를 채우는 것은 간단하지 않습니다. 따라서 화면에 삼각형이 나타나는 것을 보는 즐거움을 느낀 후로 이를 미루기로 했습니다. 그동안 우리는 조금 비정통적인 방법을 사용할 것입니다: 좌표를 버텍스 셰이더 안에 직접 포함시키는 것입니다. 코드는 다음과 같습니다:

#version 450

vec2 positions[3] = vec2[](
    vec2(0.0, -0.5),
    vec2(0.5, 0.5),
    vec2(-0.5, 0.5)
);

void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
}

main 함수는 모든 정점에 대해 호출됩니다. 내장 변수 gl_VertexIndex는 현재 정점의 인덱스를 포함합니다. 이는 보통 버텍스 버퍼의 인덱스이지만, 우리의 경우에는 하드코딩된 버텍스 데이터 배열의 인덱스가 될 것입니다. 각 정점의 위치는 셰이더의 상수 배열에서 접근되어 더미 z와 w 컴포넌트와 결합되어 클립 좌표의 위치를 생성합니다. 내장 변수 gl_Position이 출력으로 기능합니다.

프래그먼트 셰이더 (Fragment Shader)

버텍스 셰이더의 위치들로 형성된 삼각형은 화면상의 영역을 프래그먼트로 채웁니다. 프래그먼트 셰이더는 이러한 프래그먼트들에 대해 호출되어 프레임버퍼(또는 프레임버퍼들)에 대한 색상과 깊이를 생성합니다. 전체 삼각형을 빨간색으로 출력하는 간단한 프래그먼트 셰이더는 다음과 같습니다:

#version 450

layout(location = 0) out vec4 outColor;

void main() {
    outColor = vec4(1.0, 0.0, 0.0, 1.0);
}

버텍스 셰이더의 main 함수가 모든 정점에 대해 호출되는 것처럼, main 함수는 모든 프래그먼트에 대해 호출됩니다. GLSL에서 색상은 [0, 1] 범위 내의 R, G, B와 알파 채널을 가진 4-컴포넌트 벡터입니다. 버텍스 셰이더의 gl_Position과 달리, 현재 프래그먼트의 색상을 출력하기 위한 내장 변수는 없습니다. 각 프레임버퍼에 대해 자신만의 출력 변수를 지정해야 하며, 여기서 layout(location = 0) 수식자는 프레임버퍼의 인덱스를 지정합니다. 빨간색은 인덱스 0의 첫 번째(그리고 유일한) 프레임버퍼에 연결된 이 outColor 변수에 기록됩니다.

정점별 색상

전체 삼각형을 빨간색으로 만드는 것은 그다지 흥미롭지 않습니다. 다음과 같은 것이 더 보기 좋지 않을까요?

이를 구현하기 위해 두 셰이더 모두에 몇 가지 변경을 해야 합니다. 먼저, 세 정점 각각에 대해 서로 다른 색상을 지정해야 합니다. 버텍스 셰이더는 이제 위치에 대한 배열처럼 색상에 대한 배열도 포함해야 합니다:

vec3 colors[3] = vec3[](
    vec3(1.0, 0.0, 0.0),
    vec3(0.0, 1.0, 0.0),
    vec3(0.0, 0.0, 1.0)
);

이제 이러한 정점별 색상을 프래그먼트 셰이더로 전달하여 보간된 값을 프레임버퍼에 출력할 수 있도록 해야 합니다. 버텍스 셰이더에 색상에 대한 출력을 추가하고 main 함수에서 이를 기록합니다:

layout(location = 0) out vec3 fragColor;

void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
    fragColor = colors[gl_VertexIndex];
}

다음으로, 프래그먼트 셰이더에 매칭되는 입력을 추가해야 합니다:

layout(location = 0) in vec3 fragColor;

void main() {
    outColor = vec4(fragColor, 1.0);
}

입력 변수가 반드시 같은 이름을 사용할 필요는 없습니다. location 지시자로 지정된 인덱스를 사용하여 서로 연결됩니다. main 함수는 알파 값과 함께 색상을 출력하도록 수정되었습니다. 위 이미지에서 보이듯이, fragColor의 값은 세 정점 사이의 프래그먼트들에 대해 자동으로 보간되어 부드러운 그라데이션을 만들어냅니다.

셰이더 컴파일하기

프로젝트의 루트 디렉토리에 shaders라는 디렉토리를 만들고, 그 디렉토리에 버텍스 셰이더를 shader.vert 파일에, 프래그먼트 셰이더를 shader.frag 파일에 저장하세요. GLSL 셰이더는 공식 확장자가 없지만, 이 두 가지가 일반적으로 구분을 위해 사용됩니다.

shader.vert의 내용은 다음과 같아야 합니다:

#version 450

layout(location = 0) out vec3 fragColor;

vec2 positions[3] = vec2[](
    vec2(0.0, -0.5),
    vec2(0.5, 0.5),
    vec2(-0.5, 0.5)
);

vec3 colors[3] = vec3[](
    vec3(1.0, 0.0, 0.0),
    vec3(0.0, 1.0, 0.0),
    vec3(0.0, 0.0, 1.0)
);

void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
    fragColor = colors[gl_VertexIndex];
}

그리고 shader.frag의 내용은 다음과 같아야 합니다:

#version 450

layout(location = 0) in vec3 fragColor;

layout(location = 0) out vec4 outColor;

void main() {
    outColor = vec4(fragColor, 1.0);
}

이제 glslc 프로그램을 사용하여 이들을 SPIR-V 바이트코드로 컴파일하겠습니다.

Windows

다음 내용으로 compile.bat 파일을 만드세요:

C:/VulkanSDK/x.x.x.x/Bin/glslc.exe shader.vert -o vert.spv
C:/VulkanSDK/x.x.x.x/Bin/glslc.exe shader.frag -o frag.spv
pause

Vulkan SDK를 설치한 경로로 glslc.exe의 경로를 바꾸세요. 파일을 더블 클릭하여 실행하세요.

Linux

다음 내용으로 compile.sh 파일을 만드세요:

/home/user/VulkanSDK/x.x.x.x/x86_64/bin/glslc shader.vert -o vert.spv
/home/user/VulkanSDK/x.x.x.x/x86_64/bin/glslc shader.frag -o frag.spv

Vulkan SDK를 설치한 경로로 glslc의 경로를 바꾸세요. chmod +x compile.sh로 스크립트를 실행 가능하게 만들고 실행하세요.

플랫폼별 지침 끝

이 두 명령어는 컴파일러에게 GLSL 소스 파일을 읽어서 -o (출력) 플래그를 사용해 SPIR-V 바이트코드 파일을 출력하도록 지시합니다.

셰이더에 구문 오류가 있다면 컴파일러는 예상대로 줄 번호와 문제점을 알려줄 것입니다. 예를 들어 세미콜론을 생략하고 컴파일 스크립트를 다시 실행해보세요. 또한 컴파일러를 인자 없이 실행해보면 어떤 종류의 플래그들을 지원하는지 볼 수 있습니다. 예를 들어, 바이트코드를 사람이 읽을 수 있는 형식으로 출력할 수도 있어서 셰이더가 정확히 무엇을 하는지, 그리고 이 단계에서 어떤 최적화가 적용되었는지 확인할 수 있습니다.

명령줄에서 셰이더를 컴파일하는 것이 가장 간단한 방법 중 하나이며, 이 튜토리얼에서도 이 방법을 사용할 것입니다. 하지만 자신의 코드에서 직접 셰이더를 컴파일하는 것도 가능합니다. Vulkan SDK는 프로그램 내에서 GLSL 코드를 SPIR-V로 컴파일할 수 있는 라이브러리인 libshaderc를 포함하고 있습니다.

셰이더 로딩하기

이제 SPIR-V 셰이더를 생성할 수 있게 되었으니, 그래픽스 파이프라인에 연결하기 위해 프로그램에 로드할 차례입니다. 먼저 파일에서 바이너리 데이터를 로드하는 간단한 헬퍼 함수를 작성해보겠습니다.

#include <fstream>

...

static std::vector<char> readFile(const std::string& filename) {
    std::ifstream file(filename, std::ios::ate | std::ios::binary);

    if (!file.is_open()) {
        throw std::runtime_error("failed to open file!");
    }
}

readFile 함수는 지정된 파일에서 모든 바이트를 읽어서 std::vector가 관리하는 바이트 배열로 반환합니다. 우리는 두 가지 플래그를 사용하여 파일을 엽니다:

ate: 파일의 끝에서부터 읽기 시작
binary: 파일을 바이너리 파일로 읽기 (텍스트 변환 방지)

파일의 끝에서 읽기를 시작하는 것의 장점은 읽기 위치를 통해 파일의 크기를 확인하고 버퍼를 할당할 수 있다는 것입니다:

size_t fileSize = (size_t) file.tellg();
std::vector<char> buffer(fileSize);

그런 다음, 파일의 시작 위치로 되돌아가서 모든 바이트를 한 번에 읽을 수 있습니다:

file.seekg(0);
file.read(buffer.data(), fileSize);

마지막으로 파일을 닫고 바이트들을 반환합니다:

file.close();

return buffer;

이제 createGraphicsPipeline에서 이 함수를 호출하여 두 셰이더의 바이트코드를 로드하겠습니다:

void createGraphicsPipeline() {
    auto vertShaderCode = readFile("shaders/vert.spv");
    auto fragShaderCode = readFile("shaders/frag.spv");
}

버퍼의 크기를 출력하고 실제 파일 크기(바이트)와 일치하는지 확인하여 셰이더가 올바르게 로드되었는지 확인하세요. 바이너리 코드이고 나중에 크기를 명시적으로 지정할 것이기 때문에 코드가 null로 끝날 필요는 없습니다.

셰이더 모듈 생성하기

파이프라인에 코드를 전달하기 전에, 이를 VkShaderModule 객체로 래핑해야 합니다. 이를 위한 헬퍼 함수 createShaderModule을 만들어보겠습니다.

VkShaderModule createShaderModule(const std::vector<char>& code) {

}

이 함수는 바이트코드가 있는 버퍼를 매개변수로 받아 VkShaderModule을 생성합니다.

셰이더 모듈을 생성하는 것은 간단합니다. 바이트코드가 있는 버퍼에 대한 포인터와 그 길이만 지정하면 됩니다. 이 정보는 VkShaderModuleCreateInfo 구조체에 지정됩니다. 한 가지 주의할 점은 바이트코드의 크기는 바이트 단위로 지정되지만, 바이트코드 포인터는 char 포인터가 아닌 uint32_t 포인터여야 한다는 것입니다. 따라서 아래와 같이 reinterpret_cast를 사용하여 포인터를 캐스팅해야 합니다. 이런 캐스팅을 수행할 때는 데이터가 uint32_t의 정렬 요구사항을 만족하는지 확인해야 합니다. 다행히도 데이터는 std::vector에 저장되어 있고, 기본 할당자가 이미 최악의 경우의 정렬 요구사항을 만족하도록 보장합니다.

VkShaderModuleCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
createInfo.codeSize = code.size();
createInfo.pCode = reinterpret_cast<const uint32_t*>(code.data());

그런 다음 vkCreateShaderModule을 호출하여 VkShaderModule을 생성할 수 있습니다:

VkShaderModule shaderModule;
if (vkCreateShaderModule(device, &createInfo, nullptr, &shaderModule) != VK_SUCCESS) {
    throw std::runtime_error("failed to create shader module!");
}

매개변수들은 이전의 객체 생성 함수들과 동일합니다: 논리적 디바이스, 생성 정보 구조체에 대한 포인터, 선택적 커스텀 할당자에 대한 포인터, 그리고 핸들 출력 변수입니다. 셰이더 모듈을 생성한 후에는 바로 코드가 담긴 버퍼를 해제할 수 있습니다. 생성된 셰이더 모듈을 반환하는 것을 잊지 마세요:

return shaderModule;

셰이더 모듈은 우리가 이전에 파일에서 로드한 셰이더 바이트코드와 그 안에 정의된 함수들을 감싸는 얇은 래퍼일 뿐입니다. SPIR-V 바이트코드를 GPU에서 실행할 수 있는 기계어로 컴파일하고 링크하는 것은 그래픽스 파이프라인이 생성될 때까지 일어나지 않습니다. 이는 파이프라인 생성이 완료되는 즉시 셰이더 모듈을 파괴할 수 있다는 것을 의미하며, 이것이 바로 우리가 이들을 클래스 멤버가 아닌 createGraphicsPipeline 함수의 지역 변수로 만드는 이유입니다:

void createGraphicsPipeline() {
    auto vertShaderCode = readFile("shaders/vert.spv");
    auto fragShaderCode = readFile("shaders/frag.spv");

    VkShaderModule vertShaderModule = createShaderModule(vertShaderCode);
    VkShaderModule fragShaderModule = createShaderModule(fragShaderCode);

그런 다음 함수의 끝에서 vkDestroyShaderModule을 두 번 호출하여 정리해야 합니다. 이 장의 나머지 코드는 모두 이 라인들 앞에 삽입될 것입니다.

    ...
    vkDestroyShaderModule(device, fragShaderModule, nullptr);
    vkDestroyShaderModule(device, vertShaderModule, nullptr);
}

셰이더 스테이지 생성

셰이더를 실제로 사용하기 위해서는 파이프라인 생성 과정의 일부로 VkPipelineShaderStageCreateInfo 구조체를 통해 특정 파이프라인 스테이지에 할당해야 합니다.

createGraphicsPipeline 함수에서 버텍스 셰이더를 위한 구조체를 먼저 채워보겠습니다.

VkPipelineShaderStageCreateInfo vertShaderStageInfo{};
vertShaderStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
vertShaderStageInfo.stage = VK_SHADER_STAGE_VERTEX_BIT;

필수적인 sType 멤버 외에도, 첫 번째 단계는 Vulkan에게 파이프라인의 어느 스테이지에서 셰이더가 사용될 것인지 알려주는 것입니다. 이전 장에서 설명한 각각의 프로그래밍 가능한 스테이지에 대한 열거형 값이 있습니다.

vertShaderStageInfo.module = vertShaderModule;
vertShaderStageInfo.pName = "main";

다음 두 멤버는 코드가 포함된 셰이더 모듈과, 엔트리포인트라고 알려진 호출할 함수를 지정합니다. 이는 여러 개의 프래그먼트 셰이더를 하나의 셰이더 모듈로 결합하고 서로 다른 엔트리 포인트를 사용하여 각각의 동작을 구분할 수 있다는 것을 의미합니다. 이 경우에는 표준인 main을 사용하겠습니다.

여기서 사용하지는 않지만 논의할 가치가 있는 또 하나의 (선택적) 멤버가 있는데, 바로 pSpecializationInfo입니다. 이를 통해 셰이더 상수에 대한 값을 지정할 수 있습니다. 하나의 셰이더 모듈을 사용하면서 파이프라인 생성 시점에 모듈에서 사용되는 상수들에 대해 서로 다른 값을 지정함으로써 동작을 구성할 수 있습니다. 이는 렌더링 시점에 변수를 사용하여 셰이더를 구성하는 것보다 더 효율적입니다. 컴파일러가 이러한 값들에 의존하는 if 문을 제거하는 등의 최적화를 수행할 수 있기 때문입니다. 만약 그런 상수들이 없다면, 이 멤버를 nullptr로 설정할 수 있으며, 우리의 구조체 초기화는 이를 자동으로 수행합니다.

프래그먼트 셰이더에 맞게 구조체를 수정하는 것은 간단합니다:

VkPipelineShaderStageCreateInfo fragShaderStageInfo{};
fragShaderStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
fragShaderStageInfo.stage = VK_SHADER_STAGE_FRAGMENT_BIT;
fragShaderStageInfo.module = fragShaderModule;
fragShaderStageInfo.pName = "main";

마지막으로 이 두 구조체를 포함하는 배열을 정의하여, 나중에 실제 파이프라인 생성 단계에서 이들을 참조할 수 있도록 합니다.

VkPipelineShaderStageCreateInfo shaderStages[] = {vertShaderStageInfo, fragShaderStageInfo};

파이프라인의 프로그래밍 가능한 스테이지를 설명하는 것은 이게 전부입니다. 다음 장에서는 고정 기능 스테이지들을 살펴보겠습니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

고정 함수

이전 그래픽스 API들은 그래픽스 파이프라인의 대부분의 스테이지에 대해 기본 상태를 제공했습니다. Vulkan에서는 대부분의 파이프라인 상태를 명시적으로 지정해야 합니다. 이는 변경할 수 없는 파이프라인 상태 객체에 포함되기 때문입니다. 이 장에서는 이러한 고정 함수 작업을 구성하기 위한 모든 구조체를 채워보겠습니다.

동적 상태

파이프라인 상태의 대부분은 파이프라인 상태에 포함되어야 하지만, 일부 제한된 상태는 실제로 파이프라인을 재생성하지 않고도 드로우 시점에 변경할 수 있습니다. 뷰포트의 크기, 선 굵기, 블렌드 상수 등이 그 예입니다. 동적 상태를 사용하고 이러한 속성들을 제외하고 싶다면, 다음과 같이 VkPipelineDynamicStateCreateInfo 구조체를 채워야 합니다:

std::vector<VkDynamicState> dynamicStates = {
    VK_DYNAMIC_STATE_VIEWPORT,
    VK_DYNAMIC_STATE_SCISSOR
};

VkPipelineDynamicStateCreateInfo dynamicState{};
dynamicState.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO;
dynamicState.dynamicStateCount = static_cast<uint32_t>(dynamicStates.size());
dynamicState.pDynamicStates = dynamicStates.data();

이렇게 하면 이러한 값들의 구성이 무시되고, 드로잉 시점에 데이터를 지정할 수 있게(그리고 필수적으로 지정해야 하게) 됩니다. 이는 더 유연한 설정을 가능하게 하며, 뷰포트와 시저 상태와 같이 파이프라인 상태에 포함될 경우 더 복잡한 설정이 필요한 것들에 대해 매우 일반적입니다.

버텍스 입력

VkPipelineVertexInputStateCreateInfo 구조체는 버텍스 셰이더에 전달될 버텍스 데이터의 형식을 설명합니다. 이는 크게 두 가지 방식으로 설명됩니다:

바인딩: 데이터 간의 간격과 데이터가 버텍스별인지 인스턴스별인지 여부(인스턴싱 참조)
어트리뷰트 설명: 버텍스 셰이더에 전달되는 어트리뷰트의 타입, 어떤 바인딩에서 로드할지, 어떤 오프셋에서 로드할지

우리는 버텍스 데이터를 버텍스 셰이더에 직접 하드코딩할 것이므로, 현재로서는 로드할 버텍스 데이터가 없음을 지정하기 위해 이 구조체를 다음과 같이 채우겠습니다. 버텍스 버퍼 장에서 이 부분으로 다시 돌아올 것입니다.

VkPipelineVertexInputStateCreateInfo vertexInputInfo{};
vertexInputInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO;
vertexInputInfo.vertexBindingDescriptionCount = 0;
vertexInputInfo.pVertexBindingDescriptions = nullptr; // 선택적
vertexInputInfo.vertexAttributeDescriptionCount = 0;
vertexInputInfo.pVertexAttributeDescriptions = nullptr; // 선택적

pVertexBindingDescriptions와 pVertexAttributeDescriptions 멤버는 버텍스 데이터를 로드하기 위한 앞서 언급한 세부사항을 설명하는 구조체 배열을 가리킵니다. createGraphicsPipeline 함수에서 shaderStages 배열 바로 다음에 이 구조체를 추가하세요.

입력 어셈블리

VkPipelineInputAssemblyStateCreateInfo 구조체는 두 가지를 설명합니다: 버텍스로부터 어떤 종류의 도형이 그려질 것인지와 프리미티브 재시작이 활성화되어야 하는지 여부입니다. 전자는 topology 멤버에서 지정되며 다음과 같은 값들을 가질 수 있습니다:

VK_PRIMITIVE_TOPOLOGY_POINT_LIST: 버텍스로부터의 점
VK_PRIMITIVE_TOPOLOGY_LINE_LIST: 재사용 없이 매 2개의 버텍스로부터의 선
VK_PRIMITIVE_TOPOLOGY_LINE_STRIP: 모든 선의 끝 버텍스가 다음 선의 시작 버텍스로 사용됨
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST: 재사용 없이 매 3개의 버텍스로부터의 삼각형
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP: 모든 삼각형의 두 번째와 세 번째 버텍스가 다음 삼각형의 첫 두 버텍스로 사용됨

일반적으로 버텍스는 순차적인 순서로 버텍스 버퍼에서 인덱스로 로드되지만, 엘리먼트 버퍼를 사용하면 사용할 인덱스를 직접 지정할 수 있습니다. 이를 통해 버텍스를 재사용하는 등의 최적화를 수행할 수 있습니다. primitiveRestartEnable 멤버를 VK_TRUE로 설정하면, _STRIP 토폴로지 모드에서 특별한 인덱스인 0xFFFF 또는 0xFFFFFFFF를 사용하여 선과 삼각형을 분리할 수 있습니다.

이 튜토리얼에서는 삼각형을 그릴 것이므로, 구조체에 다음과 같은 데이터를 사용하겠습니다:

VkPipelineInputAssemblyStateCreateInfo inputAssembly{};
inputAssembly.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO;
inputAssembly.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST;
inputAssembly.primitiveRestartEnable = VK_FALSE;

뷰포트와 시저

뷰포트는 기본적으로 출력이 렌더링될 프레임버퍼의 영역을 설명합니다. 이는 거의 항상 (0, 0)에서 (width, height)까지가 될 것이며, 이 튜토리얼에서도 마찬가지입니다.

VkViewport viewport{};
viewport.x = 0.0f;
viewport.y = 0.0f;
viewport.width = (float) swapChainExtent.width;
viewport.height = (float) swapChainExtent.height;
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;

스왑 체인과 그 이미지들의 크기가 윈도우의 WIDTH와 HEIGHT와 다를 수 있다는 것을 기억하세요. 스왑 체인 이미지들은 나중에 프레임버퍼로 사용될 것이므로, 우리는 그들의 크기를 사용해야 합니다.

minDepth와 maxDepth 값은 프레임버퍼에 사용할 깊이 값의 범위를 지정합니다. 이 값들은 [0.0f, 1.0f] 범위 내에 있어야 하지만, minDepth가 maxDepth보다 클 수 있습니다. 특별한 작업을 하지 않는다면, 표준 값인 0.0f와 1.0f를 사용하면 됩니다.

뷰포트가 이미지에서 프레임버퍼로의 변환을 정의하는 반면, 시저 사각형은 픽셀이 실제로 저장될 영역을 정의합니다. 시저 사각형 외부의 모든 픽셀은 래스터라이저에 의해 폐기됩니다. 이들은 변환이 아닌 필터처럼 작동합니다. 이 차이는 아래 그림에서 설명됩니다. 왼쪽 시저 사각형은 뷰포트보다 크기만 하다면 그 이미지를 만들어낼 수 있는 많은 가능성 중 하나일 뿐임을 주목하세요.

따라서 전체 프레임버퍼에 그리고 싶다면, 다음과 같이 전체를 덮는 시저 사각형을 지정하면 됩니다:

VkRect2D scissor{};
scissor.offset = {0, 0};
scissor.extent = swapChainExtent;

뷰포트와 시저 사각형은 파이프라인의 정적 부분으로 지정하거나 동적 상태로 커맨드 버퍼에서 설정할 수 있습니다. 전자가 다른 상태들과 더 일치하지만, 뷰포트와 시저 상태를 동적으로 만드는 것이 더 많은 유연성을 제공하므로 종종 더 편리합니다. 이는 매우 일반적이며 모든 구현체가 성능 저하 없이 이 동적 상태를 처리할 수 있습니다.

동적 뷰포트와 시저 사각형을 선택할 경우 파이프라인에 대해 해당 동적 상태를 활성화해야 합니다:

std::vector<VkDynamicState> dynamicStates = {
    VK_DYNAMIC_STATE_VIEWPORT,
    VK_DYNAMIC_STATE_SCISSOR
};

VkPipelineDynamicStateCreateInfo dynamicState{};
dynamicState.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO;
dynamicState.dynamicStateCount = static_cast<uint32_t>(dynamicStates.size());
dynamicState.pDynamicStates = dynamicStates.data();

그리고 파이프라인 생성 시에는 개수만 지정하면 됩니다:

VkPipelineViewportStateCreateInfo viewportState{};
viewportState.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO;
viewportState.viewportCount = 1;
viewportState.scissorCount = 1;

실제 뷰포트와 시저 사각형은 나중에 드로잉 시점에 설정됩니다.

동적 상태를 사용하면 단일 커맨드 버퍼 내에서 서로 다른 뷰포트나 시저 사각형을 지정하는 것도 가능합니다.

동적 상태 없이는 뷰포트와 시저 사각형을 VkPipelineViewportStateCreateInfo 구조체를 사용하여 파이프라인에서 설정해야 합니다. 이는 이 파이프라인의 뷰포트와 시저 사각형을 변경할 수 없게 만듭니다. 이 값들을 변경하려면 새로운 값으로 새 파이프라인을 생성해야 합니다.

VkPipelineViewportStateCreateInfo viewportState{};
viewportState.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO;
viewportState.viewportCount = 1;
viewportState.pViewports = &viewport;
viewportState.scissorCount = 1;
viewportState.pScissors = &scissor;

설정 방법과 관계없이, 일부 그래픽 카드에서는 여러 뷰포트와 시저 사각형을 사용할 수 있습니다. 따라서 구조체 멤버들은 이들의 배열을 참조합니다. 여러 개를 사용하려면 GPU 기능을 활성화해야 합니다(논리적 디바이스 생성 참조).

래스터라이저

래스터라이저는 버텍스 셰이더의 버텍스들로 형성된 도형을 가져와서 프래그먼트 셰이더가 색칠할 프래그먼트로 변환합니다. 또한 깊이 테스트, 면 컬링, 시저 테스트를 수행하며, 폴리곤 전체를 채우거나 가장자리만(와이어프레임 렌더링) 프래그먼트를 출력하도록 구성할 수 있습니다. 이 모든 것은 VkPipelineRasterizationStateCreateInfo 구조체를 사용하여 구성됩니다.

VkPipelineRasterizationStateCreateInfo rasterizer{};
rasterizer.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO;
rasterizer.depthClampEnable = VK_FALSE;

depthClampEnable이 VK_TRUE로 설정되면, near 평면과 far 평면을 넘어서는 프래그먼트들이 폐기되는 대신 해당 평면에 고정됩니다. 이는 섀도우 맵과 같은 특수한 경우에 유용합니다. 이를 사용하려면 GPU 기능을 활성화해야 합니다.

rasterizer.rasterizerDiscardEnable = VK_FALSE;

rasterizerDiscardEnable이 VK_TRUE로 설정되면, 도형이 래스터라이저 스테이지를 전혀 통과하지 않습니다. 이는 기본적으로 프레임버퍼로의 모든 출력을 비활성화합니다.

rasterizer.polygonMode = VK_POLYGON_MODE_FILL;

polygonMode는 도형에 대한 프래그먼트 생성 방식을 결정합니다. 다음과 같은 모드들을 사용할 수 있습니다:

VK_POLYGON_MODE_FILL: 프래그먼트로 폴리곤의 영역을 채움
VK_POLYGON_MODE_LINE: 폴리곤의 가장자리를 선으로 그림
VK_POLYGON_MODE_POINT: 폴리곤의 버텍스를 점으로 그림

채우기 모드 이외의 모드를 사용하려면 GPU 기능을 활성화해야 합니다.

rasterizer.lineWidth = 1.0f;

lineWidth 멤버는 간단합니다. 프래그먼트 수 단위로 선의 두께를 지정합니다. 지원되는 최대 선 두께는 하드웨어에 따라 다르며, 1.0f보다 두꺼운 선을 사용하려면 wideLines GPU 기능을 활성화해야 합니다.

rasterizer.cullMode = VK_CULL_MODE_BACK_BIT;
rasterizer.frontFace = VK_FRONT_FACE_CLOCKWISE;

cullMode 변수는 사용할 면 컬링의 유형을 결정합니다. 컬링을 비활성화하거나, 앞면을 컬링하거나, 뒷면을 컬링하거나, 둘 다 컬링할 수 있습니다. frontFace 변수는 앞면으로 간주될 면의 버텍스 순서를 지정하며, 시계 방향이나 반시계 방향이 될 수 있습니다.

rasterizer.depthBiasEnable = VK_FALSE;
rasterizer.depthBiasConstantFactor = 0.0f; // 선택적
rasterizer.depthBiasClamp = 0.0f; // 선택적
rasterizer.depthBiasSlopeFactor = 0.0f; // 선택적

래스터라이저는 상수 값을 추가하거나 프래그먼트의 기울기를 기반으로 바이어스를 주는 방식으로 깊이 값을 변경할 수 있습니다. 이는 때때로 섀도우 매핑에 사용되지만, 우리는 사용하지 않을 것입니다. 그냥 depthBiasEnable을 VK_FALSE로 설정하세요.

멀티샘플링

VkPipelineMultisampleStateCreateInfo 구조체는 안티앨리어싱을 수행하는 방법 중 하나인 멀티샘플링을 구성합니다. 이는 동일한 픽셀에 래스터화되는 여러 폴리곤의 프래그먼트 셰이더 결과를 결합하는 방식으로 작동합니다. 이는 주로 가장자리를 따라 발생하며, 이는 또한 가장 눈에 띄는 앨리어싱 아티팩트가 발생하는 곳이기도 합니다. 하나의 폴리곤만 픽셀에 매핑되는 경우 프래그먼트 셰이더를 여러 번 실행할 필요가 없기 때문에, 단순히 더 높은 해상도로 렌더링한 다음 다운스케일하는 것보다 훨씬 비용이 적게 듭니다. 이를 활성화하려면 GPU 기능을 활성화해야 합니다.

VkPipelineMultisampleStateCreateInfo multisampling{};
multisampling.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO;
multisampling.sampleShadingEnable = VK_FALSE;
multisampling.rasterizationSamples = VK_SAMPLE_COUNT_1_BIT;
multisampling.minSampleShading = 1.0f; // 선택적
multisampling.pSampleMask = nullptr; // 선택적
multisampling.alphaToCoverageEnable = VK_FALSE; // 선택적
multisampling.alphaToOneEnable = VK_FALSE; // 선택적

나중 장에서 멀티샘플링을 다시 살펴볼 것입니다. 지금은 비활성화된 상태로 두겠습니다.

깊이와 스텐실 테스트

깊이 및/또는 스텐실 버퍼를 사용하는 경우, VkPipelineDepthStencilStateCreateInfo를 사용하여 깊이 및 스텐실 테스트를 구성해야 합니다. 지금은 없으므로 이러한 구조체에 대한 포인터 대신 nullptr을 전달할 수 있습니다. 깊이 버퍼링 장에서 이 부분으로 돌아올 것입니다.

색상 블렌딩

프래그먼트 셰이더가 색상을 반환한 후, 이를 이미 프레임버퍼에 있는 색상과 결합해야 합니다. 이 변환을 색상 블렌딩이라고 하며, 이를 수행하는 두 가지 방법이 있습니다:

이전 값과 새 값을 혼합하여 최종 색상 생성
비트 연산을 사용하여 이전 값과 새 값을 결합

색상 블렌딩을 구성하기 위한 두 가지 구조체가 있습니다. 첫 번째 구조체인 VkPipelineColorBlendAttachmentState는 연결된 프레임버퍼별 구성을 포함하고, 두 번째 구조체인 VkPipelineColorBlendStateCreateInfo는 전역 색상 블렌딩 설정을 포함합니다. 우리의 경우 하나의 프레임버퍼만 있습니다:

VkPipelineColorBlendAttachmentState colorBlendAttachment{};
colorBlendAttachment.colorWriteMask = VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT | VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT;
colorBlendAttachment.blendEnable = VK_FALSE;
colorBlendAttachment.srcColorBlendFactor = VK_BLEND_FACTOR_ONE; // 선택적
colorBlendAttachment.dstColorBlendFactor = VK_BLEND_FACTOR_ZERO; // 선택적
colorBlendAttachment.colorBlendOp = VK_BLEND_OP_ADD; // 선택적
colorBlendAttachment.srcAlphaBlendFactor = VK_BLEND_FACTOR_ONE; // 선택적
colorBlendAttachment.dstAlphaBlendFactor = VK_BLEND_FACTOR_ZERO; // 선택적
colorBlendAttachment.alphaBlendOp = VK_BLEND_OP_ADD; // 선택적

이 프레임버퍼별 구조체를 통해 첫 번째 색상 블렌딩 방법을 구성할 수 있습니다. 수행될 연산은 다음의 의사 코드로 가장 잘 설명됩니다:

if (blendEnable) {
    finalColor.rgb = (srcColorBlendFactor * newColor.rgb) <colorBlendOp> (dstColorBlendFactor * oldColor.rgb);
    finalColor.a = (srcAlphaBlendFactor * newColor.a) <alphaBlendOp> (dstAlphaBlendFactor * oldColor.a);
} else {
    finalColor = newColor;
}

finalColor = finalColor & colorWriteMask;

blendEnable이 VK_FALSE로 설정되면, 프래그먼트 셰이더의 새로운 색상이 수정 없이 그대로 전달됩니다. 그렇지 않으면, 새로운 색상을 계산하기 위해 두 가지 혼합 연산이 수행됩니다. 결과 색상은 colorWriteMask와 AND 연산되어 실제로 어떤 채널이 전달될지 결정됩니다.

색상 블렌딩을 사용하는 가장 일반적인 방법은 알파 블렌딩을 구현하는 것입니다. 이는 새로운 색상을 그 불투명도를 기반으로 이전 색상과 블렌딩하고자 할 때 사용됩니다. finalColor는 다음과 같이 계산되어야 합니다:

finalColor.rgb = newAlpha * newColor + (1 - newAlpha) * oldColor;
finalColor.a = newAlpha.a;

이는 다음과 같은 매개변수들로 구현할 수 있습니다:

colorBlendAttachment.blendEnable = VK_TRUE;
colorBlendAttachment.srcColorBlendFactor = VK_BLEND_FACTOR_SRC_ALPHA;
colorBlendAttachment.dstColorBlendFactor = VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA;
colorBlendAttachment.colorBlendOp = VK_BLEND_OP_ADD;
colorBlendAttachment.srcAlphaBlendFactor = VK_BLEND_FACTOR_ONE;
colorBlendAttachment.dstAlphaBlendFactor = VK_BLEND_FACTOR_ZERO;
colorBlendAttachment.alphaBlendOp = VK_BLEND_OP_ADD;

가능한 모든 연산은 사양서의 VkBlendFactor와 VkBlendOp 열거형에서 찾을 수 있습니다.

두 번째 구조체는 모든 프레임버퍼에 대한 구조체 배열을 참조하며, 앞서 언급한 계산에서 블렌드 팩터로 사용할 수 있는 블렌드 상수를 설정할 수 있게 해줍니다.

VkPipelineColorBlendStateCreateInfo colorBlending{};
colorBlending.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO;
colorBlending.logicOpEnable = VK_FALSE;
colorBlending.logicOp = VK_LOGIC_OP_COPY; // 선택적
colorBlending.attachmentCount = 1;
colorBlending.pAttachments = &colorBlendAttachment;
colorBlending.blendConstants[0] = 0.0f; // 선택적
colorBlending.blendConstants[1] = 0.0f; // 선택적
colorBlending.blendConstants[2] = 0.0f; // 선택적
colorBlending.blendConstants[3] = 0.0f; // 선택적

두 번째 블렌딩 방법(비트 단위 결합)을 사용하고 싶다면, logicOpEnable을 VK_TRUE로 설정해야 합니다. 그러면 비트 단위 연산을 logicOp 필드에서 지정할 수 있습니다. 이렇게 하면 마치 연결된 모든 프레임버퍼에 대해 blendEnable을 VK_FALSE로 설정한 것처럼 첫 번째 방법이 자동으로 비활성화된다는 점에 주의하세요! colorWriteMask는 이 모드에서도 프레임버퍼의 어떤 채널이 실제로 영향을 받을지 결정하는 데 사용됩니다. 여기서 우리가 한 것처럼 두 모드를 모두 비활성화하는 것도 가능한데, 이 경우 프래그먼트 색상이 수정 없이 프레임버퍼에 기록됩니다.

파이프라인 레이아웃

셰이더에서 uniform 값을 사용할 수 있는데, 이는 동적 상태 변수와 비슷한 전역 변수로, 셰이더를 재생성하지 않고도 드로잉 시점에 변경하여 셰이더의 동작을 변경할 수 있습니다. 이는 일반적으로 버텍스 셰이더에 변환 행렬을 전달하거나, 프래그먼트 셰이더에서 텍스처 샘플러를 생성하는 데 사용됩니다.

이러한 uniform 값들은 VkPipelineLayout 객체를 생성하여 파이프라인 생성 중에 지정되어야 합니다. 나중 장까지 이를 사용하지는 않겠지만, 빈 파이프라인 레이아웃은 생성해야 합니다.

나중에 다른 함수에서 이 객체를 참조할 것이므로, 클래스 멤버로 이 객체를 저장할 변수를 만듭니다:

VkPipelineLayout pipelineLayout;

그리고 createGraphicsPipeline 함수에서 객체를 생성합니다:

VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 0; // 선택적
pipelineLayoutInfo.pSetLayouts = nullptr; // 선택적
pipelineLayoutInfo.pushConstantRangeCount = 0; // 선택적
pipelineLayoutInfo.pPushConstantRanges = nullptr; // 선택적

if (vkCreatePipelineLayout(device, &pipelineLayoutInfo, nullptr, &pipelineLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create pipeline layout!");
}

이 구조체는 또한 푸시 상수를 지정하는데, 이는 셰이더에 동적 값을 전달하는 또 다른 방법으로, 나중 장에서 다룰 수 있습니다. 파이프라인 레이아웃은 프로그램의 수명 전체에 걸쳐 참조될 것이므로, 마지막에 파괴해야 합니다:

void cleanup() {
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    ...
}

결론

이것으로 모든 고정 함수 상태가 끝났습니다! 이 모든 것을 처음부터 설정하는 것은 많은 작업이 필요하지만, 장점은 이제 그래픽스 파이프라인에서 일어나는 모든 것을 거의 완전히 알게 되었다는 것입니다! 이는 특정 컴포넌트의 기본 상태가 예상과 다른 경우에 발생할 수 있는 예기치 않은 동작의 가능성을 줄여줍니다.

하지만 그래픽스 파이프라인을 마침내 생성하기 전에 생성해야 할 객체가 하나 더 있습니다. 바로 렌더 패스입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

렌더 패스 (Render passes)

설정

파이프라인 생성을 마무리하기 전에, 렌더링하는 동안 사용될 프레임버퍼 어태치먼트(attachment, 첨부되는—렌더링 결과가 저장될—버퍼)들에 대해 Vulkan에게 알려주어야 합니다. 우리는 컬러 버퍼와 깊이 버퍼가 각각 몇 개가 있을지, 각각에 대해 몇 개의 샘플을 사용할지, 그리고 렌더링 작업 전반에 걸쳐 이들의 내용을 어떻게 처리할지 지정해야 합니다. 이 모든 정보는 렌더 패스 객체에 포함되며, 이를 위해 새로운 createRenderPass 함수를 만들 것입니다. 이 함수를 initVulkan에서 createGraphicsPipeline 전에 호출하세요.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
}

...

void createRenderPass() {

}

어태치먼트 설명

우리의 경우에는 스왑 체인의 이미지 중 하나로 표현되는 단일 컬러 버퍼 어태치먼트만 가지게 될 것입니다.

void createRenderPass() {
    VkAttachmentDescription colorAttachment{};
    colorAttachment.format = swapChainImageFormat;
    colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
}

컬러 어태치먼트의 format은 스왑 체인 이미지의 형식과 일치해야 하며, 아직 멀티샘플링을 사용하지 않으므로 1개의 샘플을 사용하겠습니다.

colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;

loadOp와 storeOp는 렌더링 전과 후에 어태치먼트의 데이터를 어떻게 처리할지 결정합니다. loadOp에 대해서는 다음과 같은 선택지가 있습니다:

VK_ATTACHMENT_LOAD_OP_LOAD: 어태치먼트의 기존 내용을 보존
VK_ATTACHMENT_LOAD_OP_CLEAR: 시작 시 값을 상수로 초기화
VK_ATTACHMENT_LOAD_OP_DONT_CARE: 기존 내용이 정의되지 않음; 신경 쓰지 않음

우리의 경우에는 새 프레임을 그리기 전에 프레임버퍼를 검은색으로 초기화하기 위해 초기화 작업을 사용할 것입니다. storeOp에 대해서는 두 가지 가능성만 있습니다:

VK_ATTACHMENT_STORE_OP_STORE: 렌더링된 내용이 메모리에 저장되어 나중에 읽을 수 있음
VK_ATTACHMENT_STORE_OP_DONT_CARE: 렌더링 작업 후 프레임버퍼의 내용이 정의되지 않음

우리는 화면에 렌더링된 삼각형을 보고 싶으므로, 여기서는 저장 작업을 선택하겠습니다.

colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;

loadOp와 storeOp는 컬러와 깊이 데이터에 적용되며, stencilLoadOp와 stencilStoreOp는 스텐실 데이터에 적용됩니다. 우리의 애플리케이션은 스텐실 버퍼를 사용하지 않을 것이므로, 로딩과 저장의 결과는 중요하지 않습니다.

colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
colorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;

Vulkan에서 텍스처와 프레임버퍼는 특정 픽셀 형식을 가진 VkImage 객체로 표현되지만, 메모리에서 픽셀의 레이아웃은 이미지로 하려는 작업에 따라 달라질 수 있습니다.

가장 일반적인 레이아웃들은 다음과 같습니다:

VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL: 컬러 어태치먼트로 사용되는 이미지
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: 스왑 체인에서 표시될 이미지
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL: 메모리 복사 작업의 대상으로 사용될 이미지

이 주제에 대해서는 텍스처링 장에서 더 자세히 다룰 것입니다. 지금 알아야 할 중요한 점은 이미지들이 다음에 관여할 작업에 적합한 특정 레이아웃으로 전환되어야 한다는 것입니다.

initialLayout은 렌더 패스가 시작되기 전에 이미지가 가질 레이아웃을 지정합니다. finalLayout은 렌더 패스가 끝날 때 자동으로 전환될 레이아웃을 지정합니다. initialLayout에 VK_IMAGE_LAYOUT_UNDEFINED를 사용한다는 것은 이미지의 이전 레이아웃이 무엇이었는지 신경 쓰지 않는다는 의미입니다. 이 특별한 값의 주의사항은 이미지의 내용이 보존된다는 보장이 없다는 것이지만, 어차피 초기화할 것이므로 문제가 되지 않습니다. 렌더링 후에는 스왑 체인을 사용하여 표시할 수 있도록 이미지를 준비하고 싶으므로, finalLayout으로 VK_IMAGE_LAYOUT_PRESENT_SRC_KHR을 사용합니다.

서브패스와 어태치먼트 참조

하나의 렌더 패스는 여러 서브패스로 구성될 수 있습니다. 서브패스는 이전 패스의 프레임버퍼 내용에 의존하는 후속 렌더링 작업입니다. 예를 들어 하나씩 적용되는 일련의 후처리 효과들이 있습니다. 이러한 렌더링 작업들을 하나의 렌더 패스로 그룹화하면, Vulkan은 작업들을 재정렬하고 메모리 대역폭을 절약하여 더 나은 성능을 얻을 수 있습니다. 하지만 우리의 첫 번째 삼각형에서는 단일 서브패스만 사용하겠습니다.

각 서브패스는 이전 섹션에서 설명한 구조체를 사용하여 기술한 어태치먼트들 중 하나 이상을 참조합니다. 이러한 참조는 다음과 같은 VkAttachmentReference 구조체입니다:

VkAttachmentReference colorAttachmentRef{};
colorAttachmentRef.attachment = 0;
colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

attachment 매개변수는 어태치먼트 설명 배열에서의 인덱스로 어떤 어태치먼트를 참조할지 지정합니다. 우리의 배열은 하나의 VkAttachmentDescription으로 구성되어 있으므로, 인덱스는 0입니다. layout은 이 참조를 사용하는 서브패스 동안 어태치먼트가 가지기를 원하는 레이아웃을 지정합니다. Vulkan은 서브패스가 시작될 때 자동으로 어태치먼트를 이 레이아웃으로 전환할 것입니다. 우리는 어태치먼트를 컬러 버퍼로 사용하려고 하며, 이름에서 알 수 있듯이 VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL 레이아웃이 가장 좋은 성능을 제공할 것입니다.

서브패스는 VkSubpassDescription 구조체를 사용하여 설명됩니다:

VkSubpassDescription subpass{};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;

Vulkan은 나중에 컴퓨트 서브패스도 지원할 수 있으므로, 이것이 그래픽스 서브패스라는 것을 명시적으로 지정해야 합니다. 다음으로, 컬러 어태치먼트에 대한 참조를 지정합니다:

subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentRef;

이 배열에서의 어태치먼트 인덱스는 프래그먼트 셰이더에서 layout(location = 0) out vec4 outColor 지시어로 직접 참조됩니다!

서브패스에서 참조할 수 있는 다른 종류의 어태치먼트들은 다음과 같습니다:

pInputAttachments: 셰이더에서 읽는 어태치먼트
pResolveAttachments: 멀티샘플링 컬러 어태치먼트에 사용되는 어태치먼트
pDepthStencilAttachment: 깊이와 스텐실 데이터를 위한 어태치먼트
pPreserveAttachments: 이 서브패스에서는 사용되지 않지만 데이터를 보존해야 하는 어태치먼트

렌더 패스

이제 어태치먼트와 이를 참조하는 기본적인 서브패스가 설명되었으므로, 렌더 패스 자체를 생성할 수 있습니다. pipelineLayout 변수 바로 위에 VkRenderPass 객체를 담을 새로운 클래스 멤버 변수를 만드세요:

VkRenderPass renderPass;
VkPipelineLayout pipelineLayout;

그런 다음 어태치먼트와 서브패스의 배열을 포함하는 VkRenderPassCreateInfo 구조체를 채워서 렌더 패스 객체를 생성할 수 있습니다. VkAttachmentReference 객체들은 이 배열의 인덱스를 사용하여 어태치먼트를 참조합니다.

VkRenderPassCreateInfo renderPassInfo{};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassInfo.attachmentCount = 1;
renderPassInfo.pAttachments = &colorAttachment;
renderPassInfo.subpassCount = 1;
renderPassInfo.pSubpasses = &subpass;

if (vkCreateRenderPass(device, &renderPassInfo, nullptr, &renderPass) != VK_SUCCESS) {
    throw std::runtime_error("failed to create render pass!");
}

파이프라인 레이아웃과 마찬가지로, 렌더 패스는 프로그램 전체에서 참조될 것이므로 마지막에만 정리해야 합니다:

void cleanup() {
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    vkDestroyRenderPass(device, renderPass, nullptr);
    ...
}

많은 작업이었지만, 다음 장에서는 이 모든 것이 합쳐져서 마침내 그래픽스 파이프라인 객체를 생성하게 됩니다!

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

결론

이제 이전 장들에서 다룬 모든 구조체와 객체들을 조합하여 그래픽스 파이프라인을 생성할 수 있습니다! 지금까지 우리가 다룬 객체들의 종류를 간단히 복습해보겠습니다:

셰이더 단계(Shader stages): 그래픽스 파이프라인의 프로그래밍 가능한 단계의 기능을 정의하는 셰이더 모듈
고정 함수 상태(Fixed-function state): 입력 어셈블리(input assembly), 래스터라이저(rasterizer), 뷰포트(viewport), 컬러 블렌딩(color blending) 등과 같은 파이프라인의 고정 함수 단계를 정의하는 모든 구조체
파이프라인 레이아웃(Pipeline layout): 셰이더에서 참조하는 유니폼(uniform) 및 푸시(push) 값으로, 드로우 시간에 업데이트할 수 있음
렌더 패스(Render pass): 파이프라인 단계에서 참조하는 어태치먼트(attachments)와 그 사용 방법

이 모든 것들이 조합되어 그래픽스 파이프라인의 기능을 완전히 정의합니다. 따라서 이제 createGraphicsPipeline 함수의 마지막 부분에서 VkGraphicsPipelineCreateInfo 구조체를 채워나갈 수 있습니다. 하지만 이 작업은 vkDestroyShaderModule 호출 전에 이루어져야 합니다. 왜냐하면 셰이더 모듈은 파이프라인 생성 과정에서 여전히 사용되기 때문입니다.

VkGraphicsPipelineCreateInfo pipelineInfo{};
pipelineInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
pipelineInfo.stageCount = 2;
pipelineInfo.pStages = shaderStages;

먼저, VkPipelineShaderStageCreateInfo 구조체 배열을 참조합니다.

pipelineInfo.pVertexInputState = &vertexInputInfo;
pipelineInfo.pInputAssemblyState = &inputAssembly;
pipelineInfo.pViewportState = &viewportState;
pipelineInfo.pRasterizationState = &rasterizer;
pipelineInfo.pMultisampleState = &multisampling;
pipelineInfo.pDepthStencilState = nullptr; // 선택적(Optional)
pipelineInfo.pColorBlendState = &colorBlending;
pipelineInfo.pDynamicState = &dynamicState;

그 다음, 고정 함수 단계를 설명하는 모든 구조체들을 참조합니다.

pipelineInfo.layout = pipelineLayout;

그 후에는 파이프라인 레이아웃을 참조합니다. 이는 구조체 포인터가 아닌 Vulkan 핸들입니다.

pipelineInfo.renderPass = renderPass;
pipelineInfo.subpass = 0;

마지막으로, 렌더 패스와 이 그래픽스 파이프라인이 사용될 서브패스(subpass)의 인덱스를 참조합니다. 이 특정 인스턴스 대신 다른 렌더 패스를 사용할 수도 있지만, 해당 렌더 패스는 renderPass와 호환되어야 합니다. 호환성에 대한 요구 사항은 여기에 설명되어 있지만, 이 튜토리얼에서는 해당 기능을 사용하지 않을 것입니다.

pipelineInfo.basePipelineHandle = VK_NULL_HANDLE; // 선택적(Optional)
pipelineInfo.basePipelineIndex = -1; // 선택적(Optional)

실제로 두 개의 추가 매개변수가 더 있습니다: basePipelineHandle과 basePipelineIndex. Vulkan은 기존 파이프라인에서 파생된 새로운 그래픽스 파이프라인을 생성할 수 있도록 합니다. 파이프라인 파생의 아이디어는 기존 파이프라인과 많은 기능을 공유할 때 파이프라인 설정 비용이 적게 들고, 동일한 부모 파이프라인 간 전환도 더 빠르게 할 수 있다는 것입니다. basePipelineHandle을 사용하여 기존 파이프라인의 핸들을 지정하거나, basePipelineIndex를 사용하여 생성될 다른 파이프라인을 인덱스로 참조할 수 있습니다. 현재는 단일 파이프라인만 있으므로, null 핸들과 유효하지 않은 인덱스를 지정하겠습니다. 이러한 값들은 VkGraphicsPipelineCreateInfo의 flags 필드에 VK_PIPELINE_CREATE_DERIVATIVE_BIT 플래그가 지정된 경우에만 사용됩니다.

이제 마지막 단계를 준비하기 위해 VkPipeline 객체를 보관할 클래스 멤버를 생성합니다:

VkPipeline graphicsPipeline;

그리고 마지막으로 그래픽스 파이프라인을 생성합니다:

if (vkCreateGraphicsPipelines(device, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &graphicsPipeline) != VK_SUCCESS) {
    throw std::runtime_error("failed to create graphics pipeline!");
}

vkCreateGraphicsPipelines 함수는 일반적인 Vulkan 객체 생성 함수보다 더 많은 매개변수를 가지고 있습니다. 이 함수는 여러 개의 VkGraphicsPipelineCreateInfo 객체를 받아 여러 개의 VkPipeline 객체를 한 번에 생성하도록 설계되었습니다.

두 번째 매개변수는 VK_NULL_HANDLE을 전달한 부분으로, 선택적인 VkPipelineCache 객체를 참조합니다. 파이프라인 캐시는 vkCreateGraphicsPipelines 호출 간에, 심지어 프로그램 실행 간에도 파이프라인 생성과 관련된 데이터를 저장하고 재사용하는 데 사용될 수 있습니다. 이를 통해 나중에 파이프라인 생성 속도를 크게 높일 수 있습니다. 이에 대해서는 파이프라인 캐시 장에서 더 자세히 다룰 것입니다.

그래픽스 파이프라인은 모든 일반적인 드로우 작업에 필수적이므로, 프로그램 종료 시에만 파괴되어야 합니다:

void cleanup() {
    vkDestroyPipeline(device, graphicsPipeline, nullptr);
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    ...
}

이제 프로그램을 실행하여 이 모든 노력이 성공적인 파이프라인 생성으로 이어졌는지 확인해보세요! 이제 화면에 무언가가 나타나는 것까지 얼마 남지 않았습니다. 다음 몇 장에서는 스왑 체인 이미지로부터 실제 프레임버퍼를 설정하고 드로우 명령을 준비할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

그리기

프레임버퍼

지난 몇 장에서 프레임버퍼에 대해 많이 이야기했고, 스왑 체인 이미지와 동일한 형식의 단일 프레임버퍼를 기대하도록 렌더 패스를 설정했습니다. 하지만 아직 실제로 프레임버퍼를 생성하지는 않았습니다.

렌더 패스 생성 중에 지정된 어태치먼트(attachments)는 VkFramebuffer 객체로 래핑되어 바인딩됩니다. 프레임버퍼 객체는 어태치먼트를 나타내는 모든 VkImageView 객체를 참조합니다. 우리의 경우에는 단 하나의 어태치먼트, 즉 컬러 어태치먼트만 있습니다. 그러나 어태치먼트로 사용해야 하는 이미지는 스왑 체인이 프레젠테이션을 위해 반환한 이미지에 따라 달라집니다. 이는 스왑 체인의 모든 이미지에 대해 프레임버퍼를 생성하고, 드로우 시간에 검색된 이미지에 해당하는 프레임버퍼를 사용해야 한다는 것을 의미합니다.

이를 위해, 프레임버퍼를 보관할 또 다른 std::vector 클래스 멤버를 생성합니다:

std::vector<VkFramebuffer> swapChainFramebuffers;

이 배열의 객체를 생성하기 위해 initVulkan에서 그래픽스 파이프라인 생성 직후에 호출되는 새로운 함수 createFramebuffers를 만듭니다:

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
}

...

void createFramebuffers() {

}

먼저 컨테이너의 크기를 조정하여 모든 프레임버퍼를 보관할 수 있도록 합니다:

void createFramebuffers() {
    swapChainFramebuffers.resize(swapChainImageViews.size());
}

그런 다음 이미지 뷰를 순회하며 프레임버퍼를 생성합니다:

for (size_t i = 0; i < swapChainImageViews.size(); i++) {
    VkImageView attachments[] = {
        swapChainImageViews[i]
    };

    VkFramebufferCreateInfo framebufferInfo{};
    framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
    framebufferInfo.renderPass = renderPass;
    framebufferInfo.attachmentCount = 1;
    framebufferInfo.pAttachments = attachments;
    framebufferInfo.width = swapChainExtent.width;
    framebufferInfo.height = swapChainExtent.height;
    framebufferInfo.layers = 1;

    if (vkCreateFramebuffer(device, &framebufferInfo, nullptr, &swapChainFramebuffers[i]) != VK_SUCCESS) {
        throw std::runtime_error("failed to create framebuffer!");
    }
}

보시다시피, 프레임버퍼 생성은 상당히 간단합니다. 먼저 프레임버퍼가 호환되어야 하는 renderPass를 지정해야 합니다. 프레임버퍼는 호환되는 렌더 패스와만 사용할 수 있으며, 이는 대략적으로 동일한 수와 유형의 어태치먼트를 사용한다는 것을 의미합니다.

attachmentCount와 pAttachments 매개변수는 렌더 패스의 pAttachment 배열에 있는 각 어태치먼트 설명에 바인딩되어야 하는 VkImageView 객체를 지정합니다.

width와 height 매개변수는 설명이 필요 없을 정도로 명확하며, layers는 이미지 배열의 레이어 수를 나타냅니다. 우리의 스왑 체인 이미지는 단일 이미지이므로 레이어 수는 1입니다.

렌더링이 완료된 후에는 프레임버퍼를 기반으로 한 이미지 뷰와 렌더 패스를 삭제하기 전에 프레임버퍼를 삭제해야 합니다:

void cleanup() {
    for (auto framebuffer : swapChainFramebuffers) {
        vkDestroyFramebuffer(device, framebuffer, nullptr);
    }

    ...
}

이제 우리는 렌더링에 필요한 모든 객체를 갖추는 중요한 단계에 도달했습니다. 다음 장에서는 첫 번째 실제 드로우 명령을 작성할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

커맨드 버퍼

Vulkan에서 드로우 작업이나 메모리 전송과 같은 명령은 함수 호출을 통해 직접 실행되지 않습니다. 수행하려는 모든 작업을 커맨드 버퍼 객체에 기록해야 합니다. 이 방식의 장점은 Vulkan에 우리가 원하는 작업을 전달할 준비가 되었을 때, 모든 명령이 함께 제출되므로 Vulkan이 명령을 더 효율적으로 처리할 수 있다는 것입니다. 또한, 이 방식은 원하는 경우 여러 스레드에서 명령 기록을 수행할 수 있도록 합니다.

커맨드 풀

커맨드 버퍼를 생성하기 전에 커맨드 풀을 생성해야 합니다. 커맨드 풀은 버퍼를 저장하는 데 사용되는 메모리를 관리하며, 커맨드 버퍼는 이 풀에서 할당됩니다. VkCommandPool을 저장할 새로운 클래스 멤버를 추가합니다:

VkCommandPool commandPool;

그런 다음 createCommandPool이라는 새로운 함수를 만들고, 프레임버퍼 생성 후에 initVulkan에서 호출합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
}

...

void createCommandPool() {

}

커맨드 풀 생성에는 두 가지 매개변수만 필요합니다:

QueueFamilyIndices queueFamilyIndices = findQueueFamilies(physicalDevice);

VkCommandPoolCreateInfo poolInfo{};
poolInfo.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO;
poolInfo.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT;
poolInfo.queueFamilyIndex = queueFamilyIndices.graphicsFamily.value();

커맨드 풀에는 두 가지 가능한 플래그가 있습니다:

VK_COMMAND_POOL_CREATE_TRANSIENT_BIT: 커맨드 버퍼가 매우 자주 새로운 명령으로 다시 기록될 것임을 나타냅니다 (메모리 할당 동작을 변경할 수 있음).
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT: 커맨드 버퍼가 개별적으로 다시 기록될 수 있도록 합니다. 이 플래그가 없으면 모든 커맨드 버퍼를 함께 재설정해야 합니다.

우리는 매 프레임마다 커맨드 버퍼를 기록할 것이므로, 커맨드 버퍼를 재설정하고 다시 기록할 수 있어야 합니다. 따라서 VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT 플래그를 설정해야 합니다.

커맨드 버퍼는 그래픽스 큐나 프레젠테이션 큐와 같은 디바이스 큐에 제출하여 실행됩니다. 각 커맨드 풀은 단일 유형의 큐에 제출되는 커맨드 버퍼만 할당할 수 있습니다. 우리는 드로우 명령을 기록할 것이므로 그래픽스 큐 패밀리를 선택했습니다.

if (vkCreateCommandPool(device, &poolInfo, nullptr, &commandPool) != VK_SUCCESS) {
    throw std::runtime_error("failed to create command pool!");
}

vkCreateCommandPool 함수를 사용하여 커맨드 풀 생성을 완료합니다. 특별한 매개변수는 없습니다. 명령은 프로그램 전체에서 화면에 물체를 그리는 데 사용되므로, 풀은 프로그램 종료 시에만 파괴되어야 합니다:

void cleanup() {
    vkDestroyCommandPool(device, commandPool, nullptr);

    ...
}

커맨드 버퍼 할당

이제 커맨드 버퍼를 할당할 수 있습니다.

VkCommandBuffer 객체를 클래스 멤버로 생성합니다. 커맨드 버퍼는 해당 커맨드 풀이 파괴될 때 자동으로 해제되므로 명시적인 정리가 필요하지 않습니다.

VkCommandBuffer commandBuffer;

이제 createCommandBuffer 함수를 작업하여 커맨드 풀에서 단일 커맨드 버퍼를 할당합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
    createCommandBuffer();
}

...

void createCommandBuffer() {

}

커맨드 버퍼는 vkAllocateCommandBuffers 함수를 사용하여 할당됩니다. 이 함수는 커맨드 풀과 할당할 버퍼의 수를 지정하는 VkCommandBufferAllocateInfo 구조체를 매개변수로 받습니다:

VkCommandBufferAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
allocInfo.commandPool = commandPool;
allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
allocInfo.commandBufferCount = 1;

if (vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate command buffers!");
}

level 매개변수는 할당된 커맨드 버퍼가 주 커맨드 버퍼인지 보조 커맨드 버퍼인지를 지정합니다.

VK_COMMAND_BUFFER_LEVEL_PRIMARY: 큐에 제출하여 실행할 수 있지만, 다른 커맨드 버퍼에서 호출할 수 없습니다.
VK_COMMAND_BUFFER_LEVEL_SECONDARY: 직접 제출할 수는 없지만, 주 커맨드 버퍼에서 호출할 수 있습니다.

여기서는 보조 커맨드 버퍼 기능을 사용하지 않지만, 주 커맨드 버퍼에서 공통 작업을 재사용하는 데 유용할 수 있습니다.

우리는 단일 커맨드 버퍼만 할당할 것이므로 commandBufferCount 매개변수는 1입니다.

커맨드 버퍼 기록

이제 recordCommandBuffer 함수를 작업하여 실행하려는 명령을 커맨드 버퍼에 기록합니다. 사용할 VkCommandBuffer와 기록하려는 현재 스왑 체인 이미지의 인덱스가 매개변수로 전달됩니다.

void recordCommandBuffer(VkCommandBuffer commandBuffer, uint32_t imageIndex) {

}

커맨드 버퍼 기록을 시작할 때는 항상 vkBeginCommandBuffer를 호출하며, 이 함수는 해당 커맨드 버퍼의 사용에 대한 세부 정보를 지정하는 작은 VkCommandBufferBeginInfo 구조체를 인자로 받습니다.

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = 0; // 선택적
beginInfo.pInheritanceInfo = nullptr; // 선택적

if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
    throw std::runtime_error("failed to begin recording command buffer!");
}

flags 매개변수는 커맨드 버퍼를 어떻게 사용할지 지정합니다. 다음 값들이 가능합니다:

VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT: 커맨드 버퍼는 한 번 실행된 후 바로 다시 기록됩니다.
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT: 이는 단일 렌더 패스 내에서 완전히 실행되는 보조 커맨드 버퍼입니다.
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT: 커맨드 버퍼가 실행 중인 동안에도 다시 제출할 수 있습니다.

현재로서는 이러한 플래그 중 어느 것도 우리에게 적용되지 않습니다.

pInheritanceInfo 매개변수는 보조 커맨드 버퍼에만 관련이 있습니다. 이는 호출하는 주 커맨드 버퍼로부터 어떤 상태를 상속할지 지정합니다.

커맨드 버퍼가 이미 한 번 기록되었다면, vkBeginCommandBuffer 호출은 암시적으로 이를 재설정합니다. 나중에 버퍼에 명령을 추가하는 것은 불가능합니다.

렌더 패스 시작

드로우 작업은 vkCmdBeginRenderPass로 렌더 패스를 시작하는 것으로 시작됩니다. 렌더 패스는 VkRenderPassBeginInfo 구조체의 일부 매개변수를 사용하여 구성됩니다.

VkRenderPassBeginInfo renderPassInfo{};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
renderPassInfo.renderPass = renderPass;
renderPassInfo.framebuffer = swapChainFramebuffers[imageIndex];

첫 번째 매개변수는 렌더 패스 자체와 바인딩할 어태치먼트입니다. 우리는 각 스왑 체인 이미지에 대해 프레임버퍼를 생성했으며, 이는 컬러 어태치먼트로 지정되었습니다. 따라서 우리는 드로우할 스왑 체인 이미지에 해당하는 프레임버퍼를 바인딩해야 합니다. 전달된 imageIndex 매개변수를 사용하여 현재 스왑 체인 이미지에 맞는 프레임버퍼를 선택할 수 있습니다.

renderPassInfo.renderArea.offset = {0, 0};
renderPassInfo.renderArea.extent = swapChainExtent;

다음 두 매개변수는 렌더 영역의 크기를 정의합니다. 렌더 영역은 셰이더 로드 및 스토어가 발생할 위치를 정의합니다. 이 영역 밖의 픽셀은 정의되지 않은 값을 가집니다. 최상의 성능을 위해 어태치먼트의 크기와 일치해야 합니다.

VkClearValue clearColor = {{{0.0f, 0.0f, 0.0f, 1.0f}}};
renderPassInfo.clearValueCount = 1;
renderPassInfo.pClearValues = &clearColor;

마지막 두 매개변수는 컬러 어태치먼트의 로드 작업으로 VK_ATTACHMENT_LOAD_OP_CLEAR를 사용할 때 사용할 클리어 값을 정의합니다. 저는 클리어 색상을 단순히 검은색으로 100% 불투명도로 정의했습니다.

vkCmdBeginRenderPass(commandBuffer, &renderPassInfo, VK_SUBPASS_CONTENTS_INLINE);

이제 렌더 패스를 시작할 수 있습니다. 모든 명령 기록 함수는 vkCmd 접두사로 구분할 수 있습니다. 이 함수들은 모두 void를 반환하므로, 기록이 완료될 때까지 오류 처리가 없습니다.

모든 명령의 첫 번째 매개변수는 항상 명령을 기록할 커맨드 버퍼입니다. 두 번째 매개변수는 방금 제공한 렌더 패스의 세부 정보를 지정합니다. 마지막 매개변수는 렌더 패스 내에서 드로우 명령이 어떻게 제공될지 제어합니다. 이는 두 가지 값 중 하나를 가질 수 있습니다:

VK_SUBPASS_CONTENTS_INLINE: 렌더 패스 명령이 주 커맨드 버퍼 자체에 포함되며, 보조 커맨드 버퍼가 실행되지 않습니다.
VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS: 렌더 패스 명령이 보조 커맨드 버퍼에서 실행됩니다.

우리는 보조 커맨드 버퍼를 사용하지 않을 것이므로 첫 번째 옵션을 선택합니다.

기본 드로우 명령

이제 그래픽스 파이프라인을 바인딩할 수 있습니다:

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, graphicsPipeline);

두 번째 매개변수는 파이프라인 객체가 그래픽스 파이프라인인지 컴퓨트 파이프라인인지를 지정합니다. 이제 우리는 Vulkan에게 그래픽스 파이프라인에서 실행할 작업과 프래그먼트 셰이더에서 사용할 어태치먼트를 알려주었습니다.

고정 함수 장에서 언급했듯이, 우리는 이 파이프라인에 대해 뷰포트와 가위 상태를 동적으로 지정했습니다. 따라서 드로우 명령을 실행하기 전에 커맨드 버퍼에서 이를 설정해야 합니다:

VkViewport viewport{};
viewport.x = 0.0f;
viewport.y = 0.0f;
viewport.width = static_cast<float>(swapChainExtent.width);
viewport.height = static_cast<float>(swapChainExtent.height);
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;
vkCmdSetViewport(commandBuffer, 0, 1, &viewport);

VkRect2D scissor{};
scissor.offset = {0, 0};
scissor.extent = swapChainExtent;
vkCmdSetScissor(commandBuffer, 0, 1, &scissor);

이제 삼각형을 그리기 위한 드로우 명령을 실행할 준비가 되었습니다:

vkCmdDraw(commandBuffer, 3, 1, 0, 0);

실제 vkCmdDraw 함수는 약간 실망스럽게도 매우 간단합니다. 하지만 이렇게 간단한 이유는 우리가 미리 지정한 모든 정보 때문입니다. 이 함수는 커맨드 버퍼 외에도 다음과 같은 매개변수를 가집니다:

vertexCount: 버텍스 버퍼가 없더라도 기술적으로는 3개의 버텍스를 그릴 것입니다.
instanceCount: 인스턴스 렌더링에 사용되며, 이를 사용하지 않는다면 1을 사용합니다.
firstVertex: 버텍스 버퍼의 오프셋으로 사용되며, gl_VertexIndex의 최소값을 정의합니다.
firstInstance: 인스턴스 렌더링의 오프셋으로 사용되며, gl_InstanceIndex의 최소값을 정의합니다.

마무리

이제 렌더 패스를 종료할 수 있습니다:

vkCmdEndRenderPass(commandBuffer);

그리고 커맨드 버퍼 기록을 완료합니다:

if (vkEndCommandBuffer(commandBuffer) != VK_SUCCESS) {
    throw std::runtime_error("failed to record command buffer!");
}

다음 장에서는 스왑 체인에서 이미지를 획득하고, 커맨드 버퍼를 기록 및 실행한 다음, 완료된 이미지를 스왑 체인에 반환하는 메인 루프 코드를 작성할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

렌더링 및 프레젠테이션

이 장에서는 모든 것을 함께 조합할 것입니다. 메인 루프에서 호출되어 화면에 삼각형을 그리는 drawFrame 함수를 작성하겠습니다. 함수를 만들고 mainLoop에서 호출하는 것으로 시작합시다:

void mainLoop() {
    while (!glfwWindowShouldClose(window)) {
        glfwPollEvents();
        drawFrame();
    }
}

...

void drawFrame() {

}

프레임의 개요

Vulkan에서 프레임을 렌더링하는 것은 일반적으로 다음과 같은 단계로 이루어집니다:

이전 프레임이 완료될 때까지 대기
스왑 체인에서 이미지 획득
해당 이미지에 장면을 그리는 명령 버퍼 녹화
녹화된 명령 버퍼 제출
스왑 체인 이미지 프레젠테이션

이후 장에서 그리기 기능을 확장할 예정이지만, 지금은 이것이 렌더 루프의 핵심입니다.

동기화

Vulkan의 핵심 설계 철학은 GPU에서의 실행 동기화가 명시적이라는 것입니다. 작업의 실행 순서를 우리가 다양한 동기화 기본요소를 사용하여 정의합니다. 이는 많은 Vulkan API 호출이 비동기적으로 이루어진다는 것을 의미합니다. 함수는 작업이 완료되기 전에 반환됩니다.

이 장에서는 GPU에서 발생하는 여러 이벤트를 명시적으로 순서대로 정렬할 필요가 있습니다:

스왑 체인에서 이미지 획득
획득한 이미지에 그리기를 실행하는 명령 실행
그 이미지를 스크린에 표시하여 스왑체인에 반환

각각의 이벤트는 단일 함수 호출을 사용하여 시작되지만, 모두 비동기적으로 실행됩니다. 함수 호출은 작업이 실제로 완료되기 전에 반환되며 실행 순서 또한 정의되지 않습니다. 이는 불행한 일이며, 각 작업은 이전 작업이 완료되기를 필요로 합니다. 따라서 우리는 원하는 순서를 달성하기 위해 어떤 기본요소를 사용할 수 있는지 알아볼 필요가 있습니다.

세마포어

세마포어는 큐 작업 사이의 순서를 추가하는 데 사용됩니다. 큐 작업은 우리가 명령 버퍼에 제출하거나 나중에 보게 될 함수 내에서 제출하는 작업을 의미합니다. 예를 들어, 그래픽 큐와 프레젠테이션 큐 같은 큐가 있습니다. 세마포어는 동일한 큐 내의 작업과 다른 큐 간의 작업을 순서대로 할 때 사용됩니다.

세마포어에는 이진 세마포어와 타임라인 세마포어의 두 가지 유형이 있습니다. 이 튜토리얼에서는 이진 세마포어만 사용될 것이므로 타임라인 세마포어에 대해서는 논의하지 않겠습니다. 세마포어라는 용어는 이제 이진 세마포어만을 지칭합니다.

세마포어는 신호되지 않은 상태나 신호된 상태 중 하나입니다. 신호되지 않은 상태로 시작합니다. 세마포어를 큐 작업 사이에 순서를 추가하는 방법은 하나의 큐 작업에서 '신호' 세마포어로 제공하고 다른 큐 작업에서 '대기' 세마포어로 사용하는 것입니다. 예를 들어, 우리에게 세마포어 S가 있고 순서대로 실행하려는 큐 작업 A와 B가 있다고 가정해 봅시다. 우리가 Vulkan에게 말하는 것은 작업 A가 실행을 완료하면 세마포어 S를 '신호'하고, 작업 B는 실행을 시작하기 전에 세마포어 S에서 '대기'할 것입니다. 작업 A가 완료되면 세마포어 S는 신호될 것이며, 작업 B는 S가 신호되기 전까지 시작되지 않습니다. 작업 B가 실행을 시작한 후에는 세마포어 S가 자동으로 신호되지 않은 상태로 재설정되어 다시 사용될 수 있습니다.

펜스

펜스는 실행을 동기화하는 비슷한 목적을 가지고 있지만, CPU에서의 실행을 순서대로 하는 데 사용됩니다. 단순히 말해서, 호스트가 GPU가 무언가를 완료했는지 알아야 할 때 우리는 펜스를 사용합니다.

세마포어와 마찬가지로, 펜스는 신호되거나 신호되지 않은 상태 중 하나입니다. 우리가 실행할 작업을 제출할 때, 우리는 그 작업에 펜스를 첨부할 수 있습니다. 작업이 완료되면, 펜스는 신호됩니다. 그러면 우리는 호스트가 펜스가 신호될 때까지 기다리게 할 수 있습니다. 이는 호스트가 계속하기 전에 작업이 완료되었음을 보장합니다.

구체적인 예로, 스크린샷을 찍는 경우를 생각해 보겠습니다. GPU에서 필요한 작업을 이미 수행했다고 가정합니다. 이제 이미지를 GPU에서 호스트로 전송하고 그 메모리를 파일로 저장해야 합니다. 우리는 명령 버퍼 A와 펜스 F가 있는 전송을 실행하는 명령 버퍼 A를 제출합니다. 명령 버퍼 A를 펜스 F와 함께 제출한 후, 호스트가 F가 신호될 때까지 기다리라고 즉시 지시합니다. 이것은 호스트가 명령 버퍼 A의 실행이 완료될 때까지 차단됩니다. 따라서 우리는 메모리 전송이 완료되었으므로 호스트가 파일을 디스크에 저장하도록 허용할 수 있습니다.

동기화 객체 생성

이미지가 스왑 체인에서 획득되었음을 신호하는 세마포어, 렌더링이 완료되었고 프레젠테이션이 발생할 수 있음을 신호하는 또 다른 세마포어, 그리고 한 번에 하나의 프레임만 렌더링되도록 하는 펜스가 필요합니다.

이 세마포어 객체와 펜스 객체를 저장할 세 개의 클래스 멤버를 만듭니다:

VkSemaphore imageAvailableSemaphore;
VkSemaphore renderFinishedSemaphore;
VkFence inFlightFence;

세마포어를 만드는 데는 VkSemaphoreCreateInfo를 채워 넣어야 하지만, 현재 API 버전에서는 sType 외에 실제로 필요한 필드가 없습니다:

void createSyncObjects() {
    VkSemaphoreCreateInfo semaphoreInfo{};
    semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;
}

펜스를 만드는 데는 VkFenceCreateInfo를 채워 넣어야 합니다:

VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;

세마포어와 펜스를 만드는 것은 vkCreateSemaphore 및 vkCreateFence를 사용하는 친숙한 패턴을 따릅니다:

if (vkCreateSemaphore(device, &semaphoreInfo, nullptr, &imageAvailableSemaphore) != VK_SUCCESS ||
    vkCreateSemaphore(device, &semaphoreInfo, nullptr, &renderFinishedSemaphore) != VK_SUCCESS ||
    vkCreateFence(device, &fenceInfo, nullptr, &inFlightFence) != VK_SUCCESS) {
    throw std::runtime_error("failed to create semaphores!");
}

프로그램이 끝날 때, 모든 명령이 완료되고 더 이상 동기화가 필요 없을 때, 세마포어와 펜스를 정리해야 합니다:

void cleanup() {
    vkDestroySemaphore(device, imageAvailableSemaphore, nullptr);
    vkDestroySemaphore(device, renderFinishedSemaphore, nullptr);
    vkDestroyFence(device, inFlightFence, nullptr);
}

이전 프레임을 기다리기

프레임의 시작에서 이전 프레임이 완료될 때까지 기다리고 싶습니다. 그래서 명령 버퍼와 세마포어를 사용할 수 있습니다. 이를 위해 vkWaitForFences를 호출합니다:

void drawFrame() {
    vkWaitForFences(device, 1, &inFlightFence, VK_TRUE, UINT64_MAX);
}

vkWaitForFences 함수는 펜스 배열을 사용하며, 모든 펜스가 신호될 때까지 호스트가 기다립니다. 여기서 전달하는 VK_TRUE는 모든 펜스를 기다리겠다는 의미이지만, 단일 펜스의 경우는 상관없습니다. 이 함수는 또한 64비트 부호 없는 정수의 최대값인 UINT64_MAX를 타임아웃 매개변수로 사용하여 실질적으로 타임아웃을 비활성화합니다.

기다린 후에는 vkResetFences 호출을 통해 수동으로 펜스를 신호되지 않은 상태로 재설정해야 합니다:

    vkResetFences(device, 1, &inFlightFence);

진행하기 전에 우리 설계에 약간의 문제가 있습니다. drawFrame()을 처음 호출할 때, 즉시 inFlightFence가 신호될 때까지 기다립니다. inFlightFence는 프레임이 렌더링을 완료한 후에만 신호됩니다. 그러나 이것이 첫 번째 프레임이기 때문에 신호를 줄 이전 프레임이 없습니다! 따라서 vkWaitForFences()는 결코 일어나지 않을 일을 기다리며 무기한 차단됩니다.

이 딜레마에 대한 해결책은 많지만, API에 내장된 영리한 해결책이 있습니다. 펜스를 신호된 상태로 생성하여 첫 번째 vkWaitForFences() 호출이 즉시 반환되도록 합니다.

이를 수행하려면 VK_FENCE_CREATE_SIGNALED_BIT 플래그를 VkFenceCreateInfo에 추가합니다:

void createSyncObjects() {
    ...

    VkFenceCreateInfo fenceInfo{};
    fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
    fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

    ...
}

스왑 체인에서 이미지 획득

drawFrame 함수에서 다음으로 해야 할 일은 스왑 체인에서 이미지를 획득하는 것입니다. 스왑 체인은 확장 기능이므로 vk*KHR 명명 규칙을 사용하는 함수를 사용해야 합니다:

void drawFrame() {
    ...

    uint32_t imageIndex;
    vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphore, VK_NULL_HANDLE, &imageIndex);
}

vkAcquireNextImageKHR의 첫 두 매개변수는 이미지를 획득하려는 논리적 장치와 스왑 체인입니다. 세 번째 매개변수는 이미지가 사용 가능해질 때까지의 타임아웃을 나노초 단위로 지정합니다. 64비트 부호 없는 정수의 최대값을 사용하면 타임아웃을 사실상 비활성화합니다.

다음 두 매개변수는 프레젠테이션 엔진이 이미지 사용을 완료했을 때 신호되는 동기화 객체를 지정합니다. 그때부터 우리는 그것에 그릴 수 있습니다. 세마포어, 펜스 또는 둘 다를 지정할 수 있습니다. 여기서는 그 목적으로 imageAvailableSemaphore를 사용합니다.

마지막 매개변수는 사용 가능해진 스왑 체인 이미지의 인덱스를 출력하는 변수를 지정합니다. 인덱스는 우리 swapChainImages 배열의 VkImage를 참조합니다. 우리는 그 인덱스를 사용하여 VkFrameBuffer를 선택합니다.

명령 버퍼 녹화

imageIndex로 사용할 스왑 체인 이미지를 지정하면 이제 명령 버퍼를 녹화할 수 있습니다. 먼저 명령 버퍼를 녹화할 수 있도록 vkResetCommandBuffer를 호출합니다.

vkResetCommandBuffer(commandBuffer, 0);

vkResetCommandBuffer의 두 번째 매개변수는 VkCommandBufferResetFlagBits 플래그입니다. 우리는 특별한 것을 원하지 않으므로 0으로 두겠습니다.

이제 recordCommandBuffer 함수를 호출하여 우리가 원하는 명령을 녹화하세요.

recordCommandBuffer(commandBuffer, imageIndex);

완전히 녹화된 명령 버퍼를 가지고 나면 이제 제출할 수 있습니다.

명령 버퍼 제출

큐 제출과 동기화는 VkSubmitInfo 구조체의 매개변수를 통해 구성됩니다.

VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

VkSemaphore waitSemaphores[] = {imageAvailableSemaphore};
VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT};
submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = waitSemaphores;
submitInfo.pWaitDstStageMask = waitStages;

첫 세 매개변수는 실행이 시작되기 전에 어떤 세마포어를 기다릴지와 파이프라인의 어떤 단계에서 기다릴지를 지정합니다. 우리는 이미지가 사용 가능할 때까지 색상을 이미지에 쓰는 것을 기다리고

싶습니다. 그래서 우리는 그래픽 파이프라인의 색상 첨부 단계를 지정합니다. 이는 이론적으로 구현이 이미지를 사용할 수 있기 전에 우리의 버텍스 셰이더 등을 이미 실행할 수 있음을 의미합니다. waitStages 배열의 각 항목은 pWaitSemaphores의 동일한 인덱스의 세마포어와 대응합니다.

submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;

다음 두 매개변수는 실제로 실행을 제출할 명령 버퍼를 지정합니다. 우리는 단순히 우리가 가진 단일 명령 버퍼를 제출합니다.

VkSemaphore signalSemaphores[] = {renderFinishedSemaphore};
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = signalSemaphores;

signalSemaphoreCount와 pSignalSemaphores 매개변수는 명령 버퍼(들) 실행이 완료되면 신호할 세마포어를 지정합니다. 우리 경우에는 그 목적으로 renderFinishedSemaphore를 사용합니다.

if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFence) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit draw command buffer!");
}

이제 vkQueueSubmit을 사용하여 명령 버프를 그래픽 큐에 제출할 수 있습니다. 이 함수는 훨씬 더 큰 워크로드일 때 효율성을 위해 VkSubmitInfo 구조체 배열을 인자로 받습니다. 마지막 매개변수는 명령 버퍼가 실행을 완료할 때 신호될 선택적 펜스를 참조합니다. 이를 통해 명령 버퍼를 재사용할 때 안전하다는 것을 알 수 있습니다. 이제 다음 프레임에서 CPU는 이 명령 버퍼가 실행을 완료할 때까지 기다립니다.

서브패스 의존성

렌더 패스에서 서브패스는 자동으로 이미지 레이아웃 전환을 처리합니다. 이러한 전환은 서브패스 의존성을 통해 제어됩니다. 서브패스 의존성은 서브패스 간의 메모리 및 실행 의존성을 지정합니다. 우리는 지금 단 하나의 서브패스만 가지고 있지만, 이 서브패스 바로 전후의 작업도 암시적인 "서브패스"로 간주됩니다.

렌더 패스의 시작과 끝에서 전환을 처리하는 두 개의 내장된 의존성이 있지만, 전자는 적절한 시기에 발생하지 않습니다. 전환은 파이프라인의 시작에서 발생한다고 가정하지만, 그 시점에는 아직 이미지를 획득하지 않았습니다! 이 문제를 해결하는 두 가지 방법이 있습니다. imageAvailableSemaphore에 대한 waitStages를 VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT로 변경하여 렌더 패스가 이미지를 사용할 수 있을 때까지 시작되지 않도록 할 수 있습니다. 또는 렌더 패스가 VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT 단계에서 대기하도록 할 수 있습니다. 저는 여기서 두 번째 옵션을 선택했습니다. 이것은 서브패스 의존성과 그 작동 방식을 살펴볼 좋은 기회이기 때문입니다.

서브패스 의존성은 VkSubpassDependency 구조체에 지정됩니다. createRenderPass 함수로 가서 하나를 추가하세요:

VkSubpassDependency dependency{};
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;

첫 두 필드는 의존성과 의존된 서브패스의 인덱스를 지정합니다. 특별한 값 VK_SUBPASS_EXTERNAL은 srcSubpass 또는 dstSubpass에 지정되는 것에 따라 렌더 패스 전후의 암시적 서브패스를 참조합니다. 인덱스 0은 우리의 서브패스를 참조하며, 이것은 첫 번째이자 유일한 것입니다. dstSubpass는 항상 srcSubpass보다 높아야 합니다. 이는 의존성 그래프에서 순환을 방지하기 위해서입니다(하나의 서브패스가 VK_SUBPASS_EXTERNAL인 경우 제외).

dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.srcAccessMask = 0;

다음 두 필드는 어떤 작업을 기다리고 이 작업이 발생하는 단계를 지정합니다. 스왑 체인이 이미지에서 읽기를 완료할 때까지 기다려야 합니다. 이는 색상 첨부 출력 단계 자체를 기다리는 것으로 달성할 수 있습니다.

dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;

이 작업을 기다려야 할 작업은 색상 첨부 단계에 있으며 색상 첨부를 쓰는 것을 포함합니다. 이 설정은 전환이 실제로 필요하고 허용될 때까지 발생하지 않도록 방지합니다(즉, 우리가 그것에 색상을 시작하고 싶을 때).

renderPassInfo.dependencyCount = 1;
renderPassInfo.pDependencies = &dependency;

VkRenderPassCreateInfo 구조체는 의존성 배열을 지정하는 두 필드를 가지고 있습니다.

프레젠테이션

프레임을 그리는 마지막 단계는 결과를 스왑 체인에 다시 제출하여 결국 화면에 표시되도록 하는 것입니다. 프레젠테이션은 drawFrame 함수의 끝에서 VkPresentInfoKHR 구조체를 통해 구성됩니다.

VkPresentInfoKHR presentInfo{};
presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;

presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = signalSemaphores;

첫 두 매개변수는 프레젠테이션이 발생할 수 있기 전에 기다려야 할 세마포어를 지정합니다. VkSubmitInfo와 마찬가지입니다. 우리는 명령 버퍼가 실행을 완료하고, 따라서 우리의 삼각형이 그려지기를 기다리고 싶기 때문에, 신호될 세마포어를 가져와서 그것들을 기다리고, 따라서 signalSemaphores를 사용합니다.

VkSwapchainKHR swapChains[] = {swapChain};
presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = swapChains;
presentInfo.pImageIndices = &imageIndex;

다음 두 매개변수는 이미지를 표시할 스왑 체인과 각 스왑 체인의 이미지 인덱스를 지정합니다. 거의 항상 하나일 것입니다.

presentInfo.pResults = nullptr; // 선택 사항

마지

막으로, pResults라는 선택적 매개변수가 있습니다. 이 매개변수를 사용하면 개별 스왑 체인마다 프레젠테이션이 성공했는지 확인할 수 있는 VkResult 값 배열을 지정할 수 있습니다. 단일 스왑 체인을 사용하는 경우에는 필요하지 않습니다. 왜냐하면 프레젠트 함수의 반환 값만 사용할 수 있기 때문입니다.

vkQueuePresentKHR(presentQueue, &presentInfo);

vkQueuePresentKHR 함수는 스왑 체인에 이미지를 표시하도록 요청을 제출합니다. vkAcquireNextImageKHR 및 vkQueuePresentKHR에 대한 오류 처리는 다음 장에서 추가할 것입니다. 왜냐하면 이 함수들의 실패는 지금까지 본 함수들과 달리 프로그램이 종료되어야 함을 의미하지 않기 때문입니다.

지금까지 모든 것을 올바르게 수행했다면, 프로그램을 실행할 때 다음과 같은 것을 볼 수 있습니다:

이 색상 삼각형은 그래픽 튜토리얼에서 보통 보는 것과 다를 수 있습니다. 이 튜토리얼은 셰이더가 선형 색 공간에서 보간하고 그 후에 sRGB 색 공간으로 변환하도록 허용하기 때문입니다. 차이에 대한 논의는 이 블로그 게시물을 참조하세요.

이제 유효성 검사 레이어가 활성화되어 있으면 프로그램이 종료될 때 충돌합니다. debugCallback에서 터미널로 출력된 메시지는 그 이유를 알려줍니다:

drawFrame의 모든 작업이 비동기적이라는 것을 기억하세요. 그러므로 mainLoop에서 루프를 종료할 때, 그리기와 프레젠테이션 작업이 여전히 진행 중일 수 있습니다. 그 상황에서 리소스를 정리하는 것은 좋은 생각이 아닙니다.

이 문제를 해결하려면 mainLoop를 종료하고 창을 파괴하기 전에 논리적 장치가 작업을 완료할 때까지 기다려야 합니다:

void mainLoop() {
    while (!glfwWindowShouldClose(window)) {
        glfwPollEvents();
        drawFrame();
    }

    vkDeviceWaitIdle(device);
}

특정 명령 큐에서 작업이 완료될 때까지 기다리는 데 vkQueueWaitIdle을 사용할 수도 있습니다. 이 함수들은 동기화를 수행하는 매우 기초적인 방법으로 사용될 수 있습니다. 이제 창을 닫을 때 프로그램이 문제없이 종료됨을 볼 수 있습니다.

결론

약 900 줄이 넘는 코드 끝에, 우리는 드디어 화면에 무언가가 나타나는 단계에 도달했습니다! Vulkan 프로그램을 부트스트랩하는 것은 확실히 많은 작업이 필요하지만, 얻을 수 있는 메시지는 Vulkan이 명시성을 통해 엄청난 양의 제어를 제공한다는 것입니다. 이제 프로그램의 모든 Vulkan 객체의 목적과 서로의 관계에 대한 정신적 모델을 구축하기 위해 코드를 다시 읽는 시간을 가지는 것이 좋습니다. 이 지식을 기반으로 프로그램의 기능을 확장하기 시작할 것입니다.

다음 장은 렌더 루프를 확장하여 동시에 여러 프레임을 처리할 수 있도록 할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

동시에 진행되는 프레임 (Frames in flight)

동시에 진행되는 프레임

현재 렌더 루프에는 한 가지 큰 결점이 있습니다. 다음 렌더링을 시작하기 전에 이전 프레임이 끝나기를 기다려야 하므로 호스트가 불필요하게 유휴 상태가 됩니다.

이를 해결하는 방법은 여러 프레임을 동시에 진행 중으로 허용하는 것입니다. 즉, 하나의 프레임 렌더링이 다음 프레임의 녹화와 상호 작용하지 않도록 허용하는 것입니다. 이를 어떻게 할까요? 렌더링 중에 접근하고 수정되는 모든 리소스는 중복되어야 합니다. 따라서 여러 명령 버퍼, 세마포어, 펜스가 필요합니다. 나중에 다른 리소스의 여러 인스턴스도 추가할 예정이므로 이 개념을 다시 보게 될 것입니다.

프로그램 상단에 동시에 처리할 프레임 수를 정의하는 상수를 추가하여 시작합니다:

const int MAX_FRAMES_IN_FLIGHT = 2;

CPU가 GPU보다 너무 앞서 나가지 않도록 하기 위해 숫자 2를 선택합니다. 2개의 프레임이 진행 중일 때, CPU와 GPU는 동시에 자신의 작업을 수행할 수 있습니다. CPU가 일찍 끝나면 GPU가 렌더링을 완료할 때까지 기다렸다가 더 많은 작업을 제출합니다. 3개 이상의 프레임이 진행 중이면 CPU가 GPU보다 앞서 나갈 수 있으며, 프레임 지연이 추가될 수 있습니다. 일반적으로 추가 지연은 원하지 않습니다. 하지만 응용 프로그램이 진행 중인 프레임 수를 제어할 수 있도록 하는 것은 Vulkan의 명시성의 또 다른 예입니다.

각 프레임은 자체 명령 버퍼, 세마포어 세트 및 펜스를 가져야 합니다. 이름을 변경한 다음 객체를 std::vector로 변경합니다:

std::vector<VkCommandBuffer> commandBuffers;

...

std::vector<VkSemaphore> imageAvailableSemaphores;
std::vector<VkSemaphore> renderFinishedSemaphores;
std::vector<VkFence> inFlightFences;

그런 다음 여러 명령 버퍼를 생성해야 합니다. createCommandBuffer를 createCommandBuffers로 이름을 변경합니다. 다음으로 명령 버퍼 벡터의 크기를 MAX_FRAMES_IN_FLIGHT의 크기로 조정하고, VkCommandBufferAllocateInfo를 그만큼의 명령 버퍼를 포함하도록 변경한 다음, 우리의 명령 버퍼 벡터로 대상을 변경해야 합니다:

void createCommandBuffers() {
    commandBuffers.resize(MAX_FRAMES_IN_FLIGHT);
    ...
    allocInfo.commandBufferCount = (uint32_t) commandBuffers.size();

    if (vkAllocateCommandBuffers(device, &allocInfo, commandBuffers.data()) != VK_SUCCESS) {
        throw std::runtime_error("failed to allocate command buffers!");
    }
}

createSyncObjects 함수는 모든 객체를 생성하도록 변경해야 합니다:

void createSyncObjects() {
    imageAvailableSemaphores.resize(MAX_FRAMES_IN_FLIGHT);
    renderFinishedSemaphores.resize(MAX_FRAMES_IN_FLIGHT);
    inFlightFences.resize(MAX_FRAMES_IN_FLIGHT);

    VkSemaphoreCreateInfo semaphoreInfo{};
    semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;

    VkFenceCreateInfo fenceInfo{};
    fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
    fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        if (vkCreateSemaphore(device, &semaphoreInfo, nullptr, &imageAvailableSemaphores[i]) != VK_SUCCESS ||
            vkCreateSemaphore(device, &semaphoreInfo, nullptr, &renderFinishedSemaphores[i]) != VK_SUCCESS ||
            vkCreateFence(device, &fenceInfo, nullptr, &inFlightFences[i]) != VK_SUCCESS) {

            throw std::runtime_error("failed to create synchronization objects for a frame!");
        }
    }
}

마찬가지로 모두 정리해야 합니다:

void cleanup() {
    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        vkDestroySemaphore(device, renderFinishedSemaphores[i], nullptr);
        vkDestroySemaphore(device, imageAvailableSemaphores[i], nullptr);
        vkDestroyFence(device, inFlightFences[i], nullptr);
    }

    ...
}

명령 버퍼는 명령 풀을 해제할 때 우리를 위해 자동으로 해제되므로 명령 버퍼 정리를 위해 추가로 할 일은 없습니다.

매 프레임마다 올바른 객체를 사용하려면 현재 프레임을 추적해야 합니다. 그 목적으로 프레임 인덱스를 사용할 것입니다:

uint32_t currentFrame = 0;

drawFrame 함수는 이제 올바른 객체를 사용하도록 수정할 수 있습니다:

void drawFrame() {
    vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);
    vkResetFences(device, 1, &inFlightFences[currentFrame]);

    vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

    ...

    vkResetCommandBuffer(commandBuffers[currentFrame],  0);
    recordCommandBuffer(commandBuffers[currentFrame], imageIndex);

    ...

    submitInfo.pCommandBuffers = &commandBuffers[currentFrame];

    ...

    VkSemaphore waitSemaphores[] = {imageAvailableSemaphores[currentFrame]};

    ...

    VkSemaphore signalSemaphores[] = {renderFinishedSemaphores[currentFrame]};

    ...

    if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]) != VK_SUCCESS) {
}

물론, 매번 다음 프레임으로 진행하는 것을 잊지 말아야 합니다:

void drawFrame() {
    ...

    currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;
}

모듈로 (%) 연산자를 사용하여 MAX_FRAMES_IN_FLIGHT마다 대기열에 추가된 프레임 후에 프레임 인덱스가 반복되도록 합니다.

이제 MAX_FRAMES_IN_FLIGHT 이상의 프레임 작업이 대기열에 추가되지 않고 이러한 프레임이 서로 중첩되지 않도록 필요한 모든 동기화를 구현했습니다. 최종 정리와 같은 코드의 다른 부분이 vkDeviceWaitIdle과 같은 더 거친 동기화에 의존하는 것은 괜찮습니다. 어떤 접근 방식을 사용할지는 성능 요구 사항을 기준으로 결정해야 합니다.

동기화에 대해 자세히 알아보려면 Khronos의 이 광범위한 개요를 살펴보세요.

다음 장에서는 Vulkan 프로그램이 잘 동작하기 위해 필요한 작은 것 하나를 처리할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

스왑 체인(Swap Chain) 재생성

소개

우리가 지금 가지고 있는 애플리케이션은 삼각형을 성공적으로 그립니다. 하지만 아직 제대로 처리하지 못하는 몇 가지 상황이 있습니다. 창(surface)이 변경되어 스왑 체인이 더 이상 호환되지 않을 수 있습니다. 이러한 상황을 일으킬 수 있는 이유 중 하나는 창 크기의 변경입니다. 이러한 이벤트를 캐치하고 스왑 체인을 재생성해야 합니다.

스왑 체인 재생성

스왑 체인이나 창 크기에 따라 달라지는 객체들을 위한 생성 함수를 호출하는 새로운 recreateSwapChain 함수를 만듭니다.

void recreateSwapChain() {
    vkDeviceWaitIdle(device);

    createSwapChain();
    createImageViews();
    createFramebuffers();
}

우리는 먼저 vkDeviceWaitIdle을 호출합니다. 마지막 장에서처럼, 여전히 사용 중일 수 있는 리소스를 만지지 않아야 하기 때문입니다. 당연히 스왑 체인 자체를 재생성해야 합니다. 이미지 뷰는 스왑 체인 이미지에 직접 기반하기 때문에 재생성해야 합니다. 마지막으로, 프레임버퍼는 스왑 체인 이미지에 직접 의존하기 때문에 재생성되어야 합니다.

이 객체들의 이전 버전을 재생성하기 전에 정리하는 것이 좋으므로, 일부 정리 코드를 recreateSwapChain 함수에서 호출할 수 있는 별도의 함수로 이동해야 합니다. cleanupSwapChain이라고 부릅시다:

void cleanupSwapChain() {

}

void recreateSwapChain() {
    vkDeviceWaitIdle(device);

    cleanupSwapChain();

    createSwapChain();
    createImageViews();
    createFramebuffers();
}

여기서는 단순화를 위해 렌더 패스(render pass)를 재생성하지 않습니다. 이론적으로 애플리케이션의 수명 동안 스왑 체인 이미지 형식이 변경될 수 있습니다(예: 표준 범위 모니터에서 HDR(high dynamic range) 모니터로 창을 이동할 때). 이는 애플리케이션에게 동적 범위 사이의 변경을 제대로 반영하도록 렌더 패스를 재생성해야 할 수도 있습니다.

스왑 체인 새로 고침의 일부로 재생성되는 모든 객체의 정리 코드를 cleanup에서 cleanupSwapChain으로 이동하겠습니다:

void cleanupSwapChain() {
    for (auto framebuffer : swapChainFramebuffers) {
        vkDestroyFramebuffer(device, framebuffer, nullptr);
    }

    for (auto imageView : swapChainImageViews) {
        vkDestroyImageView(device, imageView, nullptr);
    }

    vkDestroySwapchainKHR(device, swapChain, nullptr);
}

void cleanup() {
    cleanupSwapChain();

    vkDestroyPipeline(device, graphicsPipeline, nullptr);
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);

    vkDestroyRenderPass(device, renderPass, nullptr);

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        vkDestroySemaphore(device, renderFinishedSemaphores[i], nullptr);
        vkDestroySemaphore(device, imageAvailableSemaphores[i], nullptr);
        vkDestroyFence(device, inFlightFences[i], nullptr);
    }

    vkDestroyCommandPool(device, commandPool, nullptr);

    vkDestroyDevice(device, nullptr);

    if (enableValidationLayers) {
        DestroyDebugUtilsMessengerEXT(instance, debugMessenger, nullptr);
    }

    vkDestroySurfaceKHR(instance, surface, nullptr);
    vkDestroyInstance(instance, nullptr);

    glfwDestroyWindow(window);

    glfwTerminate();
}

chooseSwapExtent에서 이미 새 창 해상도를 조회하여 스왑 체인 이미지가 새로운(올바른) 크기를 갖도록 했기 때문에 chooseSwapExtent를 수정할 필요가 없습니다(스왑 체인을 생성할 때 픽셀 단위로 표면의 해상도를 얻기 위해 glfwGetFramebufferSize를 이미 사용했음을 기억하세요).

스왑 체인을 재생성하는 것은 이것뿐입니다! 그러나 이 접근 방식의 단점은 새 스왑 체인을 생성하기 전에 모든 렌더링을 중지해야 한다는 것입니다. 오래된 스왑 체인의 이미지에서 그리기 명령이 여전히 진행 중인 동안 새 스왑 체인을 생성할 수 있습니다. VkSwapchainCreateInfoKHR 구조체의 oldSwapChain 필드에 이전 스왑 체인을 전달하고 그것을 사용을 마친 후에 오래된 스왑 체인을 파괴해야 합니다.

최적이 아니거나 오래된 스왑 체인

이제 스왑 체인 재생성이 필요한 시기를 파악하고 새로운 recreateSwapChain 함수를 호출해야 합니다. 다행히 Vulkan은 일반적으로 프레젠테이션 도중 스왑 체인이 더 이상 적합하지 않다고 알려줍니다. vkAcquireNextImageKHR 및 vkQueuePresentKHR 함수는 다음과 같은 특별한 값을 반환하여 이를 나타낼 수 있습니다.

VK_ERROR_OUT_OF_DATE_KHR: 스왑 체인이 표면과 호환되지 않아 더 이상 렌더링에 사용할 수 없습니다. 일반적으로 창 크기가 변경된 후에 발생합니다.
VK_SUBOPTIMAL_KHR: 스왑 체인은 여전히 표면에 성공적으로 표시할 수 있지만 표면 속성이 정확하게 일치하지는 않습니다.

VkResult result = vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

if (result == VK_ERROR_OUT_OF_DATE_KHR) {
    recreateSwapChain();
    return;
} else if (result != VK_SUCCESS && result != VK_SUBOPTIMAL_KHR) {
    throw std::runtime_error("failed to acquire swap chain image!");
}

이미지를 획득하려고 할 때 스왑 체인이 오래되었다면 더 이상 그것에 표시할 수 없습니다. 따라서 즉시 스왑 체인을 재생성하고 다음 drawFrame 호출에서 다시 시도해야 합니다.

스왑 체인이 최적이 아닌 경우에도 그렇게 할 수 있지만, 이미 이미지를 획득했기 때문에 그 경우에는 계속 진행하기로 결정했습니다. VK_SUCCESS 및 VK_SUBOPTIMAL_KHR 모두 "성공" 반환 코드로 간주됩니다.

result = vkQueuePresentKHR(presentQueue, &presentInfo);

if (result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR) {
    recreateSwapChain();
} else if (result != VK_SUCCESS) {
    throw std::runtime_error("failed to present swap chain image!");
}

currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;

vkQueuePresentKHR 함수는 동일한 값과 동일한 의미를 반환합니다. 이 경우에도 스왑 체인이 최적이 아니면

재생성합니다. 왜냐하면 가능한 최상의 결과를 원하기 때문입니다.

데드락 수정

이제 코드를 실행하려고 하면 데드락에 빠질 수 있습니다. 코드를 디버깅하면 애플리케이션이 vkWaitForFences에 도달했지만 그 이상으로 계속되지 않는 것을 발견할 수 있습니다. 이는 vkAcquireNextImageKHR가 VK_ERROR_OUT_OF_DATE_KHR를 반환하면 스왑체인을 재생성한 후 drawFrame에서 반환하기 때문입니다. 그러나 그 전에 현재 프레임의 펜스가 대기되었고 재설정되었습니다. 즉시 반환하면 실행할 작업이 제출되지 않고 펜스가 결코 신호되지 않아 vkWaitForFences가 영원히 중단됩니다.

다행히 간단한 해결책이 있습니다. 확실히 작업을 제출할 것이라는 것을 알기 전까지 펜스를 재설정하지 않도록 합니다. 따라서 일찍 반환하면 펜스는 여전히 신호되어 있고, 다음에 동일한 펜스 객체를 사용할 때 vkWaitForFences가 데드락에 빠지지 않습니다.

drawFrame의 시작 부분은 이제 다음과 같아야 합니다:

vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);

uint32_t imageIndex;
VkResult result = vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

if (result == VK_ERROR_OUT_OF_DATE_KHR) {
    recreateSwapChain();
    return;
} else if (result != VK_SUCCESS && result != VK_SUBOPTIMAL_KHR) {
    throw std::runtime_error("failed to acquire swap chain image!");
}

// 작업을 제출할 때만 펜스를 재설정합니다.
vkResetFences(device, 1, &inFlightFences[currentFrame]);

명시적으로 크기 조정 처리

많은 드라이버와 플랫폼은 창 크기 조정 후 자동으로 VK_ERROR_OUT_OF_DATE_KHR를 트리거하지만, 이것이 발생한다는 보장은 없습니다. 그렇기 때문에 크기 조정을 명시적으로 처리하는 추가 코드를 작성할 것입니다. 먼저 크기 조정이 발생했음을 플래그하는 새로운 멤버 변수를 추가하세요:

std::vector<VkFence> inFlightFences;

bool framebufferResized = false;

그런 다음 drawFrame 함수를 수정하여 이 플래그도 확인하도록 합니다:

if (result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR || framebufferResized) {
    framebufferResized = false;
    recreateSwapChain();
} else if (result != VK_SUCCESS) {
    ...
}

vkQueuePresentKHR 이후에 이 작업을 수행하는 것이 중요합니다. 그렇지 않으면 세마포어가 일관된 상태에 있지 않을 수 있으며, 신호된 세마포어가 제대로 대기되지 않을 수 있습니다. 이제 실제로 크기 조정을 감지하려면 GLFW 프레임워크의 glfwSetFramebufferSizeCallback 함수를 사용하여 콜백을 설정할 수 있습니다:

void initWindow() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);

    window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
    glfwSetFramebufferSizeCallback(window, framebufferResizeCallback);
}

static void framebufferResizeCallback(GLFWwindow* window, int width, int height) {

}

콜백으로 static 함수를 생성하는 이유는 GLFW가 우리 HelloTriangleApplication 인스턴스에 올바른 this 포인터로 멤버 함수를 제대로 호출하는 방법을 모르기 때문입니다.

그러나 콜백에서 GLFWwindow에 대한 참조를 얻을 수 있으며, 임의의 포인터를 그 안에 저장할 수 있는 또 다른 GLFW 함수가 있습니다: glfwSetWindowUserPointer:

window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
glfwSetWindowUserPointer(window, this);
glfwSetFramebufferSizeCallback(window, framebufferResizeCallback);

이제 이 값을 콜백 내에서 glfwGetWindowUserPointer를 사용하여 검색하여 플래그를 올바르게 설정할 수 있습니다:

static void framebufferResizeCallback(GLFWwindow* window, int width, int height) {
    auto app = reinterpret_cast<HelloTriangleApplication*>(glfwGetWindowUserPointer(window));
    app->framebufferResized = true;
}

이제 프로그램을 실행하고 창 크기를 조절하여 프레임버퍼가 창 크기에 맞게 제대로 조절되는지 확인해 보세요.

최소화 처리

스왑 체인이 오래되는 또 다른 경우는 특별한 종류의 창 크기 조절입니다: 창 최소화입니다. 이 경우는 특별합니다. 왜냐하면 0의 프레임 버퍼 크기로 결과를 낳기 때문입니다. 이 튜토리얼에서는 창이 다시 전경에 있을 때까지 일시 중지하여 recreateSwapChain 함수를 확장함으로써 이를 처리할 것입니다:

void recreateSwapChain() {
    int width = 0, height = 0;
    glfwGetFramebufferSize(window, &width, &height);
    while (width == 0 || height == 0) {
        glfwGetFramebufferSize(window, &width, &height);
        glfwWaitEvents();
    }

    vkDeviceWaitIdle(device);

    ...
}

glfwGetFramebufferSize의 초기 호출은 크기가 이미 정확하고 glfwWaitEvents가 기다릴 것이 없을 경우를 처리합니다.

축하합니다, 이제 매우 잘 동작하는 첫 Vulkan 프로그램을 완성했습니다! 다음 장에서는 정점 셰이더에서 하드코딩된 정점을 제거하고 실제 정점 버퍼를 사용하도록 변경할 것입니다.

C++ 코드 / 버텍스 셰이더(vertex shader) / 프래그먼트 셰이더(fragment shader)

버텍스 버퍼

버텍스 입력 구조 설명

소개

이어지는 몇 장에서, 우리는 버텍스 셰이더의 하드코딩된 버텍스 데이터를 메모리에 있는 버텍스 버퍼로 대체할 것입니다. 우선 가장 쉬운 방법인 CPU 가시 버퍼를 생성하고 memcpy를 사용하여 버텍스 데이터를 직접 복사하는 방법부터 시작하고, 이후에 고성능 메모리로 버텍스 데이터를 복사하기 위한 스테이징 버퍼 사용 방법을 살펴볼 것입니다.

버텍스 셰이더(Vertex Shader)

먼저 버텍스 셰이더 코드에서 버텍스 데이터를 제거하여 셰이더 자체에 버텍스 데이터가 포함되지 않도록 변경합니다. 버텍스 셰이더는 in 키워드를 사용하여 버텍스 버퍼로부터 데이터를 받아들입니다.

#version 450

layout(location = 0) in vec2 inPosition;
layout(location = 1) in vec3 inColor;

layout(location = 0) out vec3 fragColor;

void main() {
    gl_Position = vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
}

inPosition과 inColor는 버텍스 속성입니다. 이들은 버텍스 버퍼에서 버텍스별로 지정된 속성으로, 우리가 두 배열을 사용해 위치와 색상을 각 버텍스에 수동으로 지정했던 것처럼 작동합니다. 버텍스 셰이더를 재컴파일하는 것을 잊지 마십시오!

fragColor와 같이, layout(location = x) 주석은 나중에 참조하기 위해 입력에 인덱스를 할당합니다. dvec3와 같은 일부 유형은 여러 슬롯을 사용합니다. 즉, 그 다음 인덱스는 최소 2 이상이어야 합니다:

layout(location = 0) in dvec3 inPosition;
layout(location = 2) in vec3 inColor;

OpenGL 위키에서 레이아웃 한정자에 대한 자세한 정보를 찾을 수 있습니다.

버텍스 데이터

우리는 셰이더 코드에서 프로그램 코드의 배열로 버텍스 데이터를 이동합니다. GLM 라이브러리를 포함하여 시작하세요. 이 라이브러리는 벡터와 행렬과 같은 선형 대수 관련 유형을 제공합니다. 이 유형들을 사용하여 위치와 색상 벡터를 지정할 것입니다.

#include <glm/glm.hpp>

버텍스 셰이더에서 사용될 두 요소가 포함된 Vertex라는 새로운 구조체를 생성합니다:

struct Vertex {
    glm::vec2 pos;
    glm::vec3 color;
};

GLM은 셰이더 언어에서 사용되는 벡터 유형과 정확히 일치하는 C++ 유형을 편리하게 제공합니다.

const std::vector<Vertex> vertices = {
    {{0.0f, -0.5f}, {1.0f, 0.0f, 0.0f}},
    {{0.5f, 0.5f}, {0.0f, 1.0f, 0.0f}},
    {{-0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}}
};

이제 Vertex 구조를 사용하여 버텍스 데이터 배열을 지정합니다. 이전과 동일한 위치와 색상 값을 사용하지만, 이제 이들을 버텍스 요소를 인터리빙하는 하나의 배열로 결합했습니다.

바인딩 및 속성 구조 정의

다음 단계는 이 데이터 형식을 GPU 메모리에 업로드한 후 버텍스 셰이더로 전달하는 방법을 Vulkan에 알리는 것입니다. 이 정보를 전달하는 데 필요한 두 가지 유형의 구조체가 있습니다.

첫 번째 구조체는 VkVertexInputBindingDescription이며, Vertex 구조에 멤버 함수를 추가하여 이 구조를 올바르게 채웁니다.

struct Vertex {
    glm::vec2 pos;
    glm::vec3 color;

    static VkVertexInputBindingDescription getBindingDescription() {
        VkVertexInputBindingDescription bindingDescription{};

        return bindingDescription;
    }
};

버텍스 바인딩은 버텍스를 통해 메모리에서 데이터를 로드하는 비율을 설명합니다. 데이터 항목 간의 바이트 수와 각 버텍스나 각 인스턴스 후에 다음 데이터 항목으로 이동할지를 지정합니다.

VkVertexInputBindingDescription bindingDescription{};
bindingDescription.binding = 0;
bindingDescription.stride = sizeof(Vertex);
bindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;

모든 버텍스당 데이터는 하나의 배열에 패키지되어 있으므로 하나의 바인딩만 가질 것입니다. binding 매개변수는 바인딩 배열의 인덱스를 지정하고, stride 매개변수는 한 항목에서 다음 항목까지의 바이트 수를 지정하며, inputRate 매개변수는 다음 값 중 하나를 가질 수 있습니다:

VK_VERTEX_INPUT_RATE_VERTEX: 각 버텍스 후에 다음 데이터 항목으로 이동
VK_VERTEX_INPUT_RATE_INSTANCE: 각 인스턴스 후에 다음 데이터 항목으로 이동

우리는 인스턴스 렌더링을 사용하지 않을 것이므로 버텍스당 데이터를 사용할 것입니다.

속성 설명

버텍스 입력을 처리하는 방법을 설명하는 두 번째 구조는 VkVertexInputAttributeDescription입니다. 우리는 Vertex에 또 다른 도우미 함수를 추가하여 이 구조체들을 채울 것입니다.

#include <array>

...

static std::array<VkVertexInputAttributeDescription, 2> getAttributeDescriptions() {
    std::array<VkVertexInputAttributeDescription, 2> attributeDescriptions{};

    return attributeDescriptions;
}

함수 프로토타입이 나타내듯이 이 구조체는 두 개가 될 것입니다. 속성 설명 구조체는 바인딩 설명에서 기원하는 버텍스 데이터 덩어리에서 버텍스 속성을 추출하는 방법을 설명합니다. 우리는 위치와 색상의 두 속성을 가지고 있으므로 두 개의 속성 설명 구조체가 필요합니다.

attributeDescriptions[0].binding = 0;
attributeDescriptions[0].location = 0;
attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
attributeDescriptions[0].offset = offsetof(Vertex, pos);

binding 매개변수는 버텍스당 데이터가 어디에서 오는지 Vulkan에 알려줍니다. location 매개변수는 버텍스 셰이더의 입력의 location 지시문을 참조합니다. 버텍스 셰이더의 위치 0에 있는 입력은 위치이며, 32비트 float

의 두 구성 요소를 가지고 있습니다.

format 매개변수는 속성 데이터의 유형을 설명합니다. 약간 혼란스럽게도, 형식은 색상 형식과 동일한 열거를 사용하여 지정됩니다. 일반적으로 함께 사용되는 셰이더 유형과 형식은 다음과 같습니다:

float: VK_FORMAT_R32_SFLOAT
vec2: VK_FORMAT_R32G32_SFLOAT
vec3: VK_FORMAT_R32G32B32_SFLOAT
vec4: VK_FORMAT_R32G32B32A32_SFLOAT

보시다시피, 색상 채널의 수가 셰이더 데이터 유형의 구성 요소 수와 일치하는 형식을 사용해야 합니다. 채널 수가 셰이더의 구성 요소 수보다 많을 경우 허용되지만, 그러한 경우 추가 채널은 조용히 버려집니다. 채널 수가 구성 요소 수보다 적은 경우, BGA 구성 요소는 기본값 (0, 0, 1)을 사용합니다. 색상 유형(SFLOAT, UINT, SINT) 및 비트 폭도 셰이더 입력의 유형과 일치해야 합니다. 다음 예를 참조하십시오:

ivec2: VK_FORMAT_R32G32_SINT, 32비트 부호 있는 정수의 2-구성 요소 벡터
uvec4: VK_FORMAT_R32G32B32A32_UINT, 32비트 부호 없는 정수의 4-구성 요소 벡터
double: VK_FORMAT_R64_SFLOAT, 더블 정밀도(64비트) float

format 매개변수는 속성 데이터의 바이트 크기를 암시적으로 정의하고 offset 매개변수는 버텍스당 데이터의 시작부터 읽을 바이트 수를 지정합니다. 바인딩은 한 번에 하나의 Vertex를 로드하고 위치 속성(pos)은 이 구조체의 시작부터 0 바이트 오프셋에 있습니다. 이는 offsetof 매크로를 사용하여 자동으로 계산됩니다.

attributeDescriptions[1].binding = 0;
attributeDescriptions[1].location = 1;
attributeDescriptions[1].format = VK_FORMAT_R32G32B32_SFLOAT;
attributeDescriptions[1].offset = offsetof(Vertex, color);

색상 속성도 거의 같은 방식으로 설명됩니다.

파이프라인 버텍스 입력

이제 그래픽 파이프라인을 설정하여 이 형식의 버텍스 데이터를 수락하고 버텍스 셰이더로 전달할 수 있도록 createGraphicsPipeline에서 구조체를 참조해야 합니다. vertexInputInfo 구조체를 찾아 두 설명을 참조하도록 수정하세요:

auto bindingDescription = Vertex::getBindingDescription();
auto attributeDescriptions = Vertex::getAttributeDescriptions();

vertexInputInfo.vertexBindingDescriptionCount = 1;
vertexInputInfo.vertexAttributeDescriptionCount = static_cast<uint32_t>(attributeDescriptions.size());
vertexInputInfo.pVertexBindingDescriptions = &bindingDescription;
vertexInputInfo.pVertexAttributeDescriptions = attributeDescriptions.data();

파이프라인은 이제 vertices 컨테이너의 형식의 버텍스 데이터를 수락하고 버텍스 셰이더로 전달할 준비가 되었습니다. 이제 프로그램을 실행하면 유효성 검사 계층이 활성화되어 있고 바인딩에 버텍스 버퍼가 바인딩되어 있지 않다고 불평하는 것을 볼 수 있습니다. 다음 단계는 버텍스 버퍼를 생성하고 버텍스 데이터를 그곳으로 이동하여 GPU가 접근할 수 있도록 하는 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

버텍스 버퍼 생성

소개

Vulkan에서 버퍼는 그래픽 카드가 읽을 수 있는 임의의 데이터를 저장하는 메모리 영역입니다. 이번 장에서는 버텍스 데이터를 저장하는 데 사용하지만, 향후 장에서 탐구할 다른 많은 용도로도 사용될 수 있습니다. 지금까지 다루었던 Vulkan 객체와 달리 버퍼는 자체적으로 메모리를 자동으로 할당하지 않습니다. 이전 장에서 본 것처럼 Vulkan API는 프로그래머가 거의 모든 것을 제어하도록 하며, 메모리 관리도 그 중 하나입니다.

버퍼 생성

initVulkan에서 createCommandBuffers 바로 전에 호출할 새로운 함수 createVertexBuffer를 생성합니다.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
    createVertexBuffer();
    createCommandBuffers();
    createSyncObjects();
}

...

void createVertexBuffer() {

}

버퍼를 생성하려면 VkBufferCreateInfo 구조체를 채워야 합니다.

VkBufferCreateInfo bufferInfo{};
bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
bufferInfo.size = sizeof(vertices[0]) * vertices.size();

구조체의 첫 번째 필드는 size로, 버퍼의 크기를 바이트 단위로 지정합니다. 버텍스 데이터의 바이트 크기를 계산하는 것은 sizeof를 사용하여 간단합니다.

bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT;

두 번째 필드는 usage로, 버퍼의 데이터가 사용될 목적을 나타냅니다. 비트 연산자를 사용하여 여러 용도를 지정할 수 있습니다. 우리의 경우에는 버텍스 버퍼로 사용될 것이며, 향후 장에서 다른 유형의 용도를 살펴볼 것입니다.

bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

스왑 체인의 이미지처럼 버퍼도 특정 큐 패밀리가 소유하거나 동시에 여러 개와 공유될 수 있습니다. 버퍼는 그래픽 큐에서만 사용될 것이므로 독점적 접근을 유지할 수 있습니다.

flags 매개변수는 지금 당장 관련이 없는 희소 버퍼 메모리를 구성하는 데 사용됩니다. 기본값인 0으로 두겠습니다.

이제 vkCreateBuffer로 버퍼를 생성할 수 있습니다. 버퍼 핸들을 저장할 클래스 멤버를 정의하고 vertexBuffer라고 합니다.

VkBuffer vertexBuffer;

...

void createVertexBuffer() {
    VkBufferCreateInfo bufferInfo{};
    bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
    bufferInfo.size = sizeof(vertices[0]) * vertices.size();
    bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT;
    bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

    if (vkCreateBuffer(device, &bufferInfo, nullptr, &vertexBuffer) != VK_SUCCESS) {
        throw std::runtime_error("failed to create vertex buffer!");
    }
}

버퍼는 프로그램이 끝날 때까지 렌더링 명령에서 사용할 수 있어야 하며, 스왑 체인에 의존하지 않으므로 원래 cleanup 함수에서 정리하겠습니다.

void cleanup() {
    cleanupSwapChain();

    vkDestroyBuffer(device, vertexBuffer, nullptr);

    ...
}

메모리 요구 사항

버퍼는 생성되었지만 아직 메모리가 할당되지 않았습니다. 버퍼에 메모리를 할당하는 첫 번째 단계는 vkGetBufferMemoryRequirements 함수를 사용하여 메모리 요구 사항을 조회하는 것입니다.

VkMemoryRequirements memRequirements;
vkGetBufferMemoryRequirements(device, vertexBuffer, &memRequirements);

VkMemoryRequirements 구조체는 세 개의 필드를 가집니다:

size: 필요한 메모리 양의 크기를 바이트 단위로, bufferInfo.size와 다를 수 있습니다.
alignment: 할당된 메모리 영역에서 버퍼가 시작하는 바이트 단위의 오프셋, bufferInfo.usage 및 bufferInfo.flags에 따라 다릅니다.
memoryTypeBits: 버퍼에 적합한 메모리 유형의 비트 필드.

그래픽 카드는 할당할 수 있는 다양한 유형의 메모리를 제공할 수 있습니다. 각 메모리 유형은 허용된 작업과 성능 특성면에서 다릅니다. 버퍼의 요구 사항과 우리 자신의 애플리케이션 요구 사항을 결합하여 사용할 메모리 유형을 찾아야 합니다. 이 목적을 위해 새로운 함수 findMemoryType을 생성하겠습니다.

uint32_t findMemoryType(uint32_t typeFilter, VkMemoryPropertyFlags properties) {

}

먼저 vkGetPhysicalDeviceMemoryProperties를 사용하여 사용 가능한 메모리 유형에 대한 정보를 조회해야 합니다.

VkPhysicalDeviceMemoryProperties memProperties;
vkGetPhysicalDeviceMemoryProperties(physicalDevice, &memProperties);

VkPhysicalDeviceMemoryProperties 구조체에는 memoryTypes 및 memoryHeaps 두 개의 배열이 있습니다. 메모리 힙은 전용 VRAM과 VRAM이 부족할 때 사용되는 RAM의 스왑 공간과 같은 독립된 메모리 리소스입니다. 다양한 유형의 메모리는 이 힙 내에 존재합니다. 지금은 메모리 유형에만 관심을 가지고 힙에서 오는 것은 아니지만, 이것이 성능에 영향을 줄 수 있다고 상상할 수 있습니다.

우선 버퍼 자체에 적합한 메모리 유형을 찾겠습니다:

for (uint32_t i = 0; i < memProperties.memoryTypeCount; i++) {
    if (typeFilter & (1 << i)) {
        return i;
    }
}

throw std::runtime_error("failed to find suitable memory type!");

typeFilter 매개변수는 적합한 메모리 유형의 비트 필드를 지정하는 데 사용됩니다. 즉, 해당 비트가 1로 설정된 적합한 메모리 유형의 인덱스를 단순히 반복하여 확인할 수 있습니다.

하지만 우리는 버텍스 버퍼에 적합한 메모리 유형에만 관심이 있는 것이 아닙니다. 우리는 버텍스 데이터를 그 메모리에 쓸 수 있어야 합니다. memoryTypes 배열은 각 메모리 유형의 힙과 속성을 지정하는 VkMemoryType 구조체로 구성됩니다. 속성은 메모리의 특별한 기능, 예를 들어 CPU에서 쓸 수 있도록 매핑할 수 있는 기능을 정의합니다. 이 속성은 VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT로 나

타내지만, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT 속성도 사용해야 합니다. 메모리를 매핑할 때 그 이유를 알게 될 것입니다.

이제 루프를 수정하여 이 속성을 지원하는지도 확인할 수 있습니다:

for (uint32_t i = 0; i < memProperties.memoryTypeCount; i++) {
    if ((typeFilter & (1 << i)) && (memProperties.memoryTypes[i].propertyFlags & properties) == properties) {
        return i;
    }
}

우리는 여러 원하는 속성을 가질 수 있으므로 비트 AND의 결과가 단순히 0이 아니라 원하는 속성 비트 필드와 같은지 확인해야 합니다. 버퍼에 적합하고 필요한 모든 속성을 가진 메모리 유형이 있다면 그 인덱스를 반환하고, 그렇지 않으면 예외를 발생시킵니다.

메모리 할당

이제 적합한 메모리 유형을 결정하는 방법이 있으므로 VkMemoryAllocateInfo 구조체를 채워 실제로 메모리를 할당할 수 있습니다.

VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);

메모리 할당은 이제 크기와 유형을 지정하는 것만큼 간단하며, 두 값 모두 버텍스 버퍼의 메모리 요구 사항과 원하는 속성에서 파생됩니다. 메모리 핸들을 저장할 클래스 멤버를 생성하고 vkAllocateMemory로 할당합니다.

VkBuffer vertexBuffer;
VkDeviceMemory vertexBufferMemory;

...

if (vkAllocateMemory(device, &allocInfo, nullptr, &vertexBufferMemory) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate vertex buffer memory!");
}

메모리 할당에 성공하면 이제 이 메모리를 버퍼와 연결할 수 있습니다.

vkBindBufferMemory(device, vertexBuffer, vertexBufferMemory, 0);

첫 세 매개변수는 자명하며, 네 번째 매개변수는 메모리 영역 내의 오프셋입니다. 이 메모리는 버텍스 버퍼를 위해 특별히 할당되므로 오프셋은 단순히 0입니다. 오프셋이 0이 아닌 경우에는 memRequirements.alignment로 나눌 수 있어야 합니다.

물론 C++에서 동적 메모리 할당과 같이, 언젠가는 메모리를 해제해야 합니다. 버퍼 객체에 바인딩된 메모리는 버퍼가 더 이상 사용되지 않을 때 해제할 수 있습니다. 따라서 버퍼가 파괴된 후에 해제하겠습니다:

void cleanup() {
    cleanupSwapChain();

    vkDestroyBuffer(device, vertexBuffer, nullptr);
    vkFreeMemory(device, vertexBufferMemory, nullptr);
    ...
}

버텍스 버퍼 채우기

이제 버텍스 데이터를 버퍼에 복사할 시간입니다. 이는 vkMapMemory를 사용하여 버퍼 메모리를 CPU 접근 가능한 메모리에 매핑하여 수행됩니다.

void* data;
vkMapMemory(device, vertexBufferMemory, 0, bufferInfo.size, 0, &data);

이 함수를 사용하면 지정된 메모리 리소스의 오프셋과 크기로 정의된 영역에 접근할 수 있습니다. 여기서 오프셋과 크기는 각각 0과 bufferInfo.size입니다. 특별한 값 VK_WHOLE_SIZE를 사용하여 모든 메모리를 매핑할 수도 있습니다. 두 번째에서 마지막 매개변수는 현재 API에서 아직 사용할 수 없는 플래그를 지정하는 데 사용할 수 있습니다. 이는 0으로 설정해야 합니다. 마지막 매개변수는 매핑된 메모리에 대한 포인터를 출력으로 지정합니다.

void* data;
vkMapMemory(device, vertexBufferMemory, 0, bufferInfo.size, 0, &data);
memcpy(data, vertices.data(), (size_t) bufferInfo.size);
vkUnmapMemory(device, vertexBufferMemory);

이제 간단히 버텍스 데이터를 매핑된 메모리에 memcpy하여 다시 vkUnmapMemory를 사용하여 매핑을 해제할 수 있습니다. 불행히도 드라이버가 데이터를 버퍼 메모리에 즉시 복사하지 않을 수도 있습니다. 예를 들어 캐싱 때문입니다. 또한 매핑된 메모리에서 버퍼에 대한 쓰기가 아직 보이지 않을 수도 있습니다. 이 문제를 해결하는 두 가지 방법이 있습니다:

호스트 일관성이 있는 메모리 힙을 사용합니다. VK_MEMORY_PROPERTY_HOST_COHERENT_BIT으로 표시됩니다.
매핑된 메모리에 쓴 후 vkFlushMappedMemoryRanges를 호출하고, 매핑된 메모리에서 읽기 전에 vkInvalidateMappedMemoryRanges를 호출합니다.

우리는 첫 번째 접근 방식을 선택했습니다. 이는 매핑된 메모리가 항상 할당된 메모리의 내용과 일치하도록 보장합니다. 이 접근 방식이 명시적인 플러싱보다 약간 떨어지는 성능을 초래할 수도 있지만, 왜 그렇지 않은지 다음 장에서 살펴볼 것입니다.

메모리 범위를 플러싱하거나 일관된 메모리 힙을 사용하면 드라이버가 버퍼에 대한 우리의 쓰기를 인식하게 되지만, GPU에서 실제로 보이는 것은 아닙니다. 데이터를 GPU로 전송하는 작업은 배경에서 이루어지며, 사양은 다음 vkQueueSubmit 호출 시 완료된 것으로 보장됩니다.

버텍스 버퍼 바인딩

이제 남은 것은 렌더링 작업 중에 버텍스 버퍼를 바인딩하는 것입니다. 우리는 이를 수행하기 위해 recordCommandBuffer 함수를 확장할 것입니다.

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, graphicsPipeline);

VkBuffer vertexBuffers[] = {vertexBuffer};
VkDeviceSize offsets[] = {0};
vkCmdBindVertexBuffers(commandBuffer, 0, 1, vertexBuffers, offsets);

vkCmdDraw(commandBuffer, static_cast<uint32_t>(vertices.size()), 1, 0, 0);

vkCmdBindVertexBuffers 함수는 이전 장에서 설정한 것처럼 버텍스 버퍼를 바인딩에 바인딩하는 데 사용됩니다. 명령 버퍼 외에 첫 두 매개변수는 바인딩에 대해 버텍스 버퍼를 지정할 오프셋과 수를 지정합니다. 마지막 두 매개변수는 바인딩할 버텍스 버퍼의

배열과 버텍스 데이터를 읽기 시작할 바이트 오프셋을 지정합니다. 버퍼의 버텍스 수를 전달하기 위해 vkCmdDraw 호출도 변경해야 합니다.

이제 프로그램을 실행하면 익숙한 삼각형을 다시 볼 수 있습니다:

상단 버텍스의 색상을 vertices 배열을 수정하여 흰색으로 변경해 보세요:

const std::vector<Vertex> vertices = {
    {{0.0f, -0.5f}, {1.0f, 1.0f, 1.0f}},
    {{0.5f, 0.5f}, {0.0f, 1.0f, 0.0f}},
    {{-0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}}
};

프로그램을 다시 실행하면 다음과 같은 모습을 볼 수 있습니다:

다음 장에서는 더 나은 성능을 제공하지만 조금 더 많은 작업이 필요한 다른 방식으로 버텍스 데이터를 버텍스 버퍼로 복사하는 방법을 살펴볼 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

스테이징 버퍼

소개

현재 우리가 가지고 있는 버텍스 버퍼는 정상적으로 작동하지만, CPU에서 접근할 수 있게 하는 메모리 유형이 그래픽 카드가 읽기에 가장 최적화된 메모리 유형은 아닐 수 있습니다. 가장 최적화된 메모리는 VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT 플래그가 설정되어 있으며, 일반적으로 전용 그래픽 카드에서 CPU가 접근할 수 없습니다. 이 장에서는 두 개의 버텍스 버퍼를 생성할 것입니다. 하나는 CPU 접근 가능 메모리에 있는 스테이징 버퍼로 버텍스 배열에서 데이터를 업로드하고, 최종 버텍스 버퍼는 디바이스 로컬 메모리에 있습니다. 데이터를 스테이징 버퍼에서 실제 버텍스 버퍼로 이동하기 위해 버퍼 복사 명령을 사용할 것입니다.

전송 큐

버퍼 복사 명령은 VK_QUEUE_TRANSFER_BIT를 지원하는 큐 패밀리가 필요합니다. 좋은 소식은 VK_QUEUE_GRAPHICS_BIT 또는 VK_QUEUE_COMPUTE_BIT 기능을 가진 모든 큐 패밀리가 이미 암시적으로 VK_QUEUE_TRANSFER_BIT 작업을 지원한다는 것입니다. 이 경우 queueFlags에 명시적으로 나열할 필요가 없습니다.

도전을 좋아한다면, 전송 작업을 위해 특별히 다른 큐 패밀리를 사용해 볼 수 있습니다. 이를 위해서는 프로그램을 다음과 같이 수정해야 합니다:

QueueFamilyIndices 및 findQueueFamilies를 수정하여 VK_QUEUE_TRANSFER_BIT를 가지면서 VK_QUEUE_GRAPHICS_BIT는 가지지 않는 큐 패밀리를 명시적으로 찾습니다.
createLogicalDevice를 수정하여 전송 큐 핸들을 요청합니다.
전송 큐 패밀리에서 제출된 명령 버퍼를 위한 두 번째 명령 풀을 생성합니다.
리소스의 sharingMode를 VK_SHARING_MODE_CONCURRENT로 변경하고 그래픽 및 전송 큐 패밀리를 모두 지정합니다.
vkCmdCopyBuffer와 같은 모든 전송 명령을 그래픽 큐가 아닌 전송 큐에 제출합니다.

이 작업은 조금 번거롭지만 리소스가 큐 패밀리 간에 어떻게 공유되는지에 대해 많은 것을 배울 수 있습니다.

버퍼 생성 추상화

이 장에서 여러 버퍼를 생성할 예정이므로, 버퍼 생성을 도우미 함수로 옮기는 것이 좋습니다. createBuffer라는 새 함수를 생성하고 createVertexBuffer의 코드(매핑 제외)를 이동하세요.

void createBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) {
    VkBufferCreateInfo bufferInfo{};
    bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
    bufferInfo.size = size;
    bufferInfo.usage = usage;
    bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

    if (vkCreateBuffer(device, &bufferInfo, nullptr, &buffer) != VK_SUCCESS) {
        throw std::runtime_error("failed to create buffer!");
    }

    VkMemoryRequirements memRequirements;
    vkGetBufferMemoryRequirements(device, buffer, &memRequirements);

    VkMemoryAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
    allocInfo.allocationSize = memRequirements.size;
    allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, properties);

    if (vkAllocateMemory(device, &allocInfo, nullptr, &bufferMemory) != VK_SUCCESS) {
        throw std::runtime_error("failed to allocate buffer memory!");
    }

    vkBindBufferMemory(device, buffer, bufferMemory, 0);
}

버퍼 크기, 메모리 속성 및 사용 용도에 대한 매개변수를 추가하여 다양한 유형의 버퍼를 생성할 수 있도록 이 함수를 사용하십시오. 마지막 두 매개변수는 핸들을 쓸 출력 변수입니다.

이제 createVertexBuffer에서 버퍼 생성 및 메모리 할당 코드를 제거하고 대신 createBuffer를 호출할 수 있습니다:

void createVertexBuffer() {
    VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size();
    createBuffer(bufferSize, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, vertexBuffer, vertexBufferMemory);

    void* data;
    vkMapMemory(device, vertexBufferMemory, 0, bufferSize, 0, &data);
        memcpy(data, vertices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, vertexBufferMemory);
}

프로그램을 실행하여 버텍스 버퍼가 여전히 제대로 작동하는지 확인하세요.

스테이징 버퍼 사용

이제 createVertexBuffer를 수정하여 임시 버퍼로 호스트 가시 버퍼만 사용하고 실제 버텍스 버퍼로 디바이스 로컬 버퍼를 사용하도록 변경합니다.

void createVertexBuffer() {
    VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size();

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
        memcpy(data, vertices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, stagingBufferMemory);

    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory);
}

이제 버텍스 데이터를 매핑하고 복사하기 위해 stagingBuffer와 stagingBufferMemory를 사용합니다. 이 장에서는 두 개의 새로운 버퍼 사용 플래그를 사용합니다:

VK_BUFFER_USAGE_TRANSFER_SRC_BIT: 버퍼를 메모리 전송 작업의 원본으로 사용할 수 있습니다.
VK_BUFFER_USAGE_TRANSFER_DST_BIT: 버퍼를 메모리 전송 작업의 목적지로 사용할 수 있습니다.

vertexBuffer는 디바이스 로컬 메모리 유형에서 할당되며, 일반적으로 vkMapMemory를 사용할 수 없음을 의미합니다. 그러나 stagingBuffer에서 vertexBuffer로 데이터를 복사할 수 있습니다. stagingBuffer에 대해 전송 소스 플래그를 지정하고 vertexBuffer에 대해 전송 목적지 플래그와 버텍스 버퍼 사용 플래그를 지정함으로써 이를 수행하려는 의도를 나타내야 합니다.

이제 한 버퍼에서 다른 버퍼로 내용을 복사하는 함수 copyBuffer를 작성할 것입니다.

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {

}

메모리 전송 작업은 그리기 명령과 마찬가지로 명령 버퍼를 사용하여 실행됩니다. 따라서 임시 명령 버퍼를 먼저 할당해야

합니다. 이러한 종류의 단기 버퍼에 대해서는 명령 풀을 별도로 생성하는 것이 좋습니다. 구현은 메모리 할당 최적화를 적용할 수 있기 때문입니다. 그 경우 명령 풀 생성 중 VK_COMMAND_POOL_CREATE_TRANSIENT_BIT 플래그를 사용해야 합니다.

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {
    VkCommandBufferAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
    allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    allocInfo.commandPool = commandPool;
    allocInfo.commandBufferCount = 1;

    VkCommandBuffer commandBuffer;
    vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer);
}

그리고 즉시 명령 버퍼 기록을 시작하세요:

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;

vkBeginCommandBuffer(commandBuffer, &beginInfo);

우리는 명령 버퍼를 한 번만 사용하고 복사 작업이 완료될 때까지 함수에서 반환되는 것을 기다릴 것입니다. 드라이버에 우리의 의도를 VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT을 사용하여 알리는 것이 좋습니다.

VkBufferCopy copyRegion{};
copyRegion.srcOffset = 0; // Optional
copyRegion.dstOffset = 0; // Optional
copyRegion.size = size;
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);

버퍼의 내용은 vkCmdCopyBuffer 명령을 사용하여 전송됩니다. 이 명령은 소스 버퍼와 목적지 버퍼를 인수로 받고 복사할 영역의 배열을 받습니다. 영역은 VkBufferCopy 구조체로 정의되며 소스 버퍼 오프셋, 목적지 버퍼 오프셋 및 크기로 구성됩니다. vkMapMemory 명령과 달리 여기서 VK_WHOLE_SIZE를 지정할 수 없습니다.

vkEndCommandBuffer(commandBuffer);

이 명령 버퍼에는 복사 명령만 포함되므로 그 후에 기록을 중지할 수 있습니다. 이제 명령 버퍼를 실행하여 전송을 완료하세요:

VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;

vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
vkQueueWaitIdle(graphicsQueue);

그리기 명령과 달리 이번에는 기다려야 할 이벤트가 없습니다. 우리는 버퍼의 전송을 즉시 실행하고 싶습니다. 이 전송이 완료되기를 기다리는 두 가지 방법이 있습니다. 펜스(fence)를 사용하고 vkWaitForFences로 기다리거나, 단순히 전송 큐가 유휴 상태가 될 때까지 기다리는 vkQueueWaitIdle을 사용할 수 있습니다. 펜스를 사용하면 여러 전송을 동시에 예약하고 모두 완료될 때까지 기다릴 수 있으며, 한 번에 하나씩 실행하는 대신 이렇게 하면 드라이버가 최적화할 기회를 더 많이 가질 수 있습니다.

vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);

전송 작업에 사용된 명령 버퍼를 정리하는 것을 잊지 마세요.

이제 createVertexBuffer 함수에서 copyBuffer를 호출하여 버텍스 데이터를 디바이스 로컬 버퍼로 이동할 수 있습니다:

createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory);

copyBuffer(stagingBuffer, vertexBuffer, bufferSize);

스테이징 버퍼에서 디바이스 버퍼로 데이터를 복사한 후에는 그것을 정리해야 합니다:

    ...

    copyBuffer(stagingBuffer, vertexBuffer, bufferSize);

    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

프로그램을 실행하여 여전히 익숙한 삼각형을 볼 수 있는지 확인하세요. 개선 사항은 지금 당장 보이지 않을 수 있지만, 버텍스 데이터는 이제 고성능 메모리에서 로드됩니다. 이는 우리가 더 복잡한 기하학을 렌더링하기 시작할 때 중요해질 것입니다.

결론

실제 애플리케이션에서는 각각의 개별 버퍼에 대해 vkAllocateMemory를 호출해서는 안 됩니다. 동시 메모리 할당의 최대 수는 maxMemoryAllocationCount 물리적 디바이스 제한에 의해 제한되며, NVIDIA GTX 1080과 같은 고급 하드웨어에서는 4096으로 낮을 수 있습니다. 동시에 많은 객체에 대해 메모리를 할당하는 올바른 방법은 단일 할당을 많은 다른 객체들 사이에서 나누는 사용자 정의 할당자를 생성하는 것입니다. 이는 많은 함수에서 본 offset 매개변수를 사용합니다.

이러한 할당자를 직접 구현하거나 VulkanMemoryAllocator 라이브러리를 사용할 수 있습니다. 이 라이브러리는 GPUOpen 이니셔티브에 의해 제공됩니다. 그러나 이 튜토리얼에서는 지금 당장 이러한 제한에 도달할 가능성이 없기 때문에 각 리소스에 대해 별도의 할당을 사용하는 것이 괜찮습니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

인덱스 버퍼

소개

실제 애플리케이션에서 렌더링할 3D 메시는 종종 여러 삼각형 간에 버텍스를 공유합니다. 이는 사각형을 그릴 때와 같이 간단한 경우에도 이미 발생합니다:

사각형을 그리려면 두 개의 삼각형이 필요하므로, 6개의 버텍스가 필요한 버텍스 버퍼가 필요합니다. 문제는 두 버텍스의 데이터가 중복되어야 하며, 이는 50%의 중복을 초래합니다. 버텍스가 평균적으로 3개의 삼각형에서 재사용될 때 더 복잡한 메시에서는 상황이 더 악화됩니다. 이 문제의 해결책은 인덱스 버퍼를 사용하는 것입니다.

인덱스 버퍼는 본질적으로 버텍스 버퍼를 가리키는 포인터의 배열입니다. 이를 통해 버텍스 데이터의 순서를 재정렬하고 기존 데이터를 여러 버텍스에 재사용할 수 있습니다. 위의 그림은 각각의 네 개의 고유 버텍스를 포함하는 버텍스 버퍼가 있는 경우 사각형에 대한 인덱스 버퍼가 어떻게 생겼는지 보여줍니다. 처음 세 개의 인덱스는 오른쪽 위 삼각형을 정의하고, 마지막 세 개의 인덱스는 왼쪽 아래 삼각형의 버텍스를 정의합니다.

인덱스 버퍼 생성

이번 장에서는 버텍스 데이터를 수정하고 그림과 같은 사각형을 그리기 위해 인덱스 데이터를 추가할 것입니다. 네 모서리를 나타내도록 버텍스 데이터를 수정하세요:

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}},
    {{0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}},
    {{0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}},
    {{-0.5f, 0.5f}, {1.0f, 1.0f, 1.0f}}
};

왼쪽 위 모서리는 빨간색, 오른쪽 위는 녹색, 오른쪽 아래는 파란색, 왼쪽 아래는 흰색입니다. indices라는 새 배열을 추가하여 인덱스 버퍼의 내용을 나타냅니다. 그림의 인덱스와 일치하여 오른쪽 위 삼각형과 왼쪽 아래 삼각형을 그려야 합니다.

const std::vector<uint16_t> indices = {
    0, 1, 2, 2, 3, 0
};

vertices의 항목 수에 따라 uint16_t 또는 uint32_t를 인덱스 버퍼에 사용할 수 있습니다. 지금은 65535개 미만의 고유 버텍스를 사용하고 있으므로 uint16_t를 사용할 수 있습니다.

버텍스 데이터처럼 인덱스도 GPU가 접근할 수 있도록 VkBuffer에 업로드해야 합니다. 인덱스 버퍼를 위한 리소스를 보관할 두 개의 새 클래스 멤버를 정의하세요:

VkBuffer vertexBuffer;
VkDeviceMemory vertexBufferMemory;
VkBuffer indexBuffer;
VkDeviceMemory indexBufferMemory;

이제 추가할 createIndexBuffer 함수는 createVertexBuffer와 거의 동일합니다:

void initVulkan() {
    ...
    createVertexBuffer();
    createIndexBuffer();
    ...
}

void createIndexBuffer() {
    VkDeviceSize bufferSize = sizeof(indices[0]) * indices.size();

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
    memcpy(data, indices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, stagingBufferMemory);

    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_INDEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, indexBuffer, indexBufferMemory);

    copyBuffer(stagingBuffer, indexBuffer, bufferSize);

    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

두 가지 주목할만한 차이점이 있습니다. bufferSize는 이제 인덱스 수와 인덱스 유형의 크기, 즉 uint16_t 또는 uint32_t를 곱한 값과 같습니다. indexBuffer의 사용 용도는 VK_BUFFER_USAGE_VERTEX_BUFFER_BIT 대신 VK_BUFFER_USAGE_INDEX_BUFFER_BIT여야 합니다. 그 외에는 과정이 완전히 동일합니다. indices의 내용을 복사하기 위해 스테이징 버퍼를 생성한 다음 최종 디바이스 로컬 인덱스 버퍼로 복사합니다.

인덱스 버퍼는 프로그램이 끝날 때 버텍스 버퍼와 마찬가지로 정리되어야 합니다:

void cleanup() {
    cleanupSwapChain();

    vkDestroyBuffer(device, indexBuffer, nullptr);
    vkFreeMemory(device, indexBufferMemory, nullptr);

    vkDestroyBuffer(device, vertexBuffer, nullptr);
    vkFreeMemory(device, vertexBufferMemory, nullptr);

    ...
}

인덱스 버퍼 사용

인덱스 버퍼를 사용하여 그리기에는 recordCommandBuffer에 두 가지 변경이 필요합니다. 먼저 버텍스 버퍼와 마찬가지로 인덱스 버퍼를 바인드해야 합니다. 차이점은 인덱스 버퍼는 하나만 가질 수 있다는 것입니다. 각 버텍스 속성에 대해 다른 인덱스를 사용하는 것은 불가능하므로, 하나의 속성이라도 다르면 버텍스 데이터를 완전히 중복해야 합니다.

vkCmdBindVertexBuffers(commandBuffer, 0, 1, vertexBuffers, offsets);

vkCmdBindIndexBuffer(commandBuffer, indexBuffer, 0, VK_INDEX_TYPE_UINT16);

vkCmdBindIndexBuffer는 인덱스 버퍼, 그 안의 바이트 오프셋, 인덱스 데이터 유형을 매개변수로 사용하여 바인딩합니다. 이전에 언급했듯이 가능한 유형은 VK_INDEX_TYPE_UINT16과 VK_INDEX_TYPE_UINT32입니다.

인덱스 버퍼를 바인딩하는 것만으로는 아무 것도 변경되지 않으므로, Vulkan에 인덱스 버퍼를 사용하도록 지시하는 그리기 명령을 변경해야 합니다. vkCmdDraw 줄을 제거하고 vkCmdDrawIndexed로 대체하세요:

vkCmdDrawIndexed(commandBuffer, static_cast<uint32_t>(indices.size()), 1, 0, 0, 0);

이 함수 호출은 `vk

CmdDraw와 매우 유사합니다. 첫 두 매개변수는 인덱스 수와 인스턴스 수를 지정합니다. 인스턴싱을 사용하지 않으므로 1인스턴스를 지정하세요. 인덱스 수는 버텍스 셰이더에 전달될 버텍스 수를 나타냅니다. 다음 매개변수는 인덱스 버퍼의 오프셋을 지정하며,1`의 값을 사용하면 그래픽 카드가 두 번째 인덱스에서 읽기를 시작합니다. 두 번째에서 마지막 매개변수는 인덱스 버퍼의 인덱스에 추가할 오프셋을 지정합니다. 마지막 매개변수는 인스턴싱에 대한 오프셋을 지정하는데, 우리는 사용하지 않습니다.

이제 프로그램을 실행하면 다음과 같은 모습을 볼 수 있습니다:

이제 인덱스 버퍼를 사용하여 버텍스를 재사용함으로써 메모리를 절약하는 방법을 알게 되었습니다. 이는 복잡한 3D 모델을 로드할 예정인 미래의 장에서 특히 중요해질 것입니다.

이전 장에서는 여러 리소스를 단일 메모리 할당에서 할당해야 한다고 언급했지만, 실제로는 한 단계 더 나아가야 합니다. 드라이버 개발자들은 버텍스 버퍼와 인덱스 버퍼와 같은 여러 버퍼를 단일 VkBuffer에 저장하고 vkCmdBindVertexBuffers와 같은 명령에서 오프셋을 사용하는 것이 데이터가 더 캐시 친화적이기 때문에 장점이 있다고 권장합니다. 심지어 같은 메모리 청크를 여러 리소스에 재사용할 수도 있습니다(물론 데이터를 새로 고친다는 전제 하에), 만약 그것들이 같은 렌더 작업 중에 사용되지 않는다면. 이는 *별칭(aliasing)*이라고 알려져 있으며, 일부 Vulkan 함수는 이를 수행하려는 의도를 명시적으로 지정할 수 있는 플래그를 가지고 있습니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

유니폼 버퍼

디스크립터 세트 레이아웃 및 버퍼

소개

우리는 이제 각 버텍스에 대해 버텍스 셰이더로 임의의 속성을 전달할 수 있지만, 전역 변수는 어떨까요? 이 장부터 3D 그래픽으로 넘어가면서 모델-뷰-프로젝션 행렬이 필요하게 됩니다. 이를 버텍스 데이터로 포함시킬 수 있지만, 이는 메모리 낭비이며 변환 (transform)이 변경될 때마다 버텍스 버퍼를 업데이트해야 합니다. 변환이 매 프레임마다 쉽게 변경될 수 있습니다.

Vulkan에서 이 문제를 해결하는 올바른 방법은 리소스 디스크립터를 사용하는 것입니다. 디스크립터는 셰이더가 버퍼 및 이미지와 같은 리소스에 자유롭게 접근할 수 있는 방법입니다. 변환 행렬을 포함하는 버퍼를 설정하고 버텍스 셰이더가 디스크립터를 통해 이에 접근하도록 할 것입니다. 디스크립터의 사용은 세 부분으로 구성됩니다:

파이프라인 생성 중 디스크립터 세트 레이아웃 지정
디스크립터 풀에서 디스크립터 세트 할당
렌더링 중 디스크립터 세트 바인딩

디스크립터 세트 레이아웃은 파이프라인이 접근할 리소스 유형을 지정하며, 렌더 패스가 접근할 첨부 유형을 지정하는 것과 비슷합니다. 디스크립터 세트는 디스크립터에 바인딩될 실제 버퍼 또는 이미지 리소스를 지정하며, 프레임버퍼가 렌더 패스 첨부에 바인딩할 실제 이미지 뷰를 지정하는 것과 비슷합니다. 그런 다음 버텍스 버퍼와 프레임버퍼처럼 그리기 명령에 디스크립터 세트가 바인딩됩니다.

이번 장에서는 유니폼 버퍼 객체(UBO)와 같은 디스크립터를 다룰 것입니다. 다른 유형의 디스크립터는 추후 장에서 살펴볼 것이지만, 기본 과정은 동일합니다. 버텍스 셰이더가 가지고 있기를 원하는 데이터를 C 구조체로 다음과 같이 가지고 있다고 가정해 봅시다:

struct UniformBufferObject {
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

그런 다음 데이터를 VkBuffer에 복사하고 버텍스 셰이더에서 유니폼 버퍼 객체 디스크립터를 통해 접근할 수 있습니다:

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
}

이전 장에서 나온 사각형을 3D로 회전시켜서 매 프레임마다 모델, 뷰 및 프로젝션 행렬을 업데이트할 것입니다.

네, 계속해서 전체 문서를 번역하겠습니다. 아래는 번역본입니다:

버텍스 셰이더

위에서 지정한 대로 유니폼 버퍼 객체를 포함하도록 버텍스 셰이더를 수정하세요. MVP 변환에 익숙하다고 가정합니다. 그렇지 않다면 첫 장에서 언급된 자료를 참조하세요.

#version 450

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

layout(location = 0) in vec2 inPosition;
layout(location = 1) in vec3 inColor;

layout(location = 0) out vec3 fragColor;

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
}

uniform, in 및 out 선언의 순서는 중요하지 않습니다. binding 지시어는 속성에 대한 location 지시어와 유사합니다. 이 바인딩을 디스크립터 세트 레이아웃에서 참조할 것입니다. gl_Position 줄은 최종 위치를 클립 좌표로 계산하기 위해 변환을 사용하도록 변경되었습니다. 2D 삼각형과 달리, 클립 좌표의 마지막 구성요소가 1이 아닐 수 있으며, 최종 정규화된 디바이스 좌표로 변환될 때 나눗셈이 발생합니다. 이는 원근 분할(perspective division)로 사용되며, 더 가까운 객체가 더 멀리 있는 객체보다 크게 보이게 하는 데 필수적입니다.

디스크립터 세트 레이아웃

다음 단계는 C++ 측에서 UBO를 정의하고 버텍스 셰이더의 이 디스크립터에 대해 Vulkan에 알리는 것입니다.

struct UniformBufferObject {
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

GLM의 데이터 유형을 사용하여 셰이더의 정의와 정확히 일치하게 할 수 있습니다. 행렬의 데이터는 셰이더가 기대하는 방식과 바이너리 호환되므로, 나중에 UniformBufferObject를 VkBuffer에 memcpy할 수 있습니다.

셰이더에 사용된 모든 디스크립터 바인딩에 대한 세부 정보를 파이프라인 생성을 위해 제공해야 합니다. 이는 모든 버텍스 속성과 그 location 인덱스를 해야 했던 것과 마찬가지입니다. 이 모든 정보를 정의할 새로운 함수 createDescriptorSetLayout을 설정할 것입니다. 파이프라인 생성 전에 호출해야 합니다. 왜냐하면 이 정보가 필요하기 때문입니다.

void initVulkan() {
    ...
    createDescriptorSetLayout();
    createGraphicsPipeline();
    ...
}

...

void createDescriptorSetLayout() {

}

모든 바인딩은 VkDescriptorSetLayoutBinding 구조체를 통해 설명되어야 합니다.

void createDescriptorSetLayout() {
    VkDescriptorSetLayoutBinding uboLayoutBinding{};
    uboLayoutBinding.binding = 0;
    uboLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
    uboLayoutBinding.descriptorCount = 1;
}

첫 두 필드는 셰이더에서 사용된 binding과 디스크립터의 유형을 지정하며, 여기서는 유니폼 버퍼 객체입니다. 셰이더 변수가 유니폼 버퍼 객체의 배열을 나타낼 수 있으며, descriptorCount는 배열의 값 수를 지정합니다. 예를 들어, 골격 애니메이션에 대해 각 뼈에 대한 변환을 지정하는 데 사용할 수 있습니다. 우리의 MVP 변환은 단일 유니폼 버퍼 객체에 있으므로 descriptorCount는 1을 사용합니다.

uboLayoutBinding.stageFlags = VK_SHADER_STAGE_VERTEX_BIT;

디스크립터가 참조될 셰이더 스테이지를 지정해야 합니다. stageFlags 필드는 VkShaderStageFlagBits 값의 조합이거나 VK_SHADER_STAGE_ALL_GRAPHICS 값일 수 있습니다. 우리의 경우에는 버텍스 셰이더에서만 디스크립터를 참조합니다.

uboLayoutBinding.pImmutableSamplers =

 nullptr; // Optional

pImmutableSamplers 필드는 이미지 샘플링 관련 디스크립터에만 관련이 있으며, 나중에 살펴볼 것입니다. 기본값으로 둘 수 있습니다.

모든 디스크립터 바인딩은 하나의 VkDescriptorSetLayout 객체로 결합됩니다. pipelineLayout 위에 새 클래스 멤버를 정의하세요:

VkDescriptorSetLayout descriptorSetLayout;
VkPipelineLayout pipelineLayout;

vkCreateDescriptorSetLayout을 사용하여 생성할 수 있습니다. 이 함수는 바인딩 배열을 가진 간단한 VkDescriptorSetLayoutCreateInfo를 수띍니다:

VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = 1;
layoutInfo.pBindings = &uboLayoutBinding;

if (vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr, &descriptorSetLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create descriptor set layout!");
}

파이프라인 생성 중에 디스크립터 세트 레이아웃을 지정해야 합니다. Vulkan이 셰이더가 사용할 디스크립터를 알 수 있도록 하기 위해 파이프라인 레이아웃 객체에서 디스크립터 세트 레이아웃을 참조해야 합니다. VkPipelineLayoutCreateInfo를 수정하여 레이아웃 객체를 참조하세요:

VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 1;
pipelineLayoutInfo.pSetLayouts = &descriptorSetLayout;

여러 디스크립터 세트 레이아웃을 지정할 수 있는 이유가 궁금할 수 있습니다. 왜냐하면 하나의 디스크립터 세트 레이아웃은 이미 모든 바인딩을 포함하기 때문입니다. 다음 장에서 디스크립터 풀과 디스크립터 세트를 살펴볼 때 이에 대해 더 자세히 알아볼 것입니다.

디스크립터 세트 레이아웃은 프로그램이 종료될 때까지 새 그래픽 파이프라인을 생성할 수 있어야 하므로 유지해야 합니다:

void cleanup() {
    cleanupSwapChain();

    vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);

    ...
}

유니폼 버퍼

다음 장에서 셰이더가 이 변환 데이터에 접근할 수 있도록 VkBuffer를 유니폼 버퍼 디스크립터에 실제로 바인딩하는 디스크립터 세트를 살펴볼 것입니다. 그러나 먼저 이 버퍼를 생성해야 합니다. 매 프레임마다 새 데이터를 유니폼 버퍼에 복사할 것이므로 스테이징 버퍼를 사용하는 것은 의미가 없습니다. 이 경우 추가 작업만 필요하며 성능을 저하시킬 수 있습니다.

동시에 진행 중인 여러 프레임이 있을 수 있으므로, 이전 프레임이 여전히 읽고 있는 동안 다음 프레임을 준비하기 위해 버퍼를 업데이트하고 싶지 않기 때문에 프레임이 진행 중인 수만큼 유니폼 버퍼를 가지고 있어야 합니다. 따라서 프레임이 진행 중인 수만큼 유니폼 버퍼가 있어야 하며, GPU가 현재 읽고 있지 않은 유니폼 버퍼에 쓰기를 수행해야 합니다.

이를 위해 uniformBuffers, uniformBuffersMemory와 같은 새 클래스 멤버를 추가하세요:

VkBuffer indexBuffer;
VkDeviceMemory indexBufferMemory;

std::vector<VkBuffer> uniformBuffers;
std::vector<VkDeviceMemory> uniformBuffersMemory;
std::vector<void*> uniformBuffersMapped;

비슷하게, createIndexBuffer 후에 호출되고 버퍼를 할당하는 새 함수 createUniformBuffers를 생성하세요:

void initVulkan() {
    ...
    createVertexBuffer();
    createIndexBuffer();
    createUniformBuffers();
    ...
}

...

void createUniformBuffers() {
    VkDeviceSize bufferSize = sizeof(UniformBufferObject);

    uniformBuffers.resize(MAX_FRAMES_IN_FLIGHT);
    uniformBuffersMemory.resize(MAX_FRAMES_IN_FLIGHT);
    uniformBuffersMapped.resize(MAX_FRAMES_IN_FLIGHT);

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        createBuffer(bufferSize, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, uniformBuffers[i], uniformBuffersMemory[i]);

        vkMapMemory(device, uniformBuffersMemory[i], 0, bufferSize, 0, &uniformBuffersMapped[i]);
    }
}

생성 후 바로 vkMapMemory를 사용하여 나중에 데이터를 쓸 포인터를 얻습니다. 버퍼는 애플리케이션의 전체 수명 동안 이 포인터에 매핑되어 있습니다. 이 기법을 **"영구 매핑"**이라고 하며 모든 Vulkan 구현에서 작동합니다. 매번 업데이트할 필요 없이 매핑되지 않아 성능이 향상됩니다.

유니폼 데이터는 모든 그리기 호출에 사용되므로, 그것을 포함하는 버퍼는 렌더링을 중지할 때까지 제거되어서는 안 됩니다.

void cleanup() {
    ...

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        vkDestroyBuffer(device, uniformBuffers[i], nullptr);
        vkFreeMemory(device, uniformBuffersMemory[i], nullptr);
    }

    vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);

    ...

}

유니폼 데이터 업데이트

drawFrame 함수에서 다음 프레임을 제출하기 전에 새 함수 updateUniformBuffer를 호출하는 새 함수를 만드세요:

void drawFrame() {
    ...

    updateUniformBuffer(currentFrame);

    ...

    VkSubmitInfo submitInfo{};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

    ...
}

...

void updateUniformBuffer(uint32_t currentImage) {

}

이 함수는 매 프레임 기하학이 회전하도록 새 변환을 생성합니다. 이 기능을 구현하려면 두 개의 새 헤더를 포함해야 합니다:

#define GLM_FORCE_RADIANS
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>

#include

 <chrono>

GLM은 각도를 라디안으로 기대합니다. glm/gtc/matrix_transform.hpp는 glm::rotate 및 glm::perspective와 같은 행렬 변환 함수를 제공합니다. 이제 함수에서 회전 및 투영 변환을 계산합니다:

void updateUniformBuffer(uint32_t currentImage) {
    static auto startTime = std::chrono::high_resolution_clock::now();

    auto currentTime = std::chrono::high_resolution_clock::now();
    float time = std::chrono::duration<float, std::chrono::seconds::period>(currentTime - startTime).count();

    UniformBufferObject ubo{};
    ubo.model = glm::rotate(glm::mat4(1.0f), time * glm::radians(90.0f), glm::vec3(0.0f, 0.0f, 1.0f));
    ubo.view = glm::lookAt(glm::vec3(2.0f, 2.0f, 2.0f), glm::vec3(0.0f, 0.0f, 0.0f), glm::vec3(0.0f, 0.0f, 1.0f));
    ubo.proj = glm::perspective(glm::radians(45.0f), swapChainExtent.width / (float) swapChainExtent.height, 0.1f, 10.0f);
    ubo.proj[1][1] *= -1;

    memcpy(uniformBuffersMapped[currentImage], &ubo, sizeof(ubo));
}

startTime은 첫 프레임이 렌더링되기 전에 저장됩니다. 이는 회전 변환을 계산하기 위해 타이머로 사용됩니다. glm::rotate는 주어진 각도로 주어진 축 주위에 4x4 변환 행렬을 생성합니다. glm::lookAt 함수는 뷰 변환을 생성합니다. 이는 시점, 초점 지점 및 "위쪽" 벡터를 사용합니다. 마지막으로, glm::perspective는 주어진 수직 시야각, 종횡비 및 깊이 범위를 가진 투영 변환을 생성합니다.

Vulkan은 클립 좌표에서 Y 좌표가 아래로 확장되도록 요구합니다. 그러나 GLM은 OpenGL을 기반으로 하며, 이는 Y 좌표가 위로 확장되도록 요구합니다. ubo.proj[1][1]에 -1을 곱하여 Y 좌표를 반전시켜 이를 수정합니다.

그리기 호출이 데이터에 접근하기 전에 매 프레임마다 유니폼 버퍼의 적절한 부분을 업데이트합니다.

결론

이 장에서는 유니폼 버퍼를 생성하고 매 프레임마다 그 내용을 업데이트하는 방법을 배웠습니다. 다음 장에서는 이 버퍼를 버텍스 셰이더에 연결하기 위해 필요한 디스크립터 세트를 할당하고 설정할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

디스크립터 풀 및 세트

소개

이전 장에서 설명한 디스크립터 세트 레이아웃은 바인딩될 수 있는 디스크립터의 유형을 설명합니다. 이번 장에서는 VkBuffer 리소스를 유니폼 버퍼 디스크립터에 바인딩하기 위해 디스크립터 세트를 생성할 것입니다.

디스크립터 풀

디스크립터 세트는 직접 생성할 수 없으며, 명령 버퍼처럼 풀에서 할당해야 합니다. 디스크립터 세트에 해당하는 것은 디스크립터 풀이라고 불립니다. createDescriptorPool이라는 새 함수를 작성하여 설정합니다.

void initVulkan() {
    ...
    createUniformBuffers();
    createDescriptorPool();
    ...
}

...

void createDescriptorPool() {

}

디스크립터 세트에 포함될 디스크립터 유형과 수를 VkDescriptorPoolSize 구조체를 사용하여 설명해야 합니다.

VkDescriptorPoolSize poolSize{};
poolSize.type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
poolSize.descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

우리는 매 프레임마다 이 디스크립터 중 하나를 할당할 것입니다. 이 풀 크기 구조는 메인 VkDescriptorPoolCreateInfo에서 참조됩니다:

VkDescriptorPoolCreateInfo poolInfo{};
poolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
poolInfo.poolSizeCount = 1;
poolInfo.pPoolSizes = &poolSize;

개별 디스크립터뿐만 아니라 할당 가능한 최대 디스크립터 세트 수도 지정해야 합니다:

poolInfo.maxSets = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

구조에는 명령 풀과 유사한 선택적 플래그가 있으며, 개별 디스크립터 세트를 자유롭게 해제할 수 있는지 여부를 결정합니다: VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT. 디스크립터 세트를 생성한 후에는 수정하지 않을 예정이므로 이 플래그가 필요하지 않습니다. flags를 기본값 0으로 둘 수 있습니다.

VkDescriptorPool descriptorPool;

...

if (vkCreateDescriptorPool(device, &poolInfo, nullptr, &descriptorPool) != VK_SUCCESS) {
    throw std::runtime_error("failed to create descriptor pool!");
}

디스크립터 풀 핸들을 저장할 새 클래스 멤버를 추가하고 vkCreateDescriptorPool을 호출하여 생성합니다.

디스크립터 세트

이제 디스크립터 세트 자체를 할당할 수 있습니다. 이를 위해 createDescriptorSets 함수를 추가합니다:

void initVulkan() {
    ...
    createDescriptorPool();
    createDescriptorSets();
    ...
}

...

void createDescriptorSets() {

}

디스크립터 세트 할당은 VkDescriptorSetAllocateInfo 구조체로 설명됩니다. 할당할 디스크립터 풀, 할당할 디스크립터 세트 수 및 기준이 될 디스크립터 세트 레이아웃을 지정해야 합니다:

std::vector<VkDescriptorSetLayout> layouts(MAX_FRAMES_IN_FLIGHT, descriptorSetLayout);
VkDescriptorSetAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
allocInfo.descriptorPool = descriptorPool;
allocInfo.descriptorSetCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);
allocInfo.pSetLayouts = layouts.data();

우리의 경우, 동일한 레이아웃을 가진 각 프레임을 위해 하나의 디스크립터 세트를 생성할 것입니다. 불행히도 같은 레이아웃의 복사본이 필요하기 때문에 다음 함수는 세트 수에 맞는 배열을 기대합니다.

디스크립터 세트 핸들을 저장할 클래스 멤버를 추가하고 vkAllocateDescriptorSets를 사용하여 할당합니다:

VkDescriptorPool descriptorPool;
std::vector<VkDescriptorSet> descriptorSets;

...

descriptorSets.resize(MAX_FRAMES_IN_FLIGHT);
if (vkAllocateDescriptorSets(device, &allocInfo, descriptorSets.data()) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate descriptor sets!");
}

디스크립터 세트를 명시적으로 정리할 필요는 없습니다. 왜냐하면 디스크립터 풀이 파괴될 때 자동으로 해제되기 때문입니다. vkAllocateDescriptorSets 호출은 유니폼 버퍼 디스크립터를 가진 디스크립터 세트를 할당할 것입니다.

void cleanup() {
    ...
    vkDestroyDescriptorPool(device, descriptorPool, nullptr);

    vkDestroyDescriptorSetLayout(device, descriptor

SetLayout, nullptr);
    ...
}

이제 디스크립터 세트가 할당되었지만, 디스크립터 내부는 아직 구성되지 않았습니다. 모든 디스크립터를 채우기 위해 루프를 추가합니다:

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {

}

버퍼를 참조하는 디스크립터인 우리의 유니폼 버퍼 디스크립터는 VkDescriptorBufferInfo 구조체로 구성됩니다. 이 구조체는 디스크립터에 대한 데이터를 포함하는 버퍼와 그 영역을 명시합니다.

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    VkDescriptorBufferInfo bufferInfo{};
    bufferInfo.buffer = uniformBuffers[i];
    bufferInfo.offset = 0;
    bufferInfo.range = sizeof(UniformBufferObject);
}

전체 버퍼를 덮어쓰는 경우, range로 VK_WHOLE_SIZE 값을 사용할 수도 있습니다. 디스크립터의 구성은 VkWriteDescriptorSet 구조체의 배열을 매개변수로 취하는 vkUpdateDescriptorSets 함수를 사용하여 업데이트됩니다.

VkWriteDescriptorSet descriptorWrite{};
descriptorWrite.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrite.dstSet = descriptorSets[i];
descriptorWrite.dstBinding = 0;
descriptorWrite.dstArrayElement = 0;

첫 두 필드는 업데이트할 디스크립터 세트와 바인딩을 명시합니다. 우리는 유니폼 버퍼 바인딩 인덱스 0을 사용했습니다. 디스크립터는 배열일 수 있으므로, 업데이트할 배열의 첫 인덱스도 지정해야 합니다. 우리는 배열을 사용하지 않으므로 인덱스는 간단히 0입니다.

descriptorWrite.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrite.descriptorCount = 1;

다시 한번 디스크립터 유형을 명시해야 합니다. dstArrayElement에서 시작하는 배열에서 여러 디스크립터를 한 번에 업데이트할 수 있습니다. descriptorCount 필드는 업데이트하려는 배열 요소 수를 지정합니다.

descriptorWrite.pBufferInfo = &bufferInfo;
descriptorWrite.pImageInfo = nullptr; // Optional
descriptorWrite.pTexelBufferView = nullptr; // Optional

마지막 필드는 실제로 디스크립터를 구성하는 descriptorCount 구조체의 배열을 참조합니다. 디스크립터의 유형에 따라 세 가지 중 하나를 사용해야 합니다. pBufferInfo 필드는 버퍼 데이터를 참조하는 디스크립터에 사용됩니다, pImageInfo는 이미지 데이터를 참조하는 디스크립터에 사용되며, pTexelBufferView는 버퍼 뷰를 참조하는 디스크립터에 사용됩니다. 우리의 디스크립터는 버퍼를 기반으로 하므로 pBufferInfo를 사용합니다.

vkUpdateDescriptorSets(device, 1, &descriptorWrite, 0, nullptr);

업데이트는 vkUpdateDescriptorSets를 사용하여 적용됩니다. 이 함수는 매개변수로 VkWriteDescriptorSet과 VkCopyDescriptorSet의 두 가지 종류의 배열을 받습니다. 후자는 이름에서 알 수 있듯이 디스크립터를 서로 복사하는 데 사용됩니다.

디스크립터 세트 사용

이제 recordCommandBuffer 함수를 업데이트하여 각 프레임에 대해 적절한 디스크립터 세트를 셰이더의 디스크립터에 vkCmdBindDescriptorSets를 사용하여 실제로 바인딩해야 합니다. 이는 vkCmdDrawIndexed 호출 전에 수행되어야 합니다:

vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, 0, 1, &descriptorSets[currentFrame], 0, nullptr);
vkCmdDrawIndexed(commandBuffer, static_cast<uint32_t>(indices.size()), 1, 0, 0, 0);

버텍스 및 인덱스 버퍼와 달리, 디스크립터 세트는 그래픽 파이프라인에 고유하지 않습니다. 따라서 그래픽 또는 컴퓨트 파이프라인에 디스크립터 세트를 바인딩할 것인지 명시해야 합니다. 다음 매개변수는 디스크립터가 기반으로 하는 레이아웃입니다. 다음 세 매개변수는 첫 번째 디스크립터 세트의 인덱스, 바인딩할 세트 수 및 바인딩할 세트 배열을 지정합니다. 마지막 두 매개변수는 동적 디스크립터를 위한 오프셋 배열을 지정합니다. 이에 대해서는 향후 장에서 살펴볼 것입니다.

프로그램을 실행하면 아쉽게도 아무 것도 보이지 않는 것을 알 수 있습니다. 문제는 프로젝션 매트릭스에서 Y-플립을 수행했기 때문에 버텍스들이 시계 반대 방향으로 그려지고 있으며, 이로 인해 배면 제거가 발생하여 기하학적 도형이 그려지지 않기 때문입니다. createGraphicsPipeline 함수로 가서 VkPipelineRasterizationStateCreateInfo에서 frontFace를 수정하여 이를 바로잡으세요:

rasterizer.cullMode = VK_CULL_MODE_BACK_BIT;
rasterizer.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE;

프로그램을 다시 실행하면 다음과 같은 결과를 볼 수 있습니다:

사각형이 정사각형으로 변했는데, 이는 프로젝션 매트릭스가 이제 종횡비를 고려하기 때문입니다. updateUniformBuffer는 화면 크기 조정을 처리하므로, recreateSwapChain에서 디스크립터 세트를 다시 만들 필요가 없습니다.

정렬 요구 사항

지금까지 C++ 구조체의 데이터가 셰이더의 유니폼 정의와 어떻게 일치해야 하는지에 대해 자세히 설명하지 않았습니다. 당연히 둘 다 동일한 유형을 사용하면 될 것 같습니다:

struct UniformBufferObject {
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

하지만 그게 전부는 아닙니다. 예를 들어, 구조체와 셰이더를 다음과 같이 수정해 보세요:

struct UniformBufferObject {
    glm::vec2 foo;
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

layout(binding = 0) uniform UniformBufferObject {
    vec2 foo;
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

셰이더와 프로그램을 다시 컴파일하고 실행하면, 지금까지 작업했던 다채

로운 사각형이 사라진 것을 발견할 수 있습니다! 그 이유는 정렬 요구 사항을 고려하지 않았기 때문입니다.

Vulkan은 구조체의 데이터가 메모리에 특정 방식으로 정렬되어 있기를 요구합니다. 예를 들어:

스칼라는 N (= 32비트 float의 경우 4바이트)으로 정렬되어야 합니다.
vec2는 2N (= 8바이트)으로 정렬되어야 합니다.
vec3 또는 vec4는 4N (= 16바이트)으로 정렬되어야 합니다.
중첩 구조체는 멤버의 기본 정렬을 16의 배수로 올림해야 합니다.
mat4 행렬은 vec4와 같은 정렬을 가져야 합니다.

이 정렬 요구 사항은 명세에서 전체 목록을 찾을 수 있습니다.

원래 셰이더에 세 개의 mat4 필드만 있었기 때문에 정렬 요구 사항을 충족했습니다. 각 mat4는 4 x 4 x 4 = 64바이트 크기이므로, model의 오프셋은 0, view의 오프셋은 64, proj의 오프셋은 128입니다. 모두 16의 배수이므로 잘 작동했습니다.

새 구조체는 크기가 8바이트인 vec2로 시작하므로 모든 오프셋을 변경합니다. 이제 model의 오프셋은 8, view의 오프셋은 72, proj의 오프셋은 136이며, 이는 16의 배수가 아닙니다. 이 문제를 해결하려면 C++11에서 도입된 alignas 지정자를 사용할 수 있습니다:

struct UniformBufferObject {
    glm::vec2 foo;
    alignas(16) glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

이제 프로그램을 다시 컴파일하고 실행하면 셰이더가 행렬 값을 다시 올바르게 받는 것을 확인할 수 있습니다.

다행히도 대부분의 경우 이러한 정렬 요구 사항을 고려하지 않아도 됩니다. GLM을 포함하기 전에 GLM_FORCE_DEFAULT_ALIGNED_GENTYPES를 정의함으로써 GLM을 사용하여 vec2와 mat4의 정렬 요구 사항을 이미 지정한 버전을 사용할 수 있습니다:

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEFAULT_ALIGNED_GENTYPES
#include <glm/glm.hpp>

이 정의를 추가하면 alignas 지정자를 제거할 수 있고 프로그램은 여전히 잘 작동해야 합니다.

그러나 중첩 구조체를 사용하기 시작하면 이 방법이 실패할 수 있습니다. 다음과 같은 C++ 코드에서 정의된 것을 고려하십시오:

struct Foo {
    glm::vec2 v;
};

struct UniformBufferObject {
    Foo f1;
    Foo f2;
};

그리고 다음과 같은 셰이더 정의:

struct Foo {
    vec2 v;
};

layout(binding = 0) uniform UniformBufferObject {
    Foo f1;
    Foo f2;
} ubo;

이 경우 f2는 오프셋 8에 있어야 하지만 중첩 구조체이므로 오프셋 16에 있어야 합니다. 이 경우 정렬을 직접 지정해야 합니다:

struct UniformBufferObject {
    Foo f1;
    alignas(16) Foo f2;
};

이런 식의 문제를 피하기 위해 항상 정렬을 명시하는 것이 좋습니다. 그렇게 하면 정렬 오류의 이상한 증상에 의해 놀라지 않게 됩니다.

struct UniformBufferObject {
    alignas(16) glm::mat4 model;
    alignas(16) glm::mat4 view;
    alignas(16) glm::mat4 proj;
};

foo 필드를 제거한 후 셰이더를 다시 컴파일하는 것을 잊지 마세요.

다중 디스크립터 세트

일부 구조와 함수 호출에서 암시된 것처럼, 실제로 동시에 여러 디스크립터 세트를 바인딩할 수 있습니다. 파이프라인 레이아웃을 생성할 때 각 디스크립터 세트에 대한 디스크립터 세트 레이아웃을 지정해야 합니다. 셰이더는 다음과 같이 특정 디스크립터 세트를 참조할 수 있습니다:

layout(set = 0, binding = 0) uniform UniformBufferObject { ... }

이 기능을 사용하여 객체별로 다르고 공유되는 디스크립터를 별도의 디스크립터 세트에 넣을 수 있습니다. 이 경우 대부분의 디스크립터를 그리기 호출 간에 다시 바인딩할 필요가 없으므로 효율적일 수 있습니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

텍스처 매핑

이미지

소개

지금까지는 정점당 색상을 사용하여 기하학적 도형을 채색했지만, 이는 제한적인 접근 방식입니다. 이번 튜토리얼 파트에서는 텍스처 매핑을 구현하여 기하학적 도형을 더 흥미롭게 만들어 볼 것입니다. 이는 향후 3D 모델을 로드하고 그리는 데도 도움이 될 것입니다.

애플리케이션에 텍스처를 추가하는 작업은 다음과 같은 단계를 포함합니다:

장치 메모리로 백업된 이미지 객체 생성
이미지 파일에서 픽셀로 채우기
이미지 샘플러 생성
텍스처에서 색상을 샘플링하기 위해 결합된 이미지 샘플러 디스크립터 추가

이전에는 스왑 체인 확장을 통해 자동으로 생성된 이미지 객체를 사용했지만, 이번에는 직접 하나를 생성해야 합니다. 이미지 생성과 데이터로 채우는 과정은 버텍스 버퍼 생성과 유사합니다. 스테이징 리소스를 생성하고 픽셀 데이터로 채운 다음 렌더링에 사용할 최종 이미지 객체로 이 데이터를 복사할 것입니다. 스테이징 이미지를 생성하는 것도 가능하지만, Vulkan은 VkBuffer에서 이미지로 픽셀을 복사할 수 있게 하며, 일부 하드웨어에서는 이 API가 더 빠릅니다. 먼저 이 버퍼를 생성하고 픽셀 값으로 채운 다음, 픽셀을 복사할 이미지를 생성할 것입니다. 버퍼 생성과 마찬가지로, 메모리 요구 사항을 조회하고 장치 메모리를 할당하고 바인딩하는 과정을 포함합니다.

하지만 이미지 작업 시 추가적으로 고려해야 할 사항이 있습니다. 이미지는 메모리에서 픽셀이 어떻게 조직되어 있는지에 영향을 주는 다양한 레이아웃을 가질 수 있습니다. 예를 들어, 단순히 픽셀을 행별로 저장하는 것이 최선의 성능을 제공하지 않을 수 있습니다. 이미지에 대한 어떠한 작업을 수행할 때, 그 작업에 최적화된 레이아웃을 가지고 있는지 확인해야 합니다. 실제로 렌더 패스를 지정할 때 몇몇 레이아웃을 이미 보았습니다:

VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: 프레젠테이션에 최적화됨
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL: 프래그먼트 셰이더에서 색상을 쓰는 데 최적화된 첨부 파일로 사용됨
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL: vkCmdCopyImageToBuffer와 같은 전송 작업의 소스로 사용될 때 최적화됨
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL: vkCmdCopyBufferToImage와 같은 전송 작업의 목적지로 사용될 때 최적화됨
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL: 셰이더에서 샘플링할 때 최적화됨

이미지의 레이아웃을 전환하는 가장 일반적인 방법 중 하나는 파이프라인 배리어를 사용하는 것입니다. 파이프라인 배리어는 주로 리소스에 대한 접근을 동기화하는 데 사용되지만, 레이아웃 전환에도 사용할 수 있습니다. VK_SHARING_MODE_EXCLUSIVE를 사용할 때 큐 패밀리 소유권을 전송하는 데에도 사용할 수 있습니다.

이미지 라이브러리

이미지를 로드하기 위해 사용할 수 있는 많은 라이브러리가 있으며, BMP나 PPM과 같은 간단한 형식의 이미지를 로드하기 위한 코드를 직접 작성할 수도 있습니다. 이 튜토리얼에서는 stb 컬렉션의 stb_image 라이브러리를 사용할 것입니다. 이 라이브러리의 장점은 모든 코드가 단일 파일에 포함되어 있어 복잡한 빌드 구성이 필요하지 않다는 것입니다. stb_image.h를 다운로드하여 GLFW와 GLM을 저장한 디렉토리와 같은 편리한 위치에 저장하세요. 인클루드 경로에 위치를 추가합니다.

Visual Studio

stb_image.h가 있는 디렉토리를 Additional Include Directories 경로에 추가합니다.

Makefile

GCC의 인클루드 디렉토리에 stb_image.h가 있는 디렉토리를 추가합니다:

VULKAN_SDK_PATH = /home/user/VulkanSDK/x.x.x.x/x86_64
STB_INCLUDE_PATH = /home/user/libraries/stb

...

CFLAGS = -std=c++17 -I$(VULKAN_SDK_PATH)/include -I$(STB_INCLUDE_PATH)

이미지 로딩

이미지 라이브러리를 다음과 같이 포함합니다:

#define STB_IMAGE_IMPLEMENTATION
#include <stb_image.h>

기본적으로 헤더는 함수의 프로토타입만 정의합니다. 함수 본문을 포함하려면 한 코드 파일에서 STB_IMAGE_IMPLEMENTATION 정의와 함께 헤더를 포함해야 하며, 그렇지 않으면 링크 오류가 발생합니다.

void initVulkan() {
    ...
    createCommandPool();
    createTextureImage();
    createVertexBuffer();
    ...
}

...

void createTextureImage() {

}

createCommandPool 후에 호출되어야 하므로 새로운 함수 createTextureImage를 만들어 이미지를 로드하고 Vulkan 이미지 객체로 업로드합니다.

shaders 디렉토리 옆에 텍스처 이미지를 저장할 textures 새 디

렉토리를 만듭니다. 그 디렉토리에서 texture.jpg라는 이미지를 로드할 것입니다. 512 x 512 픽셀로 크기를 조절한 다음 CC0 라이선스 이미지를 사용했지만, 원하는 이미지를 자유롭게 선택할 수 있습니다. 라이브러리는 JPEG, PNG, BMP, GIF와 같은 대부분의 일반 이미지 파일 형식을 지원합니다.

이 라이브러리를 사용하여 이미지를 로드하는 것은 매우 쉽습니다:

void createTextureImage() {
    int texWidth, texHeight, texChannels;
    stbi_uc* pixels = stbi_load("textures/texture.jpg", &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);
    VkDeviceSize imageSize = texWidth * texHeight * 4;

    if (!pixels) {
        throw std::runtime_error("failed to load texture image!");
    }
}

stbi_load 함수는 파일 경로와 로드할 채널 수를 인수로 받습니다. STBI_rgb_alpha 값은 이미지가 알파 채널을 갖고 있지 않더라도 알파 채널을 갖도록 강제합니다. 이는 일관성을 위해 좋습니다. 중간 세 매개변수는 이미지의 너비, 높이 및 실제 채널 수를 출력합니다. 반환되는 포인터는 픽셀 값 배열의 첫 번째 요소입니다. 픽셀은 STBI_rgb_alpha의 경우 픽셀당 4바이트를 사용하여 행별로 배치되어 총 texWidth * texHeight * 4 값이 됩니다.

스테이징 버퍼

이제 호스트 가시 메모리에서 버퍼를 생성하여 vkMapMemory를 사용하고 그 안에 픽셀을 복사할 수 있습니다. createTextureImage 함수에 이 임시 버퍼에 대한 변수를 추가하세요:

VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;

버퍼는 호스트 가시 메모리에 있어야 하므로 매핑할 수 있으며, 나중에 이미지로 복사할 수 있도록 전송 소스로 사용 가능해야 합니다:

createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

그런 다음 이미지 로딩 라이브러리에서 얻은 픽셀 값을 버퍼에 직접 복사할 수 있습니다:

void* data;
vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
    memcpy(data, pixels, static_cast<size_t>(imageSize));
vkUnmapMemory(device, stagingBufferMemory);

원본 픽셀 배열을 정리하는 것을 잊지 마세요:

stbi_image_free(pixels);

텍스처 이미지

셰이더에서 픽셀 값을 버퍼로 설정하여 액세스할 수 있지만, Vulkan에서는 이 목적을 위해 이미지 객체를 사용하는 것이 더 낫습니다. 이미지 객체는 2D 좌표를 사용할 수 있게 하여 색상을 검색하는 것을 더 쉽고 빠르게 할 수 있습니다. 이 지점부터는 픽셀이 아닌 텍셀이라는 이름을 사용할 것입니다. 다음과 같은 새 클래스 멤버를 추가하세요:

VkImage textureImage;
VkDeviceMemory textureImageMemory;

이미지의 매개변수는 VkImageCreateInfo 구조체에서 지정됩니다:

VkImageCreateInfo imageInfo{};
imageInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
imageInfo.imageType = VK_IMAGE_TYPE_2D;
imageInfo.extent.width = static_cast<uint32_t>(texWidth);
imageInfo.extent.height = static_cast<uint32_t>(texHeight);
imageInfo.extent.depth = 1;
imageInfo.mipLevels = 1;
imageInfo.arrayLayers = 1;

imageType 필드에서 지정된 이미지 유형은 텍셀이 이미지에서 어떻게 주소 지정될지를 알려줍니다. 1D, 2D, 3D 이미지를 만들 수 있습니다. 일차원 이미지는 데이터 배열이나 그라디언트를 저장하는 데 사용할 수 있고, 이차원 이미지는 주로 텍스처에 사용되며, 삼차원 이미지는 예를 들어 복셀 볼륨을 저장하는 데 사용할 수 있습니다. extent 필드는 이미지의 치수를 지정합니다. 즉, 각 축에 몇 개의 텍셀이 있는지를 나타냅니다. 그래서 depth는 0이 아니라 1이어야 합니다. 우리의 텍스처는 배열이 아니며 지금은 미핑을 사용하지 않을 것입니다.

imageInfo.format = VK_FORMAT_R8G8B8A8_SRGB;

Vulkan은 많은 가능한 이미지 형식을 지원하지만, 버퍼의 픽셀과 동일한 형식을 사용해야 합니다. 그렇지 않으면 복사 작업이 실패합니다.

imageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;

tiling 필드는 두 가지 값 중 하나를 가질 수 있습니다:

VK_IMAGE_TILING_LINEAR: 픽셀 배열처럼 텍셀이 행별로 늘어섭니다
VK_IMAGE_TILING_OPTIMAL: 텍셀이 구현에 따라 최적의 접근을 위해 정의된 순서로 늘어섭니다

이미지의 레이아웃은 나중에 변경할 수 없습니다. 메모리의 텍셀에 직접 접근하려면 VK_IMAGE_TILING_LINEAR를 사용해야 합니다. 우리는 스테이징 버퍼 대신 스테이징 이미지를 사용할 것이므로 이것이 필요하지 않습니다. 셰이더에서 효율적으로 접근하기 위해 VK_IMAGE_TILING_OPTIMAL을 사용할 것입니다.

imageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;

이미지의 initialLayout에는 두 가지 가능한 값이 있습니다:

VK_IMAGE_LAYOUT_UNDEFINED: GPU에서 사용할 수 없으며 첫 번째 전환은 텍셀을 폐기

합니다.

VK_IMAGE_LAYOUT_PREINITIALIZED: GPU에서 사용할 수 없지만 첫 번째 전환은 텍셀을 보존합니다.

텍셀을 보존할 필요가 있는 몇 가지 상황이 있습니다. 하나의 예는 VK_IMAGE_TILING_LINEAR 레이아웃과 함께 이미지를 스테이징 이미지로 사용하려는 경우일 수 있습니다. 그 경우 텍셀 데이터를 업로드한 다음 전송 소스로 전환하면서 데이터를 유지하고 싶을 것입니다. 그러나 우리의 경우는 먼저 이미지를 전송 목적지로 전환한 다음 버퍼 객체에서 텍셀 데이터를 복사할 것이므로 이 속성이 필요하지 않고 안전하게 VK_IMAGE_LAYOUT_UNDEFINED을 사용할 수 있습니다.

imageInfo.usage = VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT;

usage 필드는 버퍼 생성 때와 같은 의미를 갖습니다. 이미지는 버퍼 복사의 목적지로 사용될 것이므로 전송 목적지로 설정해야 합니다. 또한 셰이더에서 메시를 색칠하기 위해 이미지에서 색상을 샘플링하려면 사용 용도에 VK_IMAGE_USAGE_SAMPLED_BIT를 포함해야 합니다.

imageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

이미지는 하나의 큐 패밀리만 사용할 것입니다: 그래픽(따라서 전송 작업도 포함) 작업을 지원하는 큐 패밀리.

imageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
imageInfo.flags = 0; // Optional

samples 플래그는 멀티샘플링과 관련이 있습니다. 이는 이미지가 첨부 파일로 사용될 때만 관련이 있으므로 한 샘플로 고정하세요. 이미지에는 희소 이미지와 관련된 몇 가지 선택적 플래그가 있습니다. 희소 이미지는 실제로 메모리가 백업되는 특정 영역만 있는 이미지입니다. 예를 들어, 복셀 지형을 사용하는 경우 "공기" 값을 저장하기 위해 메모리를 할당하는 것을 피하기 위해 이를 사용할 수 있습니다. 이 튜토리얼에서는 사용하지 않을 것이므로 기본값 0을 그대로 사용하세요.

if (vkCreateImage(device, &imageInfo, nullptr, &textureImage) != VK_SUCCESS) {
    throw std::runtime_error("failed to create image!");
}

이미지는 vkCreateImage를 사용하여 생성됩니다. 특별히 주목할 만한 매개변수는 없습니다. VK_FORMAT_R8G8B8A8_SRGB 형식이 그래픽 하드웨어에서 지원되지 않을 수 있습니다. 받아들일 수 있는 대안 목록을 가지고 있어야 하며 지원되는 최선의 형식을 선택해야 합니다. 그러나 이 특정 형식에 대한 지원은 매우 널리 퍼져 있어 이 단계를 생략할 것입니다. 다른 형식을 사용하면 성가신 변환도 필요합니다. 깊이 버퍼 장에서 이러한 시스템을 구현할 때 이에 대해 다시 살펴볼 것입니다.

VkMemoryRequirements memRequirements;
vkGetImageMemoryRequirements(device, textureImage, &memRequirements);

VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);

if (vkAllocateMemory(device, &allocInfo, nullptr, &textureImageMemory) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate image memory!");
}

vkBindImageMemory(device, textureImage, textureImageMemory, 0);

이미지에 메모리를 할당하는 것은 버퍼에 메모리를 할당하는 것과 정확히 같은 방식으로 작동합니다. vkGetBufferMemoryRequirements 대신 vkGetImageMemoryRequirements를 사용하고, vkBindBufferMemory 대신 vkBindImageMemory를 사용하세요.

이 함수는 이미 상당히 크고 나중에 더 많은 이미지를 생성할 것이므로 버퍼와 같이 이미지 생성을 createImage 함수로 추상화하는 것이 좋습니다. 함수를 생성하고 이미지 객체 생성 및 메모리 할당을 이동하세요:

void createImage(uint32_t width, uint32_t height, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, VkImage& image, VkDeviceMemory& imageMemory) {
    VkImageCreateInfo imageInfo{};
    imageInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
    imageInfo.imageType = VK_IMAGE_TYPE_2D;
    imageInfo.extent.width = width;
    imageInfo.extent.height = height;
    imageInfo.extent.depth = 1;
    imageInfo.mipLevels = 1;
    imageInfo.arrayLayers = 1;
    imageInfo.format = format;
    imageInfo.tiling = tiling;
    imageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    imageInfo.usage = usage;
    imageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
    imageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

    if (vkCreateImage(device, &imageInfo, nullptr, &image) != VK_SUCCESS) {
        throw std::runtime_error("failed to create image!");
    }

    VkMemoryRequirements memRequirements;
    vkGetImageMemoryRequirements(device, image, &memRequirements);

    VkMemoryAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
    allocInfo.allocationSize = memRequirements.size;
    allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, properties);

    if (vkAllocateMemory(device, &allocInfo, nullptr, &imageMemory) != VK_SUCCESS) {
        throw std::runtime_error("failed to allocate image memory!");
    }

    vkBindImageMemory(device, image, imageMemory, 0);
}

너비, 높이, 형식, 타일링 모드, 사용 용도 및 메모리 속성을 매개변수로 사용했습니다. 이는 이 튜토리얼을 통해 생성할 이미지가 다양할 것이기 때문입니다.

createTextureImage 함수는 이제 간소화할 수 있습니다:

void createTextureImage() {
    int texWidth, texHeight, texChannels;
    stbi_uc* pixels = stbi_load("textures/texture.jpg", &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);
    VkDeviceSize imageSize = texWidth * texHeight * 4;

    if (!pixels) {
        throw std::runtime_error("failed to load texture image!");
    }

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
        memcpy(data, pixels, static_cast<size_t>(imageSize));
    vkUnmapMemory(device, stagingBufferMemory);

    stbi_image_free(pixels);

    createImage(texWidth, texHeight, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
}

레이아웃 전환

이제 명령 버퍼를 다시 기록하고 실행하는 작업을 수행할 것이므로, 이 로직을 하나 또는 두 개의 도우미 함수로 이동할 좋은 시기입니다:

VkCommandBuffer beginSingleTimeCommands() {
    VkCommandBufferAllocateInfo allocInfo{};


    allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
    allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    allocInfo.commandPool = commandPool;
    allocInfo.commandBufferCount = 1;

    VkCommandBuffer commandBuffer;
    vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer);

    VkCommandBufferBeginInfo beginInfo{};
    beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
    beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;

    vkBeginCommandBuffer(commandBuffer, &beginInfo);

    return commandBuffer;
}

void endSingleTimeCommands(VkCommandBuffer commandBuffer) {
    vkEndCommandBuffer(commandBuffer);

    VkSubmitInfo submitInfo{};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &commandBuffer;

    vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
    vkQueueWaitIdle(graphicsQueue);

    vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);
}

copyBuffer 함수에서 기존 코드를 기반으로 함수를 작성했습니다. 이제 이 함수를 간소화할 수 있습니다:

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    VkBufferCopy copyRegion{};
    copyRegion.size = size;
    vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);

    endSingleTimeCommands(commandBuffer);
}

버퍼를 사용하고 있다면, 이제 vkCmdCopyBufferToImage를 기록하고 실행하여 작업을 완료할 수 있는 함수를 작성할 수 있습니다. 하지만 이 명령은 먼저 이미지가 올바른 레이아웃에 있어야 합니다. 레이아웃 전환을 처리하는 새 함수를 작성하세요:

void transitionImageLayout(VkImage image, VkFormat format, VkImageLayout oldLayout, VkImageLayout newLayout) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    endSingleTimeCommands(commandBuffer);
}

레이아웃 전환을 수행하는 가장 일반적인 방법 중 하나는 이미지 메모리 배리어를 사용하는 것입니다. 파이프라인 배리어는 주로 리소스에 대한 접근을 동기화하는 데 사용됩니다. 예를 들어, 읽기 전에 버퍼에 대한 쓰기가 완료되었는지 확인하는 것과 같습니다. 하지만 이미지 레이아웃을 전환하고 VK_SHARING_MODE_EXCLUSIVE를 사용할 때 큐 패밀리 소유권을 전송하는 데에도 사용할 수 있습니다. 버퍼에 대해 동일한 작업을 수행하는 버퍼 메모리 배리어도 있습니다.

VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = oldLayout;
barrier.newLayout = newLayout;

첫 두 필드는 레이아웃 전환을 지정합니다. 이미지의 기존 내용을 신경 쓰지 않는 경우 oldLayout로 VK_IMAGE_LAYOUT_UNDEFINED를 사용할 수 있습니다.

barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;

배리어를 사용하여 큐 패밀리 소유권을 전송하는 경우, 이 두 필드는 큐 패밀리의 인덱스여야 합니다. 이 작업을 수행하지 않으려면 VK_QUEUE_FAMILY_IGNORED로 설정해야 합니다(기본값이 아닙니다).

barrier.image = image;
barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
barrier.subresourceRange.baseMipLevel = 0;
barrier.subresourceRange.levelCount = 1;
barrier.subresourceRange.baseArrayLayer = 0;
barrier.subresourceRange.layerCount = 1;

image와 subresourceRange는 영향을 받는 이미지와 이미지의 특정 부분을 지정합니다. 우리의 이미지는 배열이 아니며 미핑 레벨이 없으므로 하나의 레벨과 레이어만 지정됩니다.

barrier.srcAccessMask

 = 0; // TODO
barrier.dstAccessMask = 0; // TODO

배리어는 주로 동기화 목적으로 사용되므로, 배리어 전에 리소스와 관련된 작업 유형과 배리어가 완료된 후 리소스와 관련된 작업 유형을 지정해야 합니다. vkQueueWaitIdle을 수동으로 사용하여 동기화하더라도 이 작업을 수행해야 합니다. 올바른 값은 이전 및 새 레이아웃에 따라 다릅니다. 어떤 전환을 사용할지 결정되면 이 부분으로 돌아와서 작업하겠습니다.

vkCmdPipelineBarrier(
    commandBuffer,
    0 /* TODO */, 0 /* TODO */,
    0,
    0, nullptr,
    0, nullptr,
    1, &barrier
);

모든 유형의 파이프라인 배리어는 동일한 함수를 사용하여 제출됩니다. 명령 버퍼 다음의 첫 번째 매개변수는 배리어 전에 수행되는 작업이 발생하는 파이프라인 단계를 지정합니다. 두 번째 매개변수는 배리어에서 작업이 대기하는 파이프라인 단계를 지정합니다. 배리어 전후에 지정할 수 있는 파이프라인 단계는 리소스를 사용하는 방법에 따라 다릅니다. 명세에서 허용되는 값 목록을 확인할 수 있습니다. 예를 들어, 배리어 후에 유니폼을 읽으려면 VK_ACCESS_UNIFORM_READ_BIT 사용을 지정하고 유니폼을 읽을 가장 이른 셰이더 단계를 파이프라인 단계로 지정해야 합니다. 예를 들어 VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT과 같습니다. 사용 유형에 맞지 않는 파이프라인 단계를 지정하면 유효성 검사 계층에서 경고합니다.

세 번째 매개변수는 0 또는 VK_DEPENDENCY_BY_REGION_BIT입니다. 후자는 배리어를 지역별 조건으로 설정합니다. 즉, 구현이 지금까지 작성된 리소스 부분에서 이미 읽기를 시작할 수 있도록 허용됩니다.

마지막 세 쌍의 매개변수는 파이프라인 배리어의 세 가지 유형을 참조하는 배열을 참조합니다: 메모리 배리어, 버퍼 메모리 배리어 및 여기서 사용하는 이미지 메모리 배리어입니다. 아직 VkFormat 매개변수를 사용하지 않았지만, 깊이 버퍼 장에서 특별한 전환을 위해 사용할 것입니다.

버퍼에서 이미지로 복사

createTextureImage로 돌아가기 전에 또 다른 도우미 함수를 작성할 것입니다: copyBufferToImage:

void copyBufferToImage(VkBuffer buffer, VkImage image, uint32_t width, uint32_t height) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    endSingleTimeCommands(commandBuffer);
}

버퍼의 어느 부분이 이미지의 어느 부분으로 복사될지는 VkBufferImageCopy 구조체를 사용하여 지정해야 합니다:

VkBufferImageCopy region{};
region.bufferOffset = 0;
region.bufferRowLength = 0;
region.bufferImageHeight = 0;

region.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
region.imageSubresource.mipLevel = 0;
region.imageSubresource.baseArrayLayer = 0;
region.imageSubresource.layerCount = 1;

region.imageOffset = {0, 0, 0};
region.imageExtent = {
    width,
    height,
    1
};

대부분의 필드는 자명합니다. bufferOffset은 버퍼에서 픽셀 값이 시작되는 바이트 오프셋을 지정합니다. bufferRowLength와 bufferImageHeight 필드는 메모리에 픽셀이 어떻게 배치되어 있는지를 지정합니다. 예를 들어, 이미지의 행 사이에 패딩 바이트가 있을 수 있습니다. 0을 지정하면 픽셀이 우리의 경우처럼 단순히 밀집되어 있다는 것을 나타냅니다. imageSubresource, imageOffset 및 imageExtent 필드는 픽셀을 복사할 이미지의 부분을 지정합니다.

버퍼에서 이미지로의 복사 작업은 vkCmdCopyBufferToImage 함수를 사용하여 큐에 추가됩니다:

vkCmdCopyBufferToImage(
    commandBuffer,
    buffer,
    image,
    VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    1,
    &region
);

네 번째 매개변수는 이미지가 현재 사용 중인 레이아웃을 나타냅니다. 여기서는 이미지가 복사 작업을 수행하기에 최적의 레이아웃인 VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL로 전환되었다고 가정합니다. 지금은 버퍼에서 이미지 전체로 픽셀을 복사하는 하나의 청크만 복사하고 있지만, VkBufferImageCopy의 배열을 지정하여 이 버퍼에서 이미지로 한 번의 작업으로 많은 다양한 복사를 수행할 수 있습니다.

텍스처 이미지 준비

이제 필요한 모든 도구를 갖추었으므로 createTextureImage 함수를 완성할 준비가 되었습니다. 마지막으로 한 작업은 텍스처 이미지를 생성하는 것이었습니다. 다음 단계는 스테이징 버퍼를 텍스처 이미지로 복사하는 것입니다. 이 작업은 두 단계를 포함합니다:

텍스처 이미지를 VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL로 전환
버퍼에서 이미지로 복사 작업 실행

이제 막 작성한 함수를 사용하면 쉽게 수행할 수 있습니다:

transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));

이미지는 VK_IMAGE_LAYOUT_UNDEFINED 레이아웃으로 생성되었으므로 textureImage를 전환할 때 이전 레이아웃으로 지정해야 합니다. 복사 작업을 수행하기 전에 이미지의 기존 내용을 신경 쓰지 않으므로 이렇게 할 수 있습니다.

셰이더에서 텍스처를 샘플링하기 시작하려면 마지막 전환을 수행하여 셰이더 액세스를 위해 준비해야 합니다:

transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);

전환 배리어 마스크

지금 유효성 검사 계층을 활성화한 상태에서 애플리케이션을 실행하면, transitionImageLayout에서 접근 마스크와 파이프라인 단계가 잘

못되었다는 것을 알려줍니다. 이전에 설정한 레이아웃에 따라 이러한 값을 설정해야 합니다.

다루어야 할 두 가지 전환이 있습니다:

정의되지 않음 → 전송 대상: 대기할 필요 없는 전송 쓰기
전송 대상 → 셰이더 읽기: 셰이더 읽기는 전송 쓰기를 기다려야 하며, 특히 텍스처를 사용할 프래그먼트 셰이더에서 이를 사용해야 합니다

이러한 규칙은 다음 접근 마스크와 파이프라인 단계를 사용하여 지정됩니다:

VkPipelineStageFlags sourceStage;
VkPipelineStageFlags destinationStage;

if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
    barrier.srcAccessMask = 0;
    barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

    sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
    destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
} else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) {
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

    sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
    destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
} else {
    throw std::invalid_argument("unsupported layout transition!");
}

vkCmdPipelineBarrier(
    commandBuffer,
    sourceStage, destinationStage,
    0,
    0, nullptr,
    0, nullptr,
    1, &barrier
);

알 수 있듯이, 전송 쓰기는 파이프라인 전송 단계에서 발생해야 합니다. 쓰기가 어떤 것에도 기다릴 필요가 없으므로, 배리어 사전 작업에 대해 빈 접근 마스크와 가능한 가장 이른 파이프라인 단계인 VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT를 지정할 수 있습니다. VK_PIPELINE_STAGE_TRANSFER_BIT는 그래픽 및 컴퓨트 파이프라인 내의 실제 단계가 아닌 의사(pseudo) 단계입니다. 전송이 일어나는 곳입니다. 자세한 내용과 다른 예시에 대한 의사 단계는 문서를 참조하세요.

이미지는 동일한 파이프라인 단계에서 쓰여지고 이후에 프래그먼트 셰이더에서 읽힐 것입니다. 이것이 우리가 프래그먼트 셰이더 파이프라인 단계에서 셰이더 읽기 접근을 지정하는 이유입니다.

더 많은 전환을 처리해야 할 경우 함수를 확장할 것입니다. 애플리케이션은 이제 성공적으로 실행되어야 하지만 물론 아직 시각적 변화는 없습니다.

한 가지 주목할 점은 명령 버퍼 제출은 시작 시 암시적 VK_ACCESS_HOST_WRITE_BIT 동기화를 초래한다는 것입니다. transitionImageLayout 함수는 단일 명령으로 명령 버퍼를 실행하므로, 레이아웃 전환에서 VK_ACCESS_HOST_WRITE_BIT 종속성이 필요한 경우 이 암시적 동기화를 사용하고 srcAccessMask를 0으로 설정할 수 있습니다. 이에 대해 명시적으로 설정하고 싶은지 아니면 이러한 OpenGL과 같은 "숨겨진" 작업에 의존하고 싶지 않은지 여부는 귀하에게 달려 있습니다.

사실, 모든 작업을 지원하는 특별한 유형의 이미지 레이아웃이 있습니다: VK_IMAGE_LAYOUT_GENERAL. 물론 문제는 이 레이아웃이 어떤 작업에 대해서도 최상의 성능을 제공하지 않는다는 것입니다. 이미지를 입력 및 출력으로 사용하거나 이미지가 사전 초기화된 레이아웃을 벗어난 후 이미지를 읽을 필요가 있는 일부 특별한 경우에 필요합니다.

지금까지 도우미 함수가 제출하는 모든 명령은 큐가 유휴 상태가 될 때까지 기다리는 방식으로 동기적으로 설정되었습니다. 실제 애플리케이션에서는 이러한 작업을 단일 명령 버퍼에 결합하고 비동기적으로 실행하여 처리량을 높이는 것이 좋습니다. 특히 createTextureImage 함수에서 전환과 복사를 수행할 때 이를 시도해 보세요. setupCommandBuffer를 만들어 도우미 함수가 명령을 기록하도록 하고, 지금까지 기록된 명령을 실행하는 flushSetupCommands를 추가하세요. 텍스처 매핑이 작동한 후에 이를 시도하는 것이 좋습니다. 텍스처 리소스가 여전히 올바르게 설정되어 있는지 확인할 수 있습니다.

정리

createTextureImage 함수를 마무리하고 끝에서 스테이징 버퍼와 그 메모리를 정리하세요:

    transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);

    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

메인 텍스처 이미지는 프로그램이 끝날 때까지 사용됩니다:

void cleanup() {
    cleanupSwapChain();

    vkDestroyImage(device, textureImage, nullptr);
    vkFreeMemory(device, textureImageMemory, nullptr);

    ...
}

이제 이미지에 텍스처가 포함되어 있지만, 그래픽 파이프라인에서 액세스할 수 있는 방법이 필요합니다. 다음 장에서 이 작업을 수행할 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

이미지 뷰 및 샘플러

소개

이번 장에서는 이미지를 샘플링하는 데 필요한 두 가지 리소스를 생성합니다. 첫 번째 리소스는 스왑 체인 이미지를 다루면서 이미 본 적이 있는 것이지만, 두 번째 리소스는 새로운 것으로, 셰이더가 이미지에서 텍셀을 읽는 방식과 관련이 있습니다.

텍스처 이미지 뷰

스왑 체인 이미지와 프레임버퍼에서 보았듯이, 이미지는 직접 접근하는 대신 이미지 뷰를 통해 접근됩니다. 텍스처 이미지에 대해서도 이미지 뷰를 생성해야 합니다.

텍스처 이미지에 대한 VkImageView를 보관할 클래스 멤버를 추가하고 createTextureImageView라는 새 함수를 만들어 그곳에서 이미지 뷰를 생성합니다:

VkImageView textureImageView;

...

void initVulkan() {
    ...
    createTextureImage();
    createTextureImageView();
    createVertexBuffer();
    ...
}

...

void createTextureImageView() {

}

이 함수는 createImageViews에서 직접 코드를 기반으로 할 수 있습니다. 변경해야 할 것은 format과 image 두 가지뿐입니다:

VkImageViewCreateInfo viewInfo{};
viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
viewInfo.image = textureImage;
viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
viewInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
viewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
viewInfo.subresourceRange.baseMipLevel = 0;
viewInfo.subresourceRange.levelCount = 1;
viewInfo.subresourceRange.baseArrayLayer = 0;
viewInfo.subresourceRange.layerCount = 1;

viewInfo.components 초기화를 명시적으로 생략했습니다. 왜냐하면 VK_COMPONENT_SWIZZLE_IDENTITY가 어차피 0으로 정의되어 있기 때문입니다. vkCreateImageView를 호출하여 이미지 뷰를 생성을 완료하세요:

if (vkCreateImageView(device, &viewInfo, nullptr, &textureImageView) != VK_SUCCESS) {
    throw std::runtime_error("failed to create texture image view!");
}

로직이 createImageViews에서 많이 중복되므로, 새로운 createImageView 함수로 추상화하는 것이 좋습니다:

VkImageView createImageView(VkImage image, VkFormat format) {
    VkImageViewCreateInfo viewInfo{};
    viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
    viewInfo.image = image;
    viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
    viewInfo.format = format;
    viewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
    viewInfo.subresourceRange.baseMipLevel = 0;
    viewInfo.subresourceRange.levelCount = 1;
    viewInfo.subresourceRange.baseArrayLayer = 0;
    viewInfo.subresourceRange.layerCount = 1;

    VkImageView imageView;
    if (vkCreateImageView(device, &viewInfo, nullptr, &imageView) != VK_SUCCESS) {
        throw std::runtime_error("failed to create image view!");
    }

    return imageView;
}

이제 createTextureImageView 함수는 다음과 같이 간소화할 수 있습니다:

void createTextureImageView() {
    textureImageView = createImageView(textureImage, VK_FORMAT_R8G8B8A8_SRGB);
}

그리고 createImageViews도 간소화할 수 있습니다:

void createImageViews() {
    swapChainImageViews.resize(swapChainImages.size());

    for (uint32_t i = 0; i < swapChainImages.size(); i++) {
        swapChainImageViews[i] = createImageView(swapChainImages[i], swapChainImageFormat);
    }
}

프로그램의 끝에서 이미지 자체를 파괴하기 직전에 이미지 뷰를 파괴하세요:

void cleanup() {
    cleanupSwapChain();

    vkDestroyImageView(device, textureImageView, nullptr);

    vkDestroyImage(device, textureImage, nullptr);
    vkFreeMemory(device, textureImageMemory, nullptr);

샘플러

셰이더에서 직접 이미지에서 텍셀을 읽을 수 있지만, 일반적으로 텍스처로 사용될 때는 그

렇게 하지 않습니다. 텍스처는 보통 샘플러를 통해 접근되며, 샘플러는 최종 색상을 검색하는 데 필요한 필터링 및 변환을 적용합니다.

이 필터는 오버샘플링과 같은 문제를 다루는 데 도움이 됩니다. 예를 들어, 텍셀보다 많은 프래그먼트에 매핑된 텍스처를 고려해 보세요. 각 프래그먼트의 텍스처 좌표에 가장 가까운 텍셀을 단순히 가져오면 첫 번째 이미지와 같은 결과를 얻게 됩니다:

4개의 가장 가까운 텍셀을 선형 보간을 통해 결합하면 오른쪽과 같은 더 부드러운 결과를 얻을 수 있습니다. 물론 귀하의 애플리케이션에는 왼쪽 스타일이 더 적합한 예술 스타일 요구 사항이 있을 수 있습니다(마인크래프트를 생각해 보세요), 하지만 일반적인 그래픽 애플리케이션에서는 오른쪽이 선호됩니다. 샘플러 객체는 텍스처에서 색상을 읽을 때 자동으로 이 필터링을 적용합니다.

언더샘플링은 반대 문제로, 텍셀보다 프래그먼트가 더 많습니다. 이는 예를 들어 날카로운 각도에서 체크보드 텍스처를 샘플링할 때 아티팩트를 유발합니다:

왼쪽 이미지에서 보듯이, 멀리서 보면 텍스처가 흐릿하게 보입니다. 이 문제의 해결책은 등방성 필터링이며, 이 또한 샘플러에 의해 자동으로 적용될 수 있습니다.

이 필터 외에도 샘플러는 변환을 처리할 수 있습니다. 주소 지정 모드를 통해 이미지 밖의 텍셀을 읽으려고 할 때 발생하는 일을 결정합니다. 아래 이미지는 가능한 몇 가지 옵션을 보여줍니다:

이제 createTextureSampler라는 함수를 만들어 이러한 샘플러 객체를 설정할 것입니다. 나중에 이 샘플러를 사용하여 셰이더에서 텍스처로부터 색상을 읽을 것입니다.

void initVulkan() {
    ...
    createTextureImage();
    createTextureImageView();
    createTextureSampler();
    ...
}

...

void createTextureSampler() {

}

샘플러는 VkSamplerCreateInfo 구조체를 통해 설정되며, 적용할 모든 필터와 변환을 지정합니다.

VkSamplerCreateInfo samplerInfo{};
samplerInfo.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
samplerInfo.magFilter = VK_FILTER_LINEAR;
samplerInfo.minFilter = VK_FILTER_LINEAR;

magFilter 및 minFilter 필드는 확대 또는 축소된 텍셀을 보간하는 방법을 지정합니다. 확대는 위에서 설명한 오버샘플링 문제와 관련이 있으며, 축소는 언더샘플링과 관련이 있습니다. 선택할 수 있는 옵션은 VK_FILTER_NEAREST 및 VK_FILTER_LINEAR로, 위 이미지에서 보여진 모드와 대응합니다.

samplerInfo.addressModeU = VK_SAMPLER_ADDRESS_MODE_REPEAT;
samplerInfo.addressModeV = VK_SAMPLER_ADDRESS_MODE_REPEAT;
samplerInfo.addressModeW = VK_SAMPLER_ADDRESS_MODE_REPEAT;

addressMode 필드를 사용하여 축별로 주소 지정 모드를 지정할 수 있습니다. 사용 가능한 값은 아래에 나열되어 있습니다. 대부분은 위의 이미지에서 보여진 것과 같습니다. 축은 X, Y, Z가 아닌 U, V, W로 불리는 것이 텍스처 공간 좌표에 대한 관례입니다.

VK_SAMPLER_ADDRESS_MODE_REPEAT: 이미지 차원을 넘어갈 때 텍스처를 반복합니다.
VK_SAMPLER_ADDRESS_MODE_MIRRORED_REPEAT: 반복과 비슷하지만 차원을 넘어갈 때 좌표를 반전시켜 이미지를 거울처럼 보입니다.
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE: 이미지 차원을 넘어갈 때 가장 가까운 가장자리의 색상을 취합니다.
VK_SAMPLER_ADDRESS_MODE_MIRROR_CLAMP_TO_EDGE: 가장자리에 고정하는 것과 비슷하지만, 가장 가까운 가장자리가 아닌 반대 가장자리를 사용합니다.
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER: 이미지 차원을 넘어갈 때 고체 색상을 반환합니다.

여기서 어떤 주소 지정 모드를 사용하든 중요하지 않습니다. 왜냐하면 이 튜토리얼에서는 이미지 밖을 샘플링하지 않기 때문입니다. 그러나 반복 모드는 바닥과 벽과 같은 텍스처를 타일링하는 데 사용될 수 있으므로 가장 일반적인 모드일 수 있습니다.

samplerInfo.anisotropyEnable = VK_TRUE;
samplerInfo.maxAnisotropy = ???;

이 두 필드는 등방성 필터링을 사용할지 여부를 지정합니다. 성능이 문제가 되지 않는 한 사용하는 것이 좋습니다. maxAnisotropy 필드는 최종 색상을 계산하는 데 사용될 수 있는 텍셀 샘플 수를 제한합니다. 값이 낮을수록 성능은 좋아지지만 결과 품질은 떨어집니다. 사용할 수 있는 값을 결정하려면 물리적 장치의 속성을 검색해야 합니다.

VkPhysicalDeviceProperties properties{};
vkGetPhysicalDeviceProperties(physicalDevice, &properties);

VkPhysicalDeviceProperties 구조체의 문서를 살펴보면 limits라는 이름의 VkPhysicalDeviceLimits 멤버를 포함한다는 것을 알 수 있습니다. 이 구조체는 다시 maxSamplerAnisotropy라는 멤버를 가지고 있으며, 이는 maxAnisotropy에 지정할 수 있는 최대값입니다. 최대 품질을 원한다면 이 값을 직접 사용할 수 있습니다:

samplerInfo.maxAnisotropy = properties.limits.maxSamplerAnisotropy;

프로그램의 시작 부분에서 이 속성을 쿼리하고 필요한 함수로 전달하거나 createTextureSampler 함수 자체에서 쿼리할 수 있습니다.

samplerInfo.borderColor = VK_BORDER_COLOR_INT_OPAQUE_BLACK;

borderColor 필드는 클램프 투 보더 주소 지정 모드로 이미지 차원을 넘어 샘플링할 때 반환되는 색상을 지정합니다. 가능한 값은 검은색, 흰색 또는 투명한 색상이며, float 또

는 int 형식일 수 있습니다. 임의의 색상을 지정할 수는 없습니다.

samplerInfo.unnormalizedCoordinates = VK_FALSE;

unnormalizedCoordinates 필드는 텍셀을 이미지에서 주소 지정하는 데 사용하려는 좌표계를 지정합니다. 이 필드가 VK_TRUE이면 [0, texWidth) 및 [0, texHeight) 범위 내의 좌표를 간단히 사용할 수 있습니다. VK_FALSE인 경우 모든 축에서 [0, 1) 범위를 사용하여 텍셀을 주소 지정합니다. 실제 애플리케이션에서는 거의 항상 정규화된 좌표를 사용합니다. 그렇게 하면 정확히 같은 좌표를 사용하여 다양한 해상도의 텍스처를 사용할 수 있습니다.

samplerInfo.compareEnable = VK_FALSE;
samplerInfo.compareOp = VK_COMPARE_OP_ALWAYS;

비교 함수가 활성화되면 텍셀은 먼저 값과 비교되고, 그 비교 결과는 필터링 작업에 사용됩니다. 이는 주로 퍼센트 클로저 필터링에 사용되며, 그림자 맵에서 사용됩니다. 이에 대해서는 미래의 장에서 살펴볼 것입니다.

samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
samplerInfo.mipLodBias = 0.0f;
samplerInfo.minLod = 0.0f;
samplerInfo.maxLod = 0.0f;

이 모든 필드는 미핑에 적용됩니다. 나중에 나오는 장에서 미핑을 살펴볼 것이지만, 기본적으로 적용될 수 있는 또 다른 유형의 필터입니다.

이제 샘플러의 기능이 완전히 정의되었습니다. 샘플러 객체의 핸들을 보관할 클래스 멤버를 추가하고 vkCreateSampler를 사용하여 샘플러를 생성하세요:

VkImageView textureImageView;
VkSampler textureSampler;

...

void createTextureSampler() {
    ...

    if (vkCreateSampler(device, &samplerInfo, nullptr, &textureSampler) != VK_SUCCESS) {
        throw std::runtime_error("failed to create texture sampler!");
    }
}

샘플러는 어디에도 VkImage를 참조하지 않습니다. 샘플러는 텍스처에서 색상을 추출하는 인터페이스를 제공하는 독립적인 객체입니다. 원하는 이미지에 적용할 수 있으며, 1D, 2D 또는 3D일 수 있습니다. 이는 많은 오래된 API와 다르며, 이러한 API는 텍스처 이미지와 필터링을 단일 상태로 결합했습니다.

프로그램 끝에서 이미지에 더 이상 액세스하지 않을 때 샘플러를 파괴하세요:

void cleanup() {
    cleanupSwapChain();

    vkDestroySampler(device, textureSampler, nullptr);
    vkDestroyImageView(device, textureImageView, nullptr);

    ...
}

등방성 디바이스 기능

지금 프로그램을 실행하면 다음과 같은 유효성 검사 계층 메시지를 볼 수 있습니다:

그 이유는 등방성 필터링이 실제로 선택적 디바이스 기능이기 때문입니다. createLogicalDevice 함수를 업데이트하여 이를 요청해야 합니다:

VkPhysicalDeviceFeatures deviceFeatures{};
deviceFeatures.samplerAnisotropy = VK_TRUE;

현대 그래픽 카드가 이를 지원하지 않을 가능성은 매우 낮지만, isDeviceSuitable을 업데이트하여 이를 사용할 수 있는지 확인해야 합니다:

bool isDeviceSuitable(VkPhysicalDevice device) {
    ...

    VkPhysicalDeviceFeatures supportedFeatures;
    vkGetPhysicalDeviceFeatures(device, &supportedFeatures);

    return indices.isComplete() && extensionsSupported && swapChainAdequate && supportedFeatures.samplerAnisotropy;
}

vkGetPhysicalDeviceFeatures는 지원되는 기능을 나타내기 위해

VkPhysicalDeviceFeatures 구조체를 재사용합니다. 불리언 값으로 설정됩니다.

등방성 필터링을 강제로 사용할 필요는 없으며, 다음과 같이 조건부로 설정할 수 있습니다:

samplerInfo.anisotropyEnable = VK_FALSE;
samplerInfo.maxAnisotropy = 1.0f;

다음 장에서는 이미지와 샘플러 객체를 셰이더에 노출하여 사각형에 텍스처를 그릴 것입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

결합 이미지 샘플러

소개

유니폼 버퍼 파트에서 처음으로 디스크립터에 대해 살펴보았습니다. 이 장에서는 새로운 유형의 디스크립터인 결합 이미지 샘플러를 살펴볼 것입니다. 이 디스크립터는 샘플러 객체를 통해 셰이더에서 이미지 자원에 접근할 수 있게 해줍니다. 이전 장에서 생성한 것과 같은 샘플러를 사용합니다.

먼저 디스크립터 세트 레이아웃, 디스크립터 풀 및 디스크립터 세트를 수정하여 이러한 결합 이미지 샘플러 디스크립터를 포함시키겠습니다. 그 후, Vertex에 텍스처 좌표를 추가하고 프래그먼트 셰이더를 수정하여 버텍스 색상을 단순히 보간하는 대신 텍스처에서 색상을 읽어올 것입니다.

디스크립터 업데이트

createDescriptorSetLayout 함수로 이동하여 결합 이미지 샘플러 디스크립터에 대한 VkDescriptorSetLayoutBinding을 추가하세요. 유니폼 버퍼 바로 다음 바인딩에 간단히 넣을 수 있습니다.

VkDescriptorSetLayoutBinding samplerLayoutBinding{};
samplerLayoutBinding.binding = 1;
samplerLayoutBinding.descriptorCount = 1;
samplerLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
samplerLayoutBinding.pImmutableSamplers = nullptr;
samplerLayoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;

std::array<VkDescriptorSetLayoutBinding, 2> bindings = {uboLayoutBinding, samplerLayoutBinding};
VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = static_cast<uint32_t>(bindings.size());
layoutInfo.pBindings = bindings.data();

프래그먼트 셰이더에서 결합 이미지 샘플러 디스크립터를 사용하려는 의도를 나타내기 위해 stageFlags를 설정하세요. 프래그먼트의 색상이 결정되는 곳이기 때문입니다. 예를 들어, 높이맵을 사용하여 버텍스 그리드를 동적으로 변형하기 위해 버텍스 셰이더에서 텍스처 샘플링을 사용할 수도 있습니다.

결합 이미지 샘플러를 위한 VkPoolSize를 추가하여 디스크립터 풀을 확장해야 합니다. createDescriptorPool 함수로 이동하여 이 디스크립터에 대한 VkDescriptorPoolSize를 포함시키도록 수정하세요.

std::array<VkDescriptorPoolSize, 2> poolSizes{};
poolSizes[0].type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
poolSizes[0].descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);
poolSizes[1].type = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
poolSizes[1].descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

VkDescriptorPoolCreateInfo poolInfo{};
poolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
poolInfo.poolSizeCount = static_cast<uint32_t>(poolSizes.size());
poolInfo.pPoolSizes = poolSizes.data();
poolInfo.maxSets = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

부적절한 디스크립터 풀은 검증 레이어가 잡아내지 못하는 좋은 예입니다: Vulkan 1.1부터 vkAllocateDescriptorSets는 풀이 충분히 크지 않으면 오류 코드 VK_ERROR_POOL_OUT_OF_MEMORY로 실패할 수 있습니다. 그러나 드라이버는 때때로 이 문제를 내부적으로 해결하려고 시도할 수 있습니다. 이는 때때로(하드웨어, 풀 크기 및 할당 크기에 따라 다름) 드라이버가 디스크립터 풀의 한계를 초과하는 할당을 허용하지만, 다른 경우에는 vkAllocateDescriptorSets가 실패하고 VK_ERROR_POOL_OUT_OF_MEMORY를 반환합니다. 이는 일부 기기에서는 할당이 성공하지만 다른 기기에서 실패할 때 특히 좌절스러울 수 있습니다.

Vulkan은 드라이버에 할당 책임을 이전함으로써, 디스크립터 풀 생성시 명시된 해당 descriptorCount 멤버에 따라 특정 유형의 디스크립터(VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER 등)를 할당하는 것이 엄격한 요구 사항이 아닙니다. 그러나 이는 여전히 최선의 방법이며, 향후 최선의 실천 검증을 활성화하면 VK_LAYER_KHRONOS_validation이 이러한 유형의 문제에 대해 경고할 것입니다.

디스크립터 세트에 실제 이미지와 샘플러 자원을 바인딩하는 것이 마지막 단계입니다. createDescriptorSets 함수로 이동하세요.

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    VkDescriptorBufferInfo bufferInfo{};
    bufferInfo.buffer = uniformBuffers[i];
    bufferInfo.offset = 0;
    bufferInfo.range = sizeof(UniformBufferObject);

    VkDescriptorImageInfo imageInfo{};
    imageInfo.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    imageInfo.imageView = textureImageView;
    imageInfo.sampler = textureSampler;

    ...
}

결합 이미지 샘플러 구조체에 대한 자원은 VkDescriptorImageInfo 구조체에 명시해야 하며, 유니폼 버퍼 디스크립터의 버퍼 자원이 VkDescriptorBufferInfo 구조체에 명시되는 것과 같은 방식입니다. 이전 장의 객체들이 여기서 결합됩니다.

std::array<VkWriteDescriptorSet, 2> descriptorWrites{};

descriptorWrites[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[0].dstSet = descriptorSets[i];
descriptorWrites[0].dstBinding = 0;
descriptorWrites[0].dstArrayElement = 0;
descriptorWrites[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrites[0].descriptorCount = 1;
descriptorWrites[0].pBufferInfo = &bufferInfo;

descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[1].dstSet = descriptorSets[i];
descriptorWrites[1].dstBinding = 1;
descriptorWrites[1].dstArrayElement = 0;
descriptorWrites[1].descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
descriptorWrites[1].descriptorCount = 1;
descriptorWrites[1].pImageInfo = &imageInfo;

vkUpdateDescriptorSets(device, static_cast<uint32_t>(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr);

버퍼처럼 이 이미지 정보로 디스크립터를 업데이트해야 합니다. 이번에는 pBufferInfo 배열 대신 pImageInfo 배열을 사용합니다. 이제 디스크립터가 셰이더에 의해 사용될 준비가 되었습니다!

텍스처 좌표

텍스처 매핑에 필요한 중요한 구성 요소가 하나 더 있으며, 그것은 각 버텍스에 대한 실제 텍스처 좌표입니다. 텍스처 좌표는 이미지가 기하학적으로 어떻게 매핑되는지를 결정합니다.

struct Vertex {
    glm::vec2 pos;
    glm::vec3 color;
    glm::vec2 texCoord;

    static VkVertexInputBindingDescription getBindingDescription() {
        VkVertexInputBindingDescription bindingDescription{};
        bindingDescription.binding = 0;
        bindingDescription.stride = sizeof(Vertex);
        bindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;

        return bindingDescription;
    }

    static std::array<VkVertexInputAttributeDescription, 3> getAttributeDescriptions() {
        std::array<VkVertexInputAttributeDescription, 3> attributeDescriptions{};

        attributeDescriptions[0].binding = 0;
        attributeDescriptions[0].location = 0;
        attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
        attributeDescriptions[0].offset = offsetof(Vertex, pos);

        attributeDescriptions[1].binding = 0;
        attributeDescriptions[1].location = 1;
        attributeDescriptions[1].format = VK_FORMAT_R32G32B32_SFLOAT;
        attributeDescriptions[1].offset = offsetof(Vertex, color);

        attributeDescriptions[2].binding = 0;
        attributeDescriptions[2].location = 2;
        attributeDescriptions[2].format = VK_FORMAT_R32G32_SFLOAT;
        attributeDescriptions[2].offset = offsetof(Vertex, texCoord);

        return attributeDescriptions;
    }
};

Vertex 구조체를 수정하여 텍스처 좌표를 위한 vec2를 포함시키세요. 버텍스 셰이더에서 텍스처 좌표에 접근하여 프래그먼트 셰이더로 보낼 수 있도록 VkVertexInputAttributeDescription도 추가하는 것이 필요합니다. 이는 표면을 걸쳐 보간하기 위해 필요합니다.

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}, {0.0f, 1.0f}},
    {{-0.5f, 0.5f}, {1.0f, 1.0f, 1.0f}, {1.0f, 1.0f}}
};

이 튜토리얼에서는 0, 0 좌표에서 시작하여 1, 1 좌표에서 끝나는 사각형을 텍스처로 채울 것입니다. 다른 좌표를 사용하여 실험해 보세요. 0 이하나 1 이상의 좌표를 사용하여 주소 지정 모드를 확인해 보세요!

셰이더

텍스처에서 색상을 샘플링하도록 셰이더를 수정하는 것이 마지막 단계입니다. 먼저 버텍스 셰이더를 수정하여 텍스처 좌표를 프래그먼트 셰이더로 전달해야 합니다:

layout(location = 0) in vec2 inPosition;
layout(location = 1) in vec3 inColor;
layout(location = 2) in vec2 inTexCoord;

layout(location = 0) out vec3 fragColor;
layout(location = 1) out vec2 fragTexCoord;

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
    fragTexCoord = inTexCoord;
}

프래그먼트 셰이더에서는 다음과 같이 텍스처 좌표를 색상으로 시각화할 수 있습니다:

#version 450

layout(location = 0) in vec3 fragColor;
layout(location = 1) in vec2 fragTexCoord;

layout(location = 0) out vec4 outColor;

void main() {
    outColor = vec4(fragTexCoord, 0.0, 1.0);
}

아래 이미지와 같이 보여야 합니다. 셰이더를 재컴파일하는 것을 잊지 마세요!

결합 이미지 샘플러 디스크립터는 GLSL에서 샘플러 유니폼으로 표현됩니다. 프래그먼트 셰이더에 그것을 참조 추가하세요:

layout(binding = 1) uniform sampler2D texSampler;

다른 이미지 유형에 대한 sampler1D 및 sampler3D 유형도 있습니다. 여기서 올바른 바인딩을 사용해야 합니다.

void main() {
    outColor = texture(texSampler, fragTexCoord);
}

텍스처는 내장된 texture 함수를 사용하여 샘플링됩니다. 이 함수는 sampler와 좌표를 인수로 받습니다. 샘플러는 배경에서 필터링 및 변환을 자동으로 처리합니다. 이제 애플리케이션을 실행할 때 사각형에 텍스처가 보여야 합니다:

텍스처 좌표를 1 이상의 값으로 확장하여 주소 지정 모드를 실험해 보세요. 예를 들어, VK_SAMPLER_ADDRESS_MODE_REPEAT을 사용할 때 다음 프래그먼트 셰이더는 아래 이미지에 나타난 결과를 생성합니다:

void main() {
    outColor = texture(texSampler, fragTexCoord * 2.0);
}

또한 버텍스 색상을 사용하여 텍스처 색상을 조작할 수 있습니다:

void main() {
    outColor = vec4(fragColor * texture(texSampler, fragTexCoord).rgb, 1.0);
}

여기서 RGB와 알파 채널을 분리하여 알파 채널을 조정하지 않았습니다.

셰이더에서 이미지에 접근하는 방법을 이제 알게 되었습니다! 이 기술은 프레임버퍼에도 쓰여지는 이미지와 결합할 때 매우 강력합니다. 이러한 이미지를 입력으로 사용하여 포스트 프로세싱과 카메라 디스플레이와 같은 멋진 효과를 3D 세계 내에서 구현할 수 있습니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

깊이 버퍼링

소개

지금까지 작업한 기하학적 구조는 3D로 투영되지만 완전히 평평합니다. 이 장에서는 위치에 Z 좌표를 추가하여 3D 메시를 준비할 것입니다. 이 세 번째 좌표를 사용하여 현재의 정사각형 위에 다른 정사각형을 배치하여 깊이별로 정렬되지 않은 기하학적 구조가 있을 때 발생하는 문제를 확인할 것입니다.

3D 기하학

Vertex 구조체를 3D 벡터를 사용하는 위치로 변경하고, 해당 VkVertexInputAttributeDescription의 format을 업데이트하세요:

struct Vertex {
    glm::vec3 pos;
    glm::vec3 color;
    glm::vec2 texCoord;

    ...

    static std::array<VkVertexInputAttributeDescription, 3> getAttributeDescriptions() {
        std::array<VkVertexInputAttributeDescription, 3> attributeDescriptions{};

        attributeDescriptions[0].binding = 0;
        attributeDescriptions[0].location = 0;
        attributeDescriptions[0].format = VK_FORMAT_R32G32B32_SFLOAT;
        attributeDescriptions[0].offset = offsetof(Vertex, pos);

        ...
    }
};

다음으로, 버텍스 셰이더를 업데이트하여 3D 좌표를 입력으로 받아 변환하도록 합니다. 이후에 재컴파일하는 것을 잊지 마세요!

layout(location = 0) in vec3 inPosition;

...

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 1.0);
    fragColor = inColor;
    fragTexCoord = inTexCoord;
}

마지막으로, vertices 컨테이너를 업데이트하여 Z 좌표를 포함시키세요:

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f, 0.0f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, -0.5f, 0.0f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, 0.5f, 0.0f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
    {{-0.5f, 0.5f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}}
};

이제 애플리케이션을 실행하면 이전과 동일한 결과를 볼 수 있습니다. 이제 장면을 더 흥미롭게 만들기 위해 추가적인 기하학을 추가하고, 이 장에서 해결하려는 문제를 보여줄 시간입니다. 현재 정사각형 바로 아래에 위치할 정사각형의 위치를 정의하기 위해 정점을 중복하세요:

Z 좌표를 -0.5f로 사용하고 추가 정사각형을 위한 적절한 인덱스를 추가하세요:

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f, 0.0f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, -0.5f, 0.0f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, 0.5f, 0.0f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
    {{-0.5f, 0.5f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}},

    {{-0.5f, -0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, -0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, 0.5f, -0.5f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
    {{-0.5f, 0.5f, -0.5f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}}
};

const std::vector<uint16_t> indices = {
    0, 1, 2, 2, 3, 0,
    4, 5, 6, 6, 7, 4
};

이제 프로그램을 실행하면 Escher의 일러스트레이션을 연상시키는 무언가를 볼 수 있습니다:

문제는 하단 정사각형의 프래그먼트가 인덱스 배열에서 뒤에 나오기 때문에 상단 정사각형의 프래그먼트 위에 그려진다는 것입니다. 이 문제를 해결하는 두 가지 방법이 있습니다:

모든 드로우 호출을 뒤에서 앞으로 깊이별로 정렬
깊이 버퍼를 사용한 깊이 테스트

첫 번째 방법은 투명 객체를 그릴 때 일반적으로 사용되며, 순서 독립적 투명도는 해결하기 어려운 도전입니다. 그러나 깊이별로 프래그먼트를 정렬하는 문제는 깊이 버퍼를 사용하여 훨씬 더 일반적으로 해결됩니다. 깊이 버퍼는 모든 위치의 깊이를 저장하는 추가적인 첨부 파일로, 색상 첨부 파일이 모든 위치의 색상을 저장하는 것과 같습니다. 래스터라이저가 프래그먼트를 생성할 때마다, 깊이 테스트는 새 프래그먼트가 이전 것보다 가까운지를 확인합니다. 그렇지 않다면, 새 프래그먼트는 버려집니다. 깊이 테스트를 통과한 프래그먼트는 자신의 깊이를 깊이 버퍼에 기록합니다. 프래그먼트 셰이더에서 색상 출력을 조작할 수 있는 것처럼 이 값을 조작할 수 있습니다.

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>

GLM에 의해 생성된 관점 투영 행렬은 기본적으로 OpenGL의 깊이 범위 -1.0에서 1.0을 사용합니다. 우리는 Vulkan 범위 0.0에서 1.0을 사용하도록 GLM_FORCE_DEPTH_ZERO_TO_ONE 정의를 사용하여 구성해야 합니다.

깊이 이미지 및 뷰

깊이 첨부 파일은 색상

첨부 파일과 마찬가지로 이미지를 기반으로 합니다. 차이점은 스왑 체인이 자동으로 깊이 이미지를 생성하지 않는다는 것입니다. 동시에 실행되는 드로우 작업은 하나뿐이므로 하나의 깊이 이미지만 필요합니다. 깊이 이미지는 다시 이미지, 메모리 및 이미지 뷰의 삼박자를 필요로 합니다.

VkImage depthImage;
VkDeviceMemory depthImageMemory;
VkImageView depthImageView;

이러한 리소스를 설정하기 위해 새로운 함수 createDepthResources를 생성하세요:

void initVulkan() {
    ...
    createCommandPool();
    createDepthResources();
    createTextureImage();
    ...
}

...

void createDepthResources() {

}

깊이 이미지를 생성하는 것은 비교적 간단합니다. 스왑 체인 범위에 의해 정의된 색상 첨부 파일과 동일한 해상도를 가져야 하며, 깊이 첨부 파일에 적합한 이미지 사용, 최적 타일링 및 디바이스 로컬 메모리를 가져야 합니다. 유일한 질문은 깊이 이미지에 적합한 형식은 무엇인가 하는 것입니다. 형식은 _D??_로 표시된 깊이 구성 요소를 포함해야 합니다.

텍스처 이미지와 달리 프로그램에서 텍셀을 직접 액세스할 필요가 없기 때문에 특정 형식이 필요하지 않습니다. 단지 실제 애플리케이션에서 흔히 사용되는 최소 24비트의 합리적인 정확도를 가져야 합니다. 이 요구 사항에 맞는 여러 형식이 있습니다:

VK_FORMAT_D32_SFLOAT: 깊이를 위한 32비트 float
VK_FORMAT_D32_SFLOAT_S8_UINT: 깊이를 위한 32비트 부호 있는 float 및 8비트 스텐실 구성 요소
VK_FORMAT_D24_UNORM_S8_UINT: 깊이를 위한 24비트 float 및 8비트 스텐실 구성 요소

스텐실 구성 요소는 스텐실 테스트에 사용되며, 이는 깊이 테스트와 결합할 수 있는 추가적인 테스트입니다. 이에 대해서는 향후 장에서 살펴볼 것입니다.

우리는 매우 흔하게 지원되는 VK_FORMAT_D32_SFLOAT 형식을 간단히 사용할 수 있습니다(하드웨어 데이터베이스 참조), 하지만 가능한 경우 애플리케이션에 추가적인 유연성을 추가하는 것이 좋습니다. 우리는 가장 바람직한 것부터 가장 적게 바람직한 순서로 후보 형식 목록을 취하는 findSupportedFormat 함수를 작성할 것입니다. 지원되는 첫 번째 형식을 확인합니다:

VkFormat findSupportedFormat(const std::vector<VkFormat>& candidates, VkImageTiling tiling, VkFormatFeatureFlags features) {

}

형식의 지원 여부는 타일링 모드와 사용에 따라 달라지므로 이러한 매개변수도 포함해야 합니다. 형식의 지원 여부는 vkGetPhysicalDeviceFormatProperties 함수를 사용하여 조회할 수 있습니다:

for (VkFormat format : candidates) {
    VkFormatProperties props;
    vkGetPhysicalDeviceFormatProperties(physicalDevice, format, &props);
}

VkFormatProperties 구조체에는 세 개의 필드가 있습니다:

linearTilingFeatures: 선형 타일링으로 지원되는 사용 사례
optimalTilingFeatures: 최적 타일링으로 지원되는 사용 사례
bufferFeatures: 버퍼에 대해 지원되는 사용 사례

여기서는 첫 두 가지만 관련이 있으며, 우리가 확인하는 것은 함수의 tiling 매개변수에 따라 달라집니다:

if (tiling == VK_IMAGE_TILING_LINEAR && (props.linearTilingFeatures & features) == features) {
    return format;
} else if (tiling == VK_IMAGE_TILING_OPTIMAL && (props.optimalTilingFeatures & features) == features) {
    return format;
}

원하는 사용에 대한 지원이 없는 경우 후보 형식 중 하나도 없다면 특별한 값을 반환하거나 간단히 예외를 발생시킬 수 있습니다:

VkFormat findSupportedFormat(const std::vector<VkFormat>& candidates, VkImageTiling tiling, VkFormatFeatureFlags features) {
    for (VkFormat format : candidates) {
        VkFormatProperties props;
        vkGetPhysicalDeviceFormatProperties(physicalDevice, format, & props);

        if (tiling == VK_IMAGE_TILING_LINEAR && (props.linearTilingFeatures & features) == features) {
            return format;
        } else if (tiling == VK_IMAGE_TILING_OPTIMAL && (props.optimalTilingFeatures & features) == features) {
            return format;
        }
    }

    throw std::runtime_error("failed to find supported format!");
}

이제 깊이 첨부 파일로 사용할 깊이 구성 요소가 지원되는 형식을 선택하기 위해 findDepthFormat 도우미 함수를 사용하여 이 함수를 호출할 것입니다:

VkFormat findDepthFormat() {
    return findSupportedFormat(
        {VK_FORMAT_D32_SFLOAT, VK_FORMAT_D32_SFLOAT_S8_UINT, VK_FORMAT_D24_UNORM_S8_UINT},
        VK_IMAGE_TILING_OPTIMAL,
        VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT
    );
}

이 경우 VK_IMAGE_USAGE_ 대신 VK_FORMAT_FEATURE_ 플래그를 사용해야 합니다. 이 모든 후보 형식은 깊이 구성 요소를 포함하지만, 후자 두 가지는 또한 스텐실 구성 요소를 포함합니다. 아직 그것을 사용하지는 않지만, 이러한 형식의 이미지에 레이아웃 전환을 수행할 때 그것을 고려해야 합니다. 선택한 깊이 형식이 스텐실 구성 요소를 포함하는지 알려주는 간단한 도우미 함수를 추가하세요:

bool hasStencilComponent(VkFormat format) {
    return format == VK_FORMAT_D32_SFLOAT_S8_UINT || format == VK_FORMAT_D24_UNORM_S8_UINT;
}

createDepthResources에서 깊이 형식을 찾는 함수를 호출하세요:

VkFormat depthFormat = findDepthFormat();

이제 createImage 및 createImageView 도우미 함수를 호출하는 데 필요한 모든 정보를 갖추었습니다:

createImage(swapChainExtent.width, swapChainExtent.height, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
depthImageView = createImageView(depthImage, depthFormat);

그러나 createImageView 함수는 현재 하위 리소스가 항상 VK_IMAGE_ASPECT_COLOR_BIT라고 가정하므로 해당 필드를 매개변수로 전환해야 합니다:

VkImageView createImageView(VkImage image, VkFormat format, VkImageAspectFlags aspectFlags) {
    ...
    viewInfo.subresourceRange.aspectMask = aspectFlags;
    ...
}

이 함수를 호출할 때 올바른 양상을 사용하여 모든 호출을 업데이트하세요:

swapChainImageViews[i] = createImageView(swapChainImages[i], swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT);
...
depthImageView = createImageView(depthImage,

 depthFormat, VK_IMAGE_ASPECT_DEPTH_BIT);
...
textureImageView = createImageView(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_ASPECT_COLOR_BIT);

깊이 이미지 생성이 완료되었습니다. 색상 첨부 파일처럼 렌더 패스 시작 시 클리어할 것이므로, 다른 이미지로 매핑하거나 복사할 필요는 없습니다.

깊이 이미지의 명시적 전환

렌더 패스에서 이 작업을 처리할 것이므로 이미지의 레이아웃을 깊이 첨부 파일로 명시적으로 전환할 필요는 없습니다. 그러나 완전성을 위해 이 섹션에서는 그 과정을 여전히 설명할 것입니다. 원한다면 건너뛸 수 있습니다.

createDepthResources 함수의 끝에서 transitionImageLayout을 호출하세요:

transitionImageLayout(depthImage, depthFormat, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL);

기존 깊이 이미지 내용이 중요하지 않기 때문에 초기 레이아웃으로 정의되지 않은 레이아웃을 사용할 수 있습니다. transitionImageLayout의 일부 로직을 올바른 하위 리소스 양상을 사용하도록 업데이트해야 합니다:

if (newLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL) {
    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;

    if (hasStencilComponent(format)) {
        barrier.subresourceRange.aspectMask |= VK_IMAGE_ASPECT_STENCIL_BIT;
    }
} else {
    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
}

우리는 스텐실 구성 요소를 사용하지 않지만, 깊이 이미지의 레이아웃 전환에는 포함해야 합니다.

마지막으로 올바른 액세스 마스크와 파이프라인 단계를 추가하세요:

if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
    barrier.srcAccessMask = 0;
    barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

    sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
    destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
} else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL and newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) {
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

    sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
    destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
} else if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED and newLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL) {
    barrier.srcAccessMask = 0;
    barrier.dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

    sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
    destinationStage = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
} else {
    throw std::invalid_argument("unsupported layout transition!");
}

깊이 버퍼는 프래그먼트가 보이는지 확인하기 위해 깊이 테스트를 수행할 때 읽히며 새 프래그먼트가 그려질 때 쓰여집니다. 읽기는 VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT 단계에서 발생하고 쓰기는 VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT에서 발생합니다. 지정된 작업에 맞는 가장 이른 파이프라인 단계를 선택해야 하므로 깊이 첨부 파일로 사용해야 할 때 준비됩니다.

렌더 패스

이제 createRenderPass를 수정하여 깊이 첨부 파일을 포함시킬 것입니다. 먼저 VkAttachmentDescription을 지정하세요:

VkAttachmentDescription depthAttachment{};
depthAttachment.format = findDepthFormat();
depthAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
depthAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

format은 깊이 이미지 자체와 동일해야 합니다. 이번에는 그림이 완료된 후에 깊이 데이터(storeOp)를 사용하지 않기 때문에 하드웨어가 추가 최적화를 수행할 수 있습니다. 색상 버퍼와 마찬가지로 이전 깊이 내용은 중요하지 않으므로 initialLayout으로 VK_IMAGE_LAYOUT_UNDEFINED을 사용할 수 있습니다.

VkAttachmentReference depthAttachmentRef{};
depthAttachmentRef.attachment = 1;
depthAttachmentRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

첫 번째(유일한) 서브패스에 대한 첨부 파일 참조를 추가하세요:

VkSubpassDescription subpass{};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentRef;
subpass.pDepthStencilAttachment = &depthAttachmentRef;

색상 첨부 파일과 달리 서브패스는 하나의 깊이(+스텐실) 첨부 파일만 사용할 수 있습니다. 여러 버퍼에서 깊이 테스트를 수행하는 것은 별로 의미가 없습니다.

std::array<VkAttachmentDescription, 2> attachments = {colorAttachment, depthAttachment};
VkRenderPassCreateInfo renderPassInfo{};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassInfo.attachmentCount = static_cast<uint32_t>(attachments.size());
renderPassInfo.pAttachments = attachments.data();
renderPassInfo.subpassCount = 1;
renderPassInfo.pSubpasses = &subpass;
renderPassInfo.dependencyCount = 1;
renderPassInfo.pDependencies = &dependency;

다음으로, VkSubpassDependency 구조체를 업데이트하여 두 첨부 파일을 모두 참조하도록 하세요.

dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT;
dependency.srcAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

마지막으로, 깊이 이미지의 전환과 로드 작업의 일부로 클리어되는 것 사이의 충돌이 없도록 서브패스 종속성을 확장해야 합니다. 깊이 이미지는 초기 프래그먼트 테스트 파이프라인 단계에서 처음 액세스되며 클리어하는 로드 작업이 있기 때문에 쓰기에 대한 액세스 마스크를 지정해야 합니다.

프레임버퍼

다음 단계는 프레임버퍼 생성을 수정하여 깊이 이미지를 깊이 첨부 파일에 바인딩하는 것입니다. createFramebuffers로 이동하여 깊이 이미지 뷰를 두 번째 첨부 파일로 지정하세요:

std::array<VkImageView, 2> attachments = {
    swapChainImageViews[i],
    depthImageView
};

VkFramebufferCreateInfo framebufferInfo{};
framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
framebufferInfo.renderPass = renderPass;
framebufferInfo.attachmentCount = static_cast<uint32_t>(attachments.size());
framebufferInfo.pAttachments = attachments.data();
framebufferInfo.width = swapChainExtent.width;
framebufferInfo.height = swapChainExtent.height;
framebufferInfo.layers = 1;

색상 첨부 파일은 모든 스왑 체인 이미지마다 다르지만, 세마포어로 인해 동시에 하나의 서브패스만 실행되므로 동일한 깊이 이미지를 모두 사용할 수 있습니다.

또한 깊이 이미지 뷰가 실제로 생성된 후에 호출되도록 createFramebuffers 호출을 이동해야 합니다:

void initVulkan() {
    ...
    createDepthResources();
    createFramebuffers();
    ...
}

클리어 값

VK_ATTACHMENT_LOAD_OP_CLEAR을 사용하는 여러 첨부 파일이 있으므로 여러 클리어 값을 지정해야 합니다. recordCommandBuffer로 이동하여 VkClearValue 구조체의 배열을 생성하세요:

std::array<VkClearValue, 2> clearValues{};
clearValues[0].color = {{0.0f, 0.0f, 0.0f, 1.0f}};
clearValues[1].depthStencil = {1.0f, 0};

renderPassInfo.clearValueCount = static_cast<uint32_t>(clearValues.size());
renderPassInfo.pClearValues = clearValues.data();

Vulkan에서 깊이 버퍼의 깊이 범위는 0.0에서 1.0이며, 여기서 1.0은 멀리 보이는 평면에, 0.0은 가까운 평면에 있습니다. 깊이 버퍼의 각 점의 초기 값은 가능한 가장 먼 깊이, 즉 1.0이어야 합니다.

clearValues의 순서는 첨부 파일의 순서와 동일해야 합니다.

깊이 및 스텐실 상태

깊이 첨부 파일이 이제 사용 준비가 되었지만, 깊이 테스트는 그래픽 파이프라인에서 활성화해야 합니다. 이는 VkPipelineDepthStencilStateCreateInfo 구조체를 통해 구성됩니다:

VkPipelineDepthStencilStateCreateInfo depthStencil{};
depthStencil.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO;
depthStencil.depthTestEnable = VK_TRUE;
depthStencil.depthWriteEnable = VK_TRUE;

depthTestEnable 필드는 새 프래그먼트의 깊이가 깊이 버퍼와 비교되어 버려질지 여부를 지정합니다. depthWriteEnable 필드는 깊이 테스트를 통과한 프래그먼트의 새 깊이가 실제로 깊이 버퍼에 기록되어야 하는지를 지정합니다.

depthStencil.depthCompareOp = VK_COMPARE_OP_LESS;

depthCompareOp 필드는 프래그먼트를 유지하거나 버리는 데 수행되는 비교를 지정합니다. 우리는 낮은 깊이 = 더 가까움의 규칙을 따르므로 새 프래그먼트의 깊이는 더 적어야 합니다.

depthStencil.depthBoundsTestEnable = VK_FALSE;
depthStencil.minDepthBounds = 0.0f; // 선택 사항
depthStencil.maxDepthBounds = 1.0f; // 선택 사항

depthBoundsTestEnable, minDepthBounds, maxDepthBounds 필드는 선택적 깊이 경계 테스트에 사용됩니다. 기본적으로 이 기능을 사용하면 지정된 깊이 범위 내에 있는 프래그먼트만 유지할 수 있습니다. 우리는 이 기능을 사용하지 않을 것입니다.

depthStencil.stencilTestEnable = VK_FALSE;
depthStencil.front = {}; // 선택 사항
depthStencil.back = {}; // 선택 사항

마지막 세 필드는 스텐실 버퍼 작업을 구성하는 데 사용되며, 이 튜토리얼에서는 사용하지 않을 것입니다. 이러한 작업을 사용하려면 깊이/스텐실 이미지의 형식이 스텐실 구성 요소를 포함하는지 확인해야 합니다.

pipelineInfo.pDepthStencilState = &depthStencil;

깊이 스텐실 상태를 참조하도록 VkGraphicsPipelineCreateInfo 구조체를 업데이트하세요. 렌더 패스에 깊이 스텐실 첨부 파일이 포함되어 있으면 항상 깊이 스텐실 상태를 지정해야 합니다.

이제 프로그램을 실행하면 기하학의 프래그먼트가 올바르게 정렬된 것을 볼 수 있습니다:

창 크기 변경 처리

창 크기가 변경될 때 깊이 버퍼의 해상도도 새 색상 첨부 파일 해상도와 일치하도록 변경해야 합니다. recreateSwapChain 함수를 확장하여 그 경우에 깊이 리소스를 다시 생성하세요:

void recreateSwapChain() {
    int width = 0, height = 0;
    while (width == 0 || height == 0) {
        glfwGetFramebufferSize(window, &width, &height);
        glfwWaitEvents();
    }

    vkDeviceWaitIdle(device);

    cleanupSwapChain();

    createSwapChain();
    createImageViews();
    createDepthResources();
    createFramebuffers();
}

정리 작업은 스왑 체인 정리 함수에서 수행해야 합니다:

void cleanupSwapChain() {
    vkDestroyImageView(device, depthImageView, nullptr);
    vkDestroyImage(device, depthImage, nullptr);
    vkFreeMemory(device, depthImageMemory, nullptr);

    ...
}

축하합니다, 이제 애플리케이션이 임의의 3D 기하학을 렌더링하고 올바르게 보이게 할 준비가 되었습니다. 다음 장에서 텍스처가 있는 모델을 그려보면서 이를 시험해볼 것입니다!

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

다음은 모델 로딩에 대한 번역본입니다. 원본 구조를 그대로 유지하려고 노력했습니다.

모델 로딩

소개

이제 프로그램이 텍스처가 적용된 3D 메시를 렌더링할 준비가 되었지만, 현재 vertices와 indices 배열에 있는 기하학적 구조는 아직 흥미롭지 않습니다. 이 장에서는 실제 모델 파일에서 정점과 인덱스를 로드하여 그래픽 카드에 실제 작업을 하도록 프로그램을 확장할 것입니다.

많은 그래픽 API 튜토리얼은 독자가 이러한 장에서 자체 OBJ 로더를 작성하도록 합니다. 이 방법의 문제는 어느 정도 흥미로운 3D 애플리케이션이 곧 이 파일 형식에서 지원하지 않는 기능, 예를 들어 골격 애니메이션과 같은 것들을 필요로 한다는 것입니다. 이 장에서는 OBJ 모델에서 메시 데이터를 로드할 것이지만, 파일에서 로딩하는 세부사항보다는 프로그램 자체와 메시 데이터를 통합하는 데 더 초점을 맞출 것입니다.

라이브러리

OBJ 파일에서 정점과 면을 로드하기 위해 tinyobjloader 라이브러리를 사용할 것입니다. 이 라이브러리는 stb_image처럼 단일 파일 라이브러리이기 때문에 통합하기 쉽고 빠릅니다. 위에 링크된 저장소로 가서 tiny_obj_loader.h 파일을 라이브러리 디렉토리의 폴더에 다운로드하세요.

Visual Studio

tiny_obj_loader.h가 있는 디렉토리를 Additional Include Directories 경로에 추가하세요.

Makefile

GCC의 include 디렉토리에 tiny_obj_loader.h가 있는 디렉토리를 추가하세요:

VULKAN_SDK_PATH = /home/user/VulkanSDK/x.x.x.x/x86_64
STB_INCLUDE_PATH = /home/user/libraries/stb
TINYOBJ_INCLUDE_PATH = /home/user/libraries/tinyobjloader

...

CFLAGS = -std=c++17 -I$(VULKAN_SDK_PATH)/include -I$(STB_INCLUDE_PATH) -I$(TINYOBJ_INCLUDE_PATH)

샘플 메시

이 장에서는 아직 조명을 활성화하지 않을 것이므로, 텍스처에 조명이 베이크된 샘플 모델을 사용하는 것이 도움이 됩니다. 이러한 모델을 찾는 쉬운 방법은 Sketchfab에서 3D 스캔을 검색하는 것입니다. 그 사이트의 많은 모델들이 관대한 라이선스로 OBJ 형식으로 제공됩니다.

이 튜토리얼을 위해 저는 Viking room 모델을 선택했습니다. 이 모델은 nigelgoh에 의해 만들어졌으며 (CC BY 4.0) 현재의 기하학적 구조를 대체할 수 있도록 모델의 크기와 방향을 조정했습니다:

원하는 모델을 사용할 수 있지만, 한 가지 재질만으로 구성되어 있고 크기가 약 1.5 x 1.5 x 1.5 단위인지 확인하세요. 만약 그보다 크다면, 뷰 행렬을 변경해야 합니다. 모델 파일을 shaders와 textures 옆의 새로운 models 디렉토리에 넣고 텍스처 이미지를 textures 디렉토리에 넣으세요.

프로그램에 모델 및 텍스처 경로를 정의하는 두 개의 새로운 구성 변수를 넣으세요:

const uint32_t WIDTH = 800;
const uint32_t HEIGHT = 600;

const std::string MODEL_PATH = "models/viking_room.obj";
const std::string TEXTURE_PATH = "textures/viking_room.png";

그리고 이 경로 변수를 사용하도록 createTextureImage를 업데이트하세요:

stbi_uc* pixels = stbi_load(TEXTURE_PATH.c_str(), &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);

정점 및 인덱스 로딩

이제 모델 파일에서 정점과 인덱스를 로드할 것이므로, 전역 vertices 및 indices 배열을 이제 제거하세요. 클래스 멤버로서 비상수 컨테이너로 대체하세요:

std::vector<Vertex> vertices;
std::vector<uint32_t> indices;
VkBuffer vertexBuffer;
VkDeviceMemory vertexBufferMemory;

인덱스의 유형을 uint16_t에서 uint32_t로 변경해야 합니다. 왜냐하면 65535개 이상의 정점이 있을 것이기 때문입니다. vkCmdBindIndexBuffer 매개변수도 변경하세요:

vkCmdBindIndexBuffer(commandBuffer, indexBuffer, 0, VK_INDEX_TYPE_UINT32);

tinyobjloader 라이브러리는 STB 라이브러리와 같은 방식으로 포함됩니다. tiny_obj_loader.h 파일을 포함하고 링커 오류를 방지하기 위해 한 소스 파일에서 TINYOBJLOADER_IMPLEMENTATION을 정의하세요:

#define TINYOBJLOADER_IMPLEMENTATION
#include <tiny_obj_loader.h>

이제 메시의 정점 데이터로 vertices 및 indices 컨테이너를 채우기 위해 이 라이브러리를 사용하는 loadModel 함수를 작성할 것입니다. 이 함수는 버텍스 및 인덱스 버퍼를 생성하기 전 어딘가에서 호출되어야 합니다:

void initVulkan() {
    ...
    loadModel();
    createVertexBuffer();
    createIndexBuffer();
    ...
}

...

void loadModel() {

}

모델은 tinyobj::LoadObj 함수를 호출하여 라이브러리의 데이터 구조로 로드됩니다:

void loadModel() {
    tinyobj::attrib_t attrib;
    std::vector<tinyobj::shape_t> shapes;
    std::vector<tinyobj::material_t> materials;
    std::string warn, err;

    if (!tinyobj::LoadObj(&attrib, &shapes, &materials, &warn, &err, MODEL_PATH.c_str())) {
        throw std::runtime_error(warn + err);
    }
}

OBJ 파일은 위치, 법선, 텍스처 좌표 및 면을 포함합니다. 면은 위치, 법선 및/또는 텍스처 좌표를 인덱스로 참조하는 임의의 양의 정점으로 구성됩니다. 이를 통해 전체 정점뿐만 아니라 개별 속성도 재사용할 수 있습니다.

attrib 컨테이너는 attrib.vertices, attrib.normals, attrib.texcoords 벡터에 모든 위치, 법선 및 텍스처 좌표를 보유합니다. shapes 컨테이너는 모든 개별 객체와 그 면을 포함합니다. 각 면은 정점 배열로 구성되며, 각 정점에는 위치, 법선 및 텍스처 좌표 속성의 인덱스가 포함됩니다. OBJ 모델은 면마다 재질과 텍스처를 정의할 수도 있지만, 우리는 이를 무시할 것입니다.

err 문자열에는 파일을 로딩하는 동안 발생한 오류가 포함되어 있고, warn 문자열에는 재질 정의가 누락된 것과 같은 경고가 포함되어 있습니다. 로딩이 실패했다는 것은 LoadObj 함수가 false를 반환할 때만 해당됩니다. 앞서 언급했듯이, OBJ 파일의 면은 실제로 임의의 수의 정점을 포함할 수 있지만, 우리의 애플리케이션은 삼각형만 렌더링할 수 있습니다. 다행히 LoadObj는 이러한 면을 자동으로 삼각형화하는 선택적 매개변수를 제공하며, 기본적으로 활성화되어 있습니다.

파일의 모든 면을 단일 모델로 결합할 것이므로, 모든 형상에 대해 반복하면 됩니다:

for (const auto& shape : shapes) {

}

삼각형화 기능은 이미 면 당 세 개의 정점을 보장했으므로, 이제 정점을 직접 반복하고 우리의 vertices 벡터로 직접 넣을 수 있습니다:

for (const auto& shape : shapes) {
    for (const auto& index : shape.mesh.indices) {
        Vertex vertex{};

        vertices.push_back(vertex);
        indices.push_back(indices.size());
    }
}

간단함을 위해 지금은 모든 정점이 고유하다고 가정하므로, 간단한 자동 증분 인덱스를 사용합니다. index 변수는 tinyobj::index_t 유형이며, vertex_index, normal_index, texcoord_index 멤버를 포함합니다. 이 인덱스를 사용하여 attrib 배열에서 실제 정점 속성을 찾아야 합니다:

vertex.pos = {
    attrib.vertices[3 * index.vertex_index + 0],
    attrib.vertices[3 * index.vertex_index + 1],
    attrib.vertices[3 * index.vertex_index + 2]
};

vertex.texCoord = {
    attrib.texcoords[2 * index.texcoord_index + 0],
    attrib.texcoords[2 * index.texcoord_index + 1]
};

vertex.color = {1.0f, 1.0f, 1.0f};

불행히도 attrib.vertices 배열은 glm::vec3와 같은 것이 아닌 float 값의 배열이므로 인덱스에 3을 곱해야 합니다. 텍스처 좌표의 경우 각 항목마다 두 개의 텍스처 좌표 구성 요소가 있습니다. 0, 1, 2의 오프셋은 X, Y, Z 구성 요소 또는 텍스처 좌표의 경우 U와 V 구성 요소에 액세스하는 데 사용됩니다.

이제 최적화가 활성화된 상태로 프로그램을 실행하세요(예: Visual Studio의 Release 모드 및 GCC의 -O3 컴파일러 플래그). 그렇지 않으면 모델 로딩이 매우 느릴 것입니다. 다음과 같은 것을 볼 수 있어야 합니다:

훌륭합니다, 기하학은 정확해 보이지만 텍스처는 어떤가요? OBJ 형식은 수직 좌표 0이 이미지의 하단을 의미하는 좌표 체계를 가정합니다. 그러나 우리는 이미지를 Vulkan에 0이 이미지의 상단을 의미하는 상단에서 하단으로의 방향으로 업로드했습니다. 텍스처 좌표의 수직 구성 요소를 뒤집어 해결하세요:

vertex.texCoord = {
    attrib.texcoords[2 * index.texcoord_index + 0],
    1.0f - attrib.texcoords[2 * index.texcoord_index + 1]
};

이제 프로그램을 다시 실행하면 올바른 결과를 볼 수 있어야 합니다:

이렇게 힘든 작업이 이런 데모로 결실을 맺기 시작했습니다!

모델이 회전할 때 벽의 뒷면이 조금 이상하게 보일 수 있습니다. 이것은 정상이며, 단지 모델이 그 쪽에서 보기 위해 설계되지 않았기 때문입니다.

정점 중복 제거

불행히도 우리는 아직 인덱스 버퍼를 제대로 활용하고 있지 않습니다. vertices 벡터는 많은 중복된 정점 데이터를 포함하고 있습니다. 많은 정점들이 여러 삼각형에 포함되어 있기 때문입니다. 고유한 정점만 유지하고 나타날 때마다 인덱스 버퍼를 사용하여 재사용해야 합니다. 이를 구현하는 간단한 방법은 map 또는 unordered_map을 사용하여 고유한 정점과 해당 인덱스를 추적하는 것입니다:

#include <unordered_map>

...

std::unordered_map<Vertex, uint32_t> uniqueVertices{};

for (const auto& shape : shapes) {
    for (const auto& index : shape.mesh.indices) {
        Vertex vertex{};

        ...

        if (uniqueVertices.count(vertex) == 0) {
            uniqueVertices[vertex] = static_cast<uint32_t>(vertices.size());
            vertices.push_back(vertex);
        }

        indices.push_back(uniqueVertices[vertex]);
    }
}

OBJ 파일에서 정점을 읽을 때마다 이전에 동일한 위치와 텍스처 좌표를 가진 정점을 이미 본 적이 있는지 확인합니다. 그렇지 않은 경우 vertices에 추가하고 uniqueVertices 컨테이너에 인덱스를 저장합니다. 그 후 새 정점의 인덱스를 indices에 추가합니다. 이전에 정확히 동일한 정점을 본 경우 uniqueVertices에서 인덱스를 조회하고 그 인덱스를 indices에 저장합니다.

현재 프로그램은 컴파일에 실패할 것입니다. 왜냐하면 해시 테이블에서 사용자 정의 유형인 우리의 Vertex 구조체를 키로 사용하

려면 두 함수를 구현해야 하기 때문입니다: 등가성 테스트와 해시 계산입니다. 전자는 Vertex 구조체에서 == 연산자를 재정의함으로써 쉽게 구현할 수 있습니다:

bool operator==(const Vertex& other) const {
    return pos == other.pos && color == other.color && texCoord == other.texCoord;
}

Vertex에 대한 해시 함수는 std::hash<T>에 대한 템플릿 전문화를 지정함으로써 구현됩니다. 해시 함수는 복잡한 주제이지만, cppreference.com은 구조체의 필드를 결합하여 괜찮은 품질의 해시 함수를 생성하는 다음과 같은 접근 방식을 권장합니다:

namespace std {
    template<> struct hash<Vertex> {
        size_t operator()(Vertex const& vertex) const {
            return ((hash<glm::vec3>()(vertex.pos) ^
                   (hash<glm::vec3>()(vertex.color) << 1)) >> 1) ^
                   (hash<glm::vec2>()(vertex.texCoord) << 1);
        }
    };
}

이 코드는 Vertex 구조체 외부에 배치되어야 합니다. GLM 유형에 대한 해시 함수는 다음 헤더를 사용하여 포함해야 합니다:

#define GLM_ENABLE_EXPERIMENTAL
#include <glm/gtx/hash.hpp>

해시 함수는 gtx 폴더에 정의되어 있으며, 이는 기술적으로 GLM에 대한 실험적 확장임을 의미합니다. 따라서 이를 사용하려면 GLM_ENABLE_EXPERIMENTAL을 정의해야 합니다. 이는 향후 GLM의 새 버전에서 API가 변경될 수 있음을 의미하지만, 실제로는 API가 매우 안정적입니다.

이제 프로그램을 성공적으로 컴파일하고 실행할 수 있어야 합니다. vertices의 크기를 확인하면 1,500,000에서 265,645로 줄어든 것을 볼 수 있습니다! 이는 평균적으로 각 정점이 약 6개의 삼각형에서 재사용된다는 것을 의미합니다. 이것은 확실히 많은 GPU 메모리를 절약합니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

밉맵(mipmap) 생성

소개

이제 프로그램은 3D 모델을 불러오고 렌더링할 수 있습니다. 이 장에서는 밉맵 생성이라는 기능을 추가할 것입니다. 밉맵은 게임과 렌더링 소프트웨어에서 널리 사용되며, Vulkan은 밉맵이 생성되는 방식을 완전히 제어할 수 있게 해줍니다.

밉맵은 이미지의 사전 계산된 축소 버전입니다. 각 새 이미지는 이전 이미지의 너비와 높이의 절반입니다. 밉맵은 세부 수준 또는 LOD의 한 형태로 사용됩니다. 카메라에서 멀리 떨어진 객체는 더 작은 밉맵 이미지에서 텍스처를 샘플링합니다. 더 작은 이미지를 사용하면 렌더링 속도가 향상되고 모아레 패턴과 같은 아티팩트를 방지할 수 있습니다. 밉맵이 어떻게 보이는지 예를 들어보겠습니다:

이미지 생성

Vulkan에서 각 밉맵 이미지는 VkImage의 다른 밉맵 레벨에 저장됩니다. 밉맵 레벨 0은 원본 이미지이며, 레벨 0 이후의 밉맵 레벨은 일반적으로 밉맵 체인이라고 합니다.

VkImage가 생성될 때 밉맵 레벨의 수가 지정됩니다. 지금까지 우리는 이 값을 항상 1로 설정했습니다. 이미지의 치수에서 밉맵 레벨의 수를 계산할 필요가 있습니다. 먼저 이 수를 저장할 클래스 멤버를 추가하세요:

...
uint32_t mipLevels;
VkImage textureImage;
...

createTextureImage에서 텍스처를 불러온 후 mipLevels 값을 찾을 수 있습니다:

int texWidth, texHeight, texChannels;
stbi_uc* pixels = stbi_load(TEXTURE_PATH.c_str(), &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);
...
mipLevels = static_cast<uint32_t>(std::floor(std::log2(std::max(texWidth, texHeight)))) + 1;

이것은 밉맵 체인의 레벨 수를 계산합니다. max 함수는 가장 큰 치수를 선택합니다. log2 함수는 그 치수를 2로 몇 번 나눌 수 있는지 계산합니다. floor 함수는 가장 큰 치수가 2의 거듭제곱이 아닌 경우를 처리합니다. 원본 이미지에 밉맵 레벨이 있도록 1을 더합니다.

이 값을 사용하려면 createImage, createImageView, transitionImageLayout 함수를 변경하여 밉맵 레벨의 수를 지정할 수 있도록 해야 합니다. 함수에 mipLevels 매개변수를 추가하세요:

void createImage(uint32_t width, uint32_t height, uint32_t mipLevels, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, VkImage& image, VkDeviceMemory& imageMemory) {
    ...
    imageInfo.mipLevels = mipLevels;
    ...
}

VkImageView createImageView(VkImage image, VkFormat format, VkImageAspectFlags aspectFlags, uint32_t mipLevels) {
    ...
    viewInfo.subresourceRange.levelCount = mipLevels;
    ...

void transitionImageLayout(VkImage image, VkFormat format, VkImageLayout oldLayout, VkImageLayout newLayout, uint32_t mipLevels) {
    ...
    barrier.subresourceRange.levelCount = mipLevels;
    ...

이 함수들의 모든 호출을 업데이트하여 올바른 값을 사용하세요:

createImage(swapChainExtent.width, swapChainExtent.height, 1, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
...
createImage(texWidth, texHeight, mipLevels, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);

swapChainImageViews[i] = createImageView(swapChainImages[i], swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT, 1);
...
depthImageView = createImageView(depthImage, depthFormat, VK_IMAGE_ASPECT_DEPTH_BIT, 1);
...
textureImageView = createImageView(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_ASPECT_COLOR_BIT, mipLevels);

transitionImageLayout(depthImage, depthFormat, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, 1);
...
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, mipLevels);

밉맵 생성

우리의 텍스처 이미지에는 이제 여러 밉맵 레벨이 있지만, 스테이징 버퍼는 밉맵 레벨 0만 채울 수 있습니다. 다른 레벨은 여전히 정의되지 않았습니다. 이 레벨들을 채우려면 가지고 있는 단일 레벨에서 데이터를 생성해야 합니다. vkCmdBlitImage 명령을 사용할 것입니다. 이 명령은 복사, 크기 조정 및 필터링 작업을 수행합니다. 우리는 이 명령을 여러 번 호출하여 텍스처 이미지의 각 레벨에 데이터를 blit할 것입니다.

vkCmdBlitImage는 전송 작업으로 간주되므로 Vulkan에 텍스처 이미지를 전송의 소스 및 대상으로 사용하려는 의도를 알려야 합니다. createTextureImage에서 텍스처 이미지의 사용 플래그에 VK_IMAGE_USAGE_TRANSFER_SRC_BIT를 추가하세요:

...
createImage(texWidth, texHeight, mipLevels, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
...

다른 이미지 작업과 마찬가지로, vkCmdBlitImage는 작업하는 이미지의 레이아웃에 따라 달라집니다. 전체 이미지를 VK_IMAGE_LAYOUT_GENERAL로 전환할 수 있지만, 이는 아마도 느릴 것입니다. 최적의 성능을 위해 소스 이미지는 VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL이어야 하며, 대상 이미지는 VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL이어야 합니다. Vulkan은 이미지의 각 밉맵 레벨을 독립적으로 전환할 수 있습니다. 각 blit는 한 번에 두 개의 밉맵 레벨만 다루므로, blit 명령 사이에 각 레벨을 최적의 레이아웃으로 전환할 수 있습니다.

transitionImageLayout은 전체 이미지에 대해서만 레이아웃 전환을 수행하므로, 몇 가지 추가 파이프라인 배리어 명령을 작성해야 합니다. createTextureImage에서 `VK_IMAGE_LAYOUT_SHADER

_READ_ONLY_OPTIMAL`로의 기존 전환을 제거하세요:

...
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, mipLevels);
    copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
//transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL while generating mipmaps
...

이렇게 하면 텍스처 이미지의 각 레벨이 VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL에 남게 됩니다. 각 레벨은 읽기 작업이 끝난 후 VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL로 전환됩니다.

이제 밉맵을 생성하는 함수를 작성할 것입니다:

void generateMipmaps(VkImage image, int32_t texWidth, int32_t texHeight, uint32_t mipLevels) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    VkImageMemoryBarrier barrier{};
    barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
    barrier.image = image;
    barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
    barrier.subresourceRange.baseArrayLayer = 0;
    barrier.subresourceRange.layerCount = 1;
    barrier.subresourceRange.levelCount = 1;

    endSingleTimeCommands(commandBuffer);
}

여러 전환을 수행할 것이므로 이 VkImageMemoryBarrier를 재사용할 것입니다. 위에서 설정된 필드는 모든 배리어에 대해 동일하게 유지됩니다. subresourceRange.miplevel, oldLayout, newLayout, srcAccessMask, dstAccessMask는 각 전환마다 변경됩니다.

int32_t mipWidth = texWidth;
int32_t mipHeight = texHeight;

for (uint32_t i = 1; i < mipLevels; i++) {

}

이 루프는 각 VkCmdBlitImage 명령을 기록할 것입니다. 루프 변수가 0이 아니라 1에서 시작한다는 점에 유의하세요.

barrier.subresourceRange.baseMipLevel = i - 1;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;

vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0,
    0, nullptr,
    0, nullptr,
    1, &barrier);

먼저 레벨 i - 1을 VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL로 전환합니다. 이 전환은 레벨 i - 1이 채워지기를 기다립니다. 이는 이전 blit 명령 또는 vkCmdCopyBufferToImage에서 가져온 것일 수 있습니다. 현재 blit 명령은 이 전환을 기다릴 것입니다.

VkImageBlit blit{};
blit.srcOffsets[0] = { 0, 0, 0 };
blit.srcOffsets[1] = { mipWidth, mipHeight, 1 };
blit.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
blit.srcSubresource.mipLevel = i - 1;
blit.srcSubresource.baseArrayLayer = 0;
blit.srcSubresource.layerCount = 1;
blit.dstOffsets[0] = { 0, 0, 0 };
blit.dstOffsets[1] = { mipWidth > 1 ? mipWidth / 2 : 1, mipHeight > 1 ? mipHeight / 2 : 1, 1 };
blit.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
blit.dstSubresource.mipLevel = i;
blit.dstSubresource.baseArrayLayer = 0;
blit.dstSubresource.layerCount = 1;

다음으로, blit 작업에서 사용될 영역을 지정합니다. 소스 밉맵 레벨은 i - 1이고 대상 밉맵 레벨은 i입니다. srcOffsets 배열의 두 요소는 데이터가 blit될 3D 영역을 결정합니다. dstOffsets는 데이터가 blit될 영역을 결정합니다. dstOffsets[1]의 X 및 Y 치수는 이전 레벨의 절반 크기이므로 2로 나눕니다. srcOffsets[1] 및 dstOffsets[1]의 Z 치수는 1이어야 합니다. 왜냐하면 2D 이미지는 깊이가 1이기 때문입니다.

vkCmdBlitImage(commandBuffer,
    image, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
    image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    1, &blit,
    VK_FILTER_LINEAR);

이제 blit 명령을 기록합니다. textureImage가 srcImage 및 dstImage 매개변수 모두에 사용됩니다. 이는 동일한 이미지의 다른 레벨 간에 blit되기 때문입니다. 소스 밉맵 레벨은 방금 VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL로 전환되었고 대상 레벨은 createTextureImage에서 VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL에 있습니다.

전용 전송 큐를 사용하는 경우(Vertex buffers에서 제안한 바와 같이) 주의하십시오: vkCmdBlitImage는 그래픽 기능이 있는 큐에 제출되어야 합니다.

마지막 매개변수를 사용하여 blit에서 사용할 VkFilter를 지정할 수 있습니다. VkSampler를 만들 때 가진 필터링 옵션과 동일한 옵션이 여기에 있습니다. 보간을 활성화하려면 VK_FILTER_LINEAR를 사용합니다.

barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0,
    0, nullptr,
    0, nullptr,
    1, &barrier);

이 배리어는 밉맵 레벨 i - 1을 VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL로 전환합니다. 이 전환은 현재 blit 명령이 끝나기를 기다립니다. 모든 샘플링 작업은 이 전환을 끝나기를 기다릴 것입니다.

    ...
    if (mipWidth > 1) mipWidth /= 2;
    if (mipHeight > 1) mipHeight /= 2;

루프의 끝에서 현재 밉맵 치수를 2로 나눕니다. 이는 각 치수를 나누기 전에 확인하여 그 치수가 결코 0이 되지 않도록 합니다. 이는 이미지가 정사각형이 아닌 경우를 처리합니다. 왜냐하면 밉맵 치수 중 하나가 다른 치수보다 먼저 1에 도달할 것이기 때문입니다. 이 경우 해당 치수는 남은 모든 레벨에 대해 1이어야 합니다.

    barrier.subresourceRange.baseMipLevel = mipLevels - 1;
    barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
    barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

    vkCmdPipelineBarrier(commandBuffer,
        VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0,
        0, nullptr,
        0, nullptr,
        1, &barrier);

    endSingleTimeCommands(commandBuffer);
}

명령 버퍼를 종료하기 전에 하나 더 파이

프라인 배리어를 삽입합니다. 이 배리어는 마지막 밉맵 레벨을 VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL에서 VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL로 전환합니다. 이는 루프에서 처리되지 않았습니다. 왜냐하면 마지막 밉맵 레벨은 결코 blit되지 않기 때문입니다.

마지막으로 createTextureImage에서 generateMipmaps 호출을 추가하세요:

transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, mipLevels);
    copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
//transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL while generating mipmaps
...
generateMipmaps(textureImage, texWidth, texHeight, mipLevels);

이제 텍스처 이미지의 밉맵이 완전히 채워졌습니다.

선형 필터링 지원

vkCmdBlitImage와 같은 내장 함수를 사용하여 모든 밉맵 레벨을 생성하는 것은 매우 편리하지만, 불행히도 모든 플랫폼에서 지원되는 것은 보장되지 않습니다. 사용하는 텍스처 이미지 형식이 선형 필터링을 지원해야 하는데, 이는 vkGetPhysicalDeviceFormatProperties 함수로 확인할 수 있습니다. generateMipmaps 함수에 이를 확인하는 절차를 추가할 것입니다.

먼저 이미지 형식을 지정하는 추가 매개변수를 추가하세요:

void createTextureImage() {
    ...

    generateMipmaps(textureImage, VK_FORMAT_R8G8B8A8_SRGB, texWidth, texHeight, mipLevels);
}

void generateMipmaps(VkImage image, VkFormat imageFormat, int32_t texWidth, int32_t texHeight, uint32_t mipLevels) {

    ...
}

generateMipmaps 함수에서 vkGetPhysicalDeviceFormatProperties를 사용하여 텍스처 이미지 형식의 속성을 요청하세요:

void generateMipmaps(VkImage image, VkFormat imageFormat, int32_t texWidth, int32_t texHeight, uint32_t mipLevels) {

    // 선형 블리팅이 이미지 형식을 지원하는지 확인
    VkFormatProperties formatProperties;
    vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperties);

    ...

VkFormatProperties 구조체에는 사용 방식에 따라 형식을 사용할 수 있는 방법을 설명하는 linearTilingFeatures, optimalTilingFeatures, bufferFeatures라는 세 개의 필드가 있습니다. 우리는 최적 타일링 형식으로 텍스처 이미지를 생성하므로 optimalTilingFeatures를 확인해야 합니다. 선형 필터링 기능은 VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT로 확인할 수 있습니다:

if (!(formatProperties.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT)) {
    throw std::runtime_error("texture image format does not support linear blitting!");
}

이 경우에는 두 가지 대안이 있습니다. 선형 블리팅을 지원하는 다른 일반 텍스처 이미지 형식을 찾는 함수를 구현하거나, stb_image_resize와 같은 라이브러리로 소프트웨어에서 밉맵 생성을 구현할 수 있습니다. 각 밉맵 레벨은 원본 이미지를 로드한 것과 동일한 방식으로 이미지에 로드될 수 있습니다.

일반적으로 런타임에 밉맵 레벨을 생성하는 것은 드뭅니다. 일반적으로 로딩 속도를 향상시키기 위해 기본 레벨과 함께 텍스처 파일에 미리 생성되어 저장됩니다. 소프트웨어에서 크기를 조절하고 파일에서 여러 레벨을 로드하는 것은 독자에게 남겨진 연습입니다.

샘플러

VkImage는 밉맵 데이터를 보유하지만 VkSampler는 렌더링하는 동안 해당 데이터가 어떻게 읽히는지를 제어합니다. Vulkan은 minLod, maxLod, mipLodBias, mipmapMode("Lod"는 "세부 수준"을 의미)를 지정할 수 있게 해줍니다. 텍스처가 샘플링될 때, 샘플러는 다음 의사 코드에 따라 밉맵 레벨을 선택합니다:

lod = getLodLevelFromScreenSize(); //객체가 가까울 때 작을 수 있으며 음수일 수 있음
lod = clamp(lod + mipLodBias, minLod, maxLod);

level = clamp(floor(lod), 0, texture.mipLevels - 1);  //텍스처의 밉맵 레벨 수에 맞춰 클램핑됨

if (mipmapMode == VK_SAMPLER_MIPMAP_MODE_NEAREST) {
    color = sample(level);
} else {
    color = blend(sample(level), sample(level + 1));
}

samplerInfo.mipmapMode가 VK_SAMPLER_MIPMAP_MODE_NEAREST인 경우, lod는 샘플링할 밉맵 레벨을 선택합니다. mipmap 모드가 VK_SAMPLER_MIPMAP_MODE_LINEAR인 경우, lod는 샘플링할 두 밉맵 레벨을 선택하는 데 사용됩니다. 이들 레벨은 샘플링되고 결과는 선형적으로 혼합됩니다.

샘플 작업도 lod의 영향을 받습니다:

if (lod <= 0) {
    color = readTexture(uv, magFilter);
} else {
    color = readTexture(uv, minFilter);
}

객체가 카메라에 가까우면 magFilter가 필터로 사용됩니다. 객체가 카메라에서 멀어지면 minFilter가 사용됩니다. 일반적으로 lod는 음수가 아니며, 카메라에 가까울 때만 0입니다. mipLodBias를 사용하면 일반적으로 사용할 것보다 낮은 lod와 level을 강제로 사용할 수 있습니다.

이 장의 결과를 보려면 textureSampler에 대한 값을 선택해야 합니다. 이미 minFilter와 magFilter를 VK_FILTER_LINEAR를 사용하도록 설정했습니다. minLod, maxLod, mipLodBias, mipmapMode에 대한 값을 선택하기만 하면 됩니다.

void createTextureSampler() {
    ...
    samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
    samplerInfo.minLod = 0.0f; // 선택 사항
    samplerInfo.maxLod = VK_LOD_CLAMP_NONE;
    samplerInfo.mipLodBias = 0.0f; // 선택 사항
    ...
}

모든 밉맵 레벨을 사용할 수 있도록 허용하려면 minLod를 0.0f로 설정하고 maxLod를 VK_LOD_CLAMP_NONE으로 설정합니다. 이 상수는 1000.0f와 같으며, 이는 텍스처에서 사용 가능한 모든 밉맵 레벨이 샘플링될 것임을 의미합니다.

lod 값을 변경할 이유가 없으므로 mipLodBias를 0.0f로 설정합니다.

이제 프로그램을 실행하면 다음과 같은 결과를 볼 수 있습니다:

우리의 장면이 매우 간단하기 때문에 큰 차이는 아니지만, 자세히 보면 미묘한 차이가 있습니다.

가장 눈에 띄는 차이는 종이에 쓰여진 글입니다. 밉맵을 사용하면 글이 부드럽게 처리됩니다. 밉맵을 사용하지 않으면 글에 거친 가장자리와 모아레 아티팩트로 인한 간격이 생깁니다.

샘플러 설정을 변경하여 밉맵이 어떻게 영향을 받는지 실험해 볼 수 있습니다. 예를 들어, minLod를 변경하여 샘플러가 가장 낮은 밉맵 레벨을 사용하지 않도록 강제할 수 있습니다:

samplerInfo.minLod = static_cast<float>(mipLevels / 2);

이 설정은 다음 이미지를 생성합니다:

이것은 객체가 카메라에서 멀어질 때 더 높은 밉맵 레벨이 사용되는 방식입니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

멀티샘플링

소개

프로그램은 이제 텍스처에 대해 여러 세부 수준을 불러올 수 있으며, 이는 뷰어에서 멀리 떨어진 객체를 렌더링할 때 아티팩트를 수정합니다. 이미지는 이제 훨씬 부드러워졌지만, 자세히 살펴보면 그려진 기하학적 도형의 가장자리를 따라 톱니 모양의 패턴이 보입니다. 이는 우리가 초기 프로그램에서 쿼드를 렌더링했을 때 특히 눈에 띕니다:

이 원치 않는 효과를 "앨리어싱"이라고 하며, 렌더링에 사용할 수 있는 픽셀 수가 제한되어 있기 때문에 발생합니다. 무한한 해상도를 가진 디스플레이는 존재하지 않으므로, 어느 정도는 항상 보일 것입니다. 이를 수정하는 몇 가지 방법이 있으며, 이 장에서는 가장 인기 있는 방법 중 하나인 멀티샘플 안티앨리어싱 (MSAA)에 초점을 맞출 것입니다.

일반적인 렌더링에서 픽셀 색상은 대부분의 경우 화면상의 대상 픽셀 중심에 있는 단일 샘플 지점을 기반으로 결정됩니다. 그려진 선의 일부가 특정 픽셀을 통과하지만 샘플 지점을 덮지 않으면 해당 픽셀은 비워져 "계단식" 효과를 초래합니다.

MSAA가 하는 일은 픽셀당 여러 샘플 지점(이름에서 알 수 있듯이)을 사용하여 최종 색상을 결정하는 것입니다. 예상할 수 있듯이, 샘플이 많을수록 결과는 더 좋지만 계산 비용도 더 많이 듭니다.

우리의 구현에서는 사용 가능한 최대 샘플 수를 사용하는 데 중점을 둘 것입니다. 응용 프로그램에 따라 이것이 항상 최선의 접근 방법은 아닐 수 있으며, 최종 결과가 품질 요구 사항을 충족한다면 더 높은 성능을 위해 샘플 수를 줄이는 것이 더 낫습니다.

사용 가능한 샘플 수 가져오기

하드웨어에서 사용할 수 있는 샘플 수를 결정하기 위해 시작합시다. 대부분의 현대 GPU는 최소 8개의 샘플을 지원하지만, 이 숫자는 어디에서나 동일하다는 보장은 없습니다. 새로운 클래스 멤버를 추가하여 이를 추적할 것입니다:

...
VkSampleCountFlagBits msaaSamples = VK_SAMPLE_COUNT_1_BIT;
...

기본적으로 픽셀당 하나의 샘플만 사용할 것이며, 이는 멀티샘플링이 없는 것과 동일하므로 최종 이미지는 변경되지 않을 것입니다. 정확한 최대 샘플 수는 선택한 물리적 장치와 연관된 VkPhysicalDeviceProperties에서 추출할 수 있습니다. 우리는 깊이 버퍼를 사용하므로 색상과 깊이에 대한 샘플 수를 모두 고려해야 합니다. 둘 다 지원하는 가장 높은 샘플 수가 최대 지원 가능한 값이 될 것입니다. 이 정보를 가져오는 함수를 추가하세요:

VkSampleCountFlagBits getMaxUsableSampleCount() {
    VkPhysicalDeviceProperties physicalDeviceProperties;
    vkGetPhysicalDeviceProperties(physicalDevice, &physicalDeviceProperties);

    VkSampleCountFlags counts = physicalDeviceProperties.limits.framebufferColorSampleCounts & physicalDeviceProperties.limits.framebufferDepthSampleCounts;
    if (counts & VK_SAMPLE_COUNT_64_BIT) { return VK_SAMPLE_COUNT_64_BIT; }
    if (counts & VK_SAMPLE_COUNT_32_BIT) { return VK_SAMPLE_COUNT_32_BIT; }
    if (counts & VK_SAMPLE_COUNT_16_BIT) { return VK_SAMPLE_COUNT_16_BIT; }
    if (counts & VK_SAMPLE_COUNT_8_BIT) { return VK_SAMPLE_COUNT_8_BIT; }
    if (counts & VK_SAMPLE_COUNT_4_BIT) { return VK_SAMPLE_COUNT_4_BIT; }
    if (counts & VK_SAMPLE_COUNT_2_BIT) { return VK_SAMPLE_COUNT_2_BIT; }

    return VK_SAMPLE_COUNT_1_BIT;
}

이제 이 함수를 사용하여 물리적 장치 선택 과정에서 msaaSamples 변수를 설정할 것입니다. 이를 위해 pickPhysicalDevice 함수를 약간 수정해야 합니다:

void pickPhysicalDevice() {
    ...
    for (const auto& device : devices) {
        if (isDeviceSuitable(device)) {
            physicalDevice = device;
            msaaSamples = getMaxUsableSampleCount();
            break;
        }
    }
    ...
}

렌더 타겟 설정

MSAA에서는 각 픽셀이 화면에 렌더링되기 전에 오프스크린 버퍼에서 샘플링됩니다. 이 새 버퍼는 우리가 렌더링했던 일반 이미지와 약간 다릅니다. 픽셀당 하나 이상의 샘플을 저장할 수 있어야 합니다. 멀티샘플 버퍼가 생성되면 기본 프레임버퍼(픽셀당 하나의 샘플만 저장)로 해결해야 합니다. 이것이 우리가 추가 렌더 타겟을 생성하고 현재 드로잉 프로세스를 수정해야 하는 이유입니다. 깊이 버퍼와 마찬가지로 한 번에 하나의 드로잉 작업만 활성화되므로 하나의 렌더 타겟만 필요합니다. 다음 클래스 멤버를 추가하세요:

...
VkImage colorImage;
VkDeviceMemory colorImageMemory;
VkImageView colorImageView;
...

이 새 이미지는 픽셀당 원하는 샘플 수를 저장해야 하므로, 이미지 생성 과정에서 VkImageCreateInfo에 이 숫자를 전달해야 합니다. createImage 함수를 수정하여 numSamples 매개변수를 추가하세요:

void createImage(uint32_t width, uint32_t height, uint32_t mipLevels, VkSampleCountFlagBits numSamples, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, VkImage& image, VkDeviceMemory& imageMemory) {
    ...
   

 imageInfo.samples = numSamples;
    ...

현재로서는 이 함수를 호출할 때 VK_SAMPLE_COUNT_1_BIT를 사용하여 모든 호출을 업데이트하세요 - 구현이 진행됨에 따라 적절한 값으로 이를 대체할 것입니다:

createImage(swapChainExtent.width, swapChainExtent.height, 1, VK_SAMPLE_COUNT_1_BIT, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
...
createImage(texWidth, texHeight, mipLevels, VK_SAMPLE_COUNT_1_BIT, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);

이제 멀티샘플 컬러 버퍼를 생성할 것입니다. createColorResources 함수를 추가하고 createImage 함수에 함수 매개변수로 msaaSamples를 사용하는 것을 주목하세요. 또한, 이미지가 픽셀당 하나 이상의 샘플을 가지는 경우 Vulkan 사양에서 강제하는 대로 한 개의 밉맵 레벨만 사용하고 있습니다. 또한, 이 컬러 버퍼는 텍스처로 사용되지 않을 것이므로 밉맵이 필요 없습니다:

void createColorResources() {
    VkFormat colorFormat = swapChainImageFormat;

    createImage(swapChainExtent.width, swapChainExtent.height, 1, msaaSamples, colorFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT | VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, colorImage, colorImageMemory);
    colorImageView = createImageView(colorImage, colorFormat, VK_IMAGE_ASPECT_COLOR_BIT, 1);
}

일관성을 유지하기 위해 createDepthResources 바로 전에 함수를 호출하세요:

void initVulkan() {
    ...
    createColorResources();
    createDepthResources();
    ...
}

이제 멀티샘플 컬러 버퍼가 준비되었으므로 깊이를 처리할 차례입니다. createDepthResources를 수정하고 깊이 버퍼에서 사용되는 샘플 수를 업데이트하세요:

void createDepthResources() {
    ...
    createImage(swapChainExtent.width, swapChainExtent.height, 1, msaaSamples, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
    ...
}

이제 몇 가지 새로운 Vulkan 리소스를 생성했으므로 필요할 때 이들을 해제하는 것을 잊지 마세요:

void cleanupSwapChain() {
    vkDestroyImageView(device, colorImageView, nullptr);
    vkDestroyImage(device, colorImage, nullptr);
    vkFreeMemory(device, colorImageMemory, nullptr);
    ...
}

창 크기가 조절될 때 새 컬러 이미지가 올바른 해상도로 다시 생성될 수 있도록 recreateSwapChain을 업데이트하세요:

void recreateSwapChain() {
    ...
    createImageViews();
    createColorResources();
    createDepthResources();
    ...
}

초기 MSAA 설정을 완료했으므로 이제 이 새 리소스를 그래픽 파이프라인, 프레임버퍼, 렌더 패스에서 사용하고 결과를 확인해야 합니다!

새 어태치먼트 추가

먼저 렌더 패스를 처리하세요. createRenderPass를 수정하고 색상 및 깊이 어태치먼트 생성 정보 구조체를 업데이트하세요:

void createRenderPass() {
    ...
    colorAttachment.samples = msaaSamples;
    colorAttachment.finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    ...
    depthAttachment.samples = msaaSamples;
    ...

`VK_IMAGE_LAYOUT_PRESENT_SRC_KHR에서 VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL로 finalLayout을 변경한 것을 알 수 있습니다. 멀티샘플 이미지는 직접 표시될 수 없기 때문입니다. 먼저 이를 일반 이미지로 변환해야 합니다. 이 요구사항은 깊이 버퍼에는 적용되지 않습니다, 왜냐하면 깊이 버퍼는 어떤 경우에도 표시될 일이 없기 때문입니다. 그러므로 이른바 해결 어태치먼트로 알려진 색상을 위한 새로운 어태치먼트 하나만 추가해야 합니다:

    ...
    VkAttachmentDescription colorAttachmentResolve{};
    colorAttachmentResolve.format = swapChainImageFormat;
    colorAttachmentResolve.samples = VK_SAMPLE_COUNT_1_BIT;
    colorAttachmentResolve.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    colorAttachmentResolve.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
    colorAttachmentResolve.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    colorAttachmentResolve.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    colorAttachmentResolve.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    colorAttachmentResolve.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
    ...

렌더 패스는 이제 멀티샘플 컬러 이미지를 일반 첨부 파일로 해결하도록 지시받아야 합니다. 새로운 첨부 참조를 생성하여 해결 대상으로 사용될 컬러 버퍼를 가리키게 합니다:

    ...
    VkAttachmentReference colorAttachmentResolveRef{};
    colorAttachmentResolveRef.attachment = 2;
    colorAttachmentResolveRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    ...

pResolveAttachments 하위 패스 구조 멤버를 새로 생성된 첨부 참조를 가리키도록 설정하세요. 이것으로 렌더 패스는 멀티샘플 해결 작업을 정의할 수 있으며, 이를 통해 이미지를 화면에 렌더링할 수 있습니다:

    ...
    subpass.pResolveAttachments = &colorAttachmentResolveRef;
    ...

멀티샘플 컬러 이미지를 재사용하므로 VkSubpassDependency의 srcAccessMask를 업데이트하는 것이 필요합니다. 이 업데이트는 색상 첨부에 대한 모든 쓰기 작업이 완료된 후 후속 작업이 시작되도록 보장함으로써 쓰기 후 쓰기 위험을 방지하고 불안정한 렌더링 결과를 초래할 수 있습니다:

    ...
    dependency.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    ...

이제 새 색상 첨부 파일로 렌더 패스 정보 구조를 업데이트하세요:

    ...
    std::array<VkAttachmentDescription, 3> attachments = {colorAttachment, depthAttachment, colorAttachmentResolve};
    ...

렌더 패스가 준비되었으므로 createFramebuffers를 수정하고 새 이미지 뷰를 목록에 추가하세요:

void createFramebuffers() {
        ...
        std::array<VkImageView, 3> attachments = {
            colorImageView,
            depthImageView,
            swapChainImageViews[i]
        };
        ...
}

새로 생성된 파이프라인이 둘 이상의 샘플을 사용하도록 설정하려면 createGraphicsPipeline을 수정하세요:

void createGraphicsPipeline() {
    ...
    multisampling.rasterizationSamples = msaaSamples;
    ...
}

이제 프로그램을 실행하면 다음과 같은 결과를 볼 수 있습니다:

밉맵과 마찬가지로, 차이점이 바로 눈에 띄지 않을 수 있습니다. 자세히 보면 가장자리가 덜 톱니 모양이며 전체 이미지가

원본에 비해 약간 부드러워 보입니다.

차이점은 가장자리 중 하나를 가까이에서 볼 때 더 두드러집니다:

품질 개선

현재 MSAA 구현에는 더 자세한 장면에서 출력 이미지의 품질에 영향을 줄 수 있는 몇 가지 제한이 있습니다. 예를 들어, 현재는 셰이더 에일리어싱으로 인해 발생할 수 있는 잠재적 문제를 해결하지 않고 있습니다. 즉, MSAA는 기하학적 도형의 가장자리만 부드럽게 만들고 내부 채우기는 그대로 둡니다. 이로 인해 화면에 부드럽게 렌더링된 다각형이 있지만 대비가 높은 색상이 포함된 텍스처는 여전히 에일리어싱이 보일 수 있습니다. 이 문제를 해결하는 한 가지 방법은 샘플 셰이딩을 활성화하는 것입니다. 이는 이미지 품질을 추가로 향상시킬 수 있지만 추가적인 성능 비용이 듭니다:


void createLogicalDevice() {
    ...
    deviceFeatures.sampleRateShading = VK_TRUE; // 디바이스에 샘플 셰이딩 기능을 활성화
    ...
}

void createGraphicsPipeline() {
    ...
    multisampling.sampleShadingEnable = VK_TRUE; // 파이프라인에서 샘플 셰이딩을 활성화
    multisampling.minSampleShading = .2f; // 샘플 셰이딩의 최소 비율; 1에 가까울수록 더 부드러움
    ...
}

이 예에서는 샘플 셰이딩을 비활성화하겠지만 특정 시나리오에서는 품질 개선이 눈에 띄게 나타날 수 있습니다:

결론

이 지점에 도달하기까지 많은 작업이 필요했지만, 이제 Vulkan 프로그램에 대한 좋은 기반을 갖추게 되었습니다. 이제 기본 Vulkan 원리에 대한 지식이 있으므로 더 많은 기능을 탐색하기 시작할 수 있습니다, 예를 들면:

푸시 상수
인스턴스 렌더링
동적 유니폼
별도의 이미지 및 샘플러 디스크립터
파이프라인 캐시
멀티스레드 명령 버퍼 생성
여러 서브패스
컴퓨트 셰이더

현재 프로그램은 많은 방법으로 확장될 수 있습니다, 예를 들어 Blinn-Phong 조명, 후처리 효과 및 그림자 매핑을 추가하는 것입니다. Vulkan의 명시성에도 불구하고 많은 개념이 여전히 동일하게 작동하기 때문에 다른 API의 튜토리얼에서 이러한 효과가 어떻게 작동하는지 배울 수 있어야 합니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더

컴퓨트 셰이더

소개

이번 추가 장에서는 컴퓨트 셰이더에 대해 살펴보겠습니다. 지금까지의 모든 장들은 Vulkan 파이프라인의 전통적인 그래픽 부분에 초점을 맞췄습니다. 그러나 OpenGL과 같은 이전 API와 달리, Vulkan에서는 컴퓨트 셰이더 지원이 필수입니다. 이는 모든 Vulkan 구현에서 컴퓨트 셰이더를 사용할 수 있음을 의미합니다. 고성능 데스크탑 GPU든 저전력 임베디드 디바이스든 상관없이 말이죠.

이로 인해 그래픽 프로세서 유닛(GPU)에서의 일반 목적 컴퓨팅(GPGPU)의 세계가 열렸습니다. GPGPU는 전통적으로 CPU의 영역이었던 일반 계산을 GPU에서 수행할 수 있음을 의미합니다. GPU가 점점 더 강력하고 유연해지면서 CPU의 일반 목적 기능을 요구하는 많은 작업들이 이제 GPU에서 실시간으로 처리될 수 있습니다.

GPU의 컴퓨트 기능을 사용할 수 있는 몇 가지 예는 이미지 조작, 가시성 테스트, 후처리, 고급 조명 계산, 애니메이션, 물리학(예: 입자 시스템) 등이 있습니다. 심지어 그래픽 출력이 필요 없는 계산만을 위한 컴퓨트를 사용하는 것도 가능합니다(예: 숫자 연산 또는 AI 관련 작업). 이를 "헤드리스 컴퓨트"라고 합니다.

장점

GPU에서 계산 집약적인 계산을 수행하는 것은 여러 장점이 있습니다. 가장 명확한 장점은 CPU에서 작업을 오프로딩하는 것입니다. 또 다른 장점은 CPU의 주 메모리와 GPU의 메모리 사이에 데이터를 이동할 필요가 없다는 것입니다. 모든 데이터가 GPU에 머무르며 주 메모리에서 느린 전송을 기다릴 필요가 없습니다.

이 외에도, GPU는 수천 개의 작은 컴퓨트 유닛을 갖춘 매우 병렬화된 구조를 가지고 있는 반면, 몇 개의 큰 컴퓨트 유닛을 갖는 CPU보다 고도로 병렬적인 워크플로우에 더 적합할 수 있습니다.

Vulkan 파이프라인

컴퓨트가 그래픽 파이프라인 부분과 완전히 분리되어 있음을 아는 것이 중요합니다. 공식 사양에서 나온 다음 블록 다이어그램의 Vulkan 파이프라인에서 이를 볼 수 있습니다:

이 다이어그램에서 왼쪽에는 전통적인 그래픽 파이프라인 부분을 볼 수 있고, 오른쪽에는 이 그래픽 파이프라인의 일부가 아닌 여러 단계를 볼 수 있습니다. 컴퓨트 셰이더(단계)와 같은 것들이죠. 컴퓨트 셰이더 단계가 그래픽 파이프라인과 분리되어 있기 때문에 필요한 곳에서 언제든지 사용할 수 있습니다. 예를 들어, 프래그먼트 셰이더는 항상 버텍스 셰이더의 변환된 출력에 적용되는 반면 말이죠.

다이어그램의 중앙에서도 디스크립터 세트 등을 컴퓨트에서도 사용한다는 것을 알 수 있습니다. 따라서 디스크립터 레이아웃, 디스크립터 세트 및 디스크립터에 대해 배운 모든 것이 여기에도 적용됩니다.

예제

이 장에서 구현할 이해하기 쉬운 예제는 GPU 기반 입자 시스템입니다. 이러한 시스템은 많은 게임에서 사용되며, 대개 수천 개의 입자가 상호 작용하는 프레임 속도로 업데이트되어야 합니다. 이러한 시스템을 렌더링하는 데는 두 가지 주요 구성 요소가 필요합니다: 버텍스 버퍼로 전달된 정점과 어떤 방정식에 기반하여 이들을 업데이트하는 방법입니다.

"전통적인" CPU 기반 입자 시스템은 입자 데이터를 시스템의 주 메모리에 저장하고 CPU를 사용하여 이를 업데이트합니다. 업데이트 후, 정점을 GPU의 메모리로 다시 전송하여 다음 프레임에서 업데이트된 입자를 표시할 수 있습니다. 가장 직관적인 방법은 각 프레임마다 정점 버퍼를 새 데이터로 다시 생성하는 것입니다. 이는 분명히 매우 비용이 많이 듭니다. 구현에 따라 다른 옵션들도 있습니다. 예를 들어 데스크탑 시스템에서는 "resizable BAR"로 알려진 GPU 메모리 매핑을 사용하거나, 통합 GPU에서 통합 메모리를 사용하는 것입니다. 또는 호스트 로컬 버퍼를 사용하는 방법도 있습니다(이는 PCI-E 대역폭 때문에 가장 느린 방법일 것입니다). 하지만 어떤 버퍼 업데이트 방법을 선택하든, 입자를 업데이트하기 위해 항상 "왕복" CPU가 필요합니다.

GPU 기반 입자 시스템에서는 이러한 왕복이 더 이상 필요하지 않습니다. 정점은 시작할 때 한 번만 GPU로 업로드되고 모든 업데이트는 GPU의 메모리에서 컴퓨트 셰이더를 사용하여 수행됩니다. 이것이 더 빠른 주된 이유 중 하나는 GPU와 로컬 메모리 간의 훨씬 높은 대역폭 때문입니다. CPU 기반 시나리오에서는 주 메모리 및 PCI-익스프레스 대역폭에 의해 제한될 것이며, 이는 종종 GPU의 메모리 대역폭의 일부에 불과합니다.

GPU에 전용 컴퓨트 큐가 있는 경우, 그래픽 파이프라인의 렌더링 부분과 병렬로 입자를 업데이트할 수 있습니다. 이를 "비동기 컴퓨트"라고 하며, 이 튜토리얼에서 다루지 않는 고급 주제입니다.

이 장의 코드에서 캡처된 스크린샷은 다음과 같습니다. 여기에 표시된 입자들은 CPU의 개입 없이 GPU에서 직접 컴퓨트 셰이더에 의해 업데이트됩니다:

데이터 조작

이 튜토리얼에서 우리는 이미 정점 및 인덱스 버퍼를 통해 기본 요소를 전달하고 유니폼 버퍼를 통해 셰이더에 데이터를 전달하는 다양한 버퍼 유형에 대해 배웠습니다. 우리는 또한 텍스처 매핑을 수행하기 위해 이미지를 사용했습니다. 그러나 지금까지 우리는 항상 CPU를 사용하여 데이터를 작성하고 GPU에서만 읽기를 수행했습니다.

컴퓨트 셰이더와 함께 도입된 중요한 개념은 버퍼에서 임의로 읽기 및 쓰기를 수행할 수 있다는 것입니다. 이를 위해 Vulkan은 두 가지 전용 저장 유형을 제공합니다.

셰이더 저장 버퍼 객체 (SSBO)

셰이더 저장 버퍼(SSBO)는 셰이더가 버퍼에서 읽고 쓸 수 있게 합니다. 유니폼 버퍼 객체 사용과 비슷하지만 SSBO를 다른 버퍼 유형에 별칭으로 사용할 수 있고, 임의의 크기로 만들 수 있다는 점이 가장 큰 차이입니다.

GPU 기반 입자 시스템에 대해 다시 생각해 보면, 컴퓨트 셰이더가 업데이트(쓰기)하고 버텍스 셰이더가 읽기(그리기)하는 정점을 다루는 방법이 궁금할 수 있습니다.

하지만 이는 문제가 되지 않습니다. Vulkan에서는 버퍼와 이미지에 여러 용도를 지정할 수 있습니다. 그래서 입자 정점 버퍼를 그래픽 패스에서는 정점 버퍼로, 컴퓨트 패스에서는 저장 버퍼로 사용하려면, 이 두 가지 사용 플래그를 포함하여 버퍼를 생성하기만 하면 됩니다:

VkBufferCreateInfo bufferInfo{};
...
bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT;
...

if (vkCreateBuffer(device, &bufferInfo, nullptr, &shaderStorageBuffers[i]) != VK_SUCCESS) {
    throw std::runtime_error("failed to create vertex buffer!");
}

VK_BUFFER_USAGE_VERTEX_BUFFER_BIT와 VK_BUFFER_USAGE_STORAGE_BUFFER_BIT 플래그를 bufferInfo.usage에 설정하여 이 버퍼를 두 가지 시나리오에 사용하려고 한다는 것을 구현에 알려줍니다. 여기에 VK_BUFFER_USAGE_TRANSFER_DST_BIT 플래그도 추가해서 호스트에서 GPU로 데이터를 전송할 수 있습니다. 이는 셰이더 저장 버퍼를 GPU 메모리에만 두고 싶기 때문에 필수적입니다(VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT).

다음은 createBuffer 도우미 함수를 사용한 동일한 코드입니다:

createBuffer(bufferSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, shaderStorageBuffers[i], shaderStorageBuffersMemory[i]);

이와 같은 버퍼에 접근하는 GLSL 셰이더 선언은 다음과 같습니다:

struct Particle {
  vec2 position;
  vec2 velocity;
  vec4 color;
};

layout(std140, binding = 1) readonly buffer ParticleSSBOIn {
   Particle particlesIn[ ];
};

layout(std140, binding = 2) buffer ParticleSSBOOut {
   Particle particlesOut[ ];
};

이 예제에서는 각 입자가 위치와 속도 값을 갖는 타입화된 SSBO를 가지고 있습니다(Particle 구조체 참조). SSBO는 []로 표시된 것처럼 무제한 수의 입자를 포함합니다. SSBO에서 요소 수를 지정할 필요가 없다는 것은 유니폼 버퍼에 비해 장점 중 하나입니다. std140은 셰이더 저장 버퍼의 구성원 요소가 메모리에 어떻게 정렬되는지 결정하는 메모리 레이아웃 한정자입니다. 이는 호스트와 GPU 간에 버퍼를 매핑할 때 필요한 특정 보장을 제공합니다.

컴퓨트 셰이더에서 이러한 저장 버퍼 객체에 쓰는 것은 C++ 측에서 버퍼에 쓰는 것과 비슷하고 간단합니다:

particlesOut[index].position = particlesIn[index].position + particlesIn[index].velocity.xy * ubo.deltaTime;

저장 이미지

참고: 이 장에서는 이미지 조작을 수행하지 않습니다. 이 문단은 컴퓨트 셰이더를 사용하여 이미지 조작도 가능하다는 것을 독자에게 알리기 위한 것입니다.

저장 이미지는 이미지를 읽고 쓸 수 있게 합니다. 일반적인 사용 사례는 텍스처에 이미지 효과를 적용하거나, 후처리를 수행하거나(매우 비슷한 작업), 미합맵을 생성하는 것입니다.

이미지에 대해서도 비슷합니다:

VkImageCreateInfo imageInfo {};
...
imageInfo.usage = VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_STORAGE_BIT;
...

if (vkCreateImage(device, &imageInfo, nullptr, &textureImage) != VK_SUCCESS) {
    throw std::runtime_error("failed to create image!");
}

VK_IMAGE_USAGE_SAMPLED_BIT 및 VK_IMAGE_USAGE_STORAGE_BIT 플래그는 구현에 이 이미지를 두 가지 시나리오에 사용하려고 한다는 것을 알려줍니다: 프래그먼트 셰이더에서 샘플링된 이미지로, 컴퓨터 셰이더에서 저장 이미지로 사용됩니다;

저장 이미지를 선언하는 GLSL 셰이더 선언은 프래그먼트 셰이더에서 사용되는 샘플링된 이미지와 비슷합니다:

layout (binding = 0, rgba8) uniform readonly image2D inputImage;
layout (binding = 1, rgba8) uniform writeonly image2D outputImage;

여기에서 몇 가지 차이점은 이미지 형식을 위한 추가 속성인 rgba8, 구현에 우리가 입력 이미지에서만 읽고 출력 이미지에만 쓸 것임을 알리는 readonly 및 writeonly 한정자, 그리고 저장 이미지를 선언하기 위해 image2D 유형을 사용해야 합니다.

컴퓨트 셰이더에서 저장 이미지를 읽고 쓰는 것은 imageLoad 및 imageStore를 사용하여 수행됩니다:

vec3 pixel = imageLoad(inputImage, ivec2(gl_GlobalInvocationID.xy)).rgb;
imageStore(outputImage, ivec2(gl_GlobalInvocationID.xy), pixel);

컴퓨트 큐 패밀리

물리적 장치 및 큐 패밀리 장에서 이미 큐 패밀리에 대해 배웠고, 그래픽 큐 패밀리를 선택하는 방법을 배웠습니다. 컴퓨트는 큐 패밀리 속성 플래그 비트 VK_QUEUE_COMPUTE_BIT를 사용합니다. 그래서 컴퓨트 작업을 수행하려면 컴퓨트를 지원하는 큐 패밀리에서 큐를 가져와야 합니다.

Vulkan은 그래픽 연산을 지원하는 구현이 적어도 하나의 큐 패밀리를 가지고 있어야 하며, 이는 그래픽 및 컴퓨트 연산을 모두 지원해야 합니다. 하지만 구현에 따라 전용 컴퓨트 큐를 제공할 수도 있습니다. 이 전용 컴퓨트 큐(그래픽 비트가 없는)는 비동기 컴퓨트 큐를 암시합니다. 그러나 이 튜토리얼은 초보자 친화적이므로 그래픽 및 컴퓨트 연산을 모

두 수행할 수 있는 큐를 사용할 것입니다. 이는 여러 고급 동기화 메커니즘을 다루지 않아도 되므로 더 간단합니다.

컴퓨트 샘플을 위해 장치 생성 코드를 조금 변경해야 합니다:

uint32_t queueFamilyCount = 0;
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr);

std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data());

int i = 0;
for (const auto& queueFamily : queueFamilies) {
    if ((queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT) && (queueFamily.queueFlags & VK_QUEUE_COMPUTE_BIT)) {
        indices.graphicsAndComputeFamily = i;
    }

    i++;
}

변경된 큐 패밀리 인덱스 선택 코드는 이제 그래픽 및 컴퓨트를 모두 지원하는 큐 패밀리를 찾으려고 시도할 것입니다.

그런 다음 createLogicalDevice에서 이 큐 패밀리에서 컴퓨트 큐를 가져올 수 있습니다:

vkGetDeviceQueue(device, indices.graphicsAndComputeFamily.value(), 0, &computeQueue);

컴퓨트 셰이더 단계

그래픽 샘플에서 우리는 서로 다른 파이프라인 단계에서 셰이더를 로드하고 디스크립터에 액세스했습니다. 컴퓨트 셰이더는 VK_SHADER_STAGE_COMPUTE_BIT 파이프라인을 사용하여 유사한 방식으로 액세스됩니다. 따라서 컴퓨트 셰이더를 로드하는 것은 버텍스 셰이더를 로드하는 것과 동일하지만 다른 셰이더 단계를 사용합니다. 다음 단락에서 이에 대해 자세히 설명할 것입니다. 컴퓨트는 또한 나중에 사용할 디스크립터 및 파이프라인에 대한 새로운 바인딩 지점 유형인 VK_PIPELINE_BIND_POINT_COMPUTE를 도입합니다.

컴퓨트 셰이더 로드

애플리케이션에서 컴퓨트 셰이더를 로드하는 것은 다른 셰이더를 로드하는 것과 같습니다. 유일한 실제 차이점은 위에서 언급한 VK_SHADER_STAGE_COMPUTE_BIT를 사용해야 한다는 것입니다.

auto computeShaderCode = readFile("shaders/compute.spv");

VkShaderModule computeShaderModule = createShaderModule(computeShaderCode);

VkPipelineShaderStageCreateInfo computeShaderStageInfo{};
computeShaderStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
computeShaderStageInfo.stage = VK_SHADER_STAGE_COMPUTE_BIT;
computeShaderStageInfo.module = computeShaderModule;
computeShaderStageInfo.pName = "main";
...

셰이더 저장 버퍼 준비

이전에 배웠듯이, 셰이더 저장 버퍼를 사용하여 컴퓨트 셰이더에 임의의 데이터를 전달할 수 있습니다. 이 예제에서는 GPU에 입자 배열을 업로드하여 GPU의 메모리에서 직접 조작할 수 있습니다.

프레임 인 플라이트 장에서 프레임 인 플라이트 당 리소스를 중복하여 CPU와 GPU를 계속 작업할 수 있도록 했습니다. 먼저 버퍼 객체와 이를 백업하는 디바이스 메모리에 대한 벡터를 선언합니다:

std::vector<VkBuffer> shaderStorageBuffers;
std::vector<VkDeviceMemory> shaderStorageBuffersMemory;

createShaderStorageBuffers에서는 이 벡터들을 최대 프레임 수 인 플라이트와 일치하도록 크기를 조정합니다:

shaderStorageBuffers.resize(MAX_FRAMES_IN_FLIGHT);
shaderStorageBuffersMemory.resize(MAX_FRAMES_IN_FLIGHT);

이 설정이 완료되면 호스트 측에서 입자 정보를 초기화하여 GPU로 이동을 시작할 수 있습니다:

    // 입자 초기화
    std::default_random_engine rndEngine((unsigned)time(nullptr));
    std::uniform_real_distribution<float> rndDist(0.0f, 1.0f);

    // 원형 위의 초기 입자 위치
    std::vector<Particle> particles(PARTICLE_COUNT);
    for (auto& particle : particles) {
        float r = 0.25f * sqrt(rndDist(rndEngine));
        float theta = rndDist(rndEngine) * 2 * 3.14159265358979323846;
        float x = r * cos(theta) * HEIGHT / WIDTH;
        float y = r * sin(theta);
        particle.position = glm::vec2(x, y);
        particle.velocity = glm::normalize(glm::vec2(x,y)) * 0.00025f;
        particle.color = glm::vec4(rndDist(rndEngine), rndDist(rndEngine), rndDist(rndEngine), 1.0f);
    }

그런 다음 호스트의 메모리에 스테이징 버퍼를 생성하여 초기 입자 속성을 보관합니다:

    VkDeviceSize bufferSize = sizeof(Particle) * PARTICLE_COUNT;

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
    memcpy(data, particles.data(), (size_t)bufferSize);
    vkUnmapMemory(device, stagingBufferMemory);

이 스테이징 버퍼를 소스로 사용하여 프레임당 셰이더 저장 버퍼를 생성하고 스테이징 버퍼에서 각각의 셰이더 저장 버퍼로 입자 속성을 복사합니다:

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        createBuffer(bufferSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, shaderStorageBuffers[i], shaderStorageBuffersMemory[i]);
        // 스테이징 버퍼(호스트)에서 셰이더 저장 버퍼(GPU)로 데이터 복사
        copyBuffer(stagingBuffer, shaderStorageBuffers[i], bufferSize);
    }
}

디스크립터

컴퓨트를 위한 디스크립터 설정은 그래픽과 거의 동일합니다. 유일한 차이점은 디스크립터가 컴퓨트 단계에서 접근 가능하도록 VK_SHADER_STAGE_COMPUTE_BIT를 설정해야 한다는 것입니다:

std::array<VkDescriptorSetLayoutBinding, 3> layoutBindings{};
layoutBindings[0].binding = 0;
layoutBindings[0].descriptorCount = 1;
layoutBindings[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
layoutBindings[0].pImmutableSamplers = nullptr;
layoutBindings[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
...

여기에서 셰이더 단계를 결합할 수 있으므로, 디스크립터가 버텍스 및 컴퓨트 단계에서 접근 가능하게 하려면, 두 단계의 비트를 모두 설정하면 됩니다:

layoutBindings[0].stageFlags = VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_COMPUTE_BIT;

샘플에 대한 디스크립터 설정은 다음과 같습니다

. 레이아웃은 다음과 같습니다:

std::array<VkDescriptorSetLayoutBinding, 3> layoutBindings{};
layoutBindings[0].binding = 0;
layoutBindings[0].descriptorCount = 1;
layoutBindings[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
layoutBindings[0].pImmutableSamplers = nullptr;
layoutBindings[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

layoutBindings[1].binding = 1;
layoutBindings[1].descriptorCount = 1;
layoutBindings[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
layoutBindings[1].pImmutableSamplers = nullptr;
layoutBindings[1].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

layoutBindings[2].binding = 2;
layoutBindings[2].descriptorCount = 1;
layoutBindings[2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
layoutBindings[2].pImmutableSamplers = nullptr;
layoutBindings[2].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = 3;
layoutInfo.pBindings = layoutBindings.data();

if (vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr, &computeDescriptorSetLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create compute descriptor set layout!");
}

이 설정을 보면 하나의 입자 시스템만 렌더링하는데도 불구하고 셰이더 저장 버퍼 객체에 대한 두 개의 레이아웃 바인딩이 있는 이유가 궁금할 수 있습니다. 이는 입자 위치가 델타 시간에 기반하여 프레임마다 업데이트되기 때문입니다. 즉, 각 프레임은 지난 프레임의 입자 위치를 알아야 하므로 새 델타 시간으로 업데이트하고 자신의 SSBO에 쓸 수 있습니다:

이를 위해 컴퓨트 셰이더에서 지난 프레임과 현재 프레임의 SSBO에 모두 접근할 수 있도록 디스크립터 설정에서 두 SSBO를 모두 컴퓨트 셰이더에 전달합니다. storageBufferInfoLastFrame과 storageBufferInfoCurrentFrame를 참조하세요:

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    VkDescriptorBufferInfo uniformBufferInfo{};
    uniformBufferInfo.buffer = uniformBuffers[i];
    uniformBufferInfo.offset = 0;
    uniformBufferInfo.range = sizeof(UniformBufferObject);

    std::array<VkWriteDescriptorSet, 3> descriptorWrites{};
    ...

    VkDescriptorBufferInfo storageBufferInfoLastFrame{};
    storageBufferInfoLastFrame.buffer = shaderStorageBuffers[(i - 1) % MAX_FRAMES_IN_FLIGHT];
    storageBufferInfoLastFrame.offset = 0;
    storageBufferInfoLastFrame.range = sizeof(Particle) * PARTICLE_COUNT;

    descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    descriptorWrites[1].dstSet = computeDescriptorSets[i];
    descriptorWrites[1].dstBinding = 1;
    descriptorWrites[1].dstArrayElement = 0;
    descriptorWrites[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    descriptorWrites[1].descriptorCount = 1;
    descriptorWrites[1].pBufferInfo = &storageBufferInfoLastFrame;

    VkDescriptorBufferInfo storageBufferInfoCurrentFrame{};
    storageBufferInfoCurrentFrame.buffer = shaderStorageBuffers[i];
    storageBufferInfoCurrentFrame.offset = 0;
    storageBufferInfoCurrentFrame.range = sizeof(Particle) * PARTICLE_COUNT;

    descriptorWrites[2].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    descriptorWrites[2].dstSet = computeDescriptorSets[i];
    descriptorWrites[2].dstBinding = 2;
    descriptorWrites[2].dstArrayElement = 0;
    descriptorWrites[2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    descriptorWrites[2].descriptorCount = 1;
    descriptorWrites[2].pBufferInfo = &storageBufferInfoCurrentFrame;

    vkUpdateDescriptorSets(device, 3, descriptorWrites.data(), 0, nullptr);
}

SSBO에 대한 디스크립터 유형을 디스크립터 풀에서 요청해야 함을 기억하세요:

std::array<VkDescriptorPoolSize, 2> poolSizes{};
...

poolSizes[1].type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
poolSizes[1].descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT) * 2;

세트에서 지난 프레임과 현재 프레임의 SSBO를 참조하기 때문에 풀에서 요청하는 VK_DESCRIPTOR_TYPE_STORAGE_BUFFER 유형의 수를 두 배로 늘려야 합니다.

컴퓨트 파이프라인

컴퓨트는 그래픽 파이프라인의 일부가 아니므로 vkCreateGraphicsPipelines를 사용할 수 없습니다. 대신 vkCreateComputePipelines를 사용하여 컴퓨트 명령을 실행하기 위한 전용 컴퓨트 파이프라인을 생성해야 합니다. 컴퓨트 파이프라인은 래스터화 상태를 전혀 건드리지 않으므로 그래픽 파이프라인보다 상태가 훨씬 적습니다:

VkComputePipelineCreateInfo pipelineInfo{};
pipelineInfo.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
pipelineInfo.layout = computePipelineLayout;
pipelineInfo.stage = computeShaderStageInfo;

if (vkCreateComputePipelines(device, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &computePipeline) != VK_SUCCESS) {
    throw std::runtime_error("failed to create compute pipeline!");
}

설정은 훨씬 간단하며, 하나의 셰이더 단계와 파이프라인 레이아웃만 필요합니다. 그래픽 파이프라인과 마찬가지로 파이프라인 레이아웃이 작동합니다:

VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 1;
pipelineLayoutInfo.pSetLayouts = &computeDescriptorSetLayout;

if (vkCreatePipelineLayout(device, &pipelineLayoutInfo, nullptr, &computePipelineLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create compute pipeline layout!");
}

컴퓨트 공간

컴퓨트 셰이더의 작동 방식과 GPU에 컴퓨트 작업을 제출하는 방법에 대해 이야기하기 전에, 컴퓨트 작업이 GPU의 컴퓨트 하드웨어에 의해 어떻게 처리되는지 정의하는 두 가지 중요한 컴퓨트 개념인 워크 그룹과 호출에 대해 이야기해야 합니다. 그들은 세 차원(x, y, z)에서 컴퓨트 작업이 처리되는 추상 실행 모델을 정의합니다.

워크 그룹은 컴퓨트 작업이 GPU의 컴퓨트 하드웨어에 의해 어떻게 형성되고 처리되는지를 정의합니다. GPU가 작업해야 할 작업 항목으로 생각할 수 있습니다. 워크 그룹 차원은 명령 버퍼 시간에 애플리케이션에 의해 설정됩니다.

그리고 각 워크 그룹은 동일한 컴퓨트 셰이더를 실행하는 호출의 모음입니다. 호출은 잠재적으로 병렬로 실행될 수 있으며 그 차원은 컴퓨트 셰이더에서 설정됩니다. 단일 워크그룹 내의 호출은 공유 메모리에 접근할 수 있습니다.

이 이미지는 세 차원에서 이 두 가지의 관계를 보여줍니다:

워

크 그룹(정의된 vkCmdDispatch에 의해)과 호출(컴퓨트 셰이더에서 로컬 크기로 정의된)의 차원 수는 입력 데이터가 어떻게 구성되어 있는지에 따라 다릅니다. 예를 들어, 1차원 배열에서 작업하는 경우 x 차원만 지정해야 합니다.

예를 들어: 워크 그룹 수[64, 1, 1]와 컴퓨트 셰이더 로컬 크기[32, 32, 1]로 디스패치를 수행하면 컴퓨트 셰이더가 64 x 32 x 32 = 65,536번 호출됩니다.

워크 그룹 수와 로컬 크기의 최대 카운트는 구현마다 다르므로 항상 VkPhysicalDeviceLimits의 maxComputeWorkGroupCount, maxComputeWorkGroupInvocations 및 maxComputeWorkGroupSize와 같은 컴퓨트 관련 제한을 확인해야 합니다.

컴퓨트 셰이더

이제 컴퓨트 셰이더 파이프라인을 설정하는 데 필요한 모든 부분에 대해 배웠으므로 컴퓨트 셰이더에 대해 살펴볼 시간입니다. 버텍스 및 프래그먼트 셰이더 등에서 GLSL 셰이더를 사용하는 것과 관련된 모든 것들이 컴퓨트 셰이더에도 적용됩니다. 문법은 동일하며 애플리케이션과 셰이더 간에 데이터를 전달하는 많은 개념이 동일합니다. 그러나 몇 가지 중요한 차이점이 있습니다.

선형 배열의 입자를 업데이트하기 위한 매우 기본적인 컴퓨트 셰이더는 다음과 같을 수 있습니다:

#version 450

layout (binding = 0) uniform ParameterUBO {
    float deltaTime;
} ubo;

struct Particle {
    vec2 position;
    vec2 velocity;
    vec4 color;
};

layout(std140, binding = 1) readonly buffer ParticleSSBOIn {
   Particle particlesIn[ ];
};

layout(std140, binding = 2) buffer ParticleSSBOOut {
   Particle particlesOut[ ];
};

layout (local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

void main() 
{
    uint index = gl_GlobalInvocationID.x;  

    Particle particleIn = particlesIn[index];

    particlesOut[index].position = particleIn.position + particleIn.velocity.xy * ubo.deltaTime;
    particlesOut[index].velocity = particleIn.velocity;
    ...
}

셰이더의 맨 위 부분은 셰이더 입력에 대한 선언을 포함합니다. 첫 번째는 바인딩 0에서 유니폼 버퍼 객체입니다. 이미 이 튜토리얼에서 배운 것입니다. 그 아래에는 C++ 코드에서 선언과 일치하는 Particle 구조체를 선언합니다. 바인딩 1은 지난 프레임의 입자 데이터가 있는 셰이더 저장 버퍼 객체(SSBO)를 참조하며, 바인딩 2는 이 셰이더가 업데이트할 현재 프레임의 SSBO를 가리킵니다.

흥미로운 것은 이 컴퓨트 전용 선언과 관련된 것입니다:

layout (local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

이것은 현재 워크 그룹에서 이 컴퓨트 셰이더의 호출 수를 정의합니다. 앞서 언급했듯이, 이것은 컴퓨트 공간의 로컬 부분입니다. 따라서 local_ 접두사가 붙습니다. 우리가 1D 입자 배열에서 작업하기 때문에 local_size_x의 x 차원에 대한 숫자만 지정할 필요가 있습니다.

main 함수는 그런 다음 지난 프레임의 SSBO에서 읽고 현재 프레임의 SSBO에 업데이트된 입자 위치를 씁니다. 다른 셰이더 유형과 마찬가지로 컴퓨트 셰이더에는 자체적인 내장 입력 변수 세트가 있습니다. 내장된 것들은 항상 gl_ 접두사로 시작합니다. 그 중 하나는 현재 디스패치에서 전체적으로 현재 컴퓨트 셰이더 호출을 고유하게 식별하는 변수인 gl_GlobalInvocationID입니다. 우리는 이것을 입자 배열에 인덱스로 사용합니다.

컴퓨트 명령 실행

디스패치

이제 GPU에 실제로 컴퓨트 작업을 지시할 시간입니다. 이는 명령 버퍼 내에서 vkCmdDispatch를 호출하여 수행됩니다. 완벽하게 사실은 아니지만, 디스패치는 컴퓨트에 대한 드로우 콜인 vkCmdDraw와 같습니다. 이 디스패치는 최대 세 차원에서 주어진 수의 컴퓨트 작업 항목을 디스패치합니다.

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;

if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
    throw std::runtime_error("failed to begin recording command buffer!");
}

...

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipeline);
vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 0, 1, &computeDescriptorSets[i], 0, 0);

vkCmdDispatch(computeCommandBuffer, PARTICLE_COUNT / 256, 1, 1);

...

if (vkEndCommandBuffer(commandBuffer) != VK_SUCCESS) {
    throw std::runtime_error("failed to record command buffer!");
}

vkCmdDispatch는 x 차원에서 PARTICLE_COUNT / 256의 로컬 워크 그룹을 디스패치합니다. 우리의 입자 배열이 선형이기 때문에 다른 두 차원은 하나로 두어 일차원 디스패치가 됩니다. 그러나 왜 입자 수(배열 내)를 256으로 나누는지 궁금할 수 있습니다. 그 이유는 이전 단락에서 우리가 설정한 것처럼 각 컴퓨트 셰이더 워크 그룹이 256개의 호출을 수행하기 때문입니다. 따라서 4096개의 입자가 있다면 16개의 워크 그룹을 디스패치하며, 각 워크 그룹은 256개의 컴퓨트 셰이더 호출을 실행합니다. 두 숫자를 올바르게 얻는 것은 일반적으로 작업 부하와 실행 중인 하드웨어에 따라 조정하고 프로파일링하는 데 시간이 걸립니다. 입자 크기가 동적이고 예를 들어 256으로 항상 나눌 수 없는 경우, 컴퓨트 셰이더의 시작 부분에서 gl_GlobalInvocationID를 사용하여 전역 호출 인덱스가 입자 수보다 클 경우 반환할 수 있습니다.

컴퓨트 파이프라인과 마찬가지로 컴퓨트 명령 버퍼는 그래픽

명령 버퍼보다 상태가 훨씬 적습니다. 렌더 패스를 시작하거나 뷰포트를 설정할 필요가 없습니다.

작업 제출

우리의 샘플이 컴퓨트와 그래픽 연산을 모두 수행하기 때문에, 우리는 프레임마다 그래픽 및 컴퓨트 큐에 두 번 제출할 것입니다( drawFrame 함수 참조):

...
if (vkQueueSubmit(computeQueue, 1, &submitInfo, nullptr) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit compute command buffer!");
};
...
if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit draw command buffer!");
}

첫 번째 제출은 컴퓨트 셰이더를 사용하여 입자 위치를 업데이트하고, 두 번째 제출은 그 업데이트된 데이터를 사용하여 입자 시스템을 그립니다.

그래픽과 컴퓨트 동기화

Vulkan에서 동기화는 매우 중요한 부분이며, 특히 컴퓨트와 그래픽을 함께 수행할 때 더욱 그렇습니다. 잘못되거나 부족한 동기화는 컴퓨트 셰이더가 업데이트(=쓰기)를 마치지 않은 상태에서 버텍스 단계가 입자를 그리기 시작(=읽기)하는 것(read-after-write 위험) 또는 컴퓨트 셰이더가 아직 파이프라인의 버텍스 부분에서 사용 중인 입자를 업데이트하기 시작할 수 있습니다( write-after-read 위험).

따라서 이러한 경우가 발생하지 않도록 올바르게 동기화해야 합니다. 컴퓨트 작업을 제출하는 방법에 따라 다양한 방법이 있지만, 우리의 경우 두 개의 별도 제출로 처리하므로, 그래픽과 컴퓨트 하드를 동기화하려면 세마포어와 펜스를 사용합니다.

createSyncObjects에서 새로운 컴퓨트 작업 동기화 기본 설정을 추가합니다. 컴퓨트 펜스는 그래픽 펜스와 마찬가지로 신호된 상태에서 생성됩니다. 그렇지 않으면 첫 번째 드로우가 펜스가 신호될 때까지 시간 초과가 발생할 수 있기 때문입니다(여기에 자세히 설명되어 있습니다):

std::vector<VkFence> computeInFlightFences;
std::vector<VkSemaphore> computeFinishedSemaphores;
...
computeInFlightFences.resize(MAX_FRAMES_IN_FLIGHT);
computeFinishedSemaphores.resize(MAX_FRAMES_IN_FLIGHT);

VkSemaphoreCreateInfo semaphoreInfo{};
semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;

VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    ...
    if (vkCreateSemaphore(device, &semaphoreInfo, nullptr, &computeFinishedSemaphores[i]) != VK_SUCCESS ||
        vkCreateFence(device, &fenceInfo, nullptr, &computeInFlightFences[i]) != VK_SUCCESS) {
        throw std::runtime_error("failed to create compute synchronization objects for a frame!");
    }
}

그런 다음 이를 사용하여 컴퓨트 버퍼 제출과 그래픽 제출을 동기화합니다:

// 컴퓨트 제출
vkWaitForFences(device, 1, &computeInFlightFences[currentFrame], VK_TRUE, UINT64_MAX);

updateUniformBuffer(currentFrame);

vkResetFences(device, 1, &computeInFlightFences[currentFrame]);

vkResetCommandBuffer(computeCommandBuffers[currentFrame], /*VkCommandBufferResetFlagBits*/ 0);
recordComputeCommandBuffer(computeCommandBuffers[currentFrame]);

submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &computeCommandBuffers[currentFrame];
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &computeFinishedSemaphores[currentFrame];

if (vkQueueSubmit(computeQueue, 1, &submitInfo, computeInFlightFences[currentFrame]) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit compute command buffer!");
};

// 그래픽 제출
vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);

...

vkResetFences(device, 1, &inFlightFences[currentFrame]);

vkResetCommandBuffer(commandBuffers[currentFrame], /*VkCommandBufferResetFlagBits*/ 0);
recordCommandBuffer(commandBuffers[currentFrame], imageIndex);

VkSemaphore waitSemaphores[] = { computeFinishedSemaphores[currentFrame], imageAvailableSemaphores[currentFrame] };
VkPipelineStageFlags waitStages[] = { VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

submitInfo.waitSemaphoreCount = 2;
submitInfo.pWaitSemaphores = waitSemaphores;
submitInfo.pWaitDstStageMask = waitStages;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffers[currentFrame];
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &renderFinishedSemaphores[currentFrame];

if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit draw command buffer!");
}

세마포어 장의 샘플과 비슷한 설정으로, 이 설정은 vkWaitForFences 명령을 사용하여 현재 프레임의 컴퓨트 명령 버퍼 실행을 기다린 후에 컴퓨트 제출을 즉시 실행합니다.

그래픽 제출은 컴퓨트 작업이 끝날 때까지 기다려야 하므로 컴퓨트 버퍼가 아직 입자를 업데이트하는 동안 시작하지 않도록 합니다. 따라서 현재 프레임의 computeFinishedSemaphores에서 기다리고 그래픽 제출이 VK_PIPELINE_STAGE_VERTEX_INPUT_BIT 단계에서 정점을 가져올 때까지 기다립니다.

그러나 프레젠테이션을 위해 기다려야 하므로 프래그먼트 셰이더가 이미지가 표시될 때까지 색상 첨부 파일에 출력하지 않도록 합니다. 따라서 현재 프레임의 imageAvailableSemaphores에서 VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT 단계에서도 기다립니다.

입자 시스템 그리기

앞서 배운 것처럼 Vulkan에서 버퍼는 여러 용도로 사용될 수 있으므로 입자를 포함하는 셰이더 저장 버퍼를 셰이더 저장 버퍼 비트와 정점 버퍼 비트 모두로 생성했습니다. 이는 이전 장에서 "순수" 정점 버퍼를 사용했던 것처럼 셰이더 저장 버퍼를 그리는 데

사용할 수 있음을 의미합니다.

우리는 입자 구조와 일치하도록 정점 입력 상태를 설정합니다:

struct Particle {
    ...

    static std::array<VkVertexInputAttributeDescription, 2> getAttributeDescriptions() {
        std::array<VkVertexInputAttributeDescription, 2> attributeDescriptions{};

        attributeDescriptions[0].binding = 0;
        attributeDescriptions[0].location = 0;
        attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
        attributeDescriptions[0].offset = offsetof(Particle, position);

        attributeDescriptions[1].binding = 0;
        attributeDescriptions[1].location = 1;
        attributeDescriptions[1].format = VK_FORMAT_R32G32B32A32_SFLOAT;
        attributeDescriptions[1].offset = offsetof(Particle, color);

        return attributeDescriptions;
    }
};

velocity는 컴퓨트 셰이더에서만 사용되기 때문에 정점 입력 속성에 추가하지 않습니다.

그런 다음 우리는 모든 정점 버퍼와 마찬가지로 바인딩하고 그립니다:

vkCmdBindVertexBuffers(commandBuffer, 0, 1, &shaderStorageBuffer[currentFrame], offsets);

vkCmdDraw(commandBuffer, PARTICLE_COUNT, 1, 0, 0);

결론

이 장에서 우리는 컴퓨트 셰이더를 사용하여 CPU에서 GPU로 작업을 오프로드하는 방법을 배웠습니다. 컴퓨트 셰이더가 없다면 많은 현대 게임과 애플리케이션에서 많은 효과가 불가능하거나 훨씬 느리게 실행될 것입니다. 그러나 그래픽보다 훨씬 많은 사용 사례가 있으며, 이 장은 가능한 것들의 일부만 보여줍니다. 따라서 이제 컴퓨트 셰이더 사용 방법을 알게 되었으니, 다음과 같은 몇 가지 고급 컴퓨트 주제를 살펴볼 수 있습니다:

공유 메모리
비동기 컴퓨트
원자 연산
서브그룹

공식 Khronos Vulkan 샘플 저장소에서 몇 가지 고급 컴퓨트 샘플을 찾을 수 있습니다.

C++ 코드 / 버텍스 셰이더 / 프래그먼트 셰이더 / 컴퓨트 셰이더

자주 묻는 질문 (FAQ)

이 페이지에서는 Vulkan 애플리케이션 개발 중에 마주칠 수 있는 일반적인 문제들에 대한 해결책을 제시합니다.

코어 검증 레이어에서 액세스 위반 오류가 발생합니다

MSI 애프터버너 / 리바튜너 통계 서버가 실행 중인지 확인하세요. 이 프로그램들은 Vulkan과의 호환성 문제가 있습니다.

검증 레이어에서 메시지가 보이지 않습니다 / 검증 레이어를 사용할 수 없습니다

먼저 프로그램이 종료된 후 터미널이 열려 있는지 확인하여 검증 레이어가 오류를 출력할 기회가 있는지 확인하세요. Visual Studio에서는 프로그램을 F5가 아닌 Ctrl-F5로 실행하고, Linux에서는 터미널 창에서 프로그램을 실행합니다. 메시지가 여전히 없고 검증 레이어가 활성화되어 있는지 확신하는 경우, "설치 확인" 지침을 따라 Vulkan SDK가 올바르게 설치되어 있는지 확인하세요. 또한 VK_LAYER_KHRONOS_validation 레이어를 지원하려면 SDK 버전이 최소 1.1.106.0 이상인지 확인하세요.

vkCreateSwapchainKHR가 SteamOverlayVulkanLayer64.dll에서 오류를 발생시킵니다

이는 Steam 클라이언트 베타의 호환성 문제로 보입니다. 몇 가지 가능한 해결책이 있습니다:

Steam 베타 프로그램에서 탈퇴합니다.
DISABLE_VK_LAYER_VALVE_steam_overlay_1 환경 변수를 1로 설정합니다.
HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ImplicitLayers 하위에 있는 Steam 오버레이 Vulkan 레이어 항목을 삭제합니다.

예시:

vkCreateInstance가 VK_ERROR_INCOMPATIBLE_DRIVER로 실패합니다

MacOS를 사용 중이고 최신 MoltenVK SDK를 사용하는 경우 vkCreateInstance가 VK_ERROR_INCOMPATIBLE_DRIVER 오류를 반환할 수 있습니다. 이는 Vulkan SDK 버전 1.3.216 이상에서 MoltenVK를 사용하기 위해 VK_KHR_PORTABILITY_subset 확장을 활성화해야 하기 때문입니다. 현재 MoltenVK는 완전히 호환되지 않습니다.

VkInstanceCreateInfo에 VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR 플래그를 추가하고 인스턴스 확장 목록에 VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME을 추가해야 합니다.

코드 예시:

...

std::vector<const char*> requiredExtensions;

for(uint32_t i = 0; i < glfwExtensionCount; i++) {
    requiredExtensions.emplace_back(glfwExtensions[i]);
}

requiredExtensions.emplace_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);

createInfo.flags |= VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR;

createInfo.enabledExtensionCount = (uint32_t) requiredExtensions.size();
createInfo.ppEnabledExtensionNames = requiredExtensions.data();

if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
    throw std::runtime_error("failed to create instance!");
}

개인정보 보호 정책

일반

이 개인정보 보호 정책은 vulkan-tutorial.com 또는 그 하위 도메인을 사용할 때 수집되는 정보에 적용됩니다. 이 웹사이트의 소유자인 Alexander Overvoorde가 어떻게 사용자에 대한 정보를 수집, 사용 및 공유하는지 설명합니다.

분석

이 웹사이트는 Matomo(https://matomo.org/), 이전 이름은 Piwik인 자체 호스팅 인스턴스를 사용하여 방문자에 대한 분석 정보를 수집합니다. 방문한 페이지, 사용한 기기 및 브라우저 유형, 특정 페이지를 본 시간, 그리고 출발지 등이 기록됩니다. 이 정보는 IP 주소의 처음 두 바이트만 기록하여 익명화됩니다(예: 123.123.xxx.xxx). 이 익명화된 로그는 무기한 저장됩니다.

이 분석은 웹사이트의 콘텐츠가 어떻게 소비되는지, 일반적으로 몇 명이 웹사이트를 방문하는지, 어떤 다른 웹사이트가 이곳을 링크하는지 추적하기 위해 사용됩니다. 이는 커뮤니티와의 소통을 용이하게 하고, 예를 들어 모바일 독서를 지원하는 데 추가 시간을 투자해야 할지 결정하는 데 도움이 되는 웹사이트의 어떤 영역을 개선해야 할지 판단하는 데 도움이 됩니다.

이 데이터는 제3자와 공유되지 않습니다.

Introduction

About

This tutorial will teach you the basics of using the Vulkan graphics and compute API. Vulkan is a new API by the Khronos group (known for OpenGL) that provides a much better abstraction of modern graphics cards. This new interface allows you to better describe what your application intends to do, which can lead to better performance and less surprising driver behavior compared to existing APIs like OpenGL and Direct3D. The ideas behind Vulkan are similar to those of Direct3D 12 and Metal, but Vulkan has the advantage of being fully cross-platform and allows you to develop for Windows, Linux and Android at the same time.

However, the price you pay for these benefits is that you have to work with a significantly more verbose API. Every detail related to the graphics API needs to be set up from scratch by your application, including initial frame buffer creation and memory management for objects like buffers and texture images. The graphics driver will do a lot less hand holding, which means that you will have to do more work in your application to ensure correct behavior.

The takeaway message here is that Vulkan is not for everyone. It is targeted at programmers who are enthusiastic about high performance computer graphics, and are willing to put some work in. If you are more interested in game development, rather than computer graphics, then you may wish to stick to OpenGL or Direct3D, which will not be deprecated in favor of Vulkan anytime soon. Another alternative is to use an engine like Unreal Engine or Unity, which will be able to use Vulkan while exposing a much higher level API to you.

With that out of the way, let's cover some prerequisites for following this tutorial:

A graphics card and driver compatible with Vulkan (NVIDIA, AMD, Intel, Apple Silicon (Or the Apple M1))
Experience with C++ (familiarity with RAII, initializer lists)
A compiler with decent support of C++17 features (Visual Studio 2017+, GCC 7+, Or Clang 5+)
Some existing experience with 3D computer graphics

This tutorial will not assume knowledge of OpenGL or Direct3D concepts, but it does require you to know the basics of 3D computer graphics. It will not explain the math behind perspective projection, for example. See this online book for a great introduction of computer graphics concepts. Some other great computer graphics resources are:

Ray tracing in one weekend
Physically Based Rendering book
Vulkan being used in a real engine in the open-source Quake and DOOM 3

You can use C instead of C++ if you want, but you will have to use a different linear algebra library and you will be on your own in terms of code structuring. We will use C++ features like classes and RAII to organize logic and resource lifetimes. There are also two alternative versions of this tutorial available for Rust developers: Vulkano based, Vulkanalia based.

To make it easier to follow along for developers using other programming languages, and to get some experience with the base API we'll be using the original C API to work with Vulkan. If you are using C++, however, you may prefer using the newer Vulkan-Hpp bindings that abstract some of the dirty work and help prevent certain classes of errors.

E-book

If you prefer to read this tutorial as an e-book, then you can download an EPUB or PDF version here:

EPUB
PDF

Tutorial structure

We'll start with an overview of how Vulkan works and the work we'll have to do to get the first triangle on the screen. The purpose of all the smaller steps will make more sense after you've understood their basic role in the whole picture. Next, we'll set up the development environment with the Vulkan SDK, the GLM library for linear algebra operations and GLFW for window creation. The tutorial will cover how to set these up on Windows with Visual Studio, and on Ubuntu Linux with GCC.

After that we'll implement all of the basic components of a Vulkan program that are necessary to render your first triangle. Each chapter will follow roughly the following structure:

Introduce a new concept and its purpose
Use all of the relevant API calls to integrate it into your program
Abstract parts of it into helper functions

Although each chapter is written as a follow-up on the previous one, it is also possible to read the chapters as standalone articles introducing a certain Vulkan feature. That means that the site is also useful as a reference. All of the Vulkan functions and types are linked to the specification, so you can click them to learn more. Vulkan is a very new API, so there may be some shortcomings in the specification itself. You are encouraged to submit feedback to this Khronos repository.

As mentioned before, the Vulkan API has a rather verbose API with many parameters to give you maximum control over the graphics hardware. This causes basic operations like creating a texture to take a lot of steps that have to be repeated every time. Therefore we'll be creating our own collection of helper functions throughout the tutorial.

Every chapter will also conclude with a link to the full code listing up to that point. You can refer to it if you have any doubts about the structure of the code, or if you're dealing with a bug and want to compare. All of the code files have been tested on graphics cards from multiple vendors to verify correctness. Each chapter also has a comment section at the end where you can ask any questions that are relevant to the specific subject matter. Please specify your platform, driver version, source code, expected behavior and actual behavior to help us help you.

This tutorial is intended to be a community effort. Vulkan is still a very new API and best practices have not really been established yet. If you have any type of feedback on the tutorial and site itself, then please don't hesitate to submit an issue or pull request to the GitHub repository. You can watch the repository to be notified of updates to the tutorial.

After you've gone through the ritual of drawing your very first Vulkan powered triangle onscreen, we'll start expanding the program to include linear transformations, textures and 3D models.

If you've played with graphics APIs before, then you'll know that there can be a lot of steps until the first geometry shows up on the screen. There are many of these initial steps in Vulkan, but you'll see that each of the individual steps is easy to understand and does not feel redundant. It's also important to keep in mind that once you have that boring looking triangle, drawing fully textured 3D models does not take that much extra work, and each step beyond that point is much more rewarding.

If you encounter any problems while following the tutorial, then first check the FAQ to see if your problem and its solution is already listed there. If you are still stuck after that, then feel free to ask for help in the comment section of the closest related chapter.

Ready to dive into the future of high performance graphics APIs? Let's go!

License

The contents are licensed under CC BY-SA 4.0, unless stated otherwise. By contributing, you agree to license your contributions to the public under that same license.

The code listings in the code directory in the source repository are licensed under CC0 1.0 Universal. By contributing to that directory, you agree to license your contributions to the public under that same public domain-like license.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Overview

This chapter will start off with an introduction of Vulkan and the problems it addresses. After that we're going to look at the ingredients that are required for the first triangle. This will give you a big picture to place each of the subsequent chapters in. We will conclude by covering the structure of the Vulkan API and the general usage patterns.

Origin of Vulkan

Just like the previous graphics APIs, Vulkan is designed as a cross-platform abstraction over GPUs. The problem with most of these APIs is that the era in which they were designed featured graphics hardware that was mostly limited to configurable fixed functionality. Programmers had to provide the vertex data in a standard format and were at the mercy of the GPU manufacturers with regards to lighting and shading options.

As graphics card architectures matured, they started offering more and more programmable functionality. All this new functionality had to be integrated with the existing APIs somehow. This resulted in less than ideal abstractions and a lot of guesswork on the graphics driver side to map the programmer's intent to the modern graphics architectures. That's why there are so many driver updates for improving the performance in games, sometimes by significant margins. Because of the complexity of these drivers, application developers also need to deal with inconsistencies between vendors, like the syntax that is accepted for shaders. Aside from these new features, the past decade also saw an influx of mobile devices with powerful graphics hardware. These mobile GPUs have different architectures based on their energy and space requirements. One such example is tiled rendering, which would benefit from improved performance by offering the programmer more control over this functionality. Another limitation originating from the age of these APIs is limited multi-threading support, which can result in a bottleneck on the CPU side.

Vulkan solves these problems by being designed from scratch for modern graphics architectures. It reduces driver overhead by allowing programmers to clearly specify their intent using a more verbose API, and allows multiple threads to create and submit commands in parallel. It reduces inconsistencies in shader compilation by switching to a standardized byte code format with a single compiler. Lastly, it acknowledges the general purpose processing capabilities of modern graphics cards by unifying the graphics and compute functionality into a single API.

What it takes to draw a triangle

We'll now look at an overview of all the steps it takes to render a triangle in a well-behaved Vulkan program. All of the concepts introduced here will be elaborated on in the next chapters. This is just to give you a big picture to relate all of the individual components to.

Step 1 - Instance and physical device selection

A Vulkan application starts by setting up the Vulkan API through a VkInstance. An instance is created by describing your application and any API extensions you will be using. After creating the instance, you can query for Vulkan supported hardware and select one or more VkPhysicalDevices to use for operations. You can query for properties like VRAM size and device capabilities to select desired devices, for example to prefer using dedicated graphics cards.

Step 2 - Logical device and queue families

After selecting the right hardware device to use, you need to create a VkDevice (logical device), where you describe more specifically which VkPhysicalDeviceFeatures you will be using, like multi viewport rendering and 64 bit floats. You also need to specify which queue families you would like to use. Most operations performed with Vulkan, like draw commands and memory operations, are asynchronously executed by submitting them to a VkQueue. Queues are allocated from queue families, where each queue family supports a specific set of operations in its queues. For example, there could be separate queue families for graphics, compute and memory transfer operations. The availability of queue families could also be used as a distinguishing factor in physical device selection. It is possible for a device with Vulkan support to not offer any graphics functionality, however all graphics cards with Vulkan support today will generally support all queue operations that we're interested in.

Step 3 - Window surface and swap chain

Unless you're only interested in offscreen rendering, you will need to create a window to present rendered images to. Windows can be created with the native platform APIs or libraries like GLFW and SDL. We will be using GLFW in this tutorial, but more about that in the next chapter.

We need two more components to actually render to a window: a window surface (VkSurfaceKHR) and a swap chain (VkSwapchainKHR). Note the KHR postfix, which means that these objects are part of a Vulkan extension. The Vulkan API itself is completely platform agnostic, which is why we need to use the standardized WSI (Window System Interface) extension to interact with the window manager. The surface is a cross-platform abstraction over windows to render to and is generally instantiated by providing a reference to the native window handle, for example HWND on Windows. Luckily, the GLFW library has a built-in function to deal with the platform specific details of this.

The swap chain is a collection of render targets. Its basic purpose is to ensure that the image that we're currently rendering to is different from the one that is currently on the screen. This is important to make sure that only complete images are shown. Every time we want to draw a frame we have to ask the swap chain to provide us with an image to render to. When we've finished drawing a frame, the image is returned to the swap chain for it to be presented to the screen at some point. The number of render targets and conditions for presenting finished images to the screen depends on the present mode. Common present modes are double buffering (vsync) and triple buffering. We'll look into these in the swap chain creation chapter.

Some platforms allow you to render directly to a display without interacting with any window manager through the VK_KHR_display and VK_KHR_display_swapchain extensions. These allow you to create a surface that represents the entire screen and could be used to implement your own window manager, for example.

Step 4 - Image views and framebuffers

To draw to an image acquired from the swap chain, we have to wrap it into a VkImageView and VkFramebuffer. An image view references a specific part of an image to be used, and a framebuffer references image views that are to be used for color, depth and stencil targets. Because there could be many different images in the swap chain, we'll preemptively create an image view and framebuffer for each of them and select the right one at draw time.

Step 5 - Render passes

Render passes in Vulkan describe the type of images that are used during rendering operations, how they will be used, and how their contents should be treated. In our initial triangle rendering application, we'll tell Vulkan that we will use a single image as color target and that we want it to be cleared to a solid color right before the drawing operation. Whereas a render pass only describes the type of images, a VkFramebuffer actually binds specific images to these slots.

Step 6 - Graphics pipeline

The graphics pipeline in Vulkan is set up by creating a VkPipeline object. It describes the configurable state of the graphics card, like the viewport size and depth buffer operation and the programmable state using VkShaderModule objects. The VkShaderModule objects are created from shader byte code. The driver also needs to know which render targets will be used in the pipeline, which we specify by referencing the render pass.

One of the most distinctive features of Vulkan compared to existing APIs, is that almost all configuration of the graphics pipeline needs to be set in advance. That means that if you want to switch to a different shader or slightly change your vertex layout, then you need to entirely recreate the graphics pipeline. That means that you will have to create many VkPipeline objects in advance for all the different combinations you need for your rendering operations. Only some basic configuration, like viewport size and clear color, can be changed dynamically. All of the state also needs to be described explicitly, there is no default color blend state, for example.

The good news is that because you're doing the equivalent of ahead-of-time compilation versus just-in-time compilation, there are more optimization opportunities for the driver and runtime performance is more predictable, because large state changes like switching to a different graphics pipeline are made very explicit.

Step 7 - Command pools and command buffers

As mentioned earlier, many of the operations in Vulkan that we want to execute, like drawing operations, need to be submitted to a queue. These operations first need to be recorded into a VkCommandBuffer before they can be submitted. These command buffers are allocated from a VkCommandPool that is associated with a specific queue family. To draw a simple triangle, we need to record a command buffer with the following operations:

Begin the render pass
Bind the graphics pipeline
Draw 3 vertices
End the render pass

Because the image in the framebuffer depends on which specific image the swap chain will give us, we need to record a command buffer for each possible image and select the right one at draw time. The alternative would be to record the command buffer again every frame, which is not as efficient.

Step 8 - Main loop

Now that the drawing commands have been wrapped into a command buffer, the main loop is quite straightforward. We first acquire an image from the swap chain with vkAcquireNextImageKHR. We can then select the appropriate command buffer for that image and execute it with vkQueueSubmit. Finally, we return the image to the swap chain for presentation to the screen with vkQueuePresentKHR.

Operations that are submitted to queues are executed asynchronously. Therefore we have to use synchronization objects like semaphores to ensure a correct order of execution. Execution of the draw command buffer must be set up to wait on image acquisition to finish, otherwise it may occur that we start rendering to an image that is still being read for presentation on the screen. The vkQueuePresentKHR call in turn needs to wait for rendering to be finished, for which we'll use a second semaphore that is signaled after rendering completes.

Summary

This whirlwind tour should give you a basic understanding of the work ahead for drawing the first triangle. A real-world program contains more steps, like allocating vertex buffers, creating uniform buffers and uploading texture images that will be covered in subsequent chapters, but we'll start simple because Vulkan has enough of a steep learning curve as it is. Note that we'll cheat a bit by initially embedding the vertex coordinates in the vertex shader instead of using a vertex buffer. That's because managing vertex buffers requires some familiarity with command buffers first.

So in short, to draw the first triangle we need to:

Create a VkInstance
Select a supported graphics card (VkPhysicalDevice)
Create a VkDevice and VkQueue for drawing and presentation
Create a window, window surface and swap chain
Wrap the swap chain images into VkImageView
Create a render pass that specifies the render targets and usage
Create framebuffers for the render pass
Set up the graphics pipeline
Allocate and record a command buffer with the draw commands for every possible swap chain image
Draw frames by acquiring images, submitting the right draw command buffer and returning the images back to the swap chain

It's a lot of steps, but the purpose of each individual step will be made very simple and clear in the upcoming chapters. If you're confused about the relation of a single step compared to the whole program, you should refer back to this chapter.

API concepts

This chapter will conclude with a short overview of how the Vulkan API is structured at a lower level.

Coding conventions

All of the Vulkan functions, enumerations and structs are defined in the vulkan.h header, which is included in the Vulkan SDK developed by LunarG. We'll look into installing this SDK in the next chapter.

Functions have a lower case vk prefix, types like enumerations and structs have a Vk prefix and enumeration values have a VK_ prefix. The API heavily uses structs to provide parameters to functions. For example, object creation generally follows this pattern:

VkXXXCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_XXX_CREATE_INFO;
createInfo.pNext = nullptr;
createInfo.foo = ...;
createInfo.bar = ...;

VkXXX object;
if (vkCreateXXX(&createInfo, nullptr, &object) != VK_SUCCESS) {
    std::cerr << "failed to create object" << std::endl;
    return false;
}

Many structures in Vulkan require you to explicitly specify the type of structure in the sType member. The pNext member can point to an extension structure and will always be nullptr in this tutorial. Functions that create or destroy an object will have a VkAllocationCallbacks parameter that allows you to use a custom allocator for driver memory, which will also be left nullptr in this tutorial.

Almost all functions return a VkResult that is either VK_SUCCESS or an error code. The specification describes which error codes each function can return and what they mean.

Validation layers

As mentioned earlier, Vulkan is designed for high performance and low driver overhead. Therefore it will include very limited error checking and debugging capabilities by default. The driver will often crash instead of returning an error code if you do something wrong, or worse, it will appear to work on your graphics card and completely fail on others.

Vulkan allows you to enable extensive checks through a feature known as validation layers. Validation layers are pieces of code that can be inserted between the API and the graphics driver to do things like running extra checks on function parameters and tracking memory management problems. The nice thing is that you can enable them during development and then completely disable them when releasing your application for zero overhead. Anyone can write their own validation layers, but the Vulkan SDK by LunarG provides a standard set of validation layers that we'll be using in this tutorial. You also need to register a callback function to receive debug messages from the layers.

Because Vulkan is so explicit about every operation and the validation layers are so extensive, it can actually be a lot easier to find out why your screen is black compared to OpenGL and Direct3D!

There's only one more step before we'll start writing code and that's setting up the development environment.

Development environment

In this chapter we'll set up your environment for developing Vulkan applications and install some useful libraries. All of the tools we'll use, with the exception of the compiler, are compatible with Windows, Linux and MacOS, but the steps for installing them differ a bit, which is why they're described separately here.

Windows

If you're developing for Windows, then I will assume that you are using Visual Studio to compile your code. For complete C++17 support, you need to use either Visual Studio 2017 or 2019. The steps outlined below were written for VS 2017.

Vulkan SDK

The most important component you'll need for developing Vulkan applications is the SDK. It includes the headers, standard validation layers, debugging tools and a loader for the Vulkan functions. The loader looks up the functions in the driver at runtime, similarly to GLEW for OpenGL - if you're familiar with that.

The SDK can be downloaded from the LunarG website using the buttons at the bottom of the page. You don't have to create an account, but it will give you access to some additional documentation that may be useful to you.

Proceed through the installation and pay attention to the install location of the SDK. The first thing we'll do is verify that your graphics card and driver properly support Vulkan. Go to the directory where you installed the SDK, open the Bin directory and run the vkcube.exe demo. You should see the following:

If you receive an error message then ensure that your drivers are up-to-date, include the Vulkan runtime and that your graphics card is supported. See the introduction chapter for links to drivers from the major vendors.

There is another program in this directory that will be useful for development. The glslangValidator.exe and glslc.exe programs will be used to compile shaders from the human-readable GLSL to bytecode. We'll cover this in depth in the shader modules chapter. The Bin directory also contains the binaries of the Vulkan loader and the validation layers, while the Lib directory contains the libraries.

Lastly, there's the Include directory that contains the Vulkan headers. Feel free to explore the other files, but we won't need them for this tutorial.

GLFW

As mentioned before, Vulkan by itself is a platform agnostic API and does not include tools for creating a window to display the rendered results. To benefit from the cross-platform advantages of Vulkan and to avoid the horrors of Win32, we'll use the GLFW library to create a window, which supports Windows, Linux and MacOS. There are other libraries available for this purpose, like SDL, but the advantage of GLFW is that it also abstracts away some of the other platform-specific things in Vulkan besides just window creation.

You can find the latest release of GLFW on the official website. In this tutorial we'll be using the 64-bit binaries, but you can of course also choose to build in 32 bit mode. In that case make sure to link with the Vulkan SDK binaries in the Lib32 directory instead of Lib. After downloading it, extract the archive to a convenient location. I've chosen to create a Libraries directory in the Visual Studio directory under documents.

GLM

Unlike DirectX 12, Vulkan does not include a library for linear algebra operations, so we'll have to download one. GLM is a nice library that is designed for use with graphics APIs and is also commonly used with OpenGL.

GLM is a header-only library, so just download the latest version and store it in a convenient location. You should have a directory structure similar to the following now:

Setting up Visual Studio

Now that you've installed all of the dependencies we can set up a basic Visual Studio project for Vulkan and write a little bit of code to make sure that everything works.

Start Visual Studio and create a new Windows Desktop Wizard project by entering a name and pressing OK.

Make sure that Console Application (.exe) is selected as application type so that we have a place to print debug messages to, and check Empty Project to prevent Visual Studio from adding boilerplate code.

Press OK to create the project and add a C++ source file. You should already know how to do that, but the steps are included here for completeness.

Now add the following code to the file. Don't worry about trying to understand it right now; we're just making sure that you can compile and run Vulkan applications. We'll start from scratch in the next chapter.

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>

#include <iostream>

int main() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Vulkan window", nullptr, nullptr);

    uint32_t extensionCount = 0;
    vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

    std::cout << extensionCount << " extensions supported\n";

    glm::mat4 matrix;
    glm::vec4 vec;
    auto test = matrix * vec;

    while(!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }

    glfwDestroyWindow(window);

    glfwTerminate();

    return 0;
}

Let's now configure the project to get rid of the errors. Open the project properties dialog and ensure that All Configurations is selected, because most of the settings apply to both Debug and Release mode.

Go to C++ -> General -> Additional Include Directories and press <Edit...> in the dropdown box.

Add the header directories for Vulkan, GLFW and GLM:

Next, open the editor for library directories under Linker -> General:

And add the locations of the object files for Vulkan and GLFW:

Go to Linker -> Input and press <Edit...> in the Additional Dependencies dropdown box.

Enter the names of the Vulkan and GLFW object files:

And finally change the compiler to support C++17 features:

You can now close the project properties dialog. If you did everything right then you should no longer see any more errors being highlighted in the code.

Finally, ensure that you are actually compiling in 64 bit mode:

Press F5 to compile and run the project and you should see a command prompt and a window pop up like this:

The number of extensions should be non-zero. Congratulations, you're all set for playing with Vulkan!

Linux

These instructions will be aimed at Ubuntu, Fedora and Arch Linux users, but you may be able to follow along by changing the package manager-specific commands to the ones that are appropriate for you. You should have a compiler that supports C++17 (GCC 7+ or Clang 5+). You'll also need make.

Vulkan Packages

The most important components you'll need for developing Vulkan applications on Linux are the Vulkan loader, validation layers, and a couple of command-line utilities to test whether your machine is Vulkan-capable:

sudo apt install vulkan-tools or sudo dnf install vulkan-tools: Command-line utilities, most importantly vulkaninfo and vkcube. Run these to confirm your machine supports Vulkan.
sudo apt install libvulkan-dev or sudo dnf install vulkan-loader-devel : Installs Vulkan loader. The loader looks up the functions in the driver at runtime, similarly to GLEW for OpenGL - if you're familiar with that.
sudo apt install vulkan-validationlayers spirv-tools or sudo dnf install mesa-vulkan-devel vulkan-validation-layers-devel: Installs the standard validation layers and required SPIR-V tools. These are crucial when debugging Vulkan applications, and we'll discuss them in the upcoming chapter.

On Arch Linux, you can run sudo pacman -S vulkan-devel to install all the required tools above.

If installation was successful, you should be all set with the Vulkan portion. Remember to run vkcube and ensure you see the following pop up in a window:

X Window System and XFree86-VidModeExtension

It is possible that these libraries are not on the system, if not, you can install them using the following commands:

sudo apt install libxxf86vm-dev or dnf install libXxf86vm-devel: Provides an interface to the XFree86-VidModeExtension.
sudo apt install libxi-dev or dnf install libXi-devel: Provides an X Window System client interface to the XINPUT extension.

GLFW

As mentioned before, Vulkan by itself is a platform agnostic API and does not include tools for creation a window to display the rendered results. To benefit from the cross-platform advantages of Vulkan and to avoid the horrors of X11, we'll use the GLFW library to create a window, which supports Windows, Linux and MacOS. There are other libraries available for this purpose, like SDL, but the advantage of GLFW is that it also abstracts away some of the other platform-specific things in Vulkan besides just window creation.

We'll be installing GLFW from the following command:

sudo apt install libglfw3-dev

sudo dnf install glfw-devel

sudo pacman -S glfw

GLM

It is a header-only library that can be installed from the libglm-dev or glm-devel package:

sudo apt install libglm-dev

sudo dnf install glm-devel

sudo pacman -S glm

Shader Compiler

We have just about all we need, except we'll want a program to compile shaders from the human-readable GLSL to bytecode.

Two popular shader compilers are Khronos Group's glslangValidator and Google's glslc. The latter has a familiar GCC- and Clang-like usage, so we'll go with that: on Ubuntu, download Google's unofficial binaries and copy glslc to your /usr/local/bin. Note you may need to sudo depending on your permissions. On Fedora use sudo dnf install glslc, while on Arch Linux run sudo pacman -S shaderc. To test, run glslc and it should rightfully complain we didn't pass any shaders to compile:

glslc: error: no input files

We'll cover glslc in depth in the shader modules chapter.

Setting up a makefile project

Now that you have installed all of the dependencies, we can set up a basic makefile project for Vulkan and write a little bit of code to make sure that everything works.

Create a new directory at a convenient location with a name like VulkanTest. Create a source file called main.cpp and insert the following code. Don't worry about trying to understand it right now; we're just making sure that you can compile and run Vulkan applications. We'll start from scratch in the next chapter.

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>

#include <iostream>

int main() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Vulkan window", nullptr, nullptr);

    uint32_t extensionCount = 0;
    vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

    std::cout << extensionCount << " extensions supported\n";

    glm::mat4 matrix;
    glm::vec4 vec;
    auto test = matrix * vec;

    while(!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }

    glfwDestroyWindow(window);

    glfwTerminate();

    return 0;
}

Next, we'll write a makefile to compile and run this basic Vulkan code. Create a new empty file called Makefile. I will assume that you already have some basic experience with makefiles, like how variables and rules work. If not, you can get up to speed very quickly with this tutorial.

We'll first define a couple of variables to simplify the remainder of the file. Define a CFLAGS variable that will specify the basic compiler flags:

CFLAGS = -std=c++17 -O2

We're going to use modern C++ (-std=c++17), and we'll set optimization level to O2. We can remove -O2 to compile programs faster, but we should remember to place it back for release builds.

Similarly, define the linker flags in a LDFLAGS variable:

LDFLAGS = -lglfw -lvulkan -ldl -lpthread -lX11 -lXxf86vm -lXrandr -lXi

The flag -lglfw is for GLFW, -lvulkan links with the Vulkan function loader and the remaining flags are low-level system libraries that GLFW needs. The remaining flags are dependencies of GLFW itself: the threading and window management.

It is possible that the Xxf86vm and Xi libraries are not yet installed on your system. You can find them in the following packages:

sudo apt install libxxf86vm-dev libxi-dev

sudo dnf install libXi-devel libXxf86vm-devel

sudo pacman -S libxi libxxf86vm

Specifying the rule to compile VulkanTest is straightforward now. Make sure to use tabs for indentation instead of spaces.

VulkanTest: main.cpp
	g++ $(CFLAGS) -o VulkanTest main.cpp $(LDFLAGS)

Verify that this rule works by saving the makefile and running make in the directory with main.cpp and Makefile. This should result in a VulkanTest executable.

We'll now define two more rules, test and clean, where the former will run the executable and the latter will remove a built executable:

.PHONY: test clean

test: VulkanTest
	./VulkanTest

clean:
	rm -f VulkanTest

Running make test should show the program running successfully, and displaying the number of Vulkan extensions. The application should exit with the success return code (0) when you close the empty window. You should now have a complete makefile that resembles the following:

CFLAGS = -std=c++17 -O2
LDFLAGS = -lglfw -lvulkan -ldl -lpthread -lX11 -lXxf86vm -lXrandr -lXi

VulkanTest: main.cpp
	g++ $(CFLAGS) -o VulkanTest main.cpp $(LDFLAGS)

.PHONY: test clean

test: VulkanTest
	./VulkanTest

clean:
	rm -f VulkanTest

You can now use this directory as a template for your Vulkan projects. Make a copy, rename it to something like HelloTriangle and remove all of the code in main.cpp.

You are now all set for the real adventure.

MacOS

These instructions will assume you are using Xcode and the Homebrew package manager. Also, keep in mind that you will need at least MacOS version 10.11, and your device needs to support the Metal API.

Vulkan SDK

The SDK version for MacOS internally uses MoltenVK. There is no native support for Vulkan on MacOS, so what MoltenVK does is actually act as a layer that translates Vulkan API calls to Apple's Metal graphics framework. With this you can take advantage of debugging and performance benefits of Apple's Metal framework.

After downloading it, simply extract the contents to a folder of your choice (keep in mind you will need to reference it when creating your projects on Xcode). Inside the extracted folder, in the Applications folder you should have some executable files that will run a few demos using the SDK. Run the vkcube executable and you will see the following:

GLFW

As mentioned before, Vulkan by itself is a platform agnostic API and does not include tools for creation a window to display the rendered results. We'll use the GLFW library to create a window, which supports Windows, Linux and MacOS. There are other libraries available for this purpose, like SDL, but the advantage of GLFW is that it also abstracts away some of the other platform-specific things in Vulkan besides just window creation.

To install GLFW on MacOS we will use the Homebrew package manager to get the glfw package:

brew install glfw

GLM

Vulkan does not include a library for linear algebra operations, so we'll have to download one. GLM is a nice library that is designed for use with graphics APIs and is also commonly used with OpenGL.

It is a header-only library that can be installed from the glm package:

brew install glm

Setting up Xcode

Now that all the dependencies are installed we can set up a basic Xcode project for Vulkan. Most of the instructions here are essentially a lot of "plumbing" so we can get all the dependencies linked to the project. Also, keep in mind that during the following instructions whenever we mention the folder vulkansdk we are refering to the folder where you extracted the Vulkan SDK.

Start Xcode and create a new Xcode project. On the window that will open select Application > Command Line Tool.

Select Next, write a name for the project and for Language select C++.

Press Next and the project should have been created. Now, let's change the code in the generated main.cpp file to the following code:

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>

#include <iostream>

int main() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    GLFWwindow* window = glfwCreateWindow(800, 600, "Vulkan window", nullptr, nullptr);

    uint32_t extensionCount = 0;
    vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

    std::cout << extensionCount << " extensions supported\n";

    glm::mat4 matrix;
    glm::vec4 vec;
    auto test = matrix * vec;

    while(!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }

    glfwDestroyWindow(window);

    glfwTerminate();

    return 0;
}

Keep in mind you are not required to understand all this code is doing yet, we are just setting up some API calls to make sure everything is working.

Xcode should already be showing some errors such as libraries it cannot find. We will now start configuring the project to get rid of those errors. On the Project Navigator panel select your project. Open the Build Settings tab and then:

Find the Header Search Paths field and add a link to /usr/local/include (this is where Homebrew installs headers, so the glm and glfw3 header files should be there) and a link to vulkansdk/macOS/include for the Vulkan headers.
Find the Library Search Paths field and add a link to /usr/local/lib (again, this is where Homebrew installs libraries, so the glm and glfw3 lib files should be there) and a link to vulkansdk/macOS/lib.

It should look like so (obviously, paths will be different depending on where you placed on your files):

Now, in the Build Phases tab, on Link Binary With Libraries we will add both the glfw3 and the vulkan frameworks. To make things easier we will be adding the dynamic libraries in the project (you can check the documentation of these libraries if you want to use the static frameworks).

For glfw open the folder /usr/local/lib and there you will find a file name like libglfw.3.x.dylib ("x" is the library's version number, it might be different depending on when you downloaded the package from Homebrew). Simply drag that file to the Linked Frameworks and Libraries tab on Xcode.
For vulkan, go to vulkansdk/macOS/lib. Do the same for the both files libvulkan.1.dylib and libvulkan.1.x.xx.dylib (where "x" will be the version number of the the SDK you downloaded).

After adding those libraries, in the same tab on Copy Files change Destination to "Frameworks", clear the subpath and deselect "Copy only when installing". Click on the "+" sign and add all those three frameworks here aswell.

Your Xcode configuration should look like:

The last thing you need to setup are a couple of environment variables. On Xcode toolbar go to Product > Scheme > Edit Scheme..., and in the Arguments tab add the two following environment variables:

VK_ICD_FILENAMES = vulkansdk/macOS/share/vulkan/icd.d/MoltenVK_icd.json
VK_LAYER_PATH = vulkansdk/macOS/share/vulkan/explicit_layer.d

It should look like so:

Finally, you should be all set! Now if you run the project (remembering to setting the build configuration to Debug or Release depending on the configuration you chose) you should see the following:

The number of extensions should be non-zero. The other logs are from the libraries, you might get different messages from those depending on your configuration.

You are now all set for the real thing.

Drawing a triangle

Setup

Base code

General structure

In the previous chapter you've created a Vulkan project with all of the proper configuration and tested it with the sample code. In this chapter we're starting from scratch with the following code:

#include <vulkan/vulkan.h>

#include <iostream>
#include <stdexcept>
#include <cstdlib>

class HelloTriangleApplication {
public:
    void run() {
        initVulkan();
        mainLoop();
        cleanup();
    }

private:
    void initVulkan() {

    }

    void mainLoop() {

    }

    void cleanup() {

    }
};

int main() {
    HelloTriangleApplication app;

    try {
        app.run();
    } catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

We first include the Vulkan header from the LunarG SDK, which provides the functions, structures and enumerations. The stdexcept and iostream headers are included for reporting and propagating errors. The cstdlib header provides the EXIT_SUCCESS and EXIT_FAILURE macros.

The program itself is wrapped into a class where we'll store the Vulkan objects as private class members and add functions to initiate each of them, which will be called from the initVulkan function. Once everything has been prepared, we enter the main loop to start rendering frames. We'll fill in the mainLoop function to include a loop that iterates until the window is closed in a moment. Once the window is closed and mainLoop returns, we'll make sure to deallocate the resources we've used in the cleanup function.

If any kind of fatal error occurs during execution then we'll throw a std::runtime_error exception with a descriptive message, which will propagate back to the main function and be printed to the command prompt. To handle a variety of standard exception types as well, we catch the more general std::exception. One example of an error that we will deal with soon is finding out that a certain required extension is not supported.

Roughly every chapter that follows after this one will add one new function that will be called from initVulkan and one or more new Vulkan objects to the private class members that need to be freed at the end in cleanup.

Resource management

Just like each chunk of memory allocated with malloc requires a call to free, every Vulkan object that we create needs to be explicitly destroyed when we no longer need it. In C++ it is possible to perform automatic resource management using RAII or smart pointers provided in the <memory> header. However, I've chosen to be explicit about allocation and deallocation of Vulkan objects in this tutorial. After all, Vulkan's niche is to be explicit about every operation to avoid mistakes, so it's good to be explicit about the lifetime of objects to learn how the API works.

After following this tutorial, you could implement automatic resource management by writing C++ classes that acquire Vulkan objects in their constructor and release them in their destructor, or by providing a custom deleter to either std::unique_ptr or std::shared_ptr, depending on your ownership requirements. RAII is the recommended model for larger Vulkan programs, but for learning purposes it's always good to know what's going on behind the scenes.

Vulkan objects are either created directly with functions like vkCreateXXX, or allocated through another object with functions like vkAllocateXXX. After making sure that an object is no longer used anywhere, you need to destroy it with the counterparts vkDestroyXXX and vkFreeXXX. The parameters for these functions generally vary for different types of objects, but there is one parameter that they all share: pAllocator. This is an optional parameter that allows you to specify callbacks for a custom memory allocator. We will ignore this parameter in the tutorial and always pass nullptr as argument.

Integrating GLFW

Vulkan works perfectly fine without creating a window if you want to use it for off-screen rendering, but it's a lot more exciting to actually show something! First replace the #include <vulkan/vulkan.h> line with

#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>

That way GLFW will include its own definitions and automatically load the Vulkan header with it. Add a initWindow function and add a call to it from the run function before the other calls. We'll use that function to initialize GLFW and create a window.

void run() {
    initWindow();
    initVulkan();
    mainLoop();
    cleanup();
}

private:
    void initWindow() {

    }

The very first call in initWindow should be glfwInit(), which initializes the GLFW library. Because GLFW was originally designed to create an OpenGL context, we need to tell it to not create an OpenGL context with a subsequent call:

glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);

Because handling resized windows takes special care that we'll look into later, disable it for now with another window hint call:

glfwWindowHint(GLFW_RESIZABLE, GLFW_FALSE);

All that's left now is creating the actual window. Add a GLFWwindow* window; private class member to store a reference to it and initialize the window with:

window = glfwCreateWindow(800, 600, "Vulkan", nullptr, nullptr);

The first three parameters specify the width, height and title of the window. The fourth parameter allows you to optionally specify a monitor to open the window on and the last parameter is only relevant to OpenGL.

It's a good idea to use constants instead of hardcoded width and height numbers because we'll be referring to these values a couple of times in the future. I've added the following lines above the HelloTriangleApplication class definition:

const uint32_t WIDTH = 800;
const uint32_t HEIGHT = 600;

and replaced the window creation call with

window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);

You should now have a initWindow function that looks like this:

void initWindow() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
    glfwWindowHint(GLFW_RESIZABLE, GLFW_FALSE);

    window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
}

To keep the application running until either an error occurs or the window is closed, we need to add an event loop to the mainLoop function as follows:

void mainLoop() {
    while (!glfwWindowShouldClose(window)) {
        glfwPollEvents();
    }
}

This code should be fairly self-explanatory. It loops and checks for events like pressing the X button until the window has been closed by the user. This is also the loop where we'll later call a function to render a single frame.

Once the window is closed, we need to clean up resources by destroying it and terminating GLFW itself. This will be our first cleanup code:

void cleanup() {
    glfwDestroyWindow(window);

    glfwTerminate();
}

When you run the program now you should see a window titled Vulkan show up until the application is terminated by closing the window. Now that we have the skeleton for the Vulkan application, let's create the first Vulkan object!

C++ code

Instance

Creating an instance

The very first thing you need to do is initialize the Vulkan library by creating an instance. The instance is the connection between your application and the Vulkan library and creating it involves specifying some details about your application to the driver.

Start by adding a createInstance function and invoking it in the initVulkan function.

void initVulkan() {
    createInstance();
}

Additionally add a data member to hold the handle to the instance:

private:
VkInstance instance;

Now, to create an instance we'll first have to fill in a struct with some information about our application. This data is technically optional, but it may provide some useful information to the driver in order to optimize our specific application (e.g. because it uses a well-known graphics engine with certain special behavior). This struct is called VkApplicationInfo:

void createInstance() {
    VkApplicationInfo appInfo{};
    appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
    appInfo.pApplicationName = "Hello Triangle";
    appInfo.applicationVersion = VK_MAKE_VERSION(1, 0, 0);
    appInfo.pEngineName = "No Engine";
    appInfo.engineVersion = VK_MAKE_VERSION(1, 0, 0);
    appInfo.apiVersion = VK_API_VERSION_1_0;
}

As mentioned before, many structs in Vulkan require you to explicitly specify the type in the sType member. This is also one of the many structs with a pNext member that can point to extension information in the future. We're using value initialization here to leave it as nullptr.

A lot of information in Vulkan is passed through structs instead of function parameters and we'll have to fill in one more struct to provide sufficient information for creating an instance. This next struct is not optional and tells the Vulkan driver which global extensions and validation layers we want to use. Global here means that they apply to the entire program and not a specific device, which will become clear in the next few chapters.

VkInstanceCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
createInfo.pApplicationInfo = &appInfo;

The first two parameters are straightforward. The next two layers specify the desired global extensions. As mentioned in the overview chapter, Vulkan is a platform agnostic API, which means that you need an extension to interface with the window system. GLFW has a handy built-in function that returns the extension(s) it needs to do that which we can pass to the struct:

uint32_t glfwExtensionCount = 0;
const char** glfwExtensions;

glfwExtensions = glfwGetRequiredInstanceExtensions(&glfwExtensionCount);

createInfo.enabledExtensionCount = glfwExtensionCount;
createInfo.ppEnabledExtensionNames = glfwExtensions;

The last two members of the struct determine the global validation layers to enable. We'll talk about these more in-depth in the next chapter, so just leave these empty for now.

createInfo.enabledLayerCount = 0;

We've now specified everything Vulkan needs to create an instance and we can finally issue the vkCreateInstance call:

VkResult result = vkCreateInstance(&createInfo, nullptr, &instance);

As you'll see, the general pattern that object creation function parameters in Vulkan follow is:

Pointer to struct with creation info
Pointer to custom allocator callbacks, always nullptr in this tutorial
Pointer to the variable that stores the handle to the new object

If everything went well then the handle to the instance was stored in the VkInstance class member. Nearly all Vulkan functions return a value of type VkResult that is either VK_SUCCESS or an error code. To check if the instance was created successfully, we don't need to store the result and can just use a check for the success value instead:

if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
    throw std::runtime_error("failed to create instance!");
}

Now run the program to make sure that the instance is created successfully.

Encountered VK_ERROR_INCOMPATIBLE_DRIVER:

If using MacOS with the latest MoltenVK sdk, you may get VK_ERROR_INCOMPATIBLE_DRIVER returned from vkCreateInstance. According to the Getting Start Notes. Beginning with the 1.3.216 Vulkan SDK, the VK_KHR_PORTABILITY_subset extension is mandatory.

To get over this error, first add the VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR bit to VkInstanceCreateInfo struct's flags, then add VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME to instance enabled extension list.

Typically the code could be like this:

...

std::vector<const char*> requiredExtensions;

for(uint32_t i = 0; i < glfwExtensionCount; i++) {
    requiredExtensions.emplace_back(glfwExtensions[i]);
}

requiredExtensions.emplace_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);

createInfo.flags |= VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR;

createInfo.enabledExtensionCount = (uint32_t) requiredExtensions.size();
createInfo.ppEnabledExtensionNames = requiredExtensions.data();

if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
    throw std::runtime_error("failed to create instance!");
}

Checking for extension support

If you look at the vkCreateInstance documentation then you'll see that one of the possible error codes is VK_ERROR_EXTENSION_NOT_PRESENT. We could simply specify the extensions we require and terminate if that error code comes back. That makes sense for essential extensions like the window system interface, but what if we want to check for optional functionality?

To retrieve a list of supported extensions before creating an instance, there's the vkEnumerateInstanceExtensionProperties function. It takes a pointer to a variable that stores the number of extensions and an array of VkExtensionProperties to store details of the extensions. It also takes an optional first parameter that allows us to filter extensions by a specific validation layer, which we'll ignore for now.

To allocate an array to hold the extension details we first need to know how many there are. You can request just the number of extensions by leaving the latter parameter empty:

uint32_t extensionCount = 0;
vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, nullptr);

Now allocate an array to hold the extension details (include <vector>):

std::vector<VkExtensionProperties> extensions(extensionCount);

Finally we can query the extension details:

vkEnumerateInstanceExtensionProperties(nullptr, &extensionCount, extensions.data());

Each VkExtensionProperties struct contains the name and version of an extension. We can list them with a simple for loop (\t is a tab for indentation):

std::cout << "available extensions:\n";

for (const auto& extension : extensions) {
    std::cout << '\t' << extension.extensionName << '\n';
}

You can add this code to the createInstance function if you'd like to provide some details about the Vulkan support. As a challenge, try to create a function that checks if all of the extensions returned by glfwGetRequiredInstanceExtensions are included in the supported extensions list.

Cleaning up

The VkInstance should be only destroyed right before the program exits. It can be destroyed in cleanup with the vkDestroyInstance function:

void cleanup() {
    vkDestroyInstance(instance, nullptr);

    glfwDestroyWindow(window);

    glfwTerminate();
}

The parameters for the vkDestroyInstance function are straightforward. As mentioned in the previous chapter, the allocation and deallocation functions in Vulkan have an optional allocator callback that we'll ignore by passing nullptr to it. All of the other Vulkan resources that we'll create in the following chapters should be cleaned up before the instance is destroyed.

Before continuing with the more complex steps after instance creation, it's time to evaluate our debugging options by checking out validation layers.

C++ code

Validation layers

What are validation layers?

The Vulkan API is designed around the idea of minimal driver overhead and one of the manifestations of that goal is that there is very limited error checking in the API by default. Even mistakes as simple as setting enumerations to incorrect values or passing null pointers to required parameters are generally not explicitly handled and will simply result in crashes or undefined behavior. Because Vulkan requires you to be very explicit about everything you're doing, it's easy to make many small mistakes like using a new GPU feature and forgetting to request it at logical device creation time.

However, that doesn't mean that these checks can't be added to the API. Vulkan introduces an elegant system for this known as validation layers. Validation layers are optional components that hook into Vulkan function calls to apply additional operations. Common operations in validation layers are:

Checking the values of parameters against the specification to detect misuse
Tracking creation and destruction of objects to find resource leaks
Checking thread safety by tracking the threads that calls originate from
Logging every call and its parameters to the standard output
Tracing Vulkan calls for profiling and replaying

Here's an example of what the implementation of a function in a diagnostics validation layer could look like:

VkResult vkCreateInstance(
    const VkInstanceCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkInstance* instance) {

    if (pCreateInfo == nullptr || instance == nullptr) {
        log("Null pointer passed to required parameter!");
        return VK_ERROR_INITIALIZATION_FAILED;
    }

    return real_vkCreateInstance(pCreateInfo, pAllocator, instance);
}

These validation layers can be freely stacked to include all the debugging functionality that you're interested in. You can simply enable validation layers for debug builds and completely disable them for release builds, which gives you the best of both worlds!

Vulkan does not come with any validation layers built-in, but the LunarG Vulkan SDK provides a nice set of layers that check for common errors. They're also completely open source, so you can check which kind of mistakes they check for and contribute. Using the validation layers is the best way to avoid your application breaking on different drivers by accidentally relying on undefined behavior.

Validation layers can only be used if they have been installed onto the system. For example, the LunarG validation layers are only available on PCs with the Vulkan SDK installed.

There were formerly two different types of validation layers in Vulkan: instance and device specific. The idea was that instance layers would only check calls related to global Vulkan objects like instances, and device specific layers would only check calls related to a specific GPU. Device specific layers have now been deprecated, which means that instance validation layers apply to all Vulkan calls. The specification document still recommends that you enable validation layers at device level as well for compatibility, which is required by some implementations. We'll simply specify the same layers as the instance at logical device level, which we'll see later on.

Using validation layers

In this section we'll see how to enable the standard diagnostics layers provided by the Vulkan SDK. Just like extensions, validation layers need to be enabled by specifying their name. All of the useful standard validation is bundled into a layer included in the SDK that is known as VK_LAYER_KHRONOS_validation.

Let's first add two configuration variables to the program to specify the layers to enable and whether to enable them or not. I've chosen to base that value on whether the program is being compiled in debug mode or not. The NDEBUG macro is part of the C++ standard and means "not debug".

const uint32_t WIDTH = 800;
const uint32_t HEIGHT = 600;

const std::vector<const char*> validationLayers = {
    "VK_LAYER_KHRONOS_validation"
};

#ifdef NDEBUG
    const bool enableValidationLayers = false;
#else
    const bool enableValidationLayers = true;
#endif

We'll add a new function checkValidationLayerSupport that checks if all of the requested layers are available. First list all of the available layers using the vkEnumerateInstanceLayerProperties function. Its usage is identical to that of vkEnumerateInstanceExtensionProperties which was discussed in the instance creation chapter.

bool checkValidationLayerSupport() {
    uint32_t layerCount;
    vkEnumerateInstanceLayerProperties(&layerCount, nullptr);

    std::vector<VkLayerProperties> availableLayers(layerCount);
    vkEnumerateInstanceLayerProperties(&layerCount, availableLayers.data());

    return false;
}

Next, check if all of the layers in validationLayers exist in the availableLayers list. You may need to include <cstring> for strcmp.

for (const char* layerName : validationLayers) {
    bool layerFound = false;

    for (const auto& layerProperties : availableLayers) {
        if (strcmp(layerName, layerProperties.layerName) == 0) {
            layerFound = true;
            break;
        }
    }

    if (!layerFound) {
        return false;
    }
}

return true;

We can now use this function in createInstance:

void createInstance() {
    if (enableValidationLayers && !checkValidationLayerSupport()) {
        throw std::runtime_error("validation layers requested, but not available!");
    }

    ...
}

Now run the program in debug mode and ensure that the error does not occur. If it does, then have a look at the FAQ.

Finally, modify the VkInstanceCreateInfo struct instantiation to include the validation layer names if they are enabled:

if (enableValidationLayers) {
    createInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
    createInfo.ppEnabledLayerNames = validationLayers.data();
} else {
    createInfo.enabledLayerCount = 0;
}

If the check was successful then vkCreateInstance should not ever return a VK_ERROR_LAYER_NOT_PRESENT error, but you should run the program to make sure.

Message callback

The validation layers will print debug messages to the standard output by default, but we can also handle them ourselves by providing an explicit callback in our program. This will also allow you to decide which kind of messages you would like to see, because not all are necessarily (fatal) errors. If you don't want to do that right now then you may skip to the last section in this chapter.

To set up a callback in the program to handle messages and the associated details, we have to set up a debug messenger with a callback using the VK_EXT_debug_utils extension.

We'll first create a getRequiredExtensions function that will return the required list of extensions based on whether validation layers are enabled or not:

std::vector<const char*> getRequiredExtensions() {
    uint32_t glfwExtensionCount = 0;
    const char** glfwExtensions;
    glfwExtensions = glfwGetRequiredInstanceExtensions(&glfwExtensionCount);

    std::vector<const char*> extensions(glfwExtensions, glfwExtensions + glfwExtensionCount);

    if (enableValidationLayers) {
        extensions.push_back(VK_EXT_DEBUG_UTILS_EXTENSION_NAME);
    }

    return extensions;
}

The extensions specified by GLFW are always required, but the debug messenger extension is conditionally added. Note that I've used the VK_EXT_DEBUG_UTILS_EXTENSION_NAME macro here which is equal to the literal string "VK_EXT_debug_utils". Using this macro lets you avoid typos.

We can now use this function in createInstance:

auto extensions = getRequiredExtensions();
createInfo.enabledExtensionCount = static_cast<uint32_t>(extensions.size());
createInfo.ppEnabledExtensionNames = extensions.data();

Run the program to make sure you don't receive a VK_ERROR_EXTENSION_NOT_PRESENT error. We don't really need to check for the existence of this extension, because it should be implied by the availability of the validation layers.

Now let's see what a debug callback function looks like. Add a new static member function called debugCallback with the PFN_vkDebugUtilsMessengerCallbackEXT prototype. The VKAPI_ATTR and VKAPI_CALL ensure that the function has the right signature for Vulkan to call it.

static VKAPI_ATTR VkBool32 VKAPI_CALL debugCallback(
    VkDebugUtilsMessageSeverityFlagBitsEXT messageSeverity,
    VkDebugUtilsMessageTypeFlagsEXT messageType,
    const VkDebugUtilsMessengerCallbackDataEXT* pCallbackData,
    void* pUserData) {

    std::cerr << "validation layer: " << pCallbackData->pMessage << std::endl;

    return VK_FALSE;
}

The first parameter specifies the severity of the message, which is one of the following flags:

VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT: Diagnostic message
VK_DEBUG_UTILS_MESSAGE_SEVERITY_INFO_BIT_EXT: Informational message like the creation of a resource
VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT: Message about behavior that is not necessarily an error, but very likely a bug in your application
VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT: Message about behavior that is invalid and may cause crashes

The values of this enumeration are set up in such a way that you can use a comparison operation to check if a message is equal or worse compared to some level of severity, for example:

if (messageSeverity >= VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT) {
    // Message is important enough to show
}

The messageType parameter can have the following values:

VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT: Some event has happened that is unrelated to the specification or performance
VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT: Something has happened that violates the specification or indicates a possible mistake
VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT: Potential non-optimal use of Vulkan

The pCallbackData parameter refers to a VkDebugUtilsMessengerCallbackDataEXT struct containing the details of the message itself, with the most important members being:

pMessage: The debug message as a null-terminated string
pObjects: Array of Vulkan object handles related to the message
objectCount: Number of objects in array

Finally, the pUserData parameter contains a pointer that was specified during the setup of the callback and allows you to pass your own data to it.

The callback returns a boolean that indicates if the Vulkan call that triggered the validation layer message should be aborted. If the callback returns true, then the call is aborted with the VK_ERROR_VALIDATION_FAILED_EXT error. This is normally only used to test the validation layers themselves, so you should always return VK_FALSE.

All that remains now is telling Vulkan about the callback function. Perhaps somewhat surprisingly, even the debug callback in Vulkan is managed with a handle that needs to be explicitly created and destroyed. Such a callback is part of a debug messenger and you can have as many of them as you want. Add a class member for this handle right under instance:

VkDebugUtilsMessengerEXT debugMessenger;

Now add a function setupDebugMessenger to be called from initVulkan right after createInstance:

void initVulkan() {
    createInstance();
    setupDebugMessenger();
}

void setupDebugMessenger() {
    if (!enableValidationLayers) return;

}

We'll need to fill in a structure with details about the messenger and its callback:

VkDebugUtilsMessengerCreateInfoEXT createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_MESSENGER_CREATE_INFO_EXT;
createInfo.messageSeverity = VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT;
createInfo.messageType = VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT;
createInfo.pfnUserCallback = debugCallback;
createInfo.pUserData = nullptr; // Optional

The messageSeverity field allows you to specify all the types of severities you would like your callback to be called for. I've specified all types except for VK_DEBUG_UTILS_MESSAGE_SEVERITY_INFO_BIT_EXT here to receive notifications about possible problems while leaving out verbose general debug info.

Similarly the messageType field lets you filter which types of messages your callback is notified about. I've simply enabled all types here. You can always disable some if they're not useful to you.

Finally, the pfnUserCallback field specifies the pointer to the callback function. You can optionally pass a pointer to the pUserData field which will be passed along to the callback function via the pUserData parameter. You could use this to pass a pointer to the HelloTriangleApplication class, for example.

Note that there are many more ways to configure validation layer messages and debug callbacks, but this is a good setup to get started with for this tutorial. See the extension specification for more info about the possibilities.

This struct should be passed to the vkCreateDebugUtilsMessengerEXT function to create the VkDebugUtilsMessengerEXT object. Unfortunately, because this function is an extension function, it is not automatically loaded. We have to look up its address ourselves using vkGetInstanceProcAddr. We're going to create our own proxy function that handles this in the background. I've added it right above the HelloTriangleApplication class definition.

VkResult CreateDebugUtilsMessengerEXT(VkInstance instance, const VkDebugUtilsMessengerCreateInfoEXT* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkDebugUtilsMessengerEXT* pDebugMessenger) {
    auto func = (PFN_vkCreateDebugUtilsMessengerEXT) vkGetInstanceProcAddr(instance, "vkCreateDebugUtilsMessengerEXT");
    if (func != nullptr) {
        return func(instance, pCreateInfo, pAllocator, pDebugMessenger);
    } else {
        return VK_ERROR_EXTENSION_NOT_PRESENT;
    }
}

The vkGetInstanceProcAddr function will return nullptr if the function couldn't be loaded. We can now call this function to create the extension object if it's available:

if (CreateDebugUtilsMessengerEXT(instance, &createInfo, nullptr, &debugMessenger) != VK_SUCCESS) {
    throw std::runtime_error("failed to set up debug messenger!");
}

The second to last parameter is again the optional allocator callback that we set to nullptr, other than that the parameters are fairly straightforward. Since the debug messenger is specific to our Vulkan instance and its layers, it needs to be explicitly specified as first argument. You will also see this pattern with other child objects later on.

The VkDebugUtilsMessengerEXT object also needs to be cleaned up with a call to vkDestroyDebugUtilsMessengerEXT. Similarly to vkCreateDebugUtilsMessengerEXT the function needs to be explicitly loaded.

Create another proxy function right below CreateDebugUtilsMessengerEXT:

void DestroyDebugUtilsMessengerEXT(VkInstance instance, VkDebugUtilsMessengerEXT debugMessenger, const VkAllocationCallbacks* pAllocator) {
    auto func = (PFN_vkDestroyDebugUtilsMessengerEXT) vkGetInstanceProcAddr(instance, "vkDestroyDebugUtilsMessengerEXT");
    if (func != nullptr) {
        func(instance, debugMessenger, pAllocator);
    }
}

Make sure that this function is either a static class function or a function outside the class. We can then call it in the cleanup function:

void cleanup() {
    if (enableValidationLayers) {
        DestroyDebugUtilsMessengerEXT(instance, debugMessenger, nullptr);
    }

    vkDestroyInstance(instance, nullptr);

    glfwDestroyWindow(window);

    glfwTerminate();
}

Debugging instance creation and destruction

Although we've now added debugging with validation layers to the program we're not covering everything quite yet. The vkCreateDebugUtilsMessengerEXT call requires a valid instance to have been created and vkDestroyDebugUtilsMessengerEXT must be called before the instance is destroyed. This currently leaves us unable to debug any issues in the vkCreateInstance and vkDestroyInstance calls.

However, if you closely read the extension documentation, you'll see that there is a way to create a separate debug utils messenger specifically for those two function calls. It requires you to simply pass a pointer to a VkDebugUtilsMessengerCreateInfoEXT struct in the pNext extension field of VkInstanceCreateInfo. First extract population of the messenger create info into a separate function:

void populateDebugMessengerCreateInfo(VkDebugUtilsMessengerCreateInfoEXT& createInfo) {
    createInfo = {};
    createInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_MESSENGER_CREATE_INFO_EXT;
    createInfo.messageSeverity = VK_DEBUG_UTILS_MESSAGE_SEVERITY_VERBOSE_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_WARNING_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_SEVERITY_ERROR_BIT_EXT;
    createInfo.messageType = VK_DEBUG_UTILS_MESSAGE_TYPE_GENERAL_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_VALIDATION_BIT_EXT | VK_DEBUG_UTILS_MESSAGE_TYPE_PERFORMANCE_BIT_EXT;
    createInfo.pfnUserCallback = debugCallback;
}

...

void setupDebugMessenger() {
    if (!enableValidationLayers) return;

    VkDebugUtilsMessengerCreateInfoEXT createInfo;
    populateDebugMessengerCreateInfo(createInfo);

    if (CreateDebugUtilsMessengerEXT(instance, &createInfo, nullptr, &debugMessenger) != VK_SUCCESS) {
        throw std::runtime_error("failed to set up debug messenger!");
    }
}

We can now re-use this in the createInstance function:

void createInstance() {
    ...

    VkInstanceCreateInfo createInfo{};
    createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
    createInfo.pApplicationInfo = &appInfo;

    ...

    VkDebugUtilsMessengerCreateInfoEXT debugCreateInfo{};
    if (enableValidationLayers) {
        createInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
        createInfo.ppEnabledLayerNames = validationLayers.data();

        populateDebugMessengerCreateInfo(debugCreateInfo);
        createInfo.pNext = (VkDebugUtilsMessengerCreateInfoEXT*) &debugCreateInfo;
    } else {
        createInfo.enabledLayerCount = 0;

        createInfo.pNext = nullptr;
    }

    if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
        throw std::runtime_error("failed to create instance!");
    }
}

The debugCreateInfo variable is placed outside the if statement to ensure that it is not destroyed before the vkCreateInstance call. By creating an additional debug messenger this way it will automatically be used during vkCreateInstance and vkDestroyInstance and cleaned up after that.

Testing

Now let's intentionally make a mistake to see the validation layers in action. Temporarily remove the call to DestroyDebugUtilsMessengerEXT in the cleanup function and run your program. Once it exits you should see something like this:

If you don't see any messages then check your installation.

If you want to see which call triggered a message, you can add a breakpoint to the message callback and look at the stack trace.

Configuration

There are a lot more settings for the behavior of validation layers than just the flags specified in the VkDebugUtilsMessengerCreateInfoEXT struct. Browse to the Vulkan SDK and go to the Config directory. There you will find a vk_layer_settings.txt file that explains how to configure the layers.

To configure the layer settings for your own application, copy the file to the Debug and Release directories of your project and follow the instructions to set the desired behavior. However, for the remainder of this tutorial I'll assume that you're using the default settings.

Throughout this tutorial I'll be making a couple of intentional mistakes to show you how helpful the validation layers are with catching them and to teach you how important it is to know exactly what you're doing with Vulkan. Now it's time to look at Vulkan devices in the system.

C++ code

Physical devices and queue families

Selecting a physical device

After initializing the Vulkan library through a VkInstance we need to look for and select a graphics card in the system that supports the features we need. In fact we can select any number of graphics cards and use them simultaneously, but in this tutorial we'll stick to the first graphics card that suits our needs.

We'll add a function pickPhysicalDevice and add a call to it in the initVulkan function.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    pickPhysicalDevice();
}

void pickPhysicalDevice() {

}

The graphics card that we'll end up selecting will be stored in a VkPhysicalDevice handle that is added as a new class member. This object will be implicitly destroyed when the VkInstance is destroyed, so we won't need to do anything new in the cleanup function.

VkPhysicalDevice physicalDevice = VK_NULL_HANDLE;

Listing the graphics cards is very similar to listing extensions and starts with querying just the number.

uint32_t deviceCount = 0;
vkEnumeratePhysicalDevices(instance, &deviceCount, nullptr);

If there are 0 devices with Vulkan support then there is no point going further.

if (deviceCount == 0) {
    throw std::runtime_error("failed to find GPUs with Vulkan support!");
}

Otherwise we can now allocate an array to hold all of the VkPhysicalDevice handles.

std::vector<VkPhysicalDevice> devices(deviceCount);
vkEnumeratePhysicalDevices(instance, &deviceCount, devices.data());

Now we need to evaluate each of them and check if they are suitable for the operations we want to perform, because not all graphics cards are created equal. For that we'll introduce a new function:

bool isDeviceSuitable(VkPhysicalDevice device) {
    return true;
}

And we'll check if any of the physical devices meet the requirements that we'll add to that function.

for (const auto& device : devices) {
    if (isDeviceSuitable(device)) {
        physicalDevice = device;
        break;
    }
}

if (physicalDevice == VK_NULL_HANDLE) {
    throw std::runtime_error("failed to find a suitable GPU!");
}

The next section will introduce the first requirements that we'll check for in the isDeviceSuitable function. As we'll start using more Vulkan features in the later chapters we will also extend this function to include more checks.

Base device suitability checks

To evaluate the suitability of a device we can start by querying for some details. Basic device properties like the name, type and supported Vulkan version can be queried using vkGetPhysicalDeviceProperties.

VkPhysicalDeviceProperties deviceProperties;
vkGetPhysicalDeviceProperties(device, &deviceProperties);

The support for optional features like texture compression, 64 bit floats and multi viewport rendering (useful for VR) can be queried using vkGetPhysicalDeviceFeatures:

VkPhysicalDeviceFeatures deviceFeatures;
vkGetPhysicalDeviceFeatures(device, &deviceFeatures);

There are more details that can be queried from devices that we'll discuss later concerning device memory and queue families (see the next section).

As an example, let's say we consider our application only usable for dedicated graphics cards that support geometry shaders. Then the isDeviceSuitable function would look like this:

bool isDeviceSuitable(VkPhysicalDevice device) {
    VkPhysicalDeviceProperties deviceProperties;
    VkPhysicalDeviceFeatures deviceFeatures;
    vkGetPhysicalDeviceProperties(device, &deviceProperties);
    vkGetPhysicalDeviceFeatures(device, &deviceFeatures);

    return deviceProperties.deviceType == VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU &&
           deviceFeatures.geometryShader;
}

Instead of just checking if a device is suitable or not and going with the first one, you could also give each device a score and pick the highest one. That way you could favor a dedicated graphics card by giving it a higher score, but fall back to an integrated GPU if that's the only available one. You could implement something like that as follows:

#include <map>

...

void pickPhysicalDevice() {
    ...

    // Use an ordered map to automatically sort candidates by increasing score
    std::multimap<int, VkPhysicalDevice> candidates;

    for (const auto& device : devices) {
        int score = rateDeviceSuitability(device);
        candidates.insert(std::make_pair(score, device));
    }

    // Check if the best candidate is suitable at all
    if (candidates.rbegin()->first > 0) {
        physicalDevice = candidates.rbegin()->second;
    } else {
        throw std::runtime_error("failed to find a suitable GPU!");
    }
}

int rateDeviceSuitability(VkPhysicalDevice device) {
    ...

    int score = 0;

    // Discrete GPUs have a significant performance advantage
    if (deviceProperties.deviceType == VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU) {
        score += 1000;
    }

    // Maximum possible size of textures affects graphics quality
    score += deviceProperties.limits.maxImageDimension2D;

    // Application can't function without geometry shaders
    if (!deviceFeatures.geometryShader) {
        return 0;
    }

    return score;
}

You don't need to implement all that for this tutorial, but it's to give you an idea of how you could design your device selection process. Of course you can also just display the names of the choices and allow the user to select.

Because we're just starting out, Vulkan support is the only thing we need and therefore we'll settle for just any GPU:

bool isDeviceSuitable(VkPhysicalDevice device) {
    return true;
}

In the next section we'll discuss the first real required feature to check for.

Queue families

It has been briefly touched upon before that almost every operation in Vulkan, anything from drawing to uploading textures, requires commands to be submitted to a queue. There are different types of queues that originate from different queue families and each family of queues allows only a subset of commands. For example, there could be a queue family that only allows processing of compute commands or one that only allows memory transfer related commands.

We need to check which queue families are supported by the device and which one of these supports the commands that we want to use. For that purpose we'll add a new function findQueueFamilies that looks for all the queue families we need.

Right now we are only going to look for a queue that supports graphics commands, so the function could look like this:

uint32_t findQueueFamilies(VkPhysicalDevice device) {
    // Logic to find graphics queue family
}

However, in one of the next chapters we're already going to look for yet another queue, so it's better to prepare for that and bundle the indices into a struct:

struct QueueFamilyIndices {
    uint32_t graphicsFamily;
};

QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device) {
    QueueFamilyIndices indices;
    // Logic to find queue family indices to populate struct with
    return indices;
}

But what if a queue family is not available? We could throw an exception in findQueueFamilies, but this function is not really the right place to make decisions about device suitability. For example, we may prefer devices with a dedicated transfer queue family, but not require it. Therefore we need some way of indicating whether a particular queue family was found.

It's not really possible to use a magic value to indicate the nonexistence of a queue family, since any value of uint32_t could in theory be a valid queue family index including 0. Luckily C++17 introduced a data structure to distinguish between the case of a value existing or not:

#include <optional>

...

std::optional<uint32_t> graphicsFamily;

std::cout << std::boolalpha << graphicsFamily.has_value() << std::endl; // false

graphicsFamily = 0;

std::cout << std::boolalpha << graphicsFamily.has_value() << std::endl; // true

std::optional is a wrapper that contains no value until you assign something to it. At any point you can query if it contains a value or not by calling its has_value() member function. That means that we can change the logic to:

#include <optional>

...

struct QueueFamilyIndices {
    std::optional<uint32_t> graphicsFamily;
};

QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device) {
    QueueFamilyIndices indices;
    // Assign index to queue families that could be found
    return indices;
}

We can now begin to actually implement findQueueFamilies:

QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device) {
    QueueFamilyIndices indices;

    ...

    return indices;
}

The process of retrieving the list of queue families is exactly what you expect and uses vkGetPhysicalDeviceQueueFamilyProperties:

uint32_t queueFamilyCount = 0;
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr);

std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data());

The VkQueueFamilyProperties struct contains some details about the queue family, including the type of operations that are supported and the number of queues that can be created based on that family. We need to find at least one queue family that supports VK_QUEUE_GRAPHICS_BIT.

int i = 0;
for (const auto& queueFamily : queueFamilies) {
    if (queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT) {
        indices.graphicsFamily = i;
    }

    i++;
}

Now that we have this fancy queue family lookup function, we can use it as a check in the isDeviceSuitable function to ensure that the device can process the commands we want to use:

bool isDeviceSuitable(VkPhysicalDevice device) {
    QueueFamilyIndices indices = findQueueFamilies(device);

    return indices.graphicsFamily.has_value();
}

To make this a little bit more convenient, we'll also add a generic check to the struct itself:

struct QueueFamilyIndices {
    std::optional<uint32_t> graphicsFamily;

    bool isComplete() {
        return graphicsFamily.has_value();
    }
};

...

bool isDeviceSuitable(VkPhysicalDevice device) {
    QueueFamilyIndices indices = findQueueFamilies(device);

    return indices.isComplete();
}

We can now also use this for an early exit from findQueueFamilies:

for (const auto& queueFamily : queueFamilies) {
    ...

    if (indices.isComplete()) {
        break;
    }

    i++;
}

Great, that's all we need for now to find the right physical device! The next step is to create a logical device to interface with it.

C++ code

Logical device and queues

Introduction

After selecting a physical device to use we need to set up a logical device to interface with it. The logical device creation process is similar to the instance creation process and describes the features we want to use. We also need to specify which queues to create now that we've queried which queue families are available. You can even create multiple logical devices from the same physical device if you have varying requirements.

Start by adding a new class member to store the logical device handle in.

VkDevice device;

Next, add a createLogicalDevice function that is called from initVulkan.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    pickPhysicalDevice();
    createLogicalDevice();
}

void createLogicalDevice() {

}

Specifying the queues to be created

The creation of a logical device involves specifying a bunch of details in structs again, of which the first one will be VkDeviceQueueCreateInfo. This structure describes the number of queues we want for a single queue family. Right now we're only interested in a queue with graphics capabilities.

QueueFamilyIndices indices = findQueueFamilies(physicalDevice);

VkDeviceQueueCreateInfo queueCreateInfo{};
queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queueCreateInfo.queueFamilyIndex = indices.graphicsFamily.value();
queueCreateInfo.queueCount = 1;

The currently available drivers will only allow you to create a small number of queues for each queue family and you don't really need more than one. That's because you can create all of the command buffers on multiple threads and then submit them all at once on the main thread with a single low-overhead call.

Vulkan lets you assign priorities to queues to influence the scheduling of command buffer execution using floating point numbers between 0.0 and 1.0. This is required even if there is only a single queue:

float queuePriority = 1.0f;
queueCreateInfo.pQueuePriorities = &queuePriority;

Specifying used device features

The next information to specify is the set of device features that we'll be using. These are the features that we queried support for with vkGetPhysicalDeviceFeatures in the previous chapter, like geometry shaders. Right now we don't need anything special, so we can simply define it and leave everything to VK_FALSE. We'll come back to this structure once we're about to start doing more interesting things with Vulkan.

VkPhysicalDeviceFeatures deviceFeatures{};

Creating the logical device

With the previous two structures in place, we can start filling in the main VkDeviceCreateInfo structure.

VkDeviceCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;

First add pointers to the queue creation info and device features structs:

createInfo.pQueueCreateInfos = &queueCreateInfo;
createInfo.queueCreateInfoCount = 1;

createInfo.pEnabledFeatures = &deviceFeatures;

The remainder of the information bears a resemblance to the VkInstanceCreateInfo struct and requires you to specify extensions and validation layers. The difference is that these are device specific this time.

An example of a device specific extension is VK_KHR_swapchain, which allows you to present rendered images from that device to windows. It is possible that there are Vulkan devices in the system that lack this ability, for example because they only support compute operations. We will come back to this extension in the swap chain chapter.

Previous implementations of Vulkan made a distinction between instance and device specific validation layers, but this is no longer the case. That means that the enabledLayerCount and ppEnabledLayerNames fields of VkDeviceCreateInfo are ignored by up-to-date implementations. However, it is still a good idea to set them anyway to be compatible with older implementations:

createInfo.enabledExtensionCount = 0;

if (enableValidationLayers) {
    createInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
    createInfo.ppEnabledLayerNames = validationLayers.data();
} else {
    createInfo.enabledLayerCount = 0;
}

We won't need any device specific extensions for now.

That's it, we're now ready to instantiate the logical device with a call to the appropriately named vkCreateDevice function.

if (vkCreateDevice(physicalDevice, &createInfo, nullptr, &device) != VK_SUCCESS) {
    throw std::runtime_error("failed to create logical device!");
}

The parameters are the physical device to interface with, the queue and usage info we just specified, the optional allocation callbacks pointer and a pointer to a variable to store the logical device handle in. Similarly to the instance creation function, this call can return errors based on enabling non-existent extensions or specifying the desired usage of unsupported features.

The device should be destroyed in cleanup with the vkDestroyDevice function:

void cleanup() {
    vkDestroyDevice(device, nullptr);
    ...
}

Logical devices don't interact directly with instances, which is why it's not included as a parameter.

Retrieving queue handles

The queues are automatically created along with the logical device, but we don't have a handle to interface with them yet. First add a class member to store a handle to the graphics queue:

VkQueue graphicsQueue;

Device queues are implicitly cleaned up when the device is destroyed, so we don't need to do anything in cleanup.

We can use the vkGetDeviceQueue function to retrieve queue handles for each queue family. The parameters are the logical device, queue family, queue index and a pointer to the variable to store the queue handle in. Because we're only creating a single queue from this family, we'll simply use index 0.

vkGetDeviceQueue(device, indices.graphicsFamily.value(), 0, &graphicsQueue);

With the logical device and queue handles we can now actually start using the graphics card to do things! In the next few chapters we'll set up the resources to present results to the window system.

C++ code

Presentation

Window surface

Since Vulkan is a platform agnostic API, it can not interface directly with the window system on its own. To establish the connection between Vulkan and the window system to present results to the screen, we need to use the WSI (Window System Integration) extensions. In this chapter we'll discuss the first one, which is VK_KHR_surface. It exposes a VkSurfaceKHR object that represents an abstract type of surface to present rendered images to. The surface in our program will be backed by the window that we've already opened with GLFW.

The VK_KHR_surface extension is an instance level extension and we've actually already enabled it, because it's included in the list returned by glfwGetRequiredInstanceExtensions. The list also includes some other WSI extensions that we'll use in the next couple of chapters.

The window surface needs to be created right after the instance creation, because it can actually influence the physical device selection. The reason we postponed this is because window surfaces are part of the larger topic of render targets and presentation for which the explanation would have cluttered the basic setup. It should also be noted that window surfaces are an entirely optional component in Vulkan, if you just need off-screen rendering. Vulkan allows you to do that without hacks like creating an invisible window (necessary for OpenGL).

Window surface creation

Start by adding a surface class member right below the debug callback.

VkSurfaceKHR surface;

Although the VkSurfaceKHR object and its usage is platform agnostic, its creation isn't because it depends on window system details. For example, it needs the HWND and HMODULE handles on Windows. Therefore there is a platform-specific addition to the extension, which on Windows is called VK_KHR_win32_surface and is also automatically included in the list from glfwGetRequiredInstanceExtensions.

I will demonstrate how this platform specific extension can be used to create a surface on Windows, but we won't actually use it in this tutorial. It doesn't make any sense to use a library like GLFW and then proceed to use platform-specific code anyway. GLFW actually has glfwCreateWindowSurface that handles the platform differences for us. Still, it's good to see what it does behind the scenes before we start relying on it.

To access native platform functions, you need to update the includes at the top:

#define VK_USE_PLATFORM_WIN32_KHR
#define GLFW_INCLUDE_VULKAN
#include <GLFW/glfw3.h>
#define GLFW_EXPOSE_NATIVE_WIN32
#include <GLFW/glfw3native.h>

Because a window surface is a Vulkan object, it comes with a VkWin32SurfaceCreateInfoKHR struct that needs to be filled in. It has two important parameters: hwnd and hinstance. These are the handles to the window and the process.

VkWin32SurfaceCreateInfoKHR createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_WIN32_SURFACE_CREATE_INFO_KHR;
createInfo.hwnd = glfwGetWin32Window(window);
createInfo.hinstance = GetModuleHandle(nullptr);

The glfwGetWin32Window function is used to get the raw HWND from the GLFW window object. The GetModuleHandle call returns the HINSTANCE handle of the current process.

After that the surface can be created with vkCreateWin32SurfaceKHR, which includes a parameter for the instance, surface creation details, custom allocators and the variable for the surface handle to be stored in. Technically this is a WSI extension function, but it is so commonly used that the standard Vulkan loader includes it, so unlike other extensions you don't need to explicitly load it.

if (vkCreateWin32SurfaceKHR(instance, &createInfo, nullptr, &surface) != VK_SUCCESS) {
    throw std::runtime_error("failed to create window surface!");
}

The process is similar for other platforms like Linux, where vkCreateXcbSurfaceKHR takes an XCB connection and window as creation details with X11.

The glfwCreateWindowSurface function performs exactly this operation with a different implementation for each platform. We'll now integrate it into our program. Add a function createSurface to be called from initVulkan right after instance creation and setupDebugMessenger.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
}

void createSurface() {

}

The GLFW call takes simple parameters instead of a struct which makes the implementation of the function very straightforward:

void createSurface() {
    if (glfwCreateWindowSurface(instance, window, nullptr, &surface) != VK_SUCCESS) {
        throw std::runtime_error("failed to create window surface!");
    }
}

The parameters are the VkInstance, GLFW window pointer, custom allocators and pointer to VkSurfaceKHR variable. It simply passes through the VkResult from the relevant platform call. GLFW doesn't offer a special function for destroying a surface, but that can easily be done through the original API:

void cleanup() {
        ...
        vkDestroySurfaceKHR(instance, surface, nullptr);
        vkDestroyInstance(instance, nullptr);
        ...
    }

Make sure that the surface is destroyed before the instance.

Querying for presentation support

Although the Vulkan implementation may support window system integration, that does not mean that every device in the system supports it. Therefore we need to extend isDeviceSuitable to ensure that a device can present images to the surface we created. Since the presentation is a queue-specific feature, the problem is actually about finding a queue family that supports presenting to the surface we created.

It's actually possible that the queue families supporting drawing commands and the ones supporting presentation do not overlap. Therefore we have to take into account that there could be a distinct presentation queue by modifying the QueueFamilyIndices structure:

struct QueueFamilyIndices {
    std::optional<uint32_t> graphicsFamily;
    std::optional<uint32_t> presentFamily;

    bool isComplete() {
        return graphicsFamily.has_value() && presentFamily.has_value();
    }
};

Next, we'll modify the findQueueFamilies function to look for a queue family that has the capability of presenting to our window surface. The function to check for that is vkGetPhysicalDeviceSurfaceSupportKHR, which takes the physical device, queue family index and surface as parameters. Add a call to it in the same loop as the VK_QUEUE_GRAPHICS_BIT:

VkBool32 presentSupport = false;
vkGetPhysicalDeviceSurfaceSupportKHR(device, i, surface, &presentSupport);

Then simply check the value of the boolean and store the presentation family queue index:

if (presentSupport) {
    indices.presentFamily = i;
}

Note that it's very likely that these end up being the same queue family after all, but throughout the program we will treat them as if they were separate queues for a uniform approach. Nevertheless, you could add logic to explicitly prefer a physical device that supports drawing and presentation in the same queue for improved performance.

Creating the presentation queue

The one thing that remains is modifying the logical device creation procedure to create the presentation queue and retrieve the VkQueue handle. Add a member variable for the handle:

VkQueue presentQueue;

Next, we need to have multiple VkDeviceQueueCreateInfo structs to create a queue from both families. An elegant way to do that is to create a set of all unique queue families that are necessary for the required queues:

#include <set>

...

QueueFamilyIndices indices = findQueueFamilies(physicalDevice);

std::vector<VkDeviceQueueCreateInfo> queueCreateInfos;
std::set<uint32_t> uniqueQueueFamilies = {indices.graphicsFamily.value(), indices.presentFamily.value()};

float queuePriority = 1.0f;
for (uint32_t queueFamily : uniqueQueueFamilies) {
    VkDeviceQueueCreateInfo queueCreateInfo{};
    queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
    queueCreateInfo.queueFamilyIndex = queueFamily;
    queueCreateInfo.queueCount = 1;
    queueCreateInfo.pQueuePriorities = &queuePriority;
    queueCreateInfos.push_back(queueCreateInfo);
}

And modify VkDeviceCreateInfo to point to the vector:

createInfo.queueCreateInfoCount = static_cast<uint32_t>(queueCreateInfos.size());
createInfo.pQueueCreateInfos = queueCreateInfos.data();

If the queue families are the same, then we only need to pass its index once. Finally, add a call to retrieve the queue handle:

vkGetDeviceQueue(device, indices.presentFamily.value(), 0, &presentQueue);

In case the queue families are the same, the two handles will most likely have the same value now. In the next chapter we're going to look at swap chains and how they give us the ability to present images to the surface.

C++ code

Swap chain

Vulkan does not have the concept of a "default framebuffer", hence it requires an infrastructure that will own the buffers we will render to before we visualize them on the screen. This infrastructure is known as the swap chain and must be created explicitly in Vulkan. The swap chain is essentially a queue of images that are waiting to be presented to the screen. Our application will acquire such an image to draw to it, and then return it to the queue. How exactly the queue works and the conditions for presenting an image from the queue depend on how the swap chain is set up, but the general purpose of the swap chain is to synchronize the presentation of images with the refresh rate of the screen.

Checking for swap chain support

Not all graphics cards are capable of presenting images directly to a screen for various reasons, for example because they are designed for servers and don't have any display outputs. Secondly, since image presentation is heavily tied into the window system and the surfaces associated with windows, it is not actually part of the Vulkan core. You have to enable the VK_KHR_swapchain device extension after querying for its support.

For that purpose we'll first extend the isDeviceSuitable function to check if this extension is supported. We've previously seen how to list the extensions that are supported by a VkPhysicalDevice, so doing that should be fairly straightforward. Note that the Vulkan header file provides a nice macro VK_KHR_SWAPCHAIN_EXTENSION_NAME that is defined as VK_KHR_swapchain. The advantage of using this macro is that the compiler will catch misspellings.

First declare a list of required device extensions, similar to the list of validation layers to enable.

const std::vector<const char*> deviceExtensions = {
    VK_KHR_SWAPCHAIN_EXTENSION_NAME
};

Next, create a new function checkDeviceExtensionSupport that is called from isDeviceSuitable as an additional check:

bool isDeviceSuitable(VkPhysicalDevice device) {
    QueueFamilyIndices indices = findQueueFamilies(device);

    bool extensionsSupported = checkDeviceExtensionSupport(device);

    return indices.isComplete() && extensionsSupported;
}

bool checkDeviceExtensionSupport(VkPhysicalDevice device) {
    return true;
}

Modify the body of the function to enumerate the extensions and check if all of the required extensions are amongst them.

bool checkDeviceExtensionSupport(VkPhysicalDevice device) {
    uint32_t extensionCount;
    vkEnumerateDeviceExtensionProperties(device, nullptr, &extensionCount, nullptr);

    std::vector<VkExtensionProperties> availableExtensions(extensionCount);
    vkEnumerateDeviceExtensionProperties(device, nullptr, &extensionCount, availableExtensions.data());

    std::set<std::string> requiredExtensions(deviceExtensions.begin(), deviceExtensions.end());

    for (const auto& extension : availableExtensions) {
        requiredExtensions.erase(extension.extensionName);
    }

    return requiredExtensions.empty();
}

I've chosen to use a set of strings here to represent the unconfirmed required extensions. That way we can easily tick them off while enumerating the sequence of available extensions. Of course you can also use a nested loop like in checkValidationLayerSupport. The performance difference is irrelevant. Now run the code and verify that your graphics card is indeed capable of creating a swap chain. It should be noted that the availability of a presentation queue, as we checked in the previous chapter, implies that the swap chain extension must be supported. However, it's still good to be explicit about things, and the extension does have to be explicitly enabled.

Enabling device extensions

Using a swapchain requires enabling the VK_KHR_swapchain extension first. Enabling the extension just requires a small change to the logical device creation structure:

createInfo.enabledExtensionCount = static_cast<uint32_t>(deviceExtensions.size());
createInfo.ppEnabledExtensionNames = deviceExtensions.data();

Make sure to replace the existing line createInfo.enabledExtensionCount = 0; when you do so.

Querying details of swap chain support

Just checking if a swap chain is available is not sufficient, because it may not actually be compatible with our window surface. Creating a swap chain also involves a lot more settings than instance and device creation, so we need to query for some more details before we're able to proceed.

There are basically three kinds of properties we need to check:

Basic surface capabilities (min/max number of images in swap chain, min/max width and height of images)
Surface formats (pixel format, color space)
Available presentation modes

Similar to findQueueFamilies, we'll use a struct to pass these details around once they've been queried. The three aforementioned types of properties come in the form of the following structs and lists of structs:

struct SwapChainSupportDetails {
    VkSurfaceCapabilitiesKHR capabilities;
    std::vector<VkSurfaceFormatKHR> formats;
    std::vector<VkPresentModeKHR> presentModes;
};

We'll now create a new function querySwapChainSupport that will populate this struct.

SwapChainSupportDetails querySwapChainSupport(VkPhysicalDevice device) {
    SwapChainSupportDetails details;

    return details;
}

This section covers how to query the structs that include this information. The meaning of these structs and exactly which data they contain is discussed in the next section.

Let's start with the basic surface capabilities. These properties are simple to query and are returned into a single VkSurfaceCapabilitiesKHR struct.

vkGetPhysicalDeviceSurfaceCapabilitiesKHR(device, surface, &details.capabilities);

This function takes the specified VkPhysicalDevice and VkSurfaceKHR window surface into account when determining the supported capabilities. All of the support querying functions have these two as first parameters because they are the core components of the swap chain.

The next step is about querying the supported surface formats. Because this is a list of structs, it follows the familiar ritual of 2 function calls:

uint32_t formatCount;
vkGetPhysicalDeviceSurfaceFormatsKHR(device, surface, &formatCount, nullptr);

if (formatCount != 0) {
    details.formats.resize(formatCount);
    vkGetPhysicalDeviceSurfaceFormatsKHR(device, surface, &formatCount, details.formats.data());
}

Make sure that the vector is resized to hold all the available formats. And finally, querying the supported presentation modes works exactly the same way with vkGetPhysicalDeviceSurfacePresentModesKHR:

uint32_t presentModeCount;
vkGetPhysicalDeviceSurfacePresentModesKHR(device, surface, &presentModeCount, nullptr);

if (presentModeCount != 0) {
    details.presentModes.resize(presentModeCount);
    vkGetPhysicalDeviceSurfacePresentModesKHR(device, surface, &presentModeCount, details.presentModes.data());
}

All of the details are in the struct now, so let's extend isDeviceSuitable once more to utilize this function to verify that swap chain support is adequate. Swap chain support is sufficient for this tutorial if there is at least one supported image format and one supported presentation mode given the window surface we have.

bool swapChainAdequate = false;
if (extensionsSupported) {
    SwapChainSupportDetails swapChainSupport = querySwapChainSupport(device);
    swapChainAdequate = !swapChainSupport.formats.empty() && !swapChainSupport.presentModes.empty();
}

It is important that we only try to query for swap chain support after verifying that the extension is available. The last line of the function changes to:

return indices.isComplete() && extensionsSupported && swapChainAdequate;

Choosing the right settings for the swap chain

If the swapChainAdequate conditions were met then the support is definitely sufficient, but there may still be many different modes of varying optimality. We'll now write a couple of functions to find the right settings for the best possible swap chain. There are three types of settings to determine:

Surface format (color depth)
Presentation mode (conditions for "swapping" images to the screen)
Swap extent (resolution of images in swap chain)

For each of these settings we'll have an ideal value in mind that we'll go with if it's available and otherwise we'll create some logic to find the next best thing.

Surface format

The function for this setting starts out like this. We'll later pass the formats member of the SwapChainSupportDetails struct as argument.

VkSurfaceFormatKHR chooseSwapSurfaceFormat(const std::vector<VkSurfaceFormatKHR>& availableFormats) {

}

Each VkSurfaceFormatKHR entry contains a format and a colorSpace member. The format member specifies the color channels and types. For example, VK_FORMAT_B8G8R8A8_SRGB means that we store the B, G, R and alpha channels in that order with an 8 bit unsigned integer for a total of 32 bits per pixel. The colorSpace member indicates if the SRGB color space is supported or not using the VK_COLOR_SPACE_SRGB_NONLINEAR_KHR flag. Note that this flag used to be called VK_COLORSPACE_SRGB_NONLINEAR_KHR in old versions of the specification.

For the color space we'll use SRGB if it is available, because it results in more accurate perceived colors. It is also pretty much the standard color space for images, like the textures we'll use later on. Because of that we should also use an SRGB color format, of which one of the most common ones is VK_FORMAT_B8G8R8A8_SRGB.

Let's go through the list and see if the preferred combination is available:

for (const auto& availableFormat : availableFormats) {
    if (availableFormat.format == VK_FORMAT_B8G8R8A8_SRGB && availableFormat.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) {
        return availableFormat;
    }
}

If that also fails then we could start ranking the available formats based on how "good" they are, but in most cases it's okay to just settle with the first format that is specified.

VkSurfaceFormatKHR chooseSwapSurfaceFormat(const std::vector<VkSurfaceFormatKHR>& availableFormats) {
    for (const auto& availableFormat : availableFormats) {
        if (availableFormat.format == VK_FORMAT_B8G8R8A8_SRGB && availableFormat.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) {
            return availableFormat;
        }
    }

    return availableFormats[0];
}

Presentation mode

The presentation mode is arguably the most important setting for the swap chain, because it represents the actual conditions for showing images to the screen. There are four possible modes available in Vulkan:

VK_PRESENT_MODE_IMMEDIATE_KHR: Images submitted by your application are transferred to the screen right away, which may result in tearing.
VK_PRESENT_MODE_FIFO_KHR: The swap chain is a queue where the display takes an image from the front of the queue when the display is refreshed and the program inserts rendered images at the back of the queue. If the queue is full then the program has to wait. This is most similar to vertical sync as found in modern games. The moment that the display is refreshed is known as "vertical blank".
VK_PRESENT_MODE_FIFO_RELAXED_KHR: This mode only differs from the previous one if the application is late and the queue was empty at the last vertical blank. Instead of waiting for the next vertical blank, the image is transferred right away when it finally arrives. This may result in visible tearing.
VK_PRESENT_MODE_MAILBOX_KHR: This is another variation of the second mode. Instead of blocking the application when the queue is full, the images that are already queued are simply replaced with the newer ones. This mode can be used to render frames as fast as possible while still avoiding tearing, resulting in fewer latency issues than standard vertical sync. This is commonly known as "triple buffering", although the existence of three buffers alone does not necessarily mean that the framerate is unlocked.

Only the VK_PRESENT_MODE_FIFO_KHR mode is guaranteed to be available, so we'll again have to write a function that looks for the best mode that is available:

VkPresentModeKHR chooseSwapPresentMode(const std::vector<VkPresentModeKHR>& availablePresentModes) {
    return VK_PRESENT_MODE_FIFO_KHR;
}

I personally think that VK_PRESENT_MODE_MAILBOX_KHR is a very nice trade-off if energy usage is not a concern. It allows us to avoid tearing while still maintaining a fairly low latency by rendering new images that are as up-to-date as possible right until the vertical blank. On mobile devices, where energy usage is more important, you will probably want to use VK_PRESENT_MODE_FIFO_KHR instead. Now, let's look through the list to see if VK_PRESENT_MODE_MAILBOX_KHR is available:

VkPresentModeKHR chooseSwapPresentMode(const std::vector<VkPresentModeKHR>& availablePresentModes) {
    for (const auto& availablePresentMode : availablePresentModes) {
        if (availablePresentMode == VK_PRESENT_MODE_MAILBOX_KHR) {
            return availablePresentMode;
        }
    }

    return VK_PRESENT_MODE_FIFO_KHR;
}

Swap extent

That leaves only one major property, for which we'll add one last function:

VkExtent2D chooseSwapExtent(const VkSurfaceCapabilitiesKHR& capabilities) {

}

The swap extent is the resolution of the swap chain images and it's almost always exactly equal to the resolution of the window that we're drawing to in pixels (more on that in a moment). The range of the possible resolutions is defined in the VkSurfaceCapabilitiesKHR structure. Vulkan tells us to match the resolution of the window by setting the width and height in the currentExtent member. However, some window managers do allow us to differ here and this is indicated by setting the width and height in currentExtent to a special value: the maximum value of uint32_t. In that case we'll pick the resolution that best matches the window within the minImageExtent and maxImageExtent bounds. But we must specify the resolution in the correct unit.

GLFW uses two units when measuring sizes: pixels and screen coordinates. For example, the resolution {WIDTH, HEIGHT} that we specified earlier when creating the window is measured in screen coordinates. But Vulkan works with pixels, so the swap chain extent must be specified in pixels as well. Unfortunately, if you are using a high DPI display (like Apple's Retina display), screen coordinates don't correspond to pixels. Instead, due to the higher pixel density, the resolution of the window in pixel will be larger than the resolution in screen coordinates. So if Vulkan doesn't fix the swap extent for us, we can't just use the original {WIDTH, HEIGHT}. Instead, we must use glfwGetFramebufferSize to query the resolution of the window in pixel before matching it against the minimum and maximum image extent.

#include <cstdint> // Necessary for uint32_t
#include <limits> // Necessary for std::numeric_limits
#include <algorithm> // Necessary for std::clamp

...

VkExtent2D chooseSwapExtent(const VkSurfaceCapabilitiesKHR& capabilities) {
    if (capabilities.currentExtent.width != std::numeric_limits<uint32_t>::max()) {
        return capabilities.currentExtent;
    } else {
        int width, height;
        glfwGetFramebufferSize(window, &width, &height);

        VkExtent2D actualExtent = {
            static_cast<uint32_t>(width),
            static_cast<uint32_t>(height)
        };

        actualExtent.width = std::clamp(actualExtent.width, capabilities.minImageExtent.width, capabilities.maxImageExtent.width);
        actualExtent.height = std::clamp(actualExtent.height, capabilities.minImageExtent.height, capabilities.maxImageExtent.height);

        return actualExtent;
    }
}

The clamp function is used here to bound the values of width and height between the allowed minimum and maximum extents that are supported by the implementation.

Creating the swap chain

Now that we have all of these helper functions assisting us with the choices we have to make at runtime, we finally have all the information that is needed to create a working swap chain.

Create a createSwapChain function that starts out with the results of these calls and make sure to call it from initVulkan after logical device creation.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
}

void createSwapChain() {
    SwapChainSupportDetails swapChainSupport = querySwapChainSupport(physicalDevice);

    VkSurfaceFormatKHR surfaceFormat = chooseSwapSurfaceFormat(swapChainSupport.formats);
    VkPresentModeKHR presentMode = chooseSwapPresentMode(swapChainSupport.presentModes);
    VkExtent2D extent = chooseSwapExtent(swapChainSupport.capabilities);
}

Aside from these properties we also have to decide how many images we would like to have in the swap chain. The implementation specifies the minimum number that it requires to function:

uint32_t imageCount = swapChainSupport.capabilities.minImageCount;

However, simply sticking to this minimum means that we may sometimes have to wait on the driver to complete internal operations before we can acquire another image to render to. Therefore it is recommended to request at least one more image than the minimum:

uint32_t imageCount = swapChainSupport.capabilities.minImageCount + 1;

We should also make sure to not exceed the maximum number of images while doing this, where 0 is a special value that means that there is no maximum:

if (swapChainSupport.capabilities.maxImageCount > 0 && imageCount > swapChainSupport.capabilities.maxImageCount) {
    imageCount = swapChainSupport.capabilities.maxImageCount;
}

As is tradition with Vulkan objects, creating the swap chain object requires filling in a large structure. It starts out very familiarly:

VkSwapchainCreateInfoKHR createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
createInfo.surface = surface;

After specifying which surface the swap chain should be tied to, the details of the swap chain images are specified:

createInfo.minImageCount = imageCount;
createInfo.imageFormat = surfaceFormat.format;
createInfo.imageColorSpace = surfaceFormat.colorSpace;
createInfo.imageExtent = extent;
createInfo.imageArrayLayers = 1;
createInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;

The imageArrayLayers specifies the amount of layers each image consists of. This is always 1 unless you are developing a stereoscopic 3D application. The imageUsage bit field specifies what kind of operations we'll use the images in the swap chain for. In this tutorial we're going to render directly to them, which means that they're used as color attachment. It is also possible that you'll render images to a separate image first to perform operations like post-processing. In that case you may use a value like VK_IMAGE_USAGE_TRANSFER_DST_BIT instead and use a memory operation to transfer the rendered image to a swap chain image.

QueueFamilyIndices indices = findQueueFamilies(physicalDevice);
uint32_t queueFamilyIndices[] = {indices.graphicsFamily.value(), indices.presentFamily.value()};

if (indices.graphicsFamily != indices.presentFamily) {
    createInfo.imageSharingMode = VK_SHARING_MODE_CONCURRENT;
    createInfo.queueFamilyIndexCount = 2;
    createInfo.pQueueFamilyIndices = queueFamilyIndices;
} else {
    createInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
    createInfo.queueFamilyIndexCount = 0; // Optional
    createInfo.pQueueFamilyIndices = nullptr; // Optional
}

Next, we need to specify how to handle swap chain images that will be used across multiple queue families. That will be the case in our application if the graphics queue family is different from the presentation queue. We'll be drawing on the images in the swap chain from the graphics queue and then submitting them on the presentation queue. There are two ways to handle images that are accessed from multiple queues:

VK_SHARING_MODE_EXCLUSIVE: An image is owned by one queue family at a time and ownership must be explicitly transferred before using it in another queue family. This option offers the best performance.
VK_SHARING_MODE_CONCURRENT: Images can be used across multiple queue families without explicit ownership transfers.

If the queue families differ, then we'll be using the concurrent mode in this tutorial to avoid having to do the ownership chapters, because these involve some concepts that are better explained at a later time. Concurrent mode requires you to specify in advance between which queue families ownership will be shared using the queueFamilyIndexCount and pQueueFamilyIndices parameters. If the graphics queue family and presentation queue family are the same, which will be the case on most hardware, then we should stick to exclusive mode, because concurrent mode requires you to specify at least two distinct queue families.

createInfo.preTransform = swapChainSupport.capabilities.currentTransform;

We can specify that a certain transform should be applied to images in the swap chain if it is supported (supportedTransforms in capabilities), like a 90 degree clockwise rotation or horizontal flip. To specify that you do not want any transformation, simply specify the current transformation.

createInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;

The compositeAlpha field specifies if the alpha channel should be used for blending with other windows in the window system. You'll almost always want to simply ignore the alpha channel, hence VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR.

createInfo.presentMode = presentMode;
createInfo.clipped = VK_TRUE;

The presentMode member speaks for itself. If the clipped member is set to VK_TRUE then that means that we don't care about the color of pixels that are obscured, for example because another window is in front of them. Unless you really need to be able to read these pixels back and get predictable results, you'll get the best performance by enabling clipping.

createInfo.oldSwapchain = VK_NULL_HANDLE;

That leaves one last field, oldSwapchain. With Vulkan it's possible that your swap chain becomes invalid or unoptimized while your application is running, for example because the window was resized. In that case the swap chain actually needs to be recreated from scratch and a reference to the old one must be specified in this field. This is a complex topic that we'll learn more about in a future chapter. For now we'll assume that we'll only ever create one swap chain.

Now add a class member to store the VkSwapchainKHR object:

VkSwapchainKHR swapChain;

Creating the swap chain is now as simple as calling vkCreateSwapchainKHR:

if (vkCreateSwapchainKHR(device, &createInfo, nullptr, &swapChain) != VK_SUCCESS) {
    throw std::runtime_error("failed to create swap chain!");
}

The parameters are the logical device, swap chain creation info, optional custom allocators and a pointer to the variable to store the handle in. No surprises there. It should be cleaned up using vkDestroySwapchainKHR before the device:

void cleanup() {
    vkDestroySwapchainKHR(device, swapChain, nullptr);
    ...
}

Now run the application to ensure that the swap chain is created successfully! If at this point you get an access violation error in vkCreateSwapchainKHR or see a message like Failed to find 'vkGetInstanceProcAddress' in layer SteamOverlayVulkanLayer.dll, then see the FAQ entry about the Steam overlay layer.

Try removing the createInfo.imageExtent = extent; line with validation layers enabled. You'll see that one of the validation layers immediately catches the mistake and a helpful message is printed:

Retrieving the swap chain images

The swap chain has been created now, so all that remains is retrieving the handles of the VkImages in it. We'll reference these during rendering operations in later chapters. Add a class member to store the handles:

std::vector<VkImage> swapChainImages;

The images were created by the implementation for the swap chain and they will be automatically cleaned up once the swap chain has been destroyed, therefore we don't need to add any cleanup code.

I'm adding the code to retrieve the handles to the end of the createSwapChain function, right after the vkCreateSwapchainKHR call. Retrieving them is very similar to the other times where we retrieved an array of objects from Vulkan. Remember that we only specified a minimum number of images in the swap chain, so the implementation is allowed to create a swap chain with more. That's why we'll first query the final number of images with vkGetSwapchainImagesKHR, then resize the container and finally call it again to retrieve the handles.

vkGetSwapchainImagesKHR(device, swapChain, &imageCount, nullptr);
swapChainImages.resize(imageCount);
vkGetSwapchainImagesKHR(device, swapChain, &imageCount, swapChainImages.data());

One last thing, store the format and extent we've chosen for the swap chain images in member variables. We'll need them in future chapters.

VkSwapchainKHR swapChain;
std::vector<VkImage> swapChainImages;
VkFormat swapChainImageFormat;
VkExtent2D swapChainExtent;

...

swapChainImageFormat = surfaceFormat.format;
swapChainExtent = extent;

We now have a set of images that can be drawn onto and can be presented to the window. The next chapter will begin to cover how we can set up the images as render targets and then we start looking into the actual graphics pipeline and drawing commands!

C++ code

Image views

To use any VkImage, including those in the swap chain, in the render pipeline we have to create a VkImageView object. An image view is quite literally a view into an image. It describes how to access the image and which part of the image to access, for example if it should be treated as a 2D texture depth texture without any mipmapping levels.

In this chapter we'll write a createImageViews function that creates a basic image view for every image in the swap chain so that we can use them as color targets later on.

First add a class member to store the image views in:

std::vector<VkImageView> swapChainImageViews;

Create the createImageViews function and call it right after swap chain creation.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
}

void createImageViews() {

}

The first thing we need to do is resize the list to fit all of the image views we'll be creating:

void createImageViews() {
    swapChainImageViews.resize(swapChainImages.size());

}

Next, set up the loop that iterates over all of the swap chain images.

for (size_t i = 0; i < swapChainImages.size(); i++) {

}

The parameters for image view creation are specified in a VkImageViewCreateInfo structure. The first few parameters are straightforward.

VkImageViewCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
createInfo.image = swapChainImages[i];

The viewType and format fields specify how the image data should be interpreted. The viewType parameter allows you to treat images as 1D textures, 2D textures, 3D textures and cube maps.

createInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
createInfo.format = swapChainImageFormat;

The components field allows you to swizzle the color channels around. For example, you can map all of the channels to the red channel for a monochrome texture. You can also map constant values of 0 and 1 to a channel. In our case we'll stick to the default mapping.

createInfo.components.r = VK_COMPONENT_SWIZZLE_IDENTITY;
createInfo.components.g = VK_COMPONENT_SWIZZLE_IDENTITY;
createInfo.components.b = VK_COMPONENT_SWIZZLE_IDENTITY;
createInfo.components.a = VK_COMPONENT_SWIZZLE_IDENTITY;

The subresourceRange field describes what the image's purpose is and which part of the image should be accessed. Our images will be used as color targets without any mipmapping levels or multiple layers.

createInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
createInfo.subresourceRange.baseMipLevel = 0;
createInfo.subresourceRange.levelCount = 1;
createInfo.subresourceRange.baseArrayLayer = 0;
createInfo.subresourceRange.layerCount = 1;

If you were working on a stereographic 3D application, then you would create a swap chain with multiple layers. You could then create multiple image views for each image representing the views for the left and right eyes by accessing different layers.

Creating the image view is now a matter of calling vkCreateImageView:

if (vkCreateImageView(device, &createInfo, nullptr, &swapChainImageViews[i]) != VK_SUCCESS) {
    throw std::runtime_error("failed to create image views!");
}

Unlike images, the image views were explicitly created by us, so we need to add a similar loop to destroy them again at the end of the program:

void cleanup() {
    for (auto imageView : swapChainImageViews) {
        vkDestroyImageView(device, imageView, nullptr);
    }

    ...
}

An image view is sufficient to start using an image as a texture, but it's not quite ready to be used as a render target just yet. That requires one more step of indirection, known as a framebuffer. But first we'll have to set up the graphics pipeline.

C++ code

Graphics pipeline basics

Introduction

Over the course of the next few chapters we'll be setting up a graphics pipeline that is configured to draw our first triangle. The graphics pipeline is the sequence of operations that take the vertices and textures of your meshes all the way to the pixels in the render targets. A simplified overview is displayed below:

The input assembler collects the raw vertex data from the buffers you specify and may also use an index buffer to repeat certain elements without having to duplicate the vertex data itself.

The vertex shader is run for every vertex and generally applies transformations to turn vertex positions from model space to screen space. It also passes per-vertex data down the pipeline.

The tessellation shaders allow you to subdivide geometry based on certain rules to increase the mesh quality. This is often used to make surfaces like brick walls and staircases look less flat when they are nearby.

The geometry shader is run on every primitive (triangle, line, point) and can discard it or output more primitives than came in. This is similar to the tessellation shader, but much more flexible. However, it is not used much in today's applications because the performance is not that good on most graphics cards except for Intel's integrated GPUs.

The rasterization stage discretizes the primitives into fragments. These are the pixel elements that they fill on the framebuffer. Any fragments that fall outside the screen are discarded and the attributes outputted by the vertex shader are interpolated across the fragments, as shown in the figure. Usually the fragments that are behind other primitive fragments are also discarded here because of depth testing.

The fragment shader is invoked for every fragment that survives and determines which framebuffer(s) the fragments are written to and with which color and depth values. It can do this using the interpolated data from the vertex shader, which can include things like texture coordinates and normals for lighting.

The color blending stage applies operations to mix different fragments that map to the same pixel in the framebuffer. Fragments can simply overwrite each other, add up or be mixed based upon transparency.

Stages with a green color are known as fixed-function stages. These stages allow you to tweak their operations using parameters, but the way they work is predefined.

Stages with an orange color on the other hand are programmable, which means that you can upload your own code to the graphics card to apply exactly the operations you want. This allows you to use fragment shaders, for example, to implement anything from texturing and lighting to ray tracers. These programs run on many GPU cores simultaneously to process many objects, like vertices and fragments in parallel.

If you've used older APIs like OpenGL and Direct3D before, then you'll be used to being able to change any pipeline settings at will with calls like glBlendFunc and OMSetBlendState. The graphics pipeline in Vulkan is almost completely immutable, so you must recreate the pipeline from scratch if you want to change shaders, bind different framebuffers or change the blend function. The disadvantage is that you'll have to create a number of pipelines that represent all of the different combinations of states you want to use in your rendering operations. However, because all of the operations you'll be doing in the pipeline are known in advance, the driver can optimize for it much better.

Some of the programmable stages are optional based on what you intend to do. For example, the tessellation and geometry stages can be disabled if you are just drawing simple geometry. If you are only interested in depth values then you can disable the fragment shader stage, which is useful for shadow map generation.

In the next chapter we'll first create the two programmable stages required to put a triangle onto the screen: the vertex shader and fragment shader. The fixed-function configuration like blending mode, viewport, rasterization will be set up in the chapter after that. The final part of setting up the graphics pipeline in Vulkan involves the specification of input and output framebuffers.

Create a createGraphicsPipeline function that is called right after createImageViews in initVulkan. We'll work on this function throughout the following chapters.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createGraphicsPipeline();
}

...

void createGraphicsPipeline() {

}

C++ code

Shader modules

Unlike earlier APIs, shader code in Vulkan has to be specified in a bytecode format as opposed to human-readable syntax like GLSL and HLSL. This bytecode format is called SPIR-V and is designed to be used with both Vulkan and OpenCL (both Khronos APIs). It is a format that can be used to write graphics and compute shaders, but we will focus on shaders used in Vulkan's graphics pipelines in this tutorial.

The advantage of using a bytecode format is that the compilers written by GPU vendors to turn shader code into native code are significantly less complex. The past has shown that with human-readable syntax like GLSL, some GPU vendors were rather flexible with their interpretation of the standard. If you happen to write non-trivial shaders with a GPU from one of these vendors, then you'd risk other vendor's drivers rejecting your code due to syntax errors, or worse, your shader running differently because of compiler bugs. With a straightforward bytecode format like SPIR-V that will hopefully be avoided.

However, that does not mean that we need to write this bytecode by hand. Khronos has released their own vendor-independent compiler that compiles GLSL to SPIR-V. This compiler is designed to verify that your shader code is fully standards compliant and produces one SPIR-V binary that you can ship with your program. You can also include this compiler as a library to produce SPIR-V at runtime, but we won't be doing that in this tutorial. Although we can use this compiler directly via glslangValidator.exe, we will be using glslc.exe by Google instead. The advantage of glslc is that it uses the same parameter format as well-known compilers like GCC and Clang and includes some extra functionality like includes. Both of them are already included in the Vulkan SDK, so you don't need to download anything extra.

GLSL is a shading language with a C-style syntax. Programs written in it have a main function that is invoked for every object. Instead of using parameters for input and a return value as output, GLSL uses global variables to handle input and output. The language includes many features to aid in graphics programming, like built-in vector and matrix primitives. Functions for operations like cross products, matrix-vector products and reflections around a vector are included. The vector type is called vec with a number indicating the amount of elements. For example, a 3D position would be stored in a vec3. It is possible to access single components through members like .x, but it's also possible to create a new vector from multiple components at the same time. For example, the expression vec3(1.0, 2.0, 3.0).xy would result in vec2. The constructors of vectors can also take combinations of vector objects and scalar values. For example, a vec3 can be constructed with vec3(vec2(1.0, 2.0), 3.0).

As the previous chapter mentioned, we need to write a vertex shader and a fragment shader to get a triangle on the screen. The next two sections will cover the GLSL code of each of those and after that I'll show you how to produce two SPIR-V binaries and load them into the program.

Vertex shader

The vertex shader processes each incoming vertex. It takes its attributes, like model space position, color, normal and texture coordinates as input. The output is the final position in clip coordinates and the attributes that need to be passed on to the fragment shader, like color and texture coordinates. These values will then be interpolated over the fragments by the rasterizer to produce a smooth gradient.

A clip coordinate is a four dimensional vector from the vertex shader that is subsequently turned into a normalized device coordinate by dividing the whole vector by its last component. These normalized device coordinates are homogeneous coordinates that map the framebuffer to a [-1, 1] by [-1, 1] coordinate system that looks like the following:

You should already be familiar with these if you have dabbled in computer graphics before. If you have used OpenGL before, then you'll notice that the sign of the Y coordinates is now flipped. The Z coordinate now uses the same range as it does in Direct3D, from 0 to 1.

For our first triangle we won't be applying any transformations, we'll just specify the positions of the three vertices directly as normalized device coordinates to create the following shape:

We can directly output normalized device coordinates by outputting them as clip coordinates from the vertex shader with the last component set to 1. That way the division to transform clip coordinates to normalized device coordinates will not change anything.

Normally these coordinates would be stored in a vertex buffer, but creating a vertex buffer in Vulkan and filling it with data is not trivial. Therefore I've decided to postpone that until after we've had the satisfaction of seeing a triangle pop up on the screen. We're going to do something a little unorthodox in the meanwhile: include the coordinates directly inside the vertex shader. The code looks like this:

#version 450

vec2 positions[3] = vec2[](
    vec2(0.0, -0.5),
    vec2(0.5, 0.5),
    vec2(-0.5, 0.5)
);

void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
}

The main function is invoked for every vertex. The built-in gl_VertexIndex variable contains the index of the current vertex. This is usually an index into the vertex buffer, but in our case it will be an index into a hardcoded array of vertex data. The position of each vertex is accessed from the constant array in the shader and combined with dummy z and w components to produce a position in clip coordinates. The built-in variable gl_Position functions as the output.

Fragment shader

The triangle that is formed by the positions from the vertex shader fills an area on the screen with fragments. The fragment shader is invoked on these fragments to produce a color and depth for the framebuffer (or framebuffers). A simple fragment shader that outputs the color red for the entire triangle looks like this:

#version 450

layout(location = 0) out vec4 outColor;

void main() {
    outColor = vec4(1.0, 0.0, 0.0, 1.0);
}

The main function is called for every fragment just like the vertex shader main function is called for every vertex. Colors in GLSL are 4-component vectors with the R, G, B and alpha channels within the [0, 1] range. Unlike gl_Position in the vertex shader, there is no built-in variable to output a color for the current fragment. You have to specify your own output variable for each framebuffer where the layout(location = 0) modifier specifies the index of the framebuffer. The color red is written to this outColor variable that is linked to the first (and only) framebuffer at index 0.

Per-vertex colors

Making the entire triangle red is not very interesting, wouldn't something like the following look a lot nicer?

We have to make a couple of changes to both shaders to accomplish this. First off, we need to specify a distinct color for each of the three vertices. The vertex shader should now include an array with colors just like it does for positions:

vec3 colors[3] = vec3[](
    vec3(1.0, 0.0, 0.0),
    vec3(0.0, 1.0, 0.0),
    vec3(0.0, 0.0, 1.0)
);

Now we just need to pass these per-vertex colors to the fragment shader so it can output their interpolated values to the framebuffer. Add an output for color to the vertex shader and write to it in the main function:

layout(location = 0) out vec3 fragColor;

void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
    fragColor = colors[gl_VertexIndex];
}

Next, we need to add a matching input in the fragment shader:

layout(location = 0) in vec3 fragColor;

void main() {
    outColor = vec4(fragColor, 1.0);
}

The input variable does not necessarily have to use the same name, they will be linked together using the indexes specified by the location directives. The main function has been modified to output the color along with an alpha value. As shown in the image above, the values for fragColor will be automatically interpolated for the fragments between the three vertices, resulting in a smooth gradient.

Compiling the shaders

Create a directory called shaders in the root directory of your project and store the vertex shader in a file called shader.vert and the fragment shader in a file called shader.frag in that directory. GLSL shaders don't have an official extension, but these two are commonly used to distinguish them.

The contents of shader.vert should be:

#version 450

layout(location = 0) out vec3 fragColor;

vec2 positions[3] = vec2[](
    vec2(0.0, -0.5),
    vec2(0.5, 0.5),
    vec2(-0.5, 0.5)
);

vec3 colors[3] = vec3[](
    vec3(1.0, 0.0, 0.0),
    vec3(0.0, 1.0, 0.0),
    vec3(0.0, 0.0, 1.0)
);

void main() {
    gl_Position = vec4(positions[gl_VertexIndex], 0.0, 1.0);
    fragColor = colors[gl_VertexIndex];
}

And the contents of shader.frag should be:

#version 450

layout(location = 0) in vec3 fragColor;

layout(location = 0) out vec4 outColor;

void main() {
    outColor = vec4(fragColor, 1.0);
}

We're now going to compile these into SPIR-V bytecode using the glslc program.

Windows

Create a compile.bat file with the following contents:

C:/VulkanSDK/x.x.x.x/Bin/glslc.exe shader.vert -o vert.spv
C:/VulkanSDK/x.x.x.x/Bin/glslc.exe shader.frag -o frag.spv
pause

Replace the path to glslc.exe with the path to where you installed the Vulkan SDK. Double click the file to run it.

Linux

Create a compile.sh file with the following contents:

/home/user/VulkanSDK/x.x.x.x/x86_64/bin/glslc shader.vert -o vert.spv
/home/user/VulkanSDK/x.x.x.x/x86_64/bin/glslc shader.frag -o frag.spv

Replace the path to glslc with the path to where you installed the Vulkan SDK. Make the script executable with chmod +x compile.sh and run it.

End of platform-specific instructions

These two commands tell the compiler to read the GLSL source file and output a SPIR-V bytecode file using the -o (output) flag.

If your shader contains a syntax error then the compiler will tell you the line number and problem, as you would expect. Try leaving out a semicolon for example and run the compile script again. Also try running the compiler without any arguments to see what kinds of flags it supports. It can, for example, also output the bytecode into a human-readable format so you can see exactly what your shader is doing and any optimizations that have been applied at this stage.

Compiling shaders on the commandline is one of the most straightforward options and it's the one that we'll use in this tutorial, but it's also possible to compile shaders directly from your own code. The Vulkan SDK includes libshaderc, which is a library to compile GLSL code to SPIR-V from within your program.

Loading a shader

Now that we have a way of producing SPIR-V shaders, it's time to load them into our program to plug them into the graphics pipeline at some point. We'll first write a simple helper function to load the binary data from the files.

#include <fstream>

...

static std::vector<char> readFile(const std::string& filename) {
    std::ifstream file(filename, std::ios::ate | std::ios::binary);

    if (!file.is_open()) {
        throw std::runtime_error("failed to open file!");
    }
}

The readFile function will read all of the bytes from the specified file and return them in a byte array managed by std::vector. We start by opening the file with two flags:

ate: Start reading at the end of the file
binary: Read the file as binary file (avoid text transformations)

The advantage of starting to read at the end of the file is that we can use the read position to determine the size of the file and allocate a buffer:

size_t fileSize = (size_t) file.tellg();
std::vector<char> buffer(fileSize);

After that, we can seek back to the beginning of the file and read all of the bytes at once:

file.seekg(0);
file.read(buffer.data(), fileSize);

And finally close the file and return the bytes:

file.close();

return buffer;

We'll now call this function from createGraphicsPipeline to load the bytecode of the two shaders:

void createGraphicsPipeline() {
    auto vertShaderCode = readFile("shaders/vert.spv");
    auto fragShaderCode = readFile("shaders/frag.spv");
}

Make sure that the shaders are loaded correctly by printing the size of the buffers and checking if they match the actual file size in bytes. Note that the code doesn't need to be null terminated since it's binary code and we will later be explicit about its size.

Creating shader modules

Before we can pass the code to the pipeline, we have to wrap it in a VkShaderModule object. Let's create a helper function createShaderModule to do that.

VkShaderModule createShaderModule(const std::vector<char>& code) {

}

The function will take a buffer with the bytecode as parameter and create a VkShaderModule from it.

Creating a shader module is simple, we only need to specify a pointer to the buffer with the bytecode and the length of it. This information is specified in a VkShaderModuleCreateInfo structure. The one catch is that the size of the bytecode is specified in bytes, but the bytecode pointer is a uint32_t pointer rather than a char pointer. Therefore we will need to cast the pointer with reinterpret_cast as shown below. When you perform a cast like this, you also need to ensure that the data satisfies the alignment requirements of uint32_t. Lucky for us, the data is stored in an std::vector where the default allocator already ensures that the data satisfies the worst case alignment requirements.

VkShaderModuleCreateInfo createInfo{};
createInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
createInfo.codeSize = code.size();
createInfo.pCode = reinterpret_cast<const uint32_t*>(code.data());

The VkShaderModule can then be created with a call to vkCreateShaderModule:

VkShaderModule shaderModule;
if (vkCreateShaderModule(device, &createInfo, nullptr, &shaderModule) != VK_SUCCESS) {
    throw std::runtime_error("failed to create shader module!");
}

The parameters are the same as those in previous object creation functions: the logical device, pointer to create info structure, optional pointer to custom allocators and handle output variable. The buffer with the code can be freed immediately after creating the shader module. Don't forget to return the created shader module:

return shaderModule;

Shader modules are just a thin wrapper around the shader bytecode that we've previously loaded from a file and the functions defined in it. The compilation and linking of the SPIR-V bytecode to machine code for execution by the GPU doesn't happen until the graphics pipeline is created. That means that we're allowed to destroy the shader modules again as soon as pipeline creation is finished, which is why we'll make them local variables in the createGraphicsPipeline function instead of class members:

void createGraphicsPipeline() {
    auto vertShaderCode = readFile("shaders/vert.spv");
    auto fragShaderCode = readFile("shaders/frag.spv");

    VkShaderModule vertShaderModule = createShaderModule(vertShaderCode);
    VkShaderModule fragShaderModule = createShaderModule(fragShaderCode);

The cleanup should then happen at the end of the function by adding two calls to vkDestroyShaderModule. All of the remaining code in this chapter will be inserted before these lines.

    ...
    vkDestroyShaderModule(device, fragShaderModule, nullptr);
    vkDestroyShaderModule(device, vertShaderModule, nullptr);
}

Shader stage creation

To actually use the shaders we'll need to assign them to a specific pipeline stage through VkPipelineShaderStageCreateInfo structures as part of the actual pipeline creation process.

We'll start by filling in the structure for the vertex shader, again in the createGraphicsPipeline function.

VkPipelineShaderStageCreateInfo vertShaderStageInfo{};
vertShaderStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
vertShaderStageInfo.stage = VK_SHADER_STAGE_VERTEX_BIT;

The first step, besides the obligatory sType member, is telling Vulkan in which pipeline stage the shader is going to be used. There is an enum value for each of the programmable stages described in the previous chapter.

vertShaderStageInfo.module = vertShaderModule;
vertShaderStageInfo.pName = "main";

The next two members specify the shader module containing the code, and the function to invoke, known as the entrypoint. That means that it's possible to combine multiple fragment shaders into a single shader module and use different entry points to differentiate between their behaviors. In this case we'll stick to the standard main, however.

There is one more (optional) member, pSpecializationInfo, which we won't be using here, but is worth discussing. It allows you to specify values for shader constants. You can use a single shader module where its behavior can be configured at pipeline creation by specifying different values for the constants used in it. This is more efficient than configuring the shader using variables at render time, because the compiler can do optimizations like eliminating if statements that depend on these values. If you don't have any constants like that, then you can set the member to nullptr, which our struct initialization does automatically.

Modifying the structure to suit the fragment shader is easy:

VkPipelineShaderStageCreateInfo fragShaderStageInfo{};
fragShaderStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
fragShaderStageInfo.stage = VK_SHADER_STAGE_FRAGMENT_BIT;
fragShaderStageInfo.module = fragShaderModule;
fragShaderStageInfo.pName = "main";

Finish by defining an array that contains these two structs, which we'll later use to reference them in the actual pipeline creation step.

VkPipelineShaderStageCreateInfo shaderStages[] = {vertShaderStageInfo, fragShaderStageInfo};

That's all there is to describing the programmable stages of the pipeline. In the next chapter we'll look at the fixed-function stages.

C++ code / Vertex shader / Fragment shader

Fixed functions

The older graphics APIs provided default state for most of the stages of the graphics pipeline. In Vulkan you have to be explicit about most pipeline states as it'll be baked into an immutable pipeline state object. In this chapter we'll fill in all of the structures to configure these fixed-function operations.

Dynamic state

While most of the pipeline state needs to be baked into the pipeline state, a limited amount of the state can actually be changed without recreating the pipeline at draw time. Examples are the size of the viewport, line width and blend constants. If you want to use dynamic state and keep these properties out, then you'll have to fill in a VkPipelineDynamicStateCreateInfo structure like this:

std::vector<VkDynamicState> dynamicStates = {
    VK_DYNAMIC_STATE_VIEWPORT,
    VK_DYNAMIC_STATE_SCISSOR
};

VkPipelineDynamicStateCreateInfo dynamicState{};
dynamicState.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO;
dynamicState.dynamicStateCount = static_cast<uint32_t>(dynamicStates.size());
dynamicState.pDynamicStates = dynamicStates.data();

This will cause the configuration of these values to be ignored and you will be able (and required) to specify the data at drawing time. This results in a more flexible setup and is very common for things like viewport and scissor state, which would result in a more complex setup when being baked into the pipeline state.

Vertex input

The VkPipelineVertexInputStateCreateInfo structure describes the format of the vertex data that will be passed to the vertex shader. It describes this in roughly two ways:

Bindings: spacing between data and whether the data is per-vertex or per-instance (see instancing)
Attribute descriptions: type of the attributes passed to the vertex shader, which binding to load them from and at which offset

Because we're hard coding the vertex data directly in the vertex shader, we'll fill in this structure to specify that there is no vertex data to load for now. We'll get back to it in the vertex buffer chapter.

VkPipelineVertexInputStateCreateInfo vertexInputInfo{};
vertexInputInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_VERTEX_INPUT_STATE_CREATE_INFO;
vertexInputInfo.vertexBindingDescriptionCount = 0;
vertexInputInfo.pVertexBindingDescriptions = nullptr; // Optional
vertexInputInfo.vertexAttributeDescriptionCount = 0;
vertexInputInfo.pVertexAttributeDescriptions = nullptr; // Optional

The pVertexBindingDescriptions and pVertexAttributeDescriptions members point to an array of structs that describe the aforementioned details for loading vertex data. Add this structure to the createGraphicsPipeline function right after the shaderStages array.

Input assembly

The VkPipelineInputAssemblyStateCreateInfo struct describes two things: what kind of geometry will be drawn from the vertices and if primitive restart should be enabled. The former is specified in the topology member and can have values like:

VK_PRIMITIVE_TOPOLOGY_POINT_LIST: points from vertices
VK_PRIMITIVE_TOPOLOGY_LINE_LIST: line from every 2 vertices without reuse
VK_PRIMITIVE_TOPOLOGY_LINE_STRIP: the end vertex of every line is used as start vertex for the next line
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST: triangle from every 3 vertices without reuse
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP : the second and third vertex of every triangle are used as first two vertices of the next triangle

Normally, the vertices are loaded from the vertex buffer by index in sequential order, but with an element buffer you can specify the indices to use yourself. This allows you to perform optimizations like reusing vertices. If you set the primitiveRestartEnable member to VK_TRUE, then it's possible to break up lines and triangles in the _STRIP topology modes by using a special index of 0xFFFF or 0xFFFFFFFF.

We intend to draw triangles throughout this tutorial, so we'll stick to the following data for the structure:

VkPipelineInputAssemblyStateCreateInfo inputAssembly{};
inputAssembly.sType = VK_STRUCTURE_TYPE_PIPELINE_INPUT_ASSEMBLY_STATE_CREATE_INFO;
inputAssembly.topology = VK_PRIMITIVE_TOPOLOGY_TRIANGLE_LIST;
inputAssembly.primitiveRestartEnable = VK_FALSE;

Viewports and scissors

A viewport basically describes the region of the framebuffer that the output will be rendered to. This will almost always be (0, 0) to (width, height) and in this tutorial that will also be the case.

VkViewport viewport{};
viewport.x = 0.0f;
viewport.y = 0.0f;
viewport.width = (float) swapChainExtent.width;
viewport.height = (float) swapChainExtent.height;
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;

Remember that the size of the swap chain and its images may differ from the WIDTH and HEIGHT of the window. The swap chain images will be used as framebuffers later on, so we should stick to their size.

The minDepth and maxDepth values specify the range of depth values to use for the framebuffer. These values must be within the [0.0f, 1.0f] range, but minDepth may be higher than maxDepth. If you aren't doing anything special, then you should stick to the standard values of 0.0f and 1.0f.

While viewports define the transformation from the image to the framebuffer, scissor rectangles define in which regions pixels will actually be stored. Any pixels outside the scissor rectangles will be discarded by the rasterizer. They function like a filter rather than a transformation. The difference is illustrated below. Note that the left scissor rectangle is just one of the many possibilities that would result in that image, as long as it's larger than the viewport.

So if we wanted to draw to the entire framebuffer, we would specify a scissor rectangle that covers it entirely:

VkRect2D scissor{};
scissor.offset = {0, 0};
scissor.extent = swapChainExtent;

Viewport(s) and scissor rectangle(s) can either be specified as a static part of the pipeline or as a dynamic state set in the command buffer. While the former is more in line with the other states it's often convenient to make viewport and scissor state dynamic as it gives you a lot more flexibility. This is very common and all implementations can handle this dynamic state without a performance penalty.

When opting for dynamic viewport(s) and scissor rectangle(s) you need to enable the respective dynamic states for the pipeline:

std::vector<VkDynamicState> dynamicStates = {
    VK_DYNAMIC_STATE_VIEWPORT,
    VK_DYNAMIC_STATE_SCISSOR
};

VkPipelineDynamicStateCreateInfo dynamicState{};
dynamicState.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO;
dynamicState.dynamicStateCount = static_cast<uint32_t>(dynamicStates.size());
dynamicState.pDynamicStates = dynamicStates.data();

And then you only need to specify their count at pipeline creation time:

VkPipelineViewportStateCreateInfo viewportState{};
viewportState.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO;
viewportState.viewportCount = 1;
viewportState.scissorCount = 1;

The actual viewport(s) and scissor rectangle(s) will then later be set up at drawing time.

With dynamic state it's even possible to specify different viewports and or scissor rectangles within a single command buffer.

Without dynamic state, the viewport and scissor rectangle need to be set in the pipeline using the VkPipelineViewportStateCreateInfo struct. This makes the viewport and scissor rectangle for this pipeline immutable. Any changes required to these values would require a new pipeline to be created with the new values.

VkPipelineViewportStateCreateInfo viewportState{};
viewportState.sType = VK_STRUCTURE_TYPE_PIPELINE_VIEWPORT_STATE_CREATE_INFO;
viewportState.viewportCount = 1;
viewportState.pViewports = &viewport;
viewportState.scissorCount = 1;
viewportState.pScissors = &scissor;

Independent of how you set them, it's possible to use multiple viewports and scissor rectangles on some graphics cards, so the structure members reference an array of them. Using multiple requires enabling a GPU feature (see logical device creation).

Rasterizer

The rasterizer takes the geometry that is shaped by the vertices from the vertex shader and turns it into fragments to be colored by the fragment shader. It also performs depth testing, face culling and the scissor test, and it can be configured to output fragments that fill entire polygons or just the edges (wireframe rendering). All this is configured using the VkPipelineRasterizationStateCreateInfo structure.

VkPipelineRasterizationStateCreateInfo rasterizer{};
rasterizer.sType = VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_STATE_CREATE_INFO;
rasterizer.depthClampEnable = VK_FALSE;

If depthClampEnable is set to VK_TRUE, then fragments that are beyond the near and far planes are clamped to them as opposed to discarding them. This is useful in some special cases like shadow maps. Using this requires enabling a GPU feature.

rasterizer.rasterizerDiscardEnable = VK_FALSE;

If rasterizerDiscardEnable is set to VK_TRUE, then geometry never passes through the rasterizer stage. This basically disables any output to the framebuffer.

rasterizer.polygonMode = VK_POLYGON_MODE_FILL;

The polygonMode determines how fragments are generated for geometry. The following modes are available:

VK_POLYGON_MODE_FILL: fill the area of the polygon with fragments
VK_POLYGON_MODE_LINE: polygon edges are drawn as lines
VK_POLYGON_MODE_POINT: polygon vertices are drawn as points

Using any mode other than fill requires enabling a GPU feature.

rasterizer.lineWidth = 1.0f;

The lineWidth member is straightforward, it describes the thickness of lines in terms of number of fragments. The maximum line width that is supported depends on the hardware and any line thicker than 1.0f requires you to enable the wideLines GPU feature.

rasterizer.cullMode = VK_CULL_MODE_BACK_BIT;
rasterizer.frontFace = VK_FRONT_FACE_CLOCKWISE;

The cullMode variable determines the type of face culling to use. You can disable culling, cull the front faces, cull the back faces or both. The frontFace variable specifies the vertex order for faces to be considered front-facing and can be clockwise or counterclockwise.

rasterizer.depthBiasEnable = VK_FALSE;
rasterizer.depthBiasConstantFactor = 0.0f; // Optional
rasterizer.depthBiasClamp = 0.0f; // Optional
rasterizer.depthBiasSlopeFactor = 0.0f; // Optional

The rasterizer can alter the depth values by adding a constant value or biasing them based on a fragment's slope. This is sometimes used for shadow mapping, but we won't be using it. Just set depthBiasEnable to VK_FALSE.

Multisampling

The VkPipelineMultisampleStateCreateInfo struct configures multisampling, which is one of the ways to perform anti-aliasing. It works by combining the fragment shader results of multiple polygons that rasterize to the same pixel. This mainly occurs along edges, which is also where the most noticeable aliasing artifacts occur. Because it doesn't need to run the fragment shader multiple times if only one polygon maps to a pixel, it is significantly less expensive than simply rendering to a higher resolution and then downscaling. Enabling it requires enabling a GPU feature.

VkPipelineMultisampleStateCreateInfo multisampling{};
multisampling.sType = VK_STRUCTURE_TYPE_PIPELINE_MULTISAMPLE_STATE_CREATE_INFO;
multisampling.sampleShadingEnable = VK_FALSE;
multisampling.rasterizationSamples = VK_SAMPLE_COUNT_1_BIT;
multisampling.minSampleShading = 1.0f; // Optional
multisampling.pSampleMask = nullptr; // Optional
multisampling.alphaToCoverageEnable = VK_FALSE; // Optional
multisampling.alphaToOneEnable = VK_FALSE; // Optional

We'll revisit multisampling in later chapter, for now let's keep it disabled.

Depth and stencil testing

If you are using a depth and/or stencil buffer, then you also need to configure the depth and stencil tests using VkPipelineDepthStencilStateCreateInfo. We don't have one right now, so we can simply pass a nullptr instead of a pointer to such a struct. We'll get back to it in the depth buffering chapter.

Color blending

After a fragment shader has returned a color, it needs to be combined with the color that is already in the framebuffer. This transformation is known as color blending and there are two ways to do it:

Mix the old and new value to produce a final color
Combine the old and new value using a bitwise operation

There are two types of structs to configure color blending. The first struct, VkPipelineColorBlendAttachmentState contains the configuration per attached framebuffer and the second struct, VkPipelineColorBlendStateCreateInfo contains the global color blending settings. In our case we only have one framebuffer:

VkPipelineColorBlendAttachmentState colorBlendAttachment{};
colorBlendAttachment.colorWriteMask = VK_COLOR_COMPONENT_R_BIT | VK_COLOR_COMPONENT_G_BIT | VK_COLOR_COMPONENT_B_BIT | VK_COLOR_COMPONENT_A_BIT;
colorBlendAttachment.blendEnable = VK_FALSE;
colorBlendAttachment.srcColorBlendFactor = VK_BLEND_FACTOR_ONE; // Optional
colorBlendAttachment.dstColorBlendFactor = VK_BLEND_FACTOR_ZERO; // Optional
colorBlendAttachment.colorBlendOp = VK_BLEND_OP_ADD; // Optional
colorBlendAttachment.srcAlphaBlendFactor = VK_BLEND_FACTOR_ONE; // Optional
colorBlendAttachment.dstAlphaBlendFactor = VK_BLEND_FACTOR_ZERO; // Optional
colorBlendAttachment.alphaBlendOp = VK_BLEND_OP_ADD; // Optional

This per-framebuffer struct allows you to configure the first way of color blending. The operations that will be performed are best demonstrated using the following pseudocode:

if (blendEnable) {
    finalColor.rgb = (srcColorBlendFactor * newColor.rgb) <colorBlendOp> (dstColorBlendFactor * oldColor.rgb);
    finalColor.a = (srcAlphaBlendFactor * newColor.a) <alphaBlendOp> (dstAlphaBlendFactor * oldColor.a);
} else {
    finalColor = newColor;
}

finalColor = finalColor & colorWriteMask;

If blendEnable is set to VK_FALSE, then the new color from the fragment shader is passed through unmodified. Otherwise, the two mixing operations are performed to compute a new color. The resulting color is AND'd with the colorWriteMask to determine which channels are actually passed through.

The most common way to use color blending is to implement alpha blending, where we want the new color to be blended with the old color based on its opacity. The finalColor should then be computed as follows:

finalColor.rgb = newAlpha * newColor + (1 - newAlpha) * oldColor;
finalColor.a = newAlpha.a;

This can be accomplished with the following parameters:

colorBlendAttachment.blendEnable = VK_TRUE;
colorBlendAttachment.srcColorBlendFactor = VK_BLEND_FACTOR_SRC_ALPHA;
colorBlendAttachment.dstColorBlendFactor = VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA;
colorBlendAttachment.colorBlendOp = VK_BLEND_OP_ADD;
colorBlendAttachment.srcAlphaBlendFactor = VK_BLEND_FACTOR_ONE;
colorBlendAttachment.dstAlphaBlendFactor = VK_BLEND_FACTOR_ZERO;
colorBlendAttachment.alphaBlendOp = VK_BLEND_OP_ADD;

You can find all of the possible operations in the VkBlendFactor and VkBlendOp enumerations in the specification.

The second structure references the array of structures for all of the framebuffers and allows you to set blend constants that you can use as blend factors in the aforementioned calculations.

VkPipelineColorBlendStateCreateInfo colorBlending{};
colorBlending.sType = VK_STRUCTURE_TYPE_PIPELINE_COLOR_BLEND_STATE_CREATE_INFO;
colorBlending.logicOpEnable = VK_FALSE;
colorBlending.logicOp = VK_LOGIC_OP_COPY; // Optional
colorBlending.attachmentCount = 1;
colorBlending.pAttachments = &colorBlendAttachment;
colorBlending.blendConstants[0] = 0.0f; // Optional
colorBlending.blendConstants[1] = 0.0f; // Optional
colorBlending.blendConstants[2] = 0.0f; // Optional
colorBlending.blendConstants[3] = 0.0f; // Optional

If you want to use the second method of blending (bitwise combination), then you should set logicOpEnable to VK_TRUE. The bitwise operation can then be specified in the logicOp field. Note that this will automatically disable the first method, as if you had set blendEnable to VK_FALSE for every attached framebuffer! The colorWriteMask will also be used in this mode to determine which channels in the framebuffer will actually be affected. It is also possible to disable both modes, as we've done here, in which case the fragment colors will be written to the framebuffer unmodified.

Pipeline layout

You can use uniform values in shaders, which are globals similar to dynamic state variables that can be changed at drawing time to alter the behavior of your shaders without having to recreate them. They are commonly used to pass the transformation matrix to the vertex shader, or to create texture samplers in the fragment shader.

These uniform values need to be specified during pipeline creation by creating a VkPipelineLayout object. Even though we won't be using them until a future chapter, we are still required to create an empty pipeline layout.

Create a class member to hold this object, because we'll refer to it from other functions at a later point in time:

VkPipelineLayout pipelineLayout;

And then create the object in the createGraphicsPipeline function:

VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 0; // Optional
pipelineLayoutInfo.pSetLayouts = nullptr; // Optional
pipelineLayoutInfo.pushConstantRangeCount = 0; // Optional
pipelineLayoutInfo.pPushConstantRanges = nullptr; // Optional

if (vkCreatePipelineLayout(device, &pipelineLayoutInfo, nullptr, &pipelineLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create pipeline layout!");
}

The structure also specifies push constants, which are another way of passing dynamic values to shaders that we may get into in a future chapter. The pipeline layout will be referenced throughout the program's lifetime, so it should be destroyed at the end:

void cleanup() {
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    ...
}

Conclusion

That's it for all of the fixed-function state! It's a lot of work to set all of this up from scratch, but the advantage is that we're now nearly fully aware of everything that is going on in the graphics pipeline! This reduces the chance of running into unexpected behavior because the default state of certain components is not what you expect.

There is however one more object to create before we can finally create the graphics pipeline and that is a render pass.

C++ code / Vertex shader / Fragment shader

Render passes

Setup

Before we can finish creating the pipeline, we need to tell Vulkan about the framebuffer attachments that will be used while rendering. We need to specify how many color and depth buffers there will be, how many samples to use for each of them and how their contents should be handled throughout the rendering operations. All of this information is wrapped in a render pass object, for which we'll create a new createRenderPass function. Call this function from initVulkan before createGraphicsPipeline.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
}

...

void createRenderPass() {

}

Attachment description

In our case we'll have just a single color buffer attachment represented by one of the images from the swap chain.

void createRenderPass() {
    VkAttachmentDescription colorAttachment{};
    colorAttachment.format = swapChainImageFormat;
    colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
}

The format of the color attachment should match the format of the swap chain images, and we're not doing anything with multisampling yet, so we'll stick to 1 sample.

colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;

The loadOp and storeOp determine what to do with the data in the attachment before rendering and after rendering. We have the following choices for loadOp:

VK_ATTACHMENT_LOAD_OP_LOAD: Preserve the existing contents of the attachment
VK_ATTACHMENT_LOAD_OP_CLEAR: Clear the values to a constant at the start
VK_ATTACHMENT_LOAD_OP_DONT_CARE: Existing contents are undefined; we don't care about them

In our case we're going to use the clear operation to clear the framebuffer to black before drawing a new frame. There are only two possibilities for the storeOp:

VK_ATTACHMENT_STORE_OP_STORE: Rendered contents will be stored in memory and can be read later
VK_ATTACHMENT_STORE_OP_DONT_CARE: Contents of the framebuffer will be undefined after the rendering operation

We're interested in seeing the rendered triangle on the screen, so we're going with the store operation here.

colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;

The loadOp and storeOp apply to color and depth data, and stencilLoadOp / stencilStoreOp apply to stencil data. Our application won't do anything with the stencil buffer, so the results of loading and storing are irrelevant.

colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
colorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;

Textures and framebuffers in Vulkan are represented by VkImage objects with a certain pixel format, however the layout of the pixels in memory can change based on what you're trying to do with an image.

Some of the most common layouts are:

VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL: Images used as color attachment
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: Images to be presented in the swap chain
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL: Images to be used as destination for a memory copy operation

We'll discuss this topic in more depth in the texturing chapter, but what's important to know right now is that images need to be transitioned to specific layouts that are suitable for the operation that they're going to be involved in next.

The initialLayout specifies which layout the image will have before the render pass begins. The finalLayout specifies the layout to automatically transition to when the render pass finishes. Using VK_IMAGE_LAYOUT_UNDEFINED for initialLayout means that we don't care what previous layout the image was in. The caveat of this special value is that the contents of the image are not guaranteed to be preserved, but that doesn't matter since we're going to clear it anyway. We want the image to be ready for presentation using the swap chain after rendering, which is why we use VK_IMAGE_LAYOUT_PRESENT_SRC_KHR as finalLayout.

Subpasses and attachment references

A single render pass can consist of multiple subpasses. Subpasses are subsequent rendering operations that depend on the contents of framebuffers in previous passes, for example a sequence of post-processing effects that are applied one after another. If you group these rendering operations into one render pass, then Vulkan is able to reorder the operations and conserve memory bandwidth for possibly better performance. For our very first triangle, however, we'll stick to a single subpass.

Every subpass references one or more of the attachments that we've described using the structure in the previous sections. These references are themselves VkAttachmentReference structs that look like this:

VkAttachmentReference colorAttachmentRef{};
colorAttachmentRef.attachment = 0;
colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;

The attachment parameter specifies which attachment to reference by its index in the attachment descriptions array. Our array consists of a single VkAttachmentDescription, so its index is 0. The layout specifies which layout we would like the attachment to have during a subpass that uses this reference. Vulkan will automatically transition the attachment to this layout when the subpass is started. We intend to use the attachment to function as a color buffer and the VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL layout will give us the best performance, as its name implies.

The subpass is described using a VkSubpassDescription structure:

VkSubpassDescription subpass{};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;

Vulkan may also support compute subpasses in the future, so we have to be explicit about this being a graphics subpass. Next, we specify the reference to the color attachment:

subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentRef;

The index of the attachment in this array is directly referenced from the fragment shader with the layout(location = 0) out vec4 outColor directive!

The following other types of attachments can be referenced by a subpass:

pInputAttachments: Attachments that are read from a shader
pResolveAttachments: Attachments used for multisampling color attachments
pDepthStencilAttachment: Attachment for depth and stencil data
pPreserveAttachments: Attachments that are not used by this subpass, but for which the data must be preserved

Render pass

Now that the attachment and a basic subpass referencing it have been described, we can create the render pass itself. Create a new class member variable to hold the VkRenderPass object right above the pipelineLayout variable:

VkRenderPass renderPass;
VkPipelineLayout pipelineLayout;

The render pass object can then be created by filling in the VkRenderPassCreateInfo structure with an array of attachments and subpasses. The VkAttachmentReference objects reference attachments using the indices of this array.

VkRenderPassCreateInfo renderPassInfo{};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassInfo.attachmentCount = 1;
renderPassInfo.pAttachments = &colorAttachment;
renderPassInfo.subpassCount = 1;
renderPassInfo.pSubpasses = &subpass;

if (vkCreateRenderPass(device, &renderPassInfo, nullptr, &renderPass) != VK_SUCCESS) {
    throw std::runtime_error("failed to create render pass!");
}

Just like the pipeline layout, the render pass will be referenced throughout the program, so it should only be cleaned up at the end:

void cleanup() {
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    vkDestroyRenderPass(device, renderPass, nullptr);
    ...
}

That was a lot of work, but in the next chapter it all comes together to finally create the graphics pipeline object!

C++ code / Vertex shader / Fragment shader

Conclusion

We can now combine all of the structures and objects from the previous chapters to create the graphics pipeline! Here's the types of objects we have now, as a quick recap:

Shader stages: the shader modules that define the functionality of the programmable stages of the graphics pipeline
Fixed-function state: all of the structures that define the fixed-function stages of the pipeline, like input assembly, rasterizer, viewport and color blending
Pipeline layout: the uniform and push values referenced by the shader that can be updated at draw time
Render pass: the attachments referenced by the pipeline stages and their usage

All of these combined fully define the functionality of the graphics pipeline, so we can now begin filling in the VkGraphicsPipelineCreateInfo structure at the end of the createGraphicsPipeline function. But before the calls to vkDestroyShaderModule because these are still to be used during the creation.

VkGraphicsPipelineCreateInfo pipelineInfo{};
pipelineInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
pipelineInfo.stageCount = 2;
pipelineInfo.pStages = shaderStages;

We start by referencing the array of VkPipelineShaderStageCreateInfo structs.

pipelineInfo.pVertexInputState = &vertexInputInfo;
pipelineInfo.pInputAssemblyState = &inputAssembly;
pipelineInfo.pViewportState = &viewportState;
pipelineInfo.pRasterizationState = &rasterizer;
pipelineInfo.pMultisampleState = &multisampling;
pipelineInfo.pDepthStencilState = nullptr; // Optional
pipelineInfo.pColorBlendState = &colorBlending;
pipelineInfo.pDynamicState = &dynamicState;

Then we reference all of the structures describing the fixed-function stage.

pipelineInfo.layout = pipelineLayout;

After that comes the pipeline layout, which is a Vulkan handle rather than a struct pointer.

pipelineInfo.renderPass = renderPass;
pipelineInfo.subpass = 0;

And finally we have the reference to the render pass and the index of the sub pass where this graphics pipeline will be used. It is also possible to use other render passes with this pipeline instead of this specific instance, but they have to be compatible with renderPass. The requirements for compatibility are described here, but we won't be using that feature in this tutorial.

pipelineInfo.basePipelineHandle = VK_NULL_HANDLE; // Optional
pipelineInfo.basePipelineIndex = -1; // Optional

There are actually two more parameters: basePipelineHandle and basePipelineIndex. Vulkan allows you to create a new graphics pipeline by deriving from an existing pipeline. The idea of pipeline derivatives is that it is less expensive to set up pipelines when they have much functionality in common with an existing pipeline and switching between pipelines from the same parent can also be done quicker. You can either specify the handle of an existing pipeline with basePipelineHandle or reference another pipeline that is about to be created by index with basePipelineIndex. Right now there is only a single pipeline, so we'll simply specify a null handle and an invalid index. These values are only used if the VK_PIPELINE_CREATE_DERIVATIVE_BIT flag is also specified in the flags field of VkGraphicsPipelineCreateInfo.

Now prepare for the final step by creating a class member to hold the VkPipeline object:

VkPipeline graphicsPipeline;

And finally create the graphics pipeline:

if (vkCreateGraphicsPipelines(device, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &graphicsPipeline) != VK_SUCCESS) {
    throw std::runtime_error("failed to create graphics pipeline!");
}

The vkCreateGraphicsPipelines function actually has more parameters than the usual object creation functions in Vulkan. It is designed to take multiple VkGraphicsPipelineCreateInfo objects and create multiple VkPipeline objects in a single call.

The second parameter, for which we've passed the VK_NULL_HANDLE argument, references an optional VkPipelineCache object. A pipeline cache can be used to store and reuse data relevant to pipeline creation across multiple calls to vkCreateGraphicsPipelines and even across program executions if the cache is stored to a file. This makes it possible to significantly speed up pipeline creation at a later time. We'll get into this in the pipeline cache chapter.

The graphics pipeline is required for all common drawing operations, so it should also only be destroyed at the end of the program:

void cleanup() {
    vkDestroyPipeline(device, graphicsPipeline, nullptr);
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);
    ...
}

Now run your program to confirm that all this hard work has resulted in a successful pipeline creation! We are already getting quite close to seeing something pop up on the screen. In the next couple of chapters we'll set up the actual framebuffers from the swap chain images and prepare the drawing commands.

C++ code / Vertex shader / Fragment shader

Drawing

Framebuffers

We've talked a lot about framebuffers in the past few chapters and we've set up the render pass to expect a single framebuffer with the same format as the swap chain images, but we haven't actually created any yet.

The attachments specified during render pass creation are bound by wrapping them into a VkFramebuffer object. A framebuffer object references all of the VkImageView objects that represent the attachments. In our case that will be only a single one: the color attachment. However, the image that we have to use for the attachment depends on which image the swap chain returns when we retrieve one for presentation. That means that we have to create a framebuffer for all of the images in the swap chain and use the one that corresponds to the retrieved image at drawing time.

To that end, create another std::vector class member to hold the framebuffers:

std::vector<VkFramebuffer> swapChainFramebuffers;

We'll create the objects for this array in a new function createFramebuffers that is called from initVulkan right after creating the graphics pipeline:

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
}

...

void createFramebuffers() {

}

Start by resizing the container to hold all of the framebuffers:

void createFramebuffers() {
    swapChainFramebuffers.resize(swapChainImageViews.size());
}

We'll then iterate through the image views and create framebuffers from them:

for (size_t i = 0; i < swapChainImageViews.size(); i++) {
    VkImageView attachments[] = {
        swapChainImageViews[i]
    };

    VkFramebufferCreateInfo framebufferInfo{};
    framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
    framebufferInfo.renderPass = renderPass;
    framebufferInfo.attachmentCount = 1;
    framebufferInfo.pAttachments = attachments;
    framebufferInfo.width = swapChainExtent.width;
    framebufferInfo.height = swapChainExtent.height;
    framebufferInfo.layers = 1;

    if (vkCreateFramebuffer(device, &framebufferInfo, nullptr, &swapChainFramebuffers[i]) != VK_SUCCESS) {
        throw std::runtime_error("failed to create framebuffer!");
    }
}

As you can see, creation of framebuffers is quite straightforward. We first need to specify with which renderPass the framebuffer needs to be compatible. You can only use a framebuffer with the render passes that it is compatible with, which roughly means that they use the same number and type of attachments.

The attachmentCount and pAttachments parameters specify the VkImageView objects that should be bound to the respective attachment descriptions in the render pass pAttachment array.

The width and height parameters are self-explanatory and layers refers to the number of layers in image arrays. Our swap chain images are single images, so the number of layers is 1.

We should delete the framebuffers before the image views and render pass that they are based on, but only after we've finished rendering:

void cleanup() {
    for (auto framebuffer : swapChainFramebuffers) {
        vkDestroyFramebuffer(device, framebuffer, nullptr);
    }

    ...
}

We've now reached the milestone where we have all of the objects that are required for rendering. In the next chapter we're going to write the first actual drawing commands.

C++ code / Vertex shader / Fragment shader

Command buffers

Commands in Vulkan, like drawing operations and memory transfers, are not executed directly using function calls. You have to record all of the operations you want to perform in command buffer objects. The advantage of this is that when we are ready to tell the Vulkan what we want to do, all of the commands are submitted together and Vulkan can more efficiently process the commands since all of them are available together. In addition, this allows command recording to happen in multiple threads if so desired.

Command pools

We have to create a command pool before we can create command buffers. Command pools manage the memory that is used to store the buffers and command buffers are allocated from them. Add a new class member to store a VkCommandPool:

VkCommandPool commandPool;

Then create a new function createCommandPool and call it from initVulkan after the framebuffers were created.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
}

...

void createCommandPool() {

}

Command pool creation only takes two parameters:

QueueFamilyIndices queueFamilyIndices = findQueueFamilies(physicalDevice);

VkCommandPoolCreateInfo poolInfo{};
poolInfo.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO;
poolInfo.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT;
poolInfo.queueFamilyIndex = queueFamilyIndices.graphicsFamily.value();

There are two possible flags for command pools:

VK_COMMAND_POOL_CREATE_TRANSIENT_BIT: Hint that command buffers are rerecorded with new commands very often (may change memory allocation behavior)
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT: Allow command buffers to be rerecorded individually, without this flag they all have to be reset together

We will be recording a command buffer every frame, so we want to be able to reset and rerecord over it. Thus, we need to set the VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag bit for our command pool.

Command buffers are executed by submitting them on one of the device queues, like the graphics and presentation queues we retrieved. Each command pool can only allocate command buffers that are submitted on a single type of queue. We're going to record commands for drawing, which is why we've chosen the graphics queue family.

if (vkCreateCommandPool(device, &poolInfo, nullptr, &commandPool) != VK_SUCCESS) {
    throw std::runtime_error("failed to create command pool!");
}

Finish creating the command pool using the vkCreateCommandPool function. It doesn't have any special parameters. Commands will be used throughout the program to draw things on the screen, so the pool should only be destroyed at the end:

void cleanup() {
    vkDestroyCommandPool(device, commandPool, nullptr);

    ...
}

Command buffer allocation

We can now start allocating command buffers.

Create a VkCommandBuffer object as a class member. Command buffers will be automatically freed when their command pool is destroyed, so we don't need explicit cleanup.

VkCommandBuffer commandBuffer;

We'll now start working on a createCommandBuffer function to allocate a single command buffer from the command pool.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
    createCommandBuffer();
}

...

void createCommandBuffer() {

}

Command buffers are allocated with the vkAllocateCommandBuffers function, which takes a VkCommandBufferAllocateInfo struct as parameter that specifies the command pool and number of buffers to allocate:

VkCommandBufferAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
allocInfo.commandPool = commandPool;
allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
allocInfo.commandBufferCount = 1;

if (vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate command buffers!");
}

The level parameter specifies if the allocated command buffers are primary or secondary command buffers.

VK_COMMAND_BUFFER_LEVEL_PRIMARY: Can be submitted to a queue for execution, but cannot be called from other command buffers.
VK_COMMAND_BUFFER_LEVEL_SECONDARY: Cannot be submitted directly, but can be called from primary command buffers.

We won't make use of the secondary command buffer functionality here, but you can imagine that it's helpful to reuse common operations from primary command buffers.

Since we are only allocating one command buffer, the commandBufferCount parameter is just one.

Command buffer recording

We'll now start working on the recordCommandBuffer function that writes the commands we want to execute into a command buffer. The VkCommandBuffer used will be passed in as a parameter, as well as the index of the current swapchain image we want to write to.

void recordCommandBuffer(VkCommandBuffer commandBuffer, uint32_t imageIndex) {

}

We always begin recording a command buffer by calling vkBeginCommandBuffer with a small VkCommandBufferBeginInfo structure as argument that specifies some details about the usage of this specific command buffer.

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = 0; // Optional
beginInfo.pInheritanceInfo = nullptr; // Optional

if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
    throw std::runtime_error("failed to begin recording command buffer!");
}

The flags parameter specifies how we're going to use the command buffer. The following values are available:

VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT: The command buffer will be rerecorded right after executing it once.
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT: This is a secondary command buffer that will be entirely within a single render pass.
VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT: The command buffer can be resubmitted while it is also already pending execution.

None of these flags are applicable for us right now.

The pInheritanceInfo parameter is only relevant for secondary command buffers. It specifies which state to inherit from the calling primary command buffers.

If the command buffer was already recorded once, then a call to vkBeginCommandBuffer will implicitly reset it. It's not possible to append commands to a buffer at a later time.

Starting a render pass

Drawing starts by beginning the render pass with vkCmdBeginRenderPass. The render pass is configured using some parameters in a VkRenderPassBeginInfo struct.

VkRenderPassBeginInfo renderPassInfo{};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
renderPassInfo.renderPass = renderPass;
renderPassInfo.framebuffer = swapChainFramebuffers[imageIndex];

The first parameters are the render pass itself and the attachments to bind. We created a framebuffer for each swap chain image where it is specified as a color attachment. Thus we need to bind the framebuffer for the swapchain image we want to draw to. Using the imageIndex parameter which was passed in, we can pick the right framebuffer for the current swapchain image.

renderPassInfo.renderArea.offset = {0, 0};
renderPassInfo.renderArea.extent = swapChainExtent;

The next two parameters define the size of the render area. The render area defines where shader loads and stores will take place. The pixels outside this region will have undefined values. It should match the size of the attachments for best performance.

VkClearValue clearColor = {{{0.0f, 0.0f, 0.0f, 1.0f}}};
renderPassInfo.clearValueCount = 1;
renderPassInfo.pClearValues = &clearColor;

The last two parameters define the clear values to use for VK_ATTACHMENT_LOAD_OP_CLEAR, which we used as load operation for the color attachment. I've defined the clear color to simply be black with 100% opacity.

vkCmdBeginRenderPass(commandBuffer, &renderPassInfo, VK_SUBPASS_CONTENTS_INLINE);

The render pass can now begin. All of the functions that record commands can be recognized by their vkCmd prefix. They all return void, so there will be no error handling until we've finished recording.

The first parameter for every command is always the command buffer to record the command to. The second parameter specifies the details of the render pass we've just provided. The final parameter controls how the drawing commands within the render pass will be provided. It can have one of two values:

VK_SUBPASS_CONTENTS_INLINE: The render pass commands will be embedded in the primary command buffer itself and no secondary command buffers will be executed.
VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS: The render pass commands will be executed from secondary command buffers.

We will not be using secondary command buffers, so we'll go with the first option.

Basic drawing commands

We can now bind the graphics pipeline:

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, graphicsPipeline);

The second parameter specifies if the pipeline object is a graphics or compute pipeline. We've now told Vulkan which operations to execute in the graphics pipeline and which attachment to use in the fragment shader.

As noted in the fixed functions chapter, we did specify viewport and scissor state for this pipeline to be dynamic. So we need to set them in the command buffer before issuing our draw command:

VkViewport viewport{};
viewport.x = 0.0f;
viewport.y = 0.0f;
viewport.width = static_cast<float>(swapChainExtent.width);
viewport.height = static_cast<float>(swapChainExtent.height);
viewport.minDepth = 0.0f;
viewport.maxDepth = 1.0f;
vkCmdSetViewport(commandBuffer, 0, 1, &viewport);

VkRect2D scissor{};
scissor.offset = {0, 0};
scissor.extent = swapChainExtent;
vkCmdSetScissor(commandBuffer, 0, 1, &scissor);

Now we are ready to issue the draw command for the triangle:

vkCmdDraw(commandBuffer, 3, 1, 0, 0);

The actual vkCmdDraw function is a bit anticlimactic, but it's so simple because of all the information we specified in advance. It has the following parameters, aside from the command buffer:

vertexCount: Even though we don't have a vertex buffer, we technically still have 3 vertices to draw.
instanceCount: Used for instanced rendering, use 1 if you're not doing that.
firstVertex: Used as an offset into the vertex buffer, defines the lowest value of gl_VertexIndex.
firstInstance: Used as an offset for instanced rendering, defines the lowest value of gl_InstanceIndex.

Finishing up

The render pass can now be ended:

vkCmdEndRenderPass(commandBuffer);

And we've finished recording the command buffer:

if (vkEndCommandBuffer(commandBuffer) != VK_SUCCESS) {
    throw std::runtime_error("failed to record command buffer!");
}

In the next chapter we'll write the code for the main loop, which will acquire an image from the swap chain, record and execute a command buffer, then return the finished image to the swap chain.

C++ code / Vertex shader / Fragment shader

Rendering and presentation

This is the chapter where everything is going to come together. We're going to write the drawFrame function that will be called from the main loop to put the triangle on the screen. Let's start by creating the function and call it from mainLoop:

void mainLoop() {
    while (!glfwWindowShouldClose(window)) {
        glfwPollEvents();
        drawFrame();
    }
}

...

void drawFrame() {

}

Outline of a frame

At a high level, rendering a frame in Vulkan consists of a common set of steps:

Wait for the previous frame to finish
Acquire an image from the swap chain
Record a command buffer which draws the scene onto that image
Submit the recorded command buffer
Present the swap chain image

While we will expand the drawing function in later chapters, for now this is the core of our render loop.

Synchronization

A core design philosophy in Vulkan is that synchronization of execution on the GPU is explicit. The order of operations is up to us to define using various synchronization primitives which tell the driver the order we want things to run in. This means that many Vulkan API calls which start executing work on the GPU are asynchronous, the functions will return before the operation has finished.

In this chapter there are a number of events that we need to order explicitly because they happen on the GPU, such as:

Acquire an image from the swap chain
Execute commands that draw onto the acquired image
Present that image to the screen for presentation, returning it to the swapchain

Each of these events is set in motion using a single function call, but are all executed asynchronously. The function calls will return before the operations are actually finished and the order of execution is also undefined. That is unfortunate, because each of the operations depends on the previous one finishing. Thus we need to explore which primitives we can use to achieve the desired ordering.

Semaphores

A semaphore is used to add order between queue operations. Queue operations refer to the work we submit to a queue, either in a command buffer or from within a function as we will see later. Examples of queues are the graphics queue and the presentation queue. Semaphores are used both to order work inside the same queue and between different queues.

There happens to be two kinds of semaphores in Vulkan, binary and timeline. Because only binary semaphores will be used in this tutorial, we will not discuss timeline semaphores. Further mention of the term semaphore exclusively refers to binary semaphores.

A semaphore is either unsignaled or signaled. It begins life as unsignaled. The way we use a semaphore to order queue operations is by providing the same semaphore as a 'signal' semaphore in one queue operation and as a 'wait' semaphore in another queue operation. For example, lets say we have semaphore S and queue operations A and B that we want to execute in order. What we tell Vulkan is that operation A will 'signal' semaphore S when it finishes executing, and operation B will 'wait' on semaphore S before it begins executing. When operation A finishes, semaphore S will be signaled, while operation B wont start until S is signaled. After operation B begins executing, semaphore S is automatically reset back to being unsignaled, allowing it to be used again.

Pseudo-code of what was just described:

VkCommandBuffer A, B = ... // record command buffers
VkSemaphore S = ... // create a semaphore

// enqueue A, signal S when done - starts executing immediately
vkQueueSubmit(work: A, signal: S, wait: None)

// enqueue B, wait on S to start
vkQueueSubmit(work: B, signal: None, wait: S)

Note that in this code snippet, both calls to vkQueueSubmit() return immediately - the waiting only happens on the GPU. The CPU continues running without blocking. To make the CPU wait, we need a different synchronization primitive, which we will now describe.

Fences

A fence has a similar purpose, in that it is used to synchronize execution, but it is for ordering the execution on the CPU, otherwise known as the host. Simply put, if the host needs to know when the GPU has finished something, we use a fence.

Similar to semaphores, fences are either in a signaled or unsignaled state. Whenever we submit work to execute, we can attach a fence to that work. When the work is finished, the fence will be signaled. Then we can make the host wait for the fence to be signaled, guaranteeing that the work has finished before the host continues.

A concrete example is taking a screenshot. Say we have already done the necessary work on the GPU. Now need to transfer the image from the GPU over to the host and then save the memory to a file. We have command buffer A which executes the transfer and fence F. We submit command buffer A with fence F, then immediately tell the host to wait for F to signal. This causes the host to block until command buffer A finishes execution. Thus we are safe to let the host save the file to disk, as the memory transfer has completed.

Pseudo-code for what was described:

VkCommandBuffer A = ... // record command buffer with the transfer
VkFence F = ... // create the fence

// enqueue A, start work immediately, signal F when done
vkQueueSubmit(work: A, fence: F)

vkWaitForFence(F) // blocks execution until A has finished executing

save_screenshot_to_disk() // can't run until the transfer has finished

Unlike the semaphore example, this example does block host execution. This means the host won't do anything except wait until execution has finished. For this case, we had to make sure the transfer was complete before we could save the screenshot to disk.

In general, it is preferable to not block the host unless necessary. We want to feed the GPU and the host with useful work to do. Waiting on fences to signal is not useful work. Thus we prefer semaphores, or other synchronization primitives not yet covered, to synchronize our work.

Fences must be reset manually to put them back into the unsignaled state. This is because fences are used to control the execution of the host, and so the host gets to decide when to reset the fence. Contrast this to semaphores which are used to order work on the GPU without the host being involved.

In summary, semaphores are used to specify the execution order of operations on the GPU while fences are used to keep the CPU and GPU in sync with each-other.

What to choose?

We have two synchronization primitives to use and conveniently two places to apply synchronization: Swapchain operations and waiting for the previous frame to finish. We want to use semaphores for swapchain operations because they happen on the GPU, thus we don't want to make the host wait around if we can help it. For waiting on the previous frame to finish, we want to use fences for the opposite reason, because we need the host to wait. This is so we don't draw more than one frame at a time. Because we re-record the command buffer every frame, we cannot record the next frame's work to the command buffer until the current frame has finished executing, as we don't want to overwrite the current contents of the command buffer while the GPU is using it.

Creating the synchronization objects

We'll need one semaphore to signal that an image has been acquired from the swapchain and is ready for rendering, another one to signal that rendering has finished and presentation can happen, and a fence to make sure only one frame is rendering at a time.

Create three class members to store these semaphore objects and fence object:

VkSemaphore imageAvailableSemaphore;
VkSemaphore renderFinishedSemaphore;
VkFence inFlightFence;

To create the semaphores, we'll add the last create function for this part of the tutorial: createSyncObjects:

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
    createCommandBuffer();
    createSyncObjects();
}

...

void createSyncObjects() {

}

Creating semaphores requires filling in the VkSemaphoreCreateInfo, but in the current version of the API it doesn't actually have any required fields besides sType:

void createSyncObjects() {
    VkSemaphoreCreateInfo semaphoreInfo{};
    semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;
}

Future versions of the Vulkan API or extensions may add functionality for the flags and pNext parameters like it does for the other structures.

Creating a fence requires filling in the VkFenceCreateInfo:

VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;

Creating the semaphores and fence follows the familiar pattern with vkCreateSemaphore & vkCreateFence:

if (vkCreateSemaphore(device, &semaphoreInfo, nullptr, &imageAvailableSemaphore) != VK_SUCCESS ||
    vkCreateSemaphore(device, &semaphoreInfo, nullptr, &renderFinishedSemaphore) != VK_SUCCESS ||
    vkCreateFence(device, &fenceInfo, nullptr, &inFlightFence) != VK_SUCCESS) {
    throw std::runtime_error("failed to create semaphores!");
}

The semaphores and fence should be cleaned up at the end of the program, when all commands have finished and no more synchronization is necessary:

void cleanup() {
    vkDestroySemaphore(device, imageAvailableSemaphore, nullptr);
    vkDestroySemaphore(device, renderFinishedSemaphore, nullptr);
    vkDestroyFence(device, inFlightFence, nullptr);

Onto the main drawing function!

Waiting for the previous frame

At the start of the frame, we want to wait until the previous frame has finished, so that the command buffer and semaphores are available to use. To do that, we call vkWaitForFences:

void drawFrame() {
    vkWaitForFences(device, 1, &inFlightFence, VK_TRUE, UINT64_MAX);
}

The vkWaitForFences function takes an array of fences and waits on the host for either any or all of the fences to be signaled before returning. The VK_TRUE we pass here indicates that we want to wait for all fences, but in the case of a single one it doesn't matter. This function also has a timeout parameter that we set to the maximum value of a 64 bit unsigned integer, UINT64_MAX, which effectively disables the timeout.

After waiting, we need to manually reset the fence to the unsignaled state with the vkResetFences call:

    vkResetFences(device, 1, &inFlightFence);

Before we can proceed, there is a slight hiccup in our design. On the first frame we call drawFrame(), which immediately waits on inFlightFence to be signaled. inFlightFence is only signaled after a frame has finished rendering, yet since this is the first frame, there are no previous frames in which to signal the fence! Thus vkWaitForFences() blocks indefinitely, waiting on something which will never happen.

Of the many solutions to this dilemma, there is a clever workaround built into the API. Create the fence in the signaled state, so that the first call to vkWaitForFences() returns immediately since the fence is already signaled.

To do this, we add the VK_FENCE_CREATE_SIGNALED_BIT flag to the VkFenceCreateInfo:

void createSyncObjects() {
    ...

    VkFenceCreateInfo fenceInfo{};
    fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
    fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

    ...
}

Acquiring an image from the swap chain

The next thing we need to do in the drawFrame function is acquire an image from the swap chain. Recall that the swap chain is an extension feature, so we must use a function with the vk*KHR naming convention:

void drawFrame() {
    ...

    uint32_t imageIndex;
    vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphore, VK_NULL_HANDLE, &imageIndex);
}

The first two parameters of vkAcquireNextImageKHR are the logical device and the swap chain from which we wish to acquire an image. The third parameter specifies a timeout in nanoseconds for an image to become available. Using the maximum value of a 64 bit unsigned integer means we effectively disable the timeout.

The next two parameters specify synchronization objects that are to be signaled when the presentation engine is finished using the image. That's the point in time where we can start drawing to it. It is possible to specify a semaphore, fence or both. We're going to use our imageAvailableSemaphore for that purpose here.

The last parameter specifies a variable to output the index of the swap chain image that has become available. The index refers to the VkImage in our swapChainImages array. We're going to use that index to pick the VkFrameBuffer.

Recording the command buffer

With the imageIndex specifying the swap chain image to use in hand, we can now record the command buffer. First, we call vkResetCommandBuffer on the command buffer to make sure it is able to be recorded.

vkResetCommandBuffer(commandBuffer, 0);

The second parameter of vkResetCommandBuffer is a VkCommandBufferResetFlagBits flag. Since we don't want to do anything special, we leave it as 0.

Now call the function recordCommandBuffer to record the commands we want.

recordCommandBuffer(commandBuffer, imageIndex);

With a fully recorded command buffer, we can now submit it.

Submitting the command buffer

Queue submission and synchronization is configured through parameters in the VkSubmitInfo structure.

VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

VkSemaphore waitSemaphores[] = {imageAvailableSemaphore};
VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT};
submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = waitSemaphores;
submitInfo.pWaitDstStageMask = waitStages;

The first three parameters specify which semaphores to wait on before execution begins and in which stage(s) of the pipeline to wait. We want to wait with writing colors to the image until it's available, so we're specifying the stage of the graphics pipeline that writes to the color attachment. That means that theoretically the implementation can already start executing our vertex shader and such while the image is not yet available. Each entry in the waitStages array corresponds to the semaphore with the same index in pWaitSemaphores.

submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;

The next two parameters specify which command buffers to actually submit for execution. We simply submit the single command buffer we have.

VkSemaphore signalSemaphores[] = {renderFinishedSemaphore};
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = signalSemaphores;

The signalSemaphoreCount and pSignalSemaphores parameters specify which semaphores to signal once the command buffer(s) have finished execution. In our case we're using the renderFinishedSemaphore for that purpose.

if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFence) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit draw command buffer!");
}

We can now submit the command buffer to the graphics queue using vkQueueSubmit. The function takes an array of VkSubmitInfo structures as argument for efficiency when the workload is much larger. The last parameter references an optional fence that will be signaled when the command buffers finish execution. This allows us to know when it is safe for the command buffer to be reused, thus we want to give it inFlightFence. Now on the next frame, the CPU will wait for this command buffer to finish executing before it records new commands into it.

Subpass dependencies

Remember that the subpasses in a render pass automatically take care of image layout transitions. These transitions are controlled by subpass dependencies, which specify memory and execution dependencies between subpasses. We have only a single subpass right now, but the operations right before and right after this subpass also count as implicit "subpasses".

There are two built-in dependencies that take care of the transition at the start of the render pass and at the end of the render pass, but the former does not occur at the right time. It assumes that the transition occurs at the start of the pipeline, but we haven't acquired the image yet at that point! There are two ways to deal with this problem. We could change the waitStages for the imageAvailableSemaphore to VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT to ensure that the render passes don't begin until the image is available, or we can make the render pass wait for the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT stage. I've decided to go with the second option here, because it's a good excuse to have a look at subpass dependencies and how they work.

Subpass dependencies are specified in VkSubpassDependency structs. Go to the createRenderPass function and add one:

VkSubpassDependency dependency{};
dependency.srcSubpass = VK_SUBPASS_EXTERNAL;
dependency.dstSubpass = 0;

The first two fields specify the indices of the dependency and the dependent subpass. The special value VK_SUBPASS_EXTERNAL refers to the implicit subpass before or after the render pass depending on whether it is specified in srcSubpass or dstSubpass. The index 0 refers to our subpass, which is the first and only one. The dstSubpass must always be higher than srcSubpass to prevent cycles in the dependency graph (unless one of the subpasses is VK_SUBPASS_EXTERNAL).

dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.srcAccessMask = 0;

The next two fields specify the operations to wait on and the stages in which these operations occur. We need to wait for the swap chain to finish reading from the image before we can access it. This can be accomplished by waiting on the color attachment output stage itself.

dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;

The operations that should wait on this are in the color attachment stage and involve the writing of the color attachment. These settings will prevent the transition from happening until it's actually necessary (and allowed): when we want to start writing colors to it.

renderPassInfo.dependencyCount = 1;
renderPassInfo.pDependencies = &dependency;

The VkRenderPassCreateInfo struct has two fields to specify an array of dependencies.

Presentation

The last step of drawing a frame is submitting the result back to the swap chain to have it eventually show up on the screen. Presentation is configured through a VkPresentInfoKHR structure at the end of the drawFrame function.

VkPresentInfoKHR presentInfo{};
presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;

presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = signalSemaphores;

The first two parameters specify which semaphores to wait on before presentation can happen, just like VkSubmitInfo. Since we want to wait on the command buffer to finish execution, thus our triangle being drawn, we take the semaphores which will be signalled and wait on them, thus we use signalSemaphores.

VkSwapchainKHR swapChains[] = {swapChain};
presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = swapChains;
presentInfo.pImageIndices = &imageIndex;

The next two parameters specify the swap chains to present images to and the index of the image for each swap chain. This will almost always be a single one.

presentInfo.pResults = nullptr; // Optional

There is one last optional parameter called pResults. It allows you to specify an array of VkResult values to check for every individual swap chain if presentation was successful. It's not necessary if you're only using a single swap chain, because you can simply use the return value of the present function.

vkQueuePresentKHR(presentQueue, &presentInfo);

The vkQueuePresentKHR function submits the request to present an image to the swap chain. We'll add error handling for both vkAcquireNextImageKHR and vkQueuePresentKHR in the next chapter, because their failure does not necessarily mean that the program should terminate, unlike the functions we've seen so far.

If you did everything correctly up to this point, then you should now see something resembling the following when you run your program:

This colored triangle may look a bit different from the one you're used to seeing in graphics tutorials. That's because this tutorial lets the shader interpolate in linear color space and converts to sRGB color space afterwards. See this blog post for a discussion of the difference.

Yay! Unfortunately, you'll see that when validation layers are enabled, the program crashes as soon as you close it. The messages printed to the terminal from debugCallback tell us why:

Remember that all of the operations in drawFrame are asynchronous. That means that when we exit the loop in mainLoop, drawing and presentation operations may still be going on. Cleaning up resources while that is happening is a bad idea.

To fix that problem, we should wait for the logical device to finish operations before exiting mainLoop and destroying the window:

void mainLoop() {
    while (!glfwWindowShouldClose(window)) {
        glfwPollEvents();
        drawFrame();
    }

    vkDeviceWaitIdle(device);
}

You can also wait for operations in a specific command queue to be finished with vkQueueWaitIdle. These functions can be used as a very rudimentary way to perform synchronization. You'll see that the program now exits without problems when closing the window.

Conclusion

A little over 900 lines of code later, we've finally gotten to the stage of seeing something pop up on the screen! Bootstrapping a Vulkan program is definitely a lot of work, but the take-away message is that Vulkan gives you an immense amount of control through its explicitness. I recommend you to take some time now to reread the code and build a mental model of the purpose of all of the Vulkan objects in the program and how they relate to each other. We'll be building on top of that knowledge to extend the functionality of the program from this point on.

The next chapter will expand the render loop to handle multiple frames in flight.

C++ code / Vertex shader / Fragment shader

Frames in flight

Right now our render loop has one glaring flaw. We are required to wait on the previous frame to finish before we can start rendering the next which results in unnecessary idling of the host.

The way to fix this is to allow multiple frames to be in-flight at once, that is to say, allow the rendering of one frame to not interfere with the recording of the next. How do we do this? Any resource that is accessed and modified during rendering must be duplicated. Thus, we need multiple command buffers, semaphores, and fences. In later chapters we will also add multiple instances of other resources, so we will see this concept reappear.

Start by adding a constant at the top of the program that defines how many frames should be processed concurrently:

const int MAX_FRAMES_IN_FLIGHT = 2;

We choose the number 2 because we don't want the CPU to get too far ahead of the GPU. With 2 frames in flight, the CPU and the GPU can be working on their own tasks at the same time. If the CPU finishes early, it will wait till the GPU finishes rendering before submitting more work. With 3 or more frames in flight, the CPU could get ahead of the GPU, adding frames of latency. Generally, extra latency isn't desired. But giving the application control over the number of frames in flight is another example of Vulkan being explicit.

Each frame should have its own command buffer, set of semaphores, and fence. Rename and then change them to be std::vectors of the objects:

std::vector<VkCommandBuffer> commandBuffers;

...

std::vector<VkSemaphore> imageAvailableSemaphores;
std::vector<VkSemaphore> renderFinishedSemaphores;
std::vector<VkFence> inFlightFences;

Then we need to create multiple command buffers. Rename createCommandBuffer to createCommandBuffers. Next we need to resize the command buffers vector to the size of MAX_FRAMES_IN_FLIGHT, alter the VkCommandBufferAllocateInfo to contain that many command buffers, and then change the destination to our vector of command buffers:

void createCommandBuffers() {
    commandBuffers.resize(MAX_FRAMES_IN_FLIGHT);
    ...
    allocInfo.commandBufferCount = (uint32_t) commandBuffers.size();

    if (vkAllocateCommandBuffers(device, &allocInfo, commandBuffers.data()) != VK_SUCCESS) {
        throw std::runtime_error("failed to allocate command buffers!");
    }
}

The createSyncObjects function should be changed to create all of the objects:

void createSyncObjects() {
    imageAvailableSemaphores.resize(MAX_FRAMES_IN_FLIGHT);
    renderFinishedSemaphores.resize(MAX_FRAMES_IN_FLIGHT);
    inFlightFences.resize(MAX_FRAMES_IN_FLIGHT);

    VkSemaphoreCreateInfo semaphoreInfo{};
    semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;

    VkFenceCreateInfo fenceInfo{};
    fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
    fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        if (vkCreateSemaphore(device, &semaphoreInfo, nullptr, &imageAvailableSemaphores[i]) != VK_SUCCESS ||
            vkCreateSemaphore(device, &semaphoreInfo, nullptr, &renderFinishedSemaphores[i]) != VK_SUCCESS ||
            vkCreateFence(device, &fenceInfo, nullptr, &inFlightFences[i]) != VK_SUCCESS) {

            throw std::runtime_error("failed to create synchronization objects for a frame!");
        }
    }
}

Similarly, they should also all be cleaned up:

void cleanup() {
    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        vkDestroySemaphore(device, renderFinishedSemaphores[i], nullptr);
        vkDestroySemaphore(device, imageAvailableSemaphores[i], nullptr);
        vkDestroyFence(device, inFlightFences[i], nullptr);
    }

    ...
}

Remember, because command buffers are freed for us when we free the command pool, there is nothing extra to do for command buffer cleanup.

To use the right objects every frame, we need to keep track of the current frame. We will use a frame index for that purpose:

uint32_t currentFrame = 0;

The drawFrame function can now be modified to use the right objects:

void drawFrame() {
    vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);
    vkResetFences(device, 1, &inFlightFences[currentFrame]);

    vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

    ...

    vkResetCommandBuffer(commandBuffers[currentFrame],  0);
    recordCommandBuffer(commandBuffers[currentFrame], imageIndex);

    ...

    submitInfo.pCommandBuffers = &commandBuffers[currentFrame];

    ...

    VkSemaphore waitSemaphores[] = {imageAvailableSemaphores[currentFrame]};

    ...

    VkSemaphore signalSemaphores[] = {renderFinishedSemaphores[currentFrame]};

    ...

    if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]) != VK_SUCCESS) {
}

Of course, we shouldn't forget to advance to the next frame every time:

void drawFrame() {
    ...

    currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;
}

By using the modulo (%) operator, we ensure that the frame index loops around after every MAX_FRAMES_IN_FLIGHT enqueued frames.

We've now implemented all the needed synchronization to ensure that there are no more than MAX_FRAMES_IN_FLIGHT frames of work enqueued and that these frames are not stepping over eachother. Note that it is fine for other parts of the code, like the final cleanup, to rely on more rough synchronization like vkDeviceWaitIdle. You should decide on which approach to use based on performance requirements.

To learn more about synchronization through examples, have a look at this extensive overview by Khronos.

In the next chapter we'll deal with one more small thing that is required for a well-behaved Vulkan program.

C++ code / Vertex shader / Fragment shader

Swap chain recreation

Introduction

The application we have now successfully draws a triangle, but there are some circumstances that it isn't handling properly yet. It is possible for the window surface to change such that the swap chain is no longer compatible with it. One of the reasons that could cause this to happen is the size of the window changing. We have to catch these events and recreate the swap chain.

Recreating the swap chain

Create a new recreateSwapChain function that calls createSwapChain and all of the creation functions for the objects that depend on the swap chain or the window size.

void recreateSwapChain() {
    vkDeviceWaitIdle(device);

    createSwapChain();
    createImageViews();
    createFramebuffers();
}

We first call vkDeviceWaitIdle, because just like in the last chapter, we shouldn't touch resources that may still be in use. Obviously, we'll have to recreate the swap chain itself. The image views need to be recreated because they are based directly on the swap chain images. Finally, the framebuffers directly depend on the swap chain images, and thus must be recreated as well.

To make sure that the old versions of these objects are cleaned up before recreating them, we should move some of the cleanup code to a separate function that we can call from the recreateSwapChain function. Let's call it cleanupSwapChain:

void cleanupSwapChain() {

}

void recreateSwapChain() {
    vkDeviceWaitIdle(device);

    cleanupSwapChain();

    createSwapChain();
    createImageViews();
    createFramebuffers();
}

Note that we don't recreate the renderpass here for simplicity. In theory it can be possible for the swap chain image format to change during an applications' lifetime, e.g. when moving a window from an standard range to an high dynamic range monitor. This may require the application to recreate the renderpass to make sure the change between dynamic ranges is properly reflected.

We'll move the cleanup code of all objects that are recreated as part of a swap chain refresh from cleanup to cleanupSwapChain:

void cleanupSwapChain() {
    for (auto framebuffer : swapChainFramebuffers) {
        vkDestroyFramebuffer(device, framebuffer, nullptr);
    }

    for (auto imageView : swapChainImageViews) {
        vkDestroyImageView(device, imageView, nullptr);
    }

    vkDestroySwapchainKHR(device, swapChain, nullptr);
}

void cleanup() {
    cleanupSwapChain();

    vkDestroyPipeline(device, graphicsPipeline, nullptr);
    vkDestroyPipelineLayout(device, pipelineLayout, nullptr);

    vkDestroyRenderPass(device, renderPass, nullptr);

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        vkDestroySemaphore(device, renderFinishedSemaphores[i], nullptr);
        vkDestroySemaphore(device, imageAvailableSemaphores[i], nullptr);
        vkDestroyFence(device, inFlightFences[i], nullptr);
    }

    vkDestroyCommandPool(device, commandPool, nullptr);

    vkDestroyDevice(device, nullptr);

    if (enableValidationLayers) {
        DestroyDebugUtilsMessengerEXT(instance, debugMessenger, nullptr);
    }

    vkDestroySurfaceKHR(instance, surface, nullptr);
    vkDestroyInstance(instance, nullptr);

    glfwDestroyWindow(window);

    glfwTerminate();
}

Note that in chooseSwapExtent we already query the new window resolution to make sure that the swap chain images have the (new) right size, so there's no need to modify chooseSwapExtent (remember that we already had to use glfwGetFramebufferSize to get the resolution of the surface in pixels when creating the swap chain).

That's all it takes to recreate the swap chain! However, the disadvantage of this approach is that we need to stop all rendering before creating the new swap chain. It is possible to create a new swap chain while drawing commands on an image from the old swap chain are still in-flight. You need to pass the previous swap chain to the oldSwapChain field in the VkSwapchainCreateInfoKHR struct and destroy the old swap chain as soon as you've finished using it.

Suboptimal or out-of-date swap chain

Now we just need to figure out when swap chain recreation is necessary and call our new recreateSwapChain function. Luckily, Vulkan will usually just tell us that the swap chain is no longer adequate during presentation. The vkAcquireNextImageKHR and vkQueuePresentKHR functions can return the following special values to indicate this.

VK_ERROR_OUT_OF_DATE_KHR: The swap chain has become incompatible with the surface and can no longer be used for rendering. Usually happens after a window resize.
VK_SUBOPTIMAL_KHR: The swap chain can still be used to successfully present to the surface, but the surface properties are no longer matched exactly.

VkResult result = vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

if (result == VK_ERROR_OUT_OF_DATE_KHR) {
    recreateSwapChain();
    return;
} else if (result != VK_SUCCESS && result != VK_SUBOPTIMAL_KHR) {
    throw std::runtime_error("failed to acquire swap chain image!");
}

If the swap chain turns out to be out of date when attempting to acquire an image, then it is no longer possible to present to it. Therefore we should immediately recreate the swap chain and try again in the next drawFrame call.

You could also decide to do that if the swap chain is suboptimal, but I've chosen to proceed anyway in that case because we've already acquired an image. Both VK_SUCCESS and VK_SUBOPTIMAL_KHR are considered "success" return codes.

result = vkQueuePresentKHR(presentQueue, &presentInfo);

if (result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR) {
    recreateSwapChain();
} else if (result != VK_SUCCESS) {
    throw std::runtime_error("failed to present swap chain image!");
}

currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;

The vkQueuePresentKHR function returns the same values with the same meaning. In this case we will also recreate the swap chain if it is suboptimal, because we want the best possible result.

Fixing a deadlock

If we try to run the code now, it is possible to encounter a deadlock. Debugging the code, we find that the application reaches vkWaitForFences but never continues past it. This is because when vkAcquireNextImageKHR returns VK_ERROR_OUT_OF_DATE_KHR, we recreate the swapchain and then return from drawFrame. But before that happens, the current frame's fence was waited upon and reset. Since we return immediately, no work is submitted for execution and the fence will never be signaled, causing vkWaitForFences to halt forever.

There is a simple fix thankfully. Delay resetting the fence until after we know for sure we will be submitting work with it. Thus, if we return early, the fence is still signaled and vkWaitForFences wont deadlock the next time we use the same fence object.

The beginning of drawFrame should now look like this:

vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);

uint32_t imageIndex;
VkResult result = vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], VK_NULL_HANDLE, &imageIndex);

if (result == VK_ERROR_OUT_OF_DATE_KHR) {
    recreateSwapChain();
    return;
} else if (result != VK_SUCCESS && result != VK_SUBOPTIMAL_KHR) {
    throw std::runtime_error("failed to acquire swap chain image!");
}

// Only reset the fence if we are submitting work
vkResetFences(device, 1, &inFlightFences[currentFrame]);

Handling resizes explicitly

Although many drivers and platforms trigger VK_ERROR_OUT_OF_DATE_KHR automatically after a window resize, it is not guaranteed to happen. That's why we'll add some extra code to also handle resizes explicitly. First add a new member variable that flags that a resize has happened:

std::vector<VkFence> inFlightFences;

bool framebufferResized = false;

The drawFrame function should then be modified to also check for this flag:

if (result == VK_ERROR_OUT_OF_DATE_KHR || result == VK_SUBOPTIMAL_KHR || framebufferResized) {
    framebufferResized = false;
    recreateSwapChain();
} else if (result != VK_SUCCESS) {
    ...
}

It is important to do this after vkQueuePresentKHR to ensure that the semaphores are in a consistent state, otherwise a signaled semaphore may never be properly waited upon. Now to actually detect resizes we can use the glfwSetFramebufferSizeCallback function in the GLFW framework to set up a callback:

void initWindow() {
    glfwInit();

    glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);

    window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
    glfwSetFramebufferSizeCallback(window, framebufferResizeCallback);
}

static void framebufferResizeCallback(GLFWwindow* window, int width, int height) {

}

The reason that we're creating a static function as a callback is because GLFW does not know how to properly call a member function with the right this pointer to our HelloTriangleApplication instance.

However, we do get a reference to the GLFWwindow in the callback and there is another GLFW function that allows you to store an arbitrary pointer inside of it: glfwSetWindowUserPointer:

window = glfwCreateWindow(WIDTH, HEIGHT, "Vulkan", nullptr, nullptr);
glfwSetWindowUserPointer(window, this);
glfwSetFramebufferSizeCallback(window, framebufferResizeCallback);

This value can now be retrieved from within the callback with glfwGetWindowUserPointer to properly set the flag:

static void framebufferResizeCallback(GLFWwindow* window, int width, int height) {
    auto app = reinterpret_cast<HelloTriangleApplication*>(glfwGetWindowUserPointer(window));
    app->framebufferResized = true;
}

Now try to run the program and resize the window to see if the framebuffer is indeed resized properly with the window.

Handling minimization

There is another case where a swap chain may become out of date and that is a special kind of window resizing: window minimization. This case is special because it will result in a frame buffer size of 0. In this tutorial we will handle that by pausing until the window is in the foreground again by extending the recreateSwapChain function:

void recreateSwapChain() {
    int width = 0, height = 0;
    glfwGetFramebufferSize(window, &width, &height);
    while (width == 0 || height == 0) {
        glfwGetFramebufferSize(window, &width, &height);
        glfwWaitEvents();
    }

    vkDeviceWaitIdle(device);

    ...
}

The initial call to glfwGetFramebufferSize handles the case where the size is already correct and glfwWaitEvents would have nothing to wait on.

Congratulations, you've now finished your very first well-behaved Vulkan program! In the next chapter we're going to get rid of the hardcoded vertices in the vertex shader and actually use a vertex buffer.

C++ code / Vertex shader / Fragment shader

Vertex buffers

Vertex input description

Introduction

In the next few chapters, we're going to replace the hardcoded vertex data in the vertex shader with a vertex buffer in memory. We'll start with the easiest approach of creating a CPU visible buffer and using memcpy to copy the vertex data into it directly, and after that we'll see how to use a staging buffer to copy the vertex data to high performance memory.

Vertex shader

First change the vertex shader to no longer include the vertex data in the shader code itself. The vertex shader takes input from a vertex buffer using the in keyword.

#version 450

layout(location = 0) in vec2 inPosition;
layout(location = 1) in vec3 inColor;

layout(location = 0) out vec3 fragColor;

void main() {
    gl_Position = vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
}

The inPosition and inColor variables are vertex attributes. They're properties that are specified per-vertex in the vertex buffer, just like we manually specified a position and color per vertex using the two arrays. Make sure to recompile the vertex shader!

Just like fragColor, the layout(location = x) annotations assign indices to the inputs that we can later use to reference them. It is important to know that some types, like dvec3 64 bit vectors, use multiple slots. That means that the index after it must be at least 2 higher:

layout(location = 0) in dvec3 inPosition;
layout(location = 2) in vec3 inColor;

You can find more info about the layout qualifier in the OpenGL wiki.

Vertex data

We're moving the vertex data from the shader code to an array in the code of our program. Start by including the GLM library, which provides us with linear algebra related types like vectors and matrices. We're going to use these types to specify the position and color vectors.

#include <glm/glm.hpp>

Create a new structure called Vertex with the two attributes that we're going to use in the vertex shader inside it:

struct Vertex {
    glm::vec2 pos;
    glm::vec3 color;
};

GLM conveniently provides us with C++ types that exactly match the vector types used in the shader language.

const std::vector<Vertex> vertices = {
    {{0.0f, -0.5f}, {1.0f, 0.0f, 0.0f}},
    {{0.5f, 0.5f}, {0.0f, 1.0f, 0.0f}},
    {{-0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}}
};

Now use the Vertex structure to specify an array of vertex data. We're using exactly the same position and color values as before, but now they're combined into one array of vertices. This is known as interleaving vertex attributes.

Binding descriptions

The next step is to tell Vulkan how to pass this data format to the vertex shader once it's been uploaded into GPU memory. There are two types of structures needed to convey this information.

The first structure is VkVertexInputBindingDescription and we'll add a member function to the Vertex struct to populate it with the right data.

struct Vertex {
    glm::vec2 pos;
    glm::vec3 color;

    static VkVertexInputBindingDescription getBindingDescription() {
        VkVertexInputBindingDescription bindingDescription{};

        return bindingDescription;
    }
};

A vertex binding describes at which rate to load data from memory throughout the vertices. It specifies the number of bytes between data entries and whether to move to the next data entry after each vertex or after each instance.

VkVertexInputBindingDescription bindingDescription{};
bindingDescription.binding = 0;
bindingDescription.stride = sizeof(Vertex);
bindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;

All of our per-vertex data is packed together in one array, so we're only going to have one binding. The binding parameter specifies the index of the binding in the array of bindings. The stride parameter specifies the number of bytes from one entry to the next, and the inputRate parameter can have one of the following values:

VK_VERTEX_INPUT_RATE_VERTEX: Move to the next data entry after each vertex
VK_VERTEX_INPUT_RATE_INSTANCE: Move to the next data entry after each instance

We're not going to use instanced rendering, so we'll stick to per-vertex data.

Attribute descriptions

The second structure that describes how to handle vertex input is VkVertexInputAttributeDescription. We're going to add another helper function to Vertex to fill in these structs.

#include <array>

...

static std::array<VkVertexInputAttributeDescription, 2> getAttributeDescriptions() {
    std::array<VkVertexInputAttributeDescription, 2> attributeDescriptions{};

    return attributeDescriptions;
}

As the function prototype indicates, there are going to be two of these structures. An attribute description struct describes how to extract a vertex attribute from a chunk of vertex data originating from a binding description. We have two attributes, position and color, so we need two attribute description structs.

attributeDescriptions[0].binding = 0;
attributeDescriptions[0].location = 0;
attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
attributeDescriptions[0].offset = offsetof(Vertex, pos);

The binding parameter tells Vulkan from which binding the per-vertex data comes. The location parameter references the location directive of the input in the vertex shader. The input in the vertex shader with location 0 is the position, which has two 32-bit float components.

The format parameter describes the type of data for the attribute. A bit confusingly, the formats are specified using the same enumeration as color formats. The following shader types and formats are commonly used together:

float: VK_FORMAT_R32_SFLOAT
vec2: VK_FORMAT_R32G32_SFLOAT
vec3: VK_FORMAT_R32G32B32_SFLOAT
vec4: VK_FORMAT_R32G32B32A32_SFLOAT

As you can see, you should use the format where the amount of color channels matches the number of components in the shader data type. It is allowed to use more channels than the number of components in the shader, but they will be silently discarded. If the number of channels is lower than the number of components, then the BGA components will use default values of (0, 0, 1). The color type (SFLOAT, UINT, SINT) and bit width should also match the type of the shader input. See the following examples:

ivec2: VK_FORMAT_R32G32_SINT, a 2-component vector of 32-bit signed integers
uvec4: VK_FORMAT_R32G32B32A32_UINT, a 4-component vector of 32-bit unsigned integers
double: VK_FORMAT_R64_SFLOAT, a double-precision (64-bit) float

The format parameter implicitly defines the byte size of attribute data and the offset parameter specifies the number of bytes since the start of the per-vertex data to read from. The binding is loading one Vertex at a time and the position attribute (pos) is at an offset of 0 bytes from the beginning of this struct. This is automatically calculated using the offsetof macro.

attributeDescriptions[1].binding = 0;
attributeDescriptions[1].location = 1;
attributeDescriptions[1].format = VK_FORMAT_R32G32B32_SFLOAT;
attributeDescriptions[1].offset = offsetof(Vertex, color);

The color attribute is described in much the same way.

Pipeline vertex input

We now need to set up the graphics pipeline to accept vertex data in this format by referencing the structures in createGraphicsPipeline. Find the vertexInputInfo struct and modify it to reference the two descriptions:

auto bindingDescription = Vertex::getBindingDescription();
auto attributeDescriptions = Vertex::getAttributeDescriptions();

vertexInputInfo.vertexBindingDescriptionCount = 1;
vertexInputInfo.vertexAttributeDescriptionCount = static_cast<uint32_t>(attributeDescriptions.size());
vertexInputInfo.pVertexBindingDescriptions = &bindingDescription;
vertexInputInfo.pVertexAttributeDescriptions = attributeDescriptions.data();

The pipeline is now ready to accept vertex data in the format of the vertices container and pass it on to our vertex shader. If you run the program now with validation layers enabled, you'll see that it complains that there is no vertex buffer bound to the binding. The next step is to create a vertex buffer and move the vertex data to it so the GPU is able to access it.

C++ code / Vertex shader / Fragment shader

Vertex buffer creation

Introduction

Buffers in Vulkan are regions of memory used for storing arbitrary data that can be read by the graphics card. They can be used to store vertex data, which we'll do in this chapter, but they can also be used for many other purposes that we'll explore in future chapters. Unlike the Vulkan objects we've been dealing with so far, buffers do not automatically allocate memory for themselves. The work from the previous chapters has shown that the Vulkan API puts the programmer in control of almost everything and memory management is one of those things.

Buffer creation

Create a new function createVertexBuffer and call it from initVulkan right before createCommandBuffers.

void initVulkan() {
    createInstance();
    setupDebugMessenger();
    createSurface();
    pickPhysicalDevice();
    createLogicalDevice();
    createSwapChain();
    createImageViews();
    createRenderPass();
    createGraphicsPipeline();
    createFramebuffers();
    createCommandPool();
    createVertexBuffer();
    createCommandBuffers();
    createSyncObjects();
}

...

void createVertexBuffer() {

}

Creating a buffer requires us to fill a VkBufferCreateInfo structure.

VkBufferCreateInfo bufferInfo{};
bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
bufferInfo.size = sizeof(vertices[0]) * vertices.size();

The first field of the struct is size, which specifies the size of the buffer in bytes. Calculating the byte size of the vertex data is straightforward with sizeof.

bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT;

The second field is usage, which indicates for which purposes the data in the buffer is going to be used. It is possible to specify multiple purposes using a bitwise or. Our use case will be a vertex buffer, we'll look at other types of usage in future chapters.

bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

Just like the images in the swap chain, buffers can also be owned by a specific queue family or be shared between multiple at the same time. The buffer will only be used from the graphics queue, so we can stick to exclusive access.

The flags parameter is used to configure sparse buffer memory, which is not relevant right now. We'll leave it at the default value of 0.

We can now create the buffer with vkCreateBuffer. Define a class member to hold the buffer handle and call it vertexBuffer.

VkBuffer vertexBuffer;

...

void createVertexBuffer() {
    VkBufferCreateInfo bufferInfo{};
    bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
    bufferInfo.size = sizeof(vertices[0]) * vertices.size();
    bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT;
    bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

    if (vkCreateBuffer(device, &bufferInfo, nullptr, &vertexBuffer) != VK_SUCCESS) {
        throw std::runtime_error("failed to create vertex buffer!");
    }
}

The buffer should be available for use in rendering commands until the end of the program and it does not depend on the swap chain, so we'll clean it up in the original cleanup function:

void cleanup() {
    cleanupSwapChain();

    vkDestroyBuffer(device, vertexBuffer, nullptr);

    ...
}

Memory requirements

The buffer has been created, but it doesn't actually have any memory assigned to it yet. The first step of allocating memory for the buffer is to query its memory requirements using the aptly named vkGetBufferMemoryRequirements function.

VkMemoryRequirements memRequirements;
vkGetBufferMemoryRequirements(device, vertexBuffer, &memRequirements);

The VkMemoryRequirements struct has three fields:

size: The size of the required amount of memory in bytes, may differ from bufferInfo.size.
alignment: The offset in bytes where the buffer begins in the allocated region of memory, depends on bufferInfo.usage and bufferInfo.flags.
memoryTypeBits: Bit field of the memory types that are suitable for the buffer.

Graphics cards can offer different types of memory to allocate from. Each type of memory varies in terms of allowed operations and performance characteristics. We need to combine the requirements of the buffer and our own application requirements to find the right type of memory to use. Let's create a new function findMemoryType for this purpose.

uint32_t findMemoryType(uint32_t typeFilter, VkMemoryPropertyFlags properties) {

}

First we need to query info about the available types of memory using vkGetPhysicalDeviceMemoryProperties.

VkPhysicalDeviceMemoryProperties memProperties;
vkGetPhysicalDeviceMemoryProperties(physicalDevice, &memProperties);

The VkPhysicalDeviceMemoryProperties structure has two arrays memoryTypes and memoryHeaps. Memory heaps are distinct memory resources like dedicated VRAM and swap space in RAM for when VRAM runs out. The different types of memory exist within these heaps. Right now we'll only concern ourselves with the type of memory and not the heap it comes from, but you can imagine that this can affect performance.

Let's first find a memory type that is suitable for the buffer itself:

for (uint32_t i = 0; i < memProperties.memoryTypeCount; i++) {
    if (typeFilter & (1 << i)) {
        return i;
    }
}

throw std::runtime_error("failed to find suitable memory type!");

The typeFilter parameter will be used to specify the bit field of memory types that are suitable. That means that we can find the index of a suitable memory type by simply iterating over them and checking if the corresponding bit is set to 1.

However, we're not just interested in a memory type that is suitable for the vertex buffer. We also need to be able to write our vertex data to that memory. The memoryTypes array consists of VkMemoryType structs that specify the heap and properties of each type of memory. The properties define special features of the memory, like being able to map it so we can write to it from the CPU. This property is indicated with VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, but we also need to use the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT property. We'll see why when we map the memory.

We can now modify the loop to also check for the support of this property:

for (uint32_t i = 0; i < memProperties.memoryTypeCount; i++) {
    if ((typeFilter & (1 << i)) && (memProperties.memoryTypes[i].propertyFlags & properties) == properties) {
        return i;
    }
}

We may have more than one desirable property, so we should check if the result of the bitwise AND is not just non-zero, but equal to the desired properties bit field. If there is a memory type suitable for the buffer that also has all of the properties we need, then we return its index, otherwise we throw an exception.

Memory allocation

We now have a way to determine the right memory type, so we can actually allocate the memory by filling in the VkMemoryAllocateInfo structure.

VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);

Memory allocation is now as simple as specifying the size and type, both of which are derived from the memory requirements of the vertex buffer and the desired property. Create a class member to store the handle to the memory and allocate it with vkAllocateMemory.

VkBuffer vertexBuffer;
VkDeviceMemory vertexBufferMemory;

...

if (vkAllocateMemory(device, &allocInfo, nullptr, &vertexBufferMemory) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate vertex buffer memory!");
}

If memory allocation was successful, then we can now associate this memory with the buffer using vkBindBufferMemory:

vkBindBufferMemory(device, vertexBuffer, vertexBufferMemory, 0);

The first three parameters are self-explanatory and the fourth parameter is the offset within the region of memory. Since this memory is allocated specifically for this the vertex buffer, the offset is simply 0. If the offset is non-zero, then it is required to be divisible by memRequirements.alignment.

Of course, just like dynamic memory allocation in C++, the memory should be freed at some point. Memory that is bound to a buffer object may be freed once the buffer is no longer used, so let's free it after the buffer has been destroyed:

void cleanup() {
    cleanupSwapChain();

    vkDestroyBuffer(device, vertexBuffer, nullptr);
    vkFreeMemory(device, vertexBufferMemory, nullptr);

Filling the vertex buffer

It is now time to copy the vertex data to the buffer. This is done by mapping the buffer memory into CPU accessible memory with vkMapMemory.

void* data;
vkMapMemory(device, vertexBufferMemory, 0, bufferInfo.size, 0, &data);

This function allows us to access a region of the specified memory resource defined by an offset and size. The offset and size here are 0 and bufferInfo.size, respectively. It is also possible to specify the special value VK_WHOLE_SIZE to map all of the memory. The second to last parameter can be used to specify flags, but there aren't any available yet in the current API. It must be set to the value 0. The last parameter specifies the output for the pointer to the mapped memory.

void* data;
vkMapMemory(device, vertexBufferMemory, 0, bufferInfo.size, 0, &data);
    memcpy(data, vertices.data(), (size_t) bufferInfo.size);
vkUnmapMemory(device, vertexBufferMemory);

You can now simply memcpy the vertex data to the mapped memory and unmap it again using vkUnmapMemory. Unfortunately the driver may not immediately copy the data into the buffer memory, for example because of caching. It is also possible that writes to the buffer are not visible in the mapped memory yet. There are two ways to deal with that problem:

Use a memory heap that is host coherent, indicated with VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
Call vkFlushMappedMemoryRanges after writing to the mapped memory, and call vkInvalidateMappedMemoryRanges before reading from the mapped memory

We went for the first approach, which ensures that the mapped memory always matches the contents of the allocated memory. Do keep in mind that this may lead to slightly worse performance than explicit flushing, but we'll see why that doesn't matter in the next chapter.

Flushing memory ranges or using a coherent memory heap means that the driver will be aware of our writes to the buffer, but it doesn't mean that they are actually visible on the GPU yet. The transfer of data to the GPU is an operation that happens in the background and the specification simply tells us that it is guaranteed to be complete as of the next call to vkQueueSubmit.

Binding the vertex buffer

All that remains now is binding the vertex buffer during rendering operations. We're going to extend the recordCommandBuffer function to do that.

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, graphicsPipeline);

VkBuffer vertexBuffers[] = {vertexBuffer};
VkDeviceSize offsets[] = {0};
vkCmdBindVertexBuffers(commandBuffer, 0, 1, vertexBuffers, offsets);

vkCmdDraw(commandBuffer, static_cast<uint32_t>(vertices.size()), 1, 0, 0);

The vkCmdBindVertexBuffers function is used to bind vertex buffers to bindings, like the one we set up in the previous chapter. The first two parameters, besides the command buffer, specify the offset and number of bindings we're going to specify vertex buffers for. The last two parameters specify the array of vertex buffers to bind and the byte offsets to start reading vertex data from. You should also change the call to vkCmdDraw to pass the number of vertices in the buffer as opposed to the hardcoded number 3.

Now run the program and you should see the familiar triangle again:

Try changing the color of the top vertex to white by modifying the vertices array:

const std::vector<Vertex> vertices = {
    {{0.0f, -0.5f}, {1.0f, 1.0f, 1.0f}},
    {{0.5f, 0.5f}, {0.0f, 1.0f, 0.0f}},
    {{-0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}}
};

Run the program again and you should see the following:

In the next chapter we'll look at a different way to copy vertex data to a vertex buffer that results in better performance, but takes some more work.

C++ code / Vertex shader / Fragment shader

Staging buffer

Introduction

The vertex buffer we have right now works correctly, but the memory type that allows us to access it from the CPU may not be the most optimal memory type for the graphics card itself to read from. The most optimal memory has the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT flag and is usually not accessible by the CPU on dedicated graphics cards. In this chapter we're going to create two vertex buffers. One staging buffer in CPU accessible memory to upload the data from the vertex array to, and the final vertex buffer in device local memory. We'll then use a buffer copy command to move the data from the staging buffer to the actual vertex buffer.

Transfer queue

The buffer copy command requires a queue family that supports transfer operations, which is indicated using VK_QUEUE_TRANSFER_BIT. The good news is that any queue family with VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT capabilities already implicitly support VK_QUEUE_TRANSFER_BIT operations. The implementation is not required to explicitly list it in queueFlags in those cases.

If you like a challenge, then you can still try to use a different queue family specifically for transfer operations. It will require you to make the following modifications to your program:

Modify QueueFamilyIndices and findQueueFamilies to explicitly look for a queue family with the VK_QUEUE_TRANSFER_BIT bit, but not the VK_QUEUE_GRAPHICS_BIT.
Modify createLogicalDevice to request a handle to the transfer queue
Create a second command pool for command buffers that are submitted on the transfer queue family
Change the sharingMode of resources to be VK_SHARING_MODE_CONCURRENT and specify both the graphics and transfer queue families
Submit any transfer commands like vkCmdCopyBuffer (which we'll be using in this chapter) to the transfer queue instead of the graphics queue

It's a bit of work, but it'll teach you a lot about how resources are shared between queue families.

Abstracting buffer creation

Because we're going to create multiple buffers in this chapter, it's a good idea to move buffer creation to a helper function. Create a new function createBuffer and move the code in createVertexBuffer (except mapping) to it.

void createBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) {
    VkBufferCreateInfo bufferInfo{};
    bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
    bufferInfo.size = size;
    bufferInfo.usage = usage;
    bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

    if (vkCreateBuffer(device, &bufferInfo, nullptr, &buffer) != VK_SUCCESS) {
        throw std::runtime_error("failed to create buffer!");
    }

    VkMemoryRequirements memRequirements;
    vkGetBufferMemoryRequirements(device, buffer, &memRequirements);

    VkMemoryAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
    allocInfo.allocationSize = memRequirements.size;
    allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, properties);

    if (vkAllocateMemory(device, &allocInfo, nullptr, &bufferMemory) != VK_SUCCESS) {
        throw std::runtime_error("failed to allocate buffer memory!");
    }

    vkBindBufferMemory(device, buffer, bufferMemory, 0);
}

Make sure to add parameters for the buffer size, memory properties and usage so that we can use this function to create many different types of buffers. The last two parameters are output variables to write the handles to.

You can now remove the buffer creation and memory allocation code from createVertexBuffer and just call createBuffer instead:

void createVertexBuffer() {
    VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size();
    createBuffer(bufferSize, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, vertexBuffer, vertexBufferMemory);

    void* data;
    vkMapMemory(device, vertexBufferMemory, 0, bufferSize, 0, &data);
        memcpy(data, vertices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, vertexBufferMemory);
}

Run your program to make sure that the vertex buffer still works properly.

Using a staging buffer

We're now going to change createVertexBuffer to only use a host visible buffer as temporary buffer and use a device local one as actual vertex buffer.

void createVertexBuffer() {
    VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size();

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
        memcpy(data, vertices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, stagingBufferMemory);

    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory);
}

We're now using a new stagingBuffer with stagingBufferMemory for mapping and copying the vertex data. In this chapter we're going to use two new buffer usage flags:

VK_BUFFER_USAGE_TRANSFER_SRC_BIT: Buffer can be used as source in a memory transfer operation.
VK_BUFFER_USAGE_TRANSFER_DST_BIT: Buffer can be used as destination in a memory transfer operation.

The vertexBuffer is now allocated from a memory type that is device local, which generally means that we're not able to use vkMapMemory. However, we can copy data from the stagingBuffer to the vertexBuffer. We have to indicate that we intend to do that by specifying the transfer source flag for the stagingBuffer and the transfer destination flag for the vertexBuffer, along with the vertex buffer usage flag.

We're now going to write a function to copy the contents from one buffer to another, called copyBuffer.

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {

}

Memory transfer operations are executed using command buffers, just like drawing commands. Therefore we must first allocate a temporary command buffer. You may wish to create a separate command pool for these kinds of short-lived buffers, because the implementation may be able to apply memory allocation optimizations. You should use the VK_COMMAND_POOL_CREATE_TRANSIENT_BIT flag during command pool generation in that case.

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {
    VkCommandBufferAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
    allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    allocInfo.commandPool = commandPool;
    allocInfo.commandBufferCount = 1;

    VkCommandBuffer commandBuffer;
    vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer);
}

And immediately start recording the command buffer:

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;

vkBeginCommandBuffer(commandBuffer, &beginInfo);

We're only going to use the command buffer once and wait with returning from the function until the copy operation has finished executing. It's good practice to tell the driver about our intent using VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT.

VkBufferCopy copyRegion{};
copyRegion.srcOffset = 0; // Optional
copyRegion.dstOffset = 0; // Optional
copyRegion.size = size;
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);

Contents of buffers are transferred using the vkCmdCopyBuffer command. It takes the source and destination buffers as arguments, and an array of regions to copy. The regions are defined in VkBufferCopy structs and consist of a source buffer offset, destination buffer offset and size. It is not possible to specify VK_WHOLE_SIZE here, unlike the vkMapMemory command.

vkEndCommandBuffer(commandBuffer);

This command buffer only contains the copy command, so we can stop recording right after that. Now execute the command buffer to complete the transfer:

VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;

vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
vkQueueWaitIdle(graphicsQueue);

Unlike the draw commands, there are no events we need to wait on this time. We just want to execute the transfer on the buffers immediately. There are again two possible ways to wait on this transfer to complete. We could use a fence and wait with vkWaitForFences, or simply wait for the transfer queue to become idle with vkQueueWaitIdle. A fence would allow you to schedule multiple transfers simultaneously and wait for all of them complete, instead of executing one at a time. That may give the driver more opportunities to optimize.

vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);

Don't forget to clean up the command buffer used for the transfer operation.

We can now call copyBuffer from the createVertexBuffer function to move the vertex data to the device local buffer:

createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory);

copyBuffer(stagingBuffer, vertexBuffer, bufferSize);

After copying the data from the staging buffer to the device buffer, we should clean it up:

    ...

    copyBuffer(stagingBuffer, vertexBuffer, bufferSize);

    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

Run your program to verify that you're seeing the familiar triangle again. The improvement may not be visible right now, but its vertex data is now being loaded from high performance memory. This will matter when we're going to start rendering more complex geometry.

Conclusion

It should be noted that in a real world application, you're not supposed to actually call vkAllocateMemory for every individual buffer. The maximum number of simultaneous memory allocations is limited by the maxMemoryAllocationCount physical device limit, which may be as low as 4096 even on high end hardware like an NVIDIA GTX 1080. The right way to allocate memory for a large number of objects at the same time is to create a custom allocator that splits up a single allocation among many different objects by using the offset parameters that we've seen in many functions.

You can either implement such an allocator yourself, or use the VulkanMemoryAllocator library provided by the GPUOpen initiative. However, for this tutorial it's okay to use a separate allocation for every resource, because we won't come close to hitting any of these limits for now.

C++ code / Vertex shader / Fragment shader

Index buffer

Introduction

The 3D meshes you'll be rendering in a real world application will often share vertices between multiple triangles. This already happens even with something simple like drawing a rectangle:

Drawing a rectangle takes two triangles, which means that we need a vertex buffer with 6 vertices. The problem is that the data of two vertices needs to be duplicated resulting in 50% redundancy. It only gets worse with more complex meshes, where vertices are reused in an average number of 3 triangles. The solution to this problem is to use an index buffer.

An index buffer is essentially an array of pointers into the vertex buffer. It allows you to reorder the vertex data, and reuse existing data for multiple vertices. The illustration above demonstrates what the index buffer would look like for the rectangle if we have a vertex buffer containing each of the four unique vertices. The first three indices define the upper-right triangle and the last three indices define the vertices for the bottom-left triangle.

Index buffer creation

In this chapter we're going to modify the vertex data and add index data to draw a rectangle like the one in the illustration. Modify the vertex data to represent the four corners:

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}},
    {{0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}},
    {{0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}},
    {{-0.5f, 0.5f}, {1.0f, 1.0f, 1.0f}}
};

The top-left corner is red, top-right is green, bottom-right is blue and the bottom-left is white. We'll add a new array indices to represent the contents of the index buffer. It should match the indices in the illustration to draw the upper-right triangle and bottom-left triangle.

const std::vector<uint16_t> indices = {
    0, 1, 2, 2, 3, 0
};

It is possible to use either uint16_t or uint32_t for your index buffer depending on the number of entries in vertices. We can stick to uint16_t for now because we're using less than 65535 unique vertices.

Just like the vertex data, the indices need to be uploaded into a VkBuffer for the GPU to be able to access them. Define two new class members to hold the resources for the index buffer:

VkBuffer vertexBuffer;
VkDeviceMemory vertexBufferMemory;
VkBuffer indexBuffer;
VkDeviceMemory indexBufferMemory;

The createIndexBuffer function that we'll add now is almost identical to createVertexBuffer:

void initVulkan() {
    ...
    createVertexBuffer();
    createIndexBuffer();
    ...
}

void createIndexBuffer() {
    VkDeviceSize bufferSize = sizeof(indices[0]) * indices.size();

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
    memcpy(data, indices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, stagingBufferMemory);

    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_INDEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, indexBuffer, indexBufferMemory);

    copyBuffer(stagingBuffer, indexBuffer, bufferSize);

    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

There are only two notable differences. The bufferSize is now equal to the number of indices times the size of the index type, either uint16_t or uint32_t. The usage of the indexBuffer should be VK_BUFFER_USAGE_INDEX_BUFFER_BIT instead of VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, which makes sense. Other than that, the process is exactly the same. We create a staging buffer to copy the contents of indices to and then copy it to the final device local index buffer.

The index buffer should be cleaned up at the end of the program, just like the vertex buffer:

void cleanup() {
    cleanupSwapChain();

    vkDestroyBuffer(device, indexBuffer, nullptr);
    vkFreeMemory(device, indexBufferMemory, nullptr);

    vkDestroyBuffer(device, vertexBuffer, nullptr);
    vkFreeMemory(device, vertexBufferMemory, nullptr);

    ...
}

Using an index buffer

Using an index buffer for drawing involves two changes to recordCommandBuffer. We first need to bind the index buffer, just like we did for the vertex buffer. The difference is that you can only have a single index buffer. It's unfortunately not possible to use different indices for each vertex attribute, so we do still have to completely duplicate vertex data even if just one attribute varies.

vkCmdBindVertexBuffers(commandBuffer, 0, 1, vertexBuffers, offsets);

vkCmdBindIndexBuffer(commandBuffer, indexBuffer, 0, VK_INDEX_TYPE_UINT16);

An index buffer is bound with vkCmdBindIndexBuffer which has the index buffer, a byte offset into it, and the type of index data as parameters. As mentioned before, the possible types are VK_INDEX_TYPE_UINT16 and VK_INDEX_TYPE_UINT32.

Just binding an index buffer doesn't change anything yet, we also need to change the drawing command to tell Vulkan to use the index buffer. Remove the vkCmdDraw line and replace it with vkCmdDrawIndexed:

vkCmdDrawIndexed(commandBuffer, static_cast<uint32_t>(indices.size()), 1, 0, 0, 0);

A call to this function is very similar to vkCmdDraw. The first two parameters specify the number of indices and the number of instances. We're not using instancing, so just specify 1 instance. The number of indices represents the number of vertices that will be passed to the vertex shader. The next parameter specifies an offset into the index buffer, using a value of 1 would cause the graphics card to start reading at the second index. The second to last parameter specifies an offset to add to the indices in the index buffer. The final parameter specifies an offset for instancing, which we're not using.

Now run your program and you should see the following:

You now know how to save memory by reusing vertices with index buffers. This will become especially important in a future chapter where we're going to load complex 3D models.

The previous chapter already mentioned that you should allocate multiple resources like buffers from a single memory allocation, but in fact you should go a step further. Driver developers recommend that you also store multiple buffers, like the vertex and index buffer, into a single VkBuffer and use offsets in commands like vkCmdBindVertexBuffers. The advantage is that your data is more cache friendly in that case, because it's closer together. It is even possible to reuse the same chunk of memory for multiple resources if they are not used during the same render operations, provided that their data is refreshed, of course. This is known as aliasing and some Vulkan functions have explicit flags to specify that you want to do this.

C++ code / Vertex shader / Fragment shader

Uniform buffers

Descriptor set layout and buffer

Introduction

We're now able to pass arbitrary attributes to the vertex shader for each vertex, but what about global variables? We're going to move on to 3D graphics from this chapter on and that requires a model-view-projection matrix. We could include it as vertex data, but that's a waste of memory and it would require us to update the vertex buffer whenever the transformation changes. The transformation could easily change every single frame.

The right way to tackle this in Vulkan is to use resource descriptors. A descriptor is a way for shaders to freely access resources like buffers and images. We're going to set up a buffer that contains the transformation matrices and have the vertex shader access them through a descriptor. Usage of descriptors consists of three parts:

Specify a descriptor set layout during pipeline creation
Allocate a descriptor set from a descriptor pool
Bind the descriptor set during rendering

The descriptor set layout specifies the types of resources that are going to be accessed by the pipeline, just like a render pass specifies the types of attachments that will be accessed. A descriptor set specifies the actual buffer or image resources that will be bound to the descriptors, just like a framebuffer specifies the actual image views to bind to render pass attachments. The descriptor set is then bound for the drawing commands just like the vertex buffers and framebuffer.

There are many types of descriptors, but in this chapter we'll work with uniform buffer objects (UBO). We'll look at other types of descriptors in future chapters, but the basic process is the same. Let's say we have the data we want the vertex shader to have in a C struct like this:

struct UniformBufferObject {
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

Then we can copy the data to a VkBuffer and access it through a uniform buffer object descriptor from the vertex shader like this:

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
}

We're going to update the model, view and projection matrices every frame to make the rectangle from the previous chapter spin around in 3D.

Vertex shader

Modify the vertex shader to include the uniform buffer object like it was specified above. I will assume that you are familiar with MVP transformations. If you're not, see the resource mentioned in the first chapter.

#version 450

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

layout(location = 0) in vec2 inPosition;
layout(location = 1) in vec3 inColor;

layout(location = 0) out vec3 fragColor;

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
}

Note that the order of the uniform, in and out declarations doesn't matter. The binding directive is similar to the location directive for attributes. We're going to reference this binding in the descriptor set layout. The line with gl_Position is changed to use the transformations to compute the final position in clip coordinates. Unlike the 2D triangles, the last component of the clip coordinates may not be 1, which will result in a division when converted to the final normalized device coordinates on the screen. This is used in perspective projection as the perspective division and is essential for making closer objects look larger than objects that are further away.

Descriptor set layout

The next step is to define the UBO on the C++ side and to tell Vulkan about this descriptor in the vertex shader.

struct UniformBufferObject {
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

We can exactly match the definition in the shader using data types in GLM. The data in the matrices is binary compatible with the way the shader expects it, so we can later just memcpy a UniformBufferObject to a VkBuffer.

We need to provide details about every descriptor binding used in the shaders for pipeline creation, just like we had to do for every vertex attribute and its location index. We'll set up a new function to define all of this information called createDescriptorSetLayout. It should be called right before pipeline creation, because we're going to need it there.

void initVulkan() {
    ...
    createDescriptorSetLayout();
    createGraphicsPipeline();
    ...
}

...

void createDescriptorSetLayout() {

}

Every binding needs to be described through a VkDescriptorSetLayoutBinding struct.

void createDescriptorSetLayout() {
    VkDescriptorSetLayoutBinding uboLayoutBinding{};
    uboLayoutBinding.binding = 0;
    uboLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
    uboLayoutBinding.descriptorCount = 1;
}

The first two fields specify the binding used in the shader and the type of descriptor, which is a uniform buffer object. It is possible for the shader variable to represent an array of uniform buffer objects, and descriptorCount specifies the number of values in the array. This could be used to specify a transformation for each of the bones in a skeleton for skeletal animation, for example. Our MVP transformation is in a single uniform buffer object, so we're using a descriptorCount of 1.

uboLayoutBinding.stageFlags = VK_SHADER_STAGE_VERTEX_BIT;

We also need to specify in which shader stages the descriptor is going to be referenced. The stageFlags field can be a combination of VkShaderStageFlagBits values or the value VK_SHADER_STAGE_ALL_GRAPHICS. In our case, we're only referencing the descriptor from the vertex shader.

uboLayoutBinding.pImmutableSamplers = nullptr; // Optional

The pImmutableSamplers field is only relevant for image sampling related descriptors, which we'll look at later. You can leave this to its default value.

All of the descriptor bindings are combined into a single VkDescriptorSetLayout object. Define a new class member above pipelineLayout:

VkDescriptorSetLayout descriptorSetLayout;
VkPipelineLayout pipelineLayout;

We can then create it using vkCreateDescriptorSetLayout. This function accepts a simple VkDescriptorSetLayoutCreateInfo with the array of bindings:

VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = 1;
layoutInfo.pBindings = &uboLayoutBinding;

if (vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr, &descriptorSetLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create descriptor set layout!");
}

We need to specify the descriptor set layout during pipeline creation to tell Vulkan which descriptors the shaders will be using. Descriptor set layouts are specified in the pipeline layout object. Modify the VkPipelineLayoutCreateInfo to reference the layout object:

VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 1;
pipelineLayoutInfo.pSetLayouts = &descriptorSetLayout;

You may be wondering why it's possible to specify multiple descriptor set layouts here, because a single one already includes all of the bindings. We'll get back to that in the next chapter, where we'll look into descriptor pools and descriptor sets.

The descriptor set layout should stick around while we may create new graphics pipelines i.e. until the program ends:

void cleanup() {
    cleanupSwapChain();

    vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);

    ...
}

Uniform buffer

In the next chapter we'll specify the buffer that contains the UBO data for the shader, but we need to create this buffer first. We're going to copy new data to the uniform buffer every frame, so it doesn't really make any sense to have a staging buffer. It would just add extra overhead in this case and likely degrade performance instead of improving it.

We should have multiple buffers, because multiple frames may be in flight at the same time and we don't want to update the buffer in preparation of the next frame while a previous one is still reading from it! Thus, we need to have as many uniform buffers as we have frames in flight, and write to a uniform buffer that is not currently being read by the GPU.

To that end, add new class members for uniformBuffers, and uniformBuffersMemory:

VkBuffer indexBuffer;
VkDeviceMemory indexBufferMemory;

std::vector<VkBuffer> uniformBuffers;
std::vector<VkDeviceMemory> uniformBuffersMemory;
std::vector<void*> uniformBuffersMapped;

Similarly, create a new function createUniformBuffers that is called after createIndexBuffer and allocates the buffers:

void initVulkan() {
    ...
    createVertexBuffer();
    createIndexBuffer();
    createUniformBuffers();
    ...
}

...

void createUniformBuffers() {
    VkDeviceSize bufferSize = sizeof(UniformBufferObject);

    uniformBuffers.resize(MAX_FRAMES_IN_FLIGHT);
    uniformBuffersMemory.resize(MAX_FRAMES_IN_FLIGHT);
    uniformBuffersMapped.resize(MAX_FRAMES_IN_FLIGHT);

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        createBuffer(bufferSize, VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, uniformBuffers[i], uniformBuffersMemory[i]);

        vkMapMemory(device, uniformBuffersMemory[i], 0, bufferSize, 0, &uniformBuffersMapped[i]);
    }
}

We map the buffer right after creation using vkMapMemory to get a pointer to which we can write the data later on. The buffer stays mapped to this pointer for the application's whole lifetime. This technique is called "persistent mapping" and works on all Vulkan implementations. Not having to map the buffer every time we need to update it increases performances, as mapping is not free.

The uniform data will be used for all draw calls, so the buffer containing it should only be destroyed when we stop rendering.

void cleanup() {
    ...

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        vkDestroyBuffer(device, uniformBuffers[i], nullptr);
        vkFreeMemory(device, uniformBuffersMemory[i], nullptr);
    }

    vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);

    ...

}

Updating uniform data

Create a new function updateUniformBuffer and add a call to it from the drawFrame function before submitting the next frame:

void drawFrame() {
    ...

    updateUniformBuffer(currentFrame);

    ...

    VkSubmitInfo submitInfo{};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

    ...
}

...

void updateUniformBuffer(uint32_t currentImage) {

}

This function will generate a new transformation every frame to make the geometry spin around. We need to include two new headers to implement this functionality:

#define GLM_FORCE_RADIANS
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>

#include <chrono>

The glm/gtc/matrix_transform.hpp header exposes functions that can be used to generate model transformations like glm::rotate, view transformations like glm::lookAt and projection transformations like glm::perspective. The GLM_FORCE_RADIANS definition is necessary to make sure that functions like glm::rotate use radians as arguments, to avoid any possible confusion.

The chrono standard library header exposes functions to do precise timekeeping. We'll use this to make sure that the geometry rotates 90 degrees per second regardless of frame rate.

void updateUniformBuffer(uint32_t currentImage) {
    static auto startTime = std::chrono::high_resolution_clock::now();

    auto currentTime = std::chrono::high_resolution_clock::now();
    float time = std::chrono::duration<float, std::chrono::seconds::period>(currentTime - startTime).count();
}

The updateUniformBuffer function will start out with some logic to calculate the time in seconds since rendering has started with floating point accuracy.

We will now define the model, view and projection transformations in the uniform buffer object. The model rotation will be a simple rotation around the Z-axis using the time variable:

UniformBufferObject ubo{};
ubo.model = glm::rotate(glm::mat4(1.0f), time * glm::radians(90.0f), glm::vec3(0.0f, 0.0f, 1.0f));

The glm::rotate function takes an existing transformation, rotation angle and rotation axis as parameters. The glm::mat4(1.0f) constructor returns an identity matrix. Using a rotation angle of time * glm::radians(90.0f) accomplishes the purpose of rotation 90 degrees per second.

ubo.view = glm::lookAt(glm::vec3(2.0f, 2.0f, 2.0f), glm::vec3(0.0f, 0.0f, 0.0f), glm::vec3(0.0f, 0.0f, 1.0f));

For the view transformation I've decided to look at the geometry from above at a 45 degree angle. The glm::lookAt function takes the eye position, center position and up axis as parameters.

ubo.proj = glm::perspective(glm::radians(45.0f), swapChainExtent.width / (float) swapChainExtent.height, 0.1f, 10.0f);

I've chosen to use a perspective projection with a 45 degree vertical field-of-view. The other parameters are the aspect ratio, near and far view planes. It is important to use the current swap chain extent to calculate the aspect ratio to take into account the new width and height of the window after a resize.

ubo.proj[1][1] *= -1;

GLM was originally designed for OpenGL, where the Y coordinate of the clip coordinates is inverted. The easiest way to compensate for that is to flip the sign on the scaling factor of the Y axis in the projection matrix. If you don't do this, then the image will be rendered upside down.

All of the transformations are defined now, so we can copy the data in the uniform buffer object to the current uniform buffer. This happens in exactly the same way as we did for vertex buffers, except without a staging buffer. As noted earlier, we only map the uniform buffer once, so we can directly write to it without having to map again:

memcpy(uniformBuffersMapped[currentImage], &ubo, sizeof(ubo));

Using a UBO this way is not the most efficient way to pass frequently changing values to the shader. A more efficient way to pass a small buffer of data to shaders are push constants. We may look at these in a future chapter.

In the next chapter we'll look at descriptor sets, which will actually bind the VkBuffers to the uniform buffer descriptors so that the shader can access this transformation data.

C++ code / Vertex shader / Fragment shader

Descriptor pool and sets

Introduction

The descriptor set layout from the previous chapter describes the type of descriptors that can be bound. In this chapter we're going to create a descriptor set for each VkBuffer resource to bind it to the uniform buffer descriptor.

Descriptor pool

Descriptor sets can't be created directly, they must be allocated from a pool like command buffers. The equivalent for descriptor sets is unsurprisingly called a descriptor pool. We'll write a new function createDescriptorPool to set it up.

void initVulkan() {
    ...
    createUniformBuffers();
    createDescriptorPool();
    ...
}

...

void createDescriptorPool() {

}

We first need to describe which descriptor types our descriptor sets are going to contain and how many of them, using VkDescriptorPoolSize structures.

VkDescriptorPoolSize poolSize{};
poolSize.type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
poolSize.descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

We will allocate one of these descriptors for every frame. This pool size structure is referenced by the main VkDescriptorPoolCreateInfo:

VkDescriptorPoolCreateInfo poolInfo{};
poolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
poolInfo.poolSizeCount = 1;
poolInfo.pPoolSizes = &poolSize;

Aside from the maximum number of individual descriptors that are available, we also need to specify the maximum number of descriptor sets that may be allocated:

poolInfo.maxSets = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

The structure has an optional flag similar to command pools that determines if individual descriptor sets can be freed or not: VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT. We're not going to touch the descriptor set after creating it, so we don't need this flag. You can leave flags to its default value of 0.

VkDescriptorPool descriptorPool;

...

if (vkCreateDescriptorPool(device, &poolInfo, nullptr, &descriptorPool) != VK_SUCCESS) {
    throw std::runtime_error("failed to create descriptor pool!");
}

Add a new class member to store the handle of the descriptor pool and call vkCreateDescriptorPool to create it.

Descriptor set

We can now allocate the descriptor sets themselves. Add a createDescriptorSets function for that purpose:

void initVulkan() {
    ...
    createDescriptorPool();
    createDescriptorSets();
    ...
}

...

void createDescriptorSets() {

}

A descriptor set allocation is described with a VkDescriptorSetAllocateInfo struct. You need to specify the descriptor pool to allocate from, the number of descriptor sets to allocate, and the descriptor set layout to base them on:

std::vector<VkDescriptorSetLayout> layouts(MAX_FRAMES_IN_FLIGHT, descriptorSetLayout);
VkDescriptorSetAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
allocInfo.descriptorPool = descriptorPool;
allocInfo.descriptorSetCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);
allocInfo.pSetLayouts = layouts.data();

In our case we will create one descriptor set for each frame in flight, all with the same layout. Unfortunately we do need all the copies of the layout because the next function expects an array matching the number of sets.

Add a class member to hold the descriptor set handles and allocate them with vkAllocateDescriptorSets:

VkDescriptorPool descriptorPool;
std::vector<VkDescriptorSet> descriptorSets;

...

descriptorSets.resize(MAX_FRAMES_IN_FLIGHT);
if (vkAllocateDescriptorSets(device, &allocInfo, descriptorSets.data()) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate descriptor sets!");
}

You don't need to explicitly clean up descriptor sets, because they will be automatically freed when the descriptor pool is destroyed. The call to vkAllocateDescriptorSets will allocate descriptor sets, each with one uniform buffer descriptor.

void cleanup() {
    ...
    vkDestroyDescriptorPool(device, descriptorPool, nullptr);

    vkDestroyDescriptorSetLayout(device, descriptorSetLayout, nullptr);
    ...
}

The descriptor sets have been allocated now, but the descriptors within still need to be configured. We'll now add a loop to populate every descriptor:

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {

}

Descriptors that refer to buffers, like our uniform buffer descriptor, are configured with a VkDescriptorBufferInfo struct. This structure specifies the buffer and the region within it that contains the data for the descriptor.

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    VkDescriptorBufferInfo bufferInfo{};
    bufferInfo.buffer = uniformBuffers[i];
    bufferInfo.offset = 0;
    bufferInfo.range = sizeof(UniformBufferObject);
}

If you're overwriting the whole buffer, like we are in this case, then it is also possible to use the VK_WHOLE_SIZE value for the range. The configuration of descriptors is updated using the vkUpdateDescriptorSets function, which takes an array of VkWriteDescriptorSet structs as parameter.

VkWriteDescriptorSet descriptorWrite{};
descriptorWrite.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrite.dstSet = descriptorSets[i];
descriptorWrite.dstBinding = 0;
descriptorWrite.dstArrayElement = 0;

The first two fields specify the descriptor set to update and the binding. We gave our uniform buffer binding index 0. Remember that descriptors can be arrays, so we also need to specify the first index in the array that we want to update. We're not using an array, so the index is simply 0.

descriptorWrite.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrite.descriptorCount = 1;

We need to specify the type of descriptor again. It's possible to update multiple descriptors at once in an array, starting at index dstArrayElement. The descriptorCount field specifies how many array elements you want to update.

descriptorWrite.pBufferInfo = &bufferInfo;
descriptorWrite.pImageInfo = nullptr; // Optional
descriptorWrite.pTexelBufferView = nullptr; // Optional

The last field references an array with descriptorCount structs that actually configure the descriptors. It depends on the type of descriptor which one of the three you actually need to use. The pBufferInfo field is used for descriptors that refer to buffer data, pImageInfo is used for descriptors that refer to image data, and pTexelBufferView is used for descriptors that refer to buffer views. Our descriptor is based on buffers, so we're using pBufferInfo.

vkUpdateDescriptorSets(device, 1, &descriptorWrite, 0, nullptr);

The updates are applied using vkUpdateDescriptorSets. It accepts two kinds of arrays as parameters: an array of VkWriteDescriptorSet and an array of VkCopyDescriptorSet. The latter can be used to copy descriptors to each other, as its name implies.

Using descriptor sets

We now need to update the recordCommandBuffer function to actually bind the right descriptor set for each frame to the descriptors in the shader with vkCmdBindDescriptorSets. This needs to be done before the vkCmdDrawIndexed call:

vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, 0, 1, &descriptorSets[currentFrame], 0, nullptr);
vkCmdDrawIndexed(commandBuffer, static_cast<uint32_t>(indices.size()), 1, 0, 0, 0);

Unlike vertex and index buffers, descriptor sets are not unique to graphics pipelines. Therefore we need to specify if we want to bind descriptor sets to the graphics or compute pipeline. The next parameter is the layout that the descriptors are based on. The next three parameters specify the index of the first descriptor set, the number of sets to bind, and the array of sets to bind. We'll get back to this in a moment. The last two parameters specify an array of offsets that are used for dynamic descriptors. We'll look at these in a future chapter.

If you run your program now, then you'll notice that unfortunately nothing is visible. The problem is that because of the Y-flip we did in the projection matrix, the vertices are now being drawn in counter-clockwise order instead of clockwise order. This causes backface culling to kick in and prevents any geometry from being drawn. Go to the createGraphicsPipeline function and modify the frontFace in VkPipelineRasterizationStateCreateInfo to correct this:

rasterizer.cullMode = VK_CULL_MODE_BACK_BIT;
rasterizer.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE;

Run your program again and you should now see the following:

The rectangle has changed into a square because the projection matrix now corrects for aspect ratio. The updateUniformBuffer takes care of screen resizing, so we don't need to recreate the descriptor set in recreateSwapChain.

Alignment requirements

One thing we've glossed over so far is how exactly the data in the C++ structure should match with the uniform definition in the shader. It seems obvious enough to simply use the same types in both:

struct UniformBufferObject {
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

layout(binding = 0) uniform UniformBufferObject {
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

However, that's not all there is to it. For example, try modifying the struct and shader to look like this:

struct UniformBufferObject {
    glm::vec2 foo;
    glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

layout(binding = 0) uniform UniformBufferObject {
    vec2 foo;
    mat4 model;
    mat4 view;
    mat4 proj;
} ubo;

Recompile your shader and your program and run it and you'll find that the colorful square you worked so far has disappeared! That's because we haven't taken into account the alignment requirements.

Vulkan expects the data in your structure to be aligned in memory in a specific way, for example:

Scalars have to be aligned by N (= 4 bytes given 32 bit floats).
A vec2 must be aligned by 2N (= 8 bytes)
A vec3 or vec4 must be aligned by 4N (= 16 bytes)
A nested structure must be aligned by the base alignment of its members rounded up to a multiple of 16.
A mat4 matrix must have the same alignment as a vec4.

You can find the full list of alignment requirements in the specification.

Our original shader with just three mat4 fields already met the alignment requirements. As each mat4 is 4 x 4 x 4 = 64 bytes in size, model has an offset of 0, view has an offset of 64 and proj has an offset of 128. All of these are multiples of 16 and that's why it worked fine.

The new structure starts with a vec2 which is only 8 bytes in size and therefore throws off all of the offsets. Now model has an offset of 8, view an offset of 72 and proj an offset of 136, none of which are multiples of 16. To fix this problem we can use the alignas specifier introduced in C++11:

struct UniformBufferObject {
    glm::vec2 foo;
    alignas(16) glm::mat4 model;
    glm::mat4 view;
    glm::mat4 proj;
};

If you now compile and run your program again you should see that the shader correctly receives its matrix values once again.

Luckily there is a way to not have to think about these alignment requirements most of the time. We can define GLM_FORCE_DEFAULT_ALIGNED_GENTYPES right before including GLM:

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEFAULT_ALIGNED_GENTYPES
#include <glm/glm.hpp>

This will force GLM to use a version of vec2 and mat4 that has the alignment requirements already specified for us. If you add this definition then you can remove the alignas specifier and your program should still work.

Unfortunately this method can break down if you start using nested structures. Consider the following definition in the C++ code:

struct Foo {
    glm::vec2 v;
};

struct UniformBufferObject {
    Foo f1;
    Foo f2;
};

And the following shader definition:

struct Foo {
    vec2 v;
};

layout(binding = 0) uniform UniformBufferObject {
    Foo f1;
    Foo f2;
} ubo;

In this case f2 will have an offset of 8 whereas it should have an offset of 16 since it is a nested structure. In this case you must specify the alignment yourself:

struct UniformBufferObject {
    Foo f1;
    alignas(16) Foo f2;
};

These gotchas are a good reason to always be explicit about alignment. That way you won't be caught offguard by the strange symptoms of alignment errors.

struct UniformBufferObject {
    alignas(16) glm::mat4 model;
    alignas(16) glm::mat4 view;
    alignas(16) glm::mat4 proj;
};

Don't forget to recompile your shader after removing the foo field.

Multiple descriptor sets

As some of the structures and function calls hinted at, it is actually possible to bind multiple descriptor sets simultaneously. You need to specify a descriptor set layout for each descriptor set when creating the pipeline layout. Shaders can then reference specific descriptor sets like this:

layout(set = 0, binding = 0) uniform UniformBufferObject { ... }

You can use this feature to put descriptors that vary per-object and descriptors that are shared into separate descriptor sets. In that case you avoid rebinding most of the descriptors across draw calls which is potentially more efficient.

C++ code / Vertex shader / Fragment shader

Texture mapping

Images

Introduction

The geometry has been colored using per-vertex colors so far, which is a rather limited approach. In this part of the tutorial we're going to implement texture mapping to make the geometry look more interesting. This will also allow us to load and draw basic 3D models in a future chapter.

Adding a texture to our application will involve the following steps:

Create an image object backed by device memory
Fill it with pixels from an image file
Create an image sampler
Add a combined image sampler descriptor to sample colors from the texture

We've already worked with image objects before, but those were automatically created by the swap chain extension. This time we'll have to create one by ourselves. Creating an image and filling it with data is similar to vertex buffer creation. We'll start by creating a staging resource and filling it with pixel data and then we copy this to the final image object that we'll use for rendering. Although it is possible to create a staging image for this purpose, Vulkan also allows you to copy pixels from a VkBuffer to an image and the API for this is actually faster on some hardware. We'll first create this buffer and fill it with pixel values, and then we'll create an image to copy the pixels to. Creating an image is not very different from creating buffers. It involves querying the memory requirements, allocating device memory and binding it, just like we've seen before.

However, there is something extra that we'll have to take care of when working with images. Images can have different layouts that affect how the pixels are organized in memory. Due to the way graphics hardware works, simply storing the pixels row by row may not lead to the best performance, for example. When performing any operation on images, you must make sure that they have the layout that is optimal for use in that operation. We've actually already seen some of these layouts when we specified the render pass:

VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: Optimal for presentation
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL: Optimal as attachment for writing colors from the fragment shader
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL: Optimal as source in a transfer operation, like vkCmdCopyImageToBuffer
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL: Optimal as destination in a transfer operation, like vkCmdCopyBufferToImage
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL: Optimal for sampling from a shader

One of the most common ways to transition the layout of an image is a pipeline barrier. Pipeline barriers are primarily used for synchronizing access to resources, like making sure that an image was written to before it is read, but they can also be used to transition layouts. In this chapter we'll see how pipeline barriers are used for this purpose. Barriers can additionally be used to transfer queue family ownership when using VK_SHARING_MODE_EXCLUSIVE.

Image library

There are many libraries available for loading images, and you can even write your own code to load simple formats like BMP and PPM. In this tutorial we'll be using the stb_image library from the stb collection. The advantage of it is that all of the code is in a single file, so it doesn't require any tricky build configuration. Download stb_image.h and store it in a convenient location, like the directory where you saved GLFW and GLM. Add the location to your include path.

Visual Studio

Add the directory with stb_image.h in it to the Additional Include Directories paths.

Makefile

Add the directory with stb_image.h to the include directories for GCC:

VULKAN_SDK_PATH = /home/user/VulkanSDK/x.x.x.x/x86_64
STB_INCLUDE_PATH = /home/user/libraries/stb

...

CFLAGS = -std=c++17 -I$(VULKAN_SDK_PATH)/include -I$(STB_INCLUDE_PATH)

Loading an image

Include the image library like this:

#define STB_IMAGE_IMPLEMENTATION
#include <stb_image.h>

The header only defines the prototypes of the functions by default. One code file needs to include the header with the STB_IMAGE_IMPLEMENTATION definition to include the function bodies, otherwise we'll get linking errors.

void initVulkan() {
    ...
    createCommandPool();
    createTextureImage();
    createVertexBuffer();
    ...
}

...

void createTextureImage() {

}

Create a new function createTextureImage where we'll load an image and upload it into a Vulkan image object. We're going to use command buffers, so it should be called after createCommandPool.

Create a new directory textures next to the shaders directory to store texture images in. We're going to load an image called texture.jpg from that directory. I've chosen to use the following CC0 licensed image resized to 512 x 512 pixels, but feel free to pick any image you want. The library supports most common image file formats, like JPEG, PNG, BMP and GIF.

Loading an image with this library is really easy:

void createTextureImage() {
    int texWidth, texHeight, texChannels;
    stbi_uc* pixels = stbi_load("textures/texture.jpg", &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);
    VkDeviceSize imageSize = texWidth * texHeight * 4;

    if (!pixels) {
        throw std::runtime_error("failed to load texture image!");
    }
}

The stbi_load function takes the file path and number of channels to load as arguments. The STBI_rgb_alpha value forces the image to be loaded with an alpha channel, even if it doesn't have one, which is nice for consistency with other textures in the future. The middle three parameters are outputs for the width, height and actual number of channels in the image. The pointer that is returned is the first element in an array of pixel values. The pixels are laid out row by row with 4 bytes per pixel in the case of STBI_rgb_alpha for a total of texWidth * texHeight * 4 values.

Staging buffer

We're now going to create a buffer in host visible memory so that we can use vkMapMemory and copy the pixels to it. Add variables for this temporary buffer to the createTextureImage function:

VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;

The buffer should be in host visible memory so that we can map it and it should be usable as a transfer source so that we can copy it to an image later on:

createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

We can then directly copy the pixel values that we got from the image loading library to the buffer:

void* data;
vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
    memcpy(data, pixels, static_cast<size_t>(imageSize));
vkUnmapMemory(device, stagingBufferMemory);

Don't forget to clean up the original pixel array now:

stbi_image_free(pixels);

Texture Image

Although we could set up the shader to access the pixel values in the buffer, it's better to use image objects in Vulkan for this purpose. Image objects will make it easier and faster to retrieve colors by allowing us to use 2D coordinates, for one. Pixels within an image object are known as texels and we'll use that name from this point on. Add the following new class members:

VkImage textureImage;
VkDeviceMemory textureImageMemory;

The parameters for an image are specified in a VkImageCreateInfo struct:

VkImageCreateInfo imageInfo{};
imageInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
imageInfo.imageType = VK_IMAGE_TYPE_2D;
imageInfo.extent.width = static_cast<uint32_t>(texWidth);
imageInfo.extent.height = static_cast<uint32_t>(texHeight);
imageInfo.extent.depth = 1;
imageInfo.mipLevels = 1;
imageInfo.arrayLayers = 1;

The image type, specified in the imageType field, tells Vulkan with what kind of coordinate system the texels in the image are going to be addressed. It is possible to create 1D, 2D and 3D images. One dimensional images can be used to store an array of data or gradient, two dimensional images are mainly used for textures, and three dimensional images can be used to store voxel volumes, for example. The extent field specifies the dimensions of the image, basically how many texels there are on each axis. That's why depth must be 1 instead of 0. Our texture will not be an array and we won't be using mipmapping for now.

imageInfo.format = VK_FORMAT_R8G8B8A8_SRGB;

Vulkan supports many possible image formats, but we should use the same format for the texels as the pixels in the buffer, otherwise the copy operation will fail.

imageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;

The tiling field can have one of two values:

VK_IMAGE_TILING_LINEAR: Texels are laid out in row-major order like our pixels array
VK_IMAGE_TILING_OPTIMAL: Texels are laid out in an implementation defined order for optimal access

Unlike the layout of an image, the tiling mode cannot be changed at a later time. If you want to be able to directly access texels in the memory of the image, then you must use VK_IMAGE_TILING_LINEAR. We will be using a staging buffer instead of a staging image, so this won't be necessary. We will be using VK_IMAGE_TILING_OPTIMAL for efficient access from the shader.

imageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;

There are only two possible values for the initialLayout of an image:

VK_IMAGE_LAYOUT_UNDEFINED: Not usable by the GPU and the very first transition will discard the texels.
VK_IMAGE_LAYOUT_PREINITIALIZED: Not usable by the GPU, but the first transition will preserve the texels.

There are few situations where it is necessary for the texels to be preserved during the first transition. One example, however, would be if you wanted to use an image as a staging image in combination with the VK_IMAGE_TILING_LINEAR layout. In that case, you'd want to upload the texel data to it and then transition the image to be a transfer source without losing the data. In our case, however, we're first going to transition the image to be a transfer destination and then copy texel data to it from a buffer object, so we don't need this property and can safely use VK_IMAGE_LAYOUT_UNDEFINED.

imageInfo.usage = VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT;

The usage field has the same semantics as the one during buffer creation. The image is going to be used as destination for the buffer copy, so it should be set up as a transfer destination. We also want to be able to access the image from the shader to color our mesh, so the usage should include VK_IMAGE_USAGE_SAMPLED_BIT.

imageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

The image will only be used by one queue family: the one that supports graphics (and therefore also) transfer operations.

imageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
imageInfo.flags = 0; // Optional

The samples flag is related to multisampling. This is only relevant for images that will be used as attachments, so stick to one sample. There are some optional flags for images that are related to sparse images. Sparse images are images where only certain regions are actually backed by memory. If you were using a 3D texture for a voxel terrain, for example, then you could use this to avoid allocating memory to store large volumes of "air" values. We won't be using it in this tutorial, so leave it to its default value of 0.

if (vkCreateImage(device, &imageInfo, nullptr, &textureImage) != VK_SUCCESS) {
    throw std::runtime_error("failed to create image!");
}

The image is created using vkCreateImage, which doesn't have any particularly noteworthy parameters. It is possible that the VK_FORMAT_R8G8B8A8_SRGB format is not supported by the graphics hardware. You should have a list of acceptable alternatives and go with the best one that is supported. However, support for this particular format is so widespread that we'll skip this step. Using different formats would also require annoying conversions. We will get back to this in the depth buffer chapter, where we'll implement such a system.

VkMemoryRequirements memRequirements;
vkGetImageMemoryRequirements(device, textureImage, &memRequirements);

VkMemoryAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocInfo.allocationSize = memRequirements.size;
allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);

if (vkAllocateMemory(device, &allocInfo, nullptr, &textureImageMemory) != VK_SUCCESS) {
    throw std::runtime_error("failed to allocate image memory!");
}

vkBindImageMemory(device, textureImage, textureImageMemory, 0);

Allocating memory for an image works in exactly the same way as allocating memory for a buffer. Use vkGetImageMemoryRequirements instead of vkGetBufferMemoryRequirements, and use vkBindImageMemory instead of vkBindBufferMemory.

This function is already getting quite large and there'll be a need to create more images in later chapters, so we should abstract image creation into a createImage function, like we did for buffers. Create the function and move the image object creation and memory allocation to it:

void createImage(uint32_t width, uint32_t height, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, VkImage& image, VkDeviceMemory& imageMemory) {
    VkImageCreateInfo imageInfo{};
    imageInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
    imageInfo.imageType = VK_IMAGE_TYPE_2D;
    imageInfo.extent.width = width;
    imageInfo.extent.height = height;
    imageInfo.extent.depth = 1;
    imageInfo.mipLevels = 1;
    imageInfo.arrayLayers = 1;
    imageInfo.format = format;
    imageInfo.tiling = tiling;
    imageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    imageInfo.usage = usage;
    imageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
    imageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;

    if (vkCreateImage(device, &imageInfo, nullptr, &image) != VK_SUCCESS) {
        throw std::runtime_error("failed to create image!");
    }

    VkMemoryRequirements memRequirements;
    vkGetImageMemoryRequirements(device, image, &memRequirements);

    VkMemoryAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
    allocInfo.allocationSize = memRequirements.size;
    allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, properties);

    if (vkAllocateMemory(device, &allocInfo, nullptr, &imageMemory) != VK_SUCCESS) {
        throw std::runtime_error("failed to allocate image memory!");
    }

    vkBindImageMemory(device, image, imageMemory, 0);
}

I've made the width, height, format, tiling mode, usage, and memory properties parameters, because these will all vary between the images we'll be creating throughout this tutorial.

The createTextureImage function can now be simplified to:

void createTextureImage() {
    int texWidth, texHeight, texChannels;
    stbi_uc* pixels = stbi_load("textures/texture.jpg", &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);
    VkDeviceSize imageSize = texWidth * texHeight * 4;

    if (!pixels) {
        throw std::runtime_error("failed to load texture image!");
    }

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
        memcpy(data, pixels, static_cast<size_t>(imageSize));
    vkUnmapMemory(device, stagingBufferMemory);

    stbi_image_free(pixels);

    createImage(texWidth, texHeight, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
}

Layout transitions

The function we're going to write now involves recording and executing a command buffer again, so now's a good time to move that logic into a helper function or two:

VkCommandBuffer beginSingleTimeCommands() {
    VkCommandBufferAllocateInfo allocInfo{};
    allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
    allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    allocInfo.commandPool = commandPool;
    allocInfo.commandBufferCount = 1;

    VkCommandBuffer commandBuffer;
    vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer);

    VkCommandBufferBeginInfo beginInfo{};
    beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
    beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;

    vkBeginCommandBuffer(commandBuffer, &beginInfo);

    return commandBuffer;
}

void endSingleTimeCommands(VkCommandBuffer commandBuffer) {
    vkEndCommandBuffer(commandBuffer);

    VkSubmitInfo submitInfo{};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &commandBuffer;

    vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
    vkQueueWaitIdle(graphicsQueue);

    vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);
}

The code for these functions is based on the existing code in copyBuffer. You can now simplify that function to:

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    VkBufferCopy copyRegion{};
    copyRegion.size = size;
    vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);

    endSingleTimeCommands(commandBuffer);
}

If we were still using buffers, then we could now write a function to record and execute vkCmdCopyBufferToImage to finish the job, but this command requires the image to be in the right layout first. Create a new function to handle layout transitions:

void transitionImageLayout(VkImage image, VkFormat format, VkImageLayout oldLayout, VkImageLayout newLayout) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    endSingleTimeCommands(commandBuffer);
}

One of the most common ways to perform layout transitions is using an image memory barrier. A pipeline barrier like that is generally used to synchronize access to resources, like ensuring that a write to a buffer completes before reading from it, but it can also be used to transition image layouts and transfer queue family ownership when VK_SHARING_MODE_EXCLUSIVE is used. There is an equivalent buffer memory barrier to do this for buffers.

VkImageMemoryBarrier barrier{};
barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
barrier.oldLayout = oldLayout;
barrier.newLayout = newLayout;

The first two fields specify layout transition. It is possible to use VK_IMAGE_LAYOUT_UNDEFINED as oldLayout if you don't care about the existing contents of the image.

barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;

If you are using the barrier to transfer queue family ownership, then these two fields should be the indices of the queue families. They must be set to VK_QUEUE_FAMILY_IGNORED if you don't want to do this (not the default value!).

barrier.image = image;
barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
barrier.subresourceRange.baseMipLevel = 0;
barrier.subresourceRange.levelCount = 1;
barrier.subresourceRange.baseArrayLayer = 0;
barrier.subresourceRange.layerCount = 1;

The image and subresourceRange specify the image that is affected and the specific part of the image. Our image is not an array and does not have mipmapping levels, so only one level and layer are specified.

barrier.srcAccessMask = 0; // TODO
barrier.dstAccessMask = 0; // TODO

Barriers are primarily used for synchronization purposes, so you must specify which types of operations that involve the resource must happen before the barrier, and which operations that involve the resource must wait on the barrier. We need to do that despite already using vkQueueWaitIdle to manually synchronize. The right values depend on the old and new layout, so we'll get back to this once we've figured out which transitions we're going to use.

vkCmdPipelineBarrier(
    commandBuffer,
    0 /* TODO */, 0 /* TODO */,
    0,
    0, nullptr,
    0, nullptr,
    1, &barrier
);

All types of pipeline barriers are submitted using the same function. The first parameter after the command buffer specifies in which pipeline stage the operations occur that should happen before the barrier. The second parameter specifies the pipeline stage in which operations will wait on the barrier. The pipeline stages that you are allowed to specify before and after the barrier depend on how you use the resource before and after the barrier. The allowed values are listed in this table of the specification. For example, if you're going to read from a uniform after the barrier, you would specify a usage of VK_ACCESS_UNIFORM_READ_BIT and the earliest shader that will read from the uniform as pipeline stage, for example VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT. It would not make sense to specify a non-shader pipeline stage for this type of usage and the validation layers will warn you when you specify a pipeline stage that does not match the type of usage.

The third parameter is either 0 or VK_DEPENDENCY_BY_REGION_BIT. The latter turns the barrier into a per-region condition. That means that the implementation is allowed to already begin reading from the parts of a resource that were written so far, for example.

The last three pairs of parameters reference arrays of pipeline barriers of the three available types: memory barriers, buffer memory barriers, and image memory barriers like the one we're using here. Note that we're not using the VkFormat parameter yet, but we'll be using that one for special transitions in the depth buffer chapter.

Copying buffer to image

Before we get back to createTextureImage, we're going to write one more helper function: copyBufferToImage:

void copyBufferToImage(VkBuffer buffer, VkImage image, uint32_t width, uint32_t height) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    endSingleTimeCommands(commandBuffer);
}

Just like with buffer copies, you need to specify which part of the buffer is going to be copied to which part of the image. This happens through VkBufferImageCopy structs:

VkBufferImageCopy region{};
region.bufferOffset = 0;
region.bufferRowLength = 0;
region.bufferImageHeight = 0;

region.imageSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
region.imageSubresource.mipLevel = 0;
region.imageSubresource.baseArrayLayer = 0;
region.imageSubresource.layerCount = 1;

region.imageOffset = {0, 0, 0};
region.imageExtent = {
    width,
    height,
    1
};

Most of these fields are self-explanatory. The bufferOffset specifies the byte offset in the buffer at which the pixel values start. The bufferRowLength and bufferImageHeight fields specify how the pixels are laid out in memory. For example, you could have some padding bytes between rows of the image. Specifying 0 for both indicates that the pixels are simply tightly packed like they are in our case. The imageSubresource, imageOffset and imageExtent fields indicate to which part of the image we want to copy the pixels.

Buffer to image copy operations are enqueued using the vkCmdCopyBufferToImage function:

vkCmdCopyBufferToImage(
    commandBuffer,
    buffer,
    image,
    VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    1,
    &region
);

The fourth parameter indicates which layout the image is currently using. I'm assuming here that the image has already been transitioned to the layout that is optimal for copying pixels to. Right now we're only copying one chunk of pixels to the whole image, but it's possible to specify an array of VkBufferImageCopy to perform many different copies from this buffer to the image in one operation.

Preparing the texture image

We now have all of the tools we need to finish setting up the texture image, so we're going back to the createTextureImage function. The last thing we did there was creating the texture image. The next step is to copy the staging buffer to the texture image. This involves two steps:

Transition the texture image to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
Execute the buffer to image copy operation

This is easy to do with the functions we just created:

transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));

The image was created with the VK_IMAGE_LAYOUT_UNDEFINED layout, so that one should be specified as old layout when transitioning textureImage. Remember that we can do this because we don't care about its contents before performing the copy operation.

To be able to start sampling from the texture image in the shader, we need one last transition to prepare it for shader access:

transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);

Transition barrier masks

If you run your application with validation layers enabled now, then you'll see that it complains about the access masks and pipeline stages in transitionImageLayout being invalid. We still need to set those based on the layouts in the transition.

There are two transitions we need to handle:

Undefined → transfer destination: transfer writes that don't need to wait on anything
Transfer destination → shader reading: shader reads should wait on transfer writes, specifically the shader reads in the fragment shader, because that's where we're going to use the texture

These rules are specified using the following access masks and pipeline stages:

VkPipelineStageFlags sourceStage;
VkPipelineStageFlags destinationStage;

if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
    barrier.srcAccessMask = 0;
    barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

    sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
    destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
} else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) {
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

    sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
    destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
} else {
    throw std::invalid_argument("unsupported layout transition!");
}

vkCmdPipelineBarrier(
    commandBuffer,
    sourceStage, destinationStage,
    0,
    0, nullptr,
    0, nullptr,
    1, &barrier
);

As you can see in the aforementioned table, transfer writes must occur in the pipeline transfer stage. Since the writes don't have to wait on anything, you may specify an empty access mask and the earliest possible pipeline stage VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT for the pre-barrier operations. It should be noted that VK_PIPELINE_STAGE_TRANSFER_BIT is not a real stage within the graphics and compute pipelines. It is more of a pseudo-stage where transfers happen. See the documentation for more information and other examples of pseudo-stages.

The image will be written in the same pipeline stage and subsequently read by the fragment shader, which is why we specify shader reading access in the fragment shader pipeline stage.

If we need to do more transitions in the future, then we'll extend the function. The application should now run successfully, although there are of course no visual changes yet.

One thing to note is that command buffer submission results in implicit VK_ACCESS_HOST_WRITE_BIT synchronization at the beginning. Since the transitionImageLayout function executes a command buffer with only a single command, you could use this implicit synchronization and set srcAccessMask to 0 if you ever needed a VK_ACCESS_HOST_WRITE_BIT dependency in a layout transition. It's up to you if you want to be explicit about it or not, but I'm personally not a fan of relying on these OpenGL-like "hidden" operations.

There is actually a special type of image layout that supports all operations, VK_IMAGE_LAYOUT_GENERAL. The problem with it, of course, is that it doesn't necessarily offer the best performance for any operation. It is required for some special cases, like using an image as both input and output, or for reading an image after it has left the preinitialized layout.

All of the helper functions that submit commands so far have been set up to execute synchronously by waiting for the queue to become idle. For practical applications it is recommended to combine these operations in a single command buffer and execute them asynchronously for higher throughput, especially the transitions and copy in the createTextureImage function. Try to experiment with this by creating a setupCommandBuffer that the helper functions record commands into, and add a flushSetupCommands to execute the commands that have been recorded so far. It's best to do this after the texture mapping works to check if the texture resources are still set up correctly.

Cleanup

Finish the createTextureImage function by cleaning up the staging buffer and its memory at the end:

    transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);

    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

The main texture image is used until the end of the program:

void cleanup() {
    cleanupSwapChain();

    vkDestroyImage(device, textureImage, nullptr);
    vkFreeMemory(device, textureImageMemory, nullptr);

    ...
}

The image now contains the texture, but we still need a way to access it from the graphics pipeline. We'll work on that in the next chapter.

C++ code / Vertex shader / Fragment shader

Image view and sampler

In this chapter we're going to create two more resources that are needed for the graphics pipeline to sample an image. The first resource is one that we've already seen before while working with the swap chain images, but the second one is new - it relates to how the shader will read texels from the image.

Texture image view

We've seen before, with the swap chain images and the framebuffer, that images are accessed through image views rather than directly. We will also need to create such an image view for the texture image.

Add a class member to hold a VkImageView for the texture image and create a new function createTextureImageView where we'll create it:

VkImageView textureImageView;

...

void initVulkan() {
    ...
    createTextureImage();
    createTextureImageView();
    createVertexBuffer();
    ...
}

...

void createTextureImageView() {

}

The code for this function can be based directly on createImageViews. The only two changes you have to make are the format and the image:

VkImageViewCreateInfo viewInfo{};
viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
viewInfo.image = textureImage;
viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
viewInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
viewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
viewInfo.subresourceRange.baseMipLevel = 0;
viewInfo.subresourceRange.levelCount = 1;
viewInfo.subresourceRange.baseArrayLayer = 0;
viewInfo.subresourceRange.layerCount = 1;

I've left out the explicit viewInfo.components initialization, because VK_COMPONENT_SWIZZLE_IDENTITY is defined as 0 anyway. Finish creating the image view by calling vkCreateImageView:

if (vkCreateImageView(device, &viewInfo, nullptr, &textureImageView) != VK_SUCCESS) {
    throw std::runtime_error("failed to create texture image view!");
}

Because so much of the logic is duplicated from createImageViews, you may wish to abstract it into a new createImageView function:

VkImageView createImageView(VkImage image, VkFormat format) {
    VkImageViewCreateInfo viewInfo{};
    viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
    viewInfo.image = image;
    viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
    viewInfo.format = format;
    viewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
    viewInfo.subresourceRange.baseMipLevel = 0;
    viewInfo.subresourceRange.levelCount = 1;
    viewInfo.subresourceRange.baseArrayLayer = 0;
    viewInfo.subresourceRange.layerCount = 1;

    VkImageView imageView;
    if (vkCreateImageView(device, &viewInfo, nullptr, &imageView) != VK_SUCCESS) {
        throw std::runtime_error("failed to create image view!");
    }

    return imageView;
}

The createTextureImageView function can now be simplified to:

void createTextureImageView() {
    textureImageView = createImageView(textureImage, VK_FORMAT_R8G8B8A8_SRGB);
}

And createImageViews can be simplified to:

void createImageViews() {
    swapChainImageViews.resize(swapChainImages.size());

    for (uint32_t i = 0; i < swapChainImages.size(); i++) {
        swapChainImageViews[i] = createImageView(swapChainImages[i], swapChainImageFormat);
    }
}

Make sure to destroy the image view at the end of the program, right before destroying the image itself:

void cleanup() {
    cleanupSwapChain();

    vkDestroyImageView(device, textureImageView, nullptr);

    vkDestroyImage(device, textureImage, nullptr);
    vkFreeMemory(device, textureImageMemory, nullptr);

Samplers

It is possible for shaders to read texels directly from images, but that is not very common when they are used as textures. Textures are usually accessed through samplers, which will apply filtering and transformations to compute the final color that is retrieved.

These filters are helpful to deal with problems like oversampling. Consider a texture that is mapped to geometry with more fragments than texels. If you simply took the closest texel for the texture coordinate in each fragment, then you would get a result like the first image:

If you combined the 4 closest texels through linear interpolation, then you would get a smoother result like the one on the right. Of course your application may have art style requirements that fit the left style more (think Minecraft), but the right is preferred in conventional graphics applications. A sampler object automatically applies this filtering for you when reading a color from the texture.

Undersampling is the opposite problem, where you have more texels than fragments. This will lead to artifacts when sampling high frequency patterns like a checkerboard texture at a sharp angle:

As shown in the left image, the texture turns into a blurry mess in the distance. The solution to this is anisotropic filtering, which can also be applied automatically by a sampler.

Aside from these filters, a sampler can also take care of transformations. It determines what happens when you try to read texels outside the image through its addressing mode. The image below displays some of the possibilities:

We will now create a function createTextureSampler to set up such a sampler object. We'll be using that sampler to read colors from the texture in the shader later on.

void initVulkan() {
    ...
    createTextureImage();
    createTextureImageView();
    createTextureSampler();
    ...
}

...

void createTextureSampler() {

}

Samplers are configured through a VkSamplerCreateInfo structure, which specifies all filters and transformations that it should apply.

VkSamplerCreateInfo samplerInfo{};
samplerInfo.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
samplerInfo.magFilter = VK_FILTER_LINEAR;
samplerInfo.minFilter = VK_FILTER_LINEAR;

The magFilter and minFilter fields specify how to interpolate texels that are magnified or minified. Magnification concerns the oversampling problem describes above, and minification concerns undersampling. The choices are VK_FILTER_NEAREST and VK_FILTER_LINEAR, corresponding to the modes demonstrated in the images above.

samplerInfo.addressModeU = VK_SAMPLER_ADDRESS_MODE_REPEAT;
samplerInfo.addressModeV = VK_SAMPLER_ADDRESS_MODE_REPEAT;
samplerInfo.addressModeW = VK_SAMPLER_ADDRESS_MODE_REPEAT;

The addressing mode can be specified per axis using the addressMode fields. The available values are listed below. Most of these are demonstrated in the image above. Note that the axes are called U, V and W instead of X, Y and Z. This is a convention for texture space coordinates.

VK_SAMPLER_ADDRESS_MODE_REPEAT: Repeat the texture when going beyond the image dimensions.
VK_SAMPLER_ADDRESS_MODE_MIRRORED_REPEAT: Like repeat, but inverts the coordinates to mirror the image when going beyond the dimensions.
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE: Take the color of the edge closest to the coordinate beyond the image dimensions.
VK_SAMPLER_ADDRESS_MODE_MIRROR_CLAMP_TO_EDGE: Like clamp to edge, but instead uses the edge opposite to the closest edge.
VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER: Return a solid color when sampling beyond the dimensions of the image.

It doesn't really matter which addressing mode we use here, because we're not going to sample outside of the image in this tutorial. However, the repeat mode is probably the most common mode, because it can be used to tile textures like floors and walls.

samplerInfo.anisotropyEnable = VK_TRUE;
samplerInfo.maxAnisotropy = ???;

These two fields specify if anisotropic filtering should be used. There is no reason not to use this unless performance is a concern. The maxAnisotropy field limits the amount of texel samples that can be used to calculate the final color. A lower value results in better performance, but lower quality results. To figure out which value we can use, we need to retrieve the properties of the physical device like so:

VkPhysicalDeviceProperties properties{};
vkGetPhysicalDeviceProperties(physicalDevice, &properties);

If you look at the documentation for the VkPhysicalDeviceProperties structure, you'll see that it contains a VkPhysicalDeviceLimits member named limits. This struct in turn has a member called maxSamplerAnisotropy and this is the maximum value we can specify for maxAnisotropy. If we want to go for maximum quality, we can simply use that value directly:

samplerInfo.maxAnisotropy = properties.limits.maxSamplerAnisotropy;

You can either query the properties at the beginning of your program and pass them around to the functions that need them, or query them in the createTextureSampler function itself.

samplerInfo.borderColor = VK_BORDER_COLOR_INT_OPAQUE_BLACK;

The borderColor field specifies which color is returned when sampling beyond the image with clamp to border addressing mode. It is possible to return black, white or transparent in either float or int formats. You cannot specify an arbitrary color.

samplerInfo.unnormalizedCoordinates = VK_FALSE;

The unnormalizedCoordinates field specifies which coordinate system you want to use to address texels in an image. If this field is VK_TRUE, then you can simply use coordinates within the [0, texWidth) and [0, texHeight) range. If it is VK_FALSE, then the texels are addressed using the [0, 1) range on all axes. Real-world applications almost always use normalized coordinates, because then it's possible to use textures of varying resolutions with the exact same coordinates.

samplerInfo.compareEnable = VK_FALSE;
samplerInfo.compareOp = VK_COMPARE_OP_ALWAYS;

If a comparison function is enabled, then texels will first be compared to a value, and the result of that comparison is used in filtering operations. This is mainly used for percentage-closer filtering on shadow maps. We'll look at this in a future chapter.

samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
samplerInfo.mipLodBias = 0.0f;
samplerInfo.minLod = 0.0f;
samplerInfo.maxLod = 0.0f;

All of these fields apply to mipmapping. We will look at mipmapping in a later chapter, but basically it's another type of filter that can be applied.

The functioning of the sampler is now fully defined. Add a class member to hold the handle of the sampler object and create the sampler with vkCreateSampler:

VkImageView textureImageView;
VkSampler textureSampler;

...

void createTextureSampler() {
    ...

    if (vkCreateSampler(device, &samplerInfo, nullptr, &textureSampler) != VK_SUCCESS) {
        throw std::runtime_error("failed to create texture sampler!");
    }
}

Note the sampler does not reference a VkImage anywhere. The sampler is a distinct object that provides an interface to extract colors from a texture. It can be applied to any image you want, whether it is 1D, 2D or 3D. This is different from many older APIs, which combined texture images and filtering into a single state.

Destroy the sampler at the end of the program when we'll no longer be accessing the image:

void cleanup() {
    cleanupSwapChain();

    vkDestroySampler(device, textureSampler, nullptr);
    vkDestroyImageView(device, textureImageView, nullptr);

    ...
}

Anisotropy device feature

If you run your program right now, you'll see a validation layer message like this:

That's because anisotropic filtering is actually an optional device feature. We need to update the createLogicalDevice function to request it:

VkPhysicalDeviceFeatures deviceFeatures{};
deviceFeatures.samplerAnisotropy = VK_TRUE;

And even though it is very unlikely that a modern graphics card will not support it, we should update isDeviceSuitable to check if it is available:

bool isDeviceSuitable(VkPhysicalDevice device) {
    ...

    VkPhysicalDeviceFeatures supportedFeatures;
    vkGetPhysicalDeviceFeatures(device, &supportedFeatures);

    return indices.isComplete() && extensionsSupported && swapChainAdequate && supportedFeatures.samplerAnisotropy;
}

The vkGetPhysicalDeviceFeatures repurposes the VkPhysicalDeviceFeatures struct to indicate which features are supported rather than requested by setting the boolean values.

Instead of enforcing the availability of anisotropic filtering, it's also possible to simply not use it by conditionally setting:

samplerInfo.anisotropyEnable = VK_FALSE;
samplerInfo.maxAnisotropy = 1.0f;

In the next chapter we will expose the image and sampler objects to the shaders to draw the texture onto the square.

C++ code / Vertex shader / Fragment shader

Combined image sampler

Introduction

We looked at descriptors for the first time in the uniform buffers part of the tutorial. In this chapter we will look at a new type of descriptor: combined image sampler. This descriptor makes it possible for shaders to access an image resource through a sampler object like the one we created in the previous chapter.

We'll start by modifying the descriptor set layout, descriptor pool and descriptor set to include such a combined image sampler descriptor. After that, we're going to add texture coordinates to Vertex and modify the fragment shader to read colors from the texture instead of just interpolating the vertex colors.

Updating the descriptors

Browse to the createDescriptorSetLayout function and add a VkDescriptorSetLayoutBinding for a combined image sampler descriptor. We'll simply put it in the binding after the uniform buffer:

VkDescriptorSetLayoutBinding samplerLayoutBinding{};
samplerLayoutBinding.binding = 1;
samplerLayoutBinding.descriptorCount = 1;
samplerLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
samplerLayoutBinding.pImmutableSamplers = nullptr;
samplerLayoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;

std::array<VkDescriptorSetLayoutBinding, 2> bindings = {uboLayoutBinding, samplerLayoutBinding};
VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = static_cast<uint32_t>(bindings.size());
layoutInfo.pBindings = bindings.data();

Make sure to set the stageFlags to indicate that we intend to use the combined image sampler descriptor in the fragment shader. That's where the color of the fragment is going to be determined. It is possible to use texture sampling in the vertex shader, for example to dynamically deform a grid of vertices by a heightmap.

We must also create a larger descriptor pool to make room for the allocation of the combined image sampler by adding another VkPoolSize of type VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER to the VkDescriptorPoolCreateInfo. Go to the createDescriptorPool function and modify it to include a VkDescriptorPoolSize for this descriptor:

std::array<VkDescriptorPoolSize, 2> poolSizes{};
poolSizes[0].type = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
poolSizes[0].descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);
poolSizes[1].type = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
poolSizes[1].descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

VkDescriptorPoolCreateInfo poolInfo{};
poolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
poolInfo.poolSizeCount = static_cast<uint32_t>(poolSizes.size());
poolInfo.pPoolSizes = poolSizes.data();
poolInfo.maxSets = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT);

Inadequate descriptor pools are a good example of a problem that the validation layers will not catch: As of Vulkan 1.1, vkAllocateDescriptorSets may fail with the error code VK_ERROR_POOL_OUT_OF_MEMORY if the pool is not sufficiently large, but the driver may also try to solve the problem internally. This means that sometimes (depending on hardware, pool size and allocation size) the driver will let us get away with an allocation that exceeds the limits of our descriptor pool. Other times, vkAllocateDescriptorSets will fail and return VK_ERROR_POOL_OUT_OF_MEMORY. This can be particularly frustrating if the allocation succeeds on some machines, but fails on others.

Since Vulkan shifts the responsiblity for the allocation to the driver, it is no longer a strict requirement to only allocate as many descriptors of a certain type (VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, etc.) as specified by the corresponding descriptorCount members for the creation of the descriptor pool. However, it remains best practise to do so, and in the future, VK_LAYER_KHRONOS_validation will warn about this type of problem if you enable Best Practice Validation.

The final step is to bind the actual image and sampler resources to the descriptors in the descriptor set. Go to the createDescriptorSets function.

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    VkDescriptorBufferInfo bufferInfo{};
    bufferInfo.buffer = uniformBuffers[i];
    bufferInfo.offset = 0;
    bufferInfo.range = sizeof(UniformBufferObject);

    VkDescriptorImageInfo imageInfo{};
    imageInfo.imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    imageInfo.imageView = textureImageView;
    imageInfo.sampler = textureSampler;

    ...
}

The resources for a combined image sampler structure must be specified in a VkDescriptorImageInfo struct, just like the buffer resource for a uniform buffer descriptor is specified in a VkDescriptorBufferInfo struct. This is where the objects from the previous chapter come together.

std::array<VkWriteDescriptorSet, 2> descriptorWrites{};

descriptorWrites[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[0].dstSet = descriptorSets[i];
descriptorWrites[0].dstBinding = 0;
descriptorWrites[0].dstArrayElement = 0;
descriptorWrites[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrites[0].descriptorCount = 1;
descriptorWrites[0].pBufferInfo = &bufferInfo;

descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[1].dstSet = descriptorSets[i];
descriptorWrites[1].dstBinding = 1;
descriptorWrites[1].dstArrayElement = 0;
descriptorWrites[1].descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
descriptorWrites[1].descriptorCount = 1;
descriptorWrites[1].pImageInfo = &imageInfo;

vkUpdateDescriptorSets(device, static_cast<uint32_t>(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr);

The descriptors must be updated with this image info, just like the buffer. This time we're using the pImageInfo array instead of pBufferInfo. The descriptors are now ready to be used by the shaders!

Texture coordinates

There is one important ingredient for texture mapping that is still missing, and that's the actual texture coordinates for each vertex. The texture coordinates determine how the image is actually mapped to the geometry.

struct Vertex {
    glm::vec2 pos;
    glm::vec3 color;
    glm::vec2 texCoord;

    static VkVertexInputBindingDescription getBindingDescription() {
        VkVertexInputBindingDescription bindingDescription{};
        bindingDescription.binding = 0;
        bindingDescription.stride = sizeof(Vertex);
        bindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;

        return bindingDescription;
    }

    static std::array<VkVertexInputAttributeDescription, 3> getAttributeDescriptions() {
        std::array<VkVertexInputAttributeDescription, 3> attributeDescriptions{};

        attributeDescriptions[0].binding = 0;
        attributeDescriptions[0].location = 0;
        attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
        attributeDescriptions[0].offset = offsetof(Vertex, pos);

        attributeDescriptions[1].binding = 0;
        attributeDescriptions[1].location = 1;
        attributeDescriptions[1].format = VK_FORMAT_R32G32B32_SFLOAT;
        attributeDescriptions[1].offset = offsetof(Vertex, color);

        attributeDescriptions[2].binding = 0;
        attributeDescriptions[2].location = 2;
        attributeDescriptions[2].format = VK_FORMAT_R32G32_SFLOAT;
        attributeDescriptions[2].offset = offsetof(Vertex, texCoord);

        return attributeDescriptions;
    }
};

Modify the Vertex struct to include a vec2 for texture coordinates. Make sure to also add a VkVertexInputAttributeDescription so that we can use access texture coordinates as input in the vertex shader. That is necessary to be able to pass them to the fragment shader for interpolation across the surface of the square.

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, 0.5f}, {0.0f, 0.0f, 1.0f}, {0.0f, 1.0f}},
    {{-0.5f, 0.5f}, {1.0f, 1.0f, 1.0f}, {1.0f, 1.0f}}
};

In this tutorial, I will simply fill the square with the texture by using coordinates from 0, 0 in the top-left corner to 1, 1 in the bottom-right corner. Feel free to experiment with different coordinates. Try using coordinates below 0 or above 1 to see the addressing modes in action!

Shaders

The final step is modifying the shaders to sample colors from the texture. We first need to modify the vertex shader to pass through the texture coordinates to the fragment shader:

layout(location = 0) in vec2 inPosition;
layout(location = 1) in vec3 inColor;
layout(location = 2) in vec2 inTexCoord;

layout(location = 0) out vec3 fragColor;
layout(location = 1) out vec2 fragTexCoord;

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 0.0, 1.0);
    fragColor = inColor;
    fragTexCoord = inTexCoord;
}

Just like the per vertex colors, the fragTexCoord values will be smoothly interpolated across the area of the square by the rasterizer. We can visualize this by having the fragment shader output the texture coordinates as colors:

#version 450

layout(location = 0) in vec3 fragColor;
layout(location = 1) in vec2 fragTexCoord;

layout(location = 0) out vec4 outColor;

void main() {
    outColor = vec4(fragTexCoord, 0.0, 1.0);
}

You should see something like the image below. Don't forget to recompile the shaders!

The green channel represents the horizontal coordinates and the red channel the vertical coordinates. The black and yellow corners confirm that the texture coordinates are correctly interpolated from 0, 0 to 1, 1 across the square. Visualizing data using colors is the shader programming equivalent of printf debugging, for lack of a better option!

A combined image sampler descriptor is represented in GLSL by a sampler uniform. Add a reference to it in the fragment shader:

layout(binding = 1) uniform sampler2D texSampler;

There are equivalent sampler1D and sampler3D types for other types of images. Make sure to use the correct binding here.

void main() {
    outColor = texture(texSampler, fragTexCoord);
}

Textures are sampled using the built-in texture function. It takes a sampler and coordinate as arguments. The sampler automatically takes care of the filtering and transformations in the background. You should now see the texture on the square when you run the application:

Try experimenting with the addressing modes by scaling the texture coordinates to values higher than 1. For example, the following fragment shader produces the result in the image below when using VK_SAMPLER_ADDRESS_MODE_REPEAT:

void main() {
    outColor = texture(texSampler, fragTexCoord * 2.0);
}

You can also manipulate the texture colors using the vertex colors:

void main() {
    outColor = vec4(fragColor * texture(texSampler, fragTexCoord).rgb, 1.0);
}

I've separated the RGB and alpha channels here to not scale the alpha channel.

You now know how to access images in shaders! This is a very powerful technique when combined with images that are also written to in framebuffers. You can use these images as inputs to implement cool effects like post-processing and camera displays within the 3D world.

C++ code / Vertex shader / Fragment shader

Depth buffering

Introduction

The geometry we've worked with so far is projected into 3D, but it's still completely flat. In this chapter we're going to add a Z coordinate to the position to prepare for 3D meshes. We'll use this third coordinate to place a square over the current square to see a problem that arises when geometry is not sorted by depth.

3D geometry

Change the Vertex struct to use a 3D vector for the position, and update the format in the corresponding VkVertexInputAttributeDescription:

struct Vertex {
    glm::vec3 pos;
    glm::vec3 color;
    glm::vec2 texCoord;

    ...

    static std::array<VkVertexInputAttributeDescription, 3> getAttributeDescriptions() {
        std::array<VkVertexInputAttributeDescription, 3> attributeDescriptions{};

        attributeDescriptions[0].binding = 0;
        attributeDescriptions[0].location = 0;
        attributeDescriptions[0].format = VK_FORMAT_R32G32B32_SFLOAT;
        attributeDescriptions[0].offset = offsetof(Vertex, pos);

        ...
    }
};

Next, update the vertex shader to accept and transform 3D coordinates as input. Don't forget to recompile it afterwards!

layout(location = 0) in vec3 inPosition;

...

void main() {
    gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 1.0);
    fragColor = inColor;
    fragTexCoord = inTexCoord;
}

Lastly, update the vertices container to include Z coordinates:

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f, 0.0f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, -0.5f, 0.0f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, 0.5f, 0.0f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
    {{-0.5f, 0.5f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}}
};

If you run your application now, then you should see exactly the same result as before. It's time to add some extra geometry to make the scene more interesting, and to demonstrate the problem that we're going to tackle in this chapter. Duplicate the vertices to define positions for a square right under the current one like this:

Use Z coordinates of -0.5f and add the appropriate indices for the extra square:

const std::vector<Vertex> vertices = {
    {{-0.5f, -0.5f, 0.0f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, -0.5f, 0.0f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, 0.5f, 0.0f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
    {{-0.5f, 0.5f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}},

    {{-0.5f, -0.5f, -0.5f}, {1.0f, 0.0f, 0.0f}, {0.0f, 0.0f}},
    {{0.5f, -0.5f, -0.5f}, {0.0f, 1.0f, 0.0f}, {1.0f, 0.0f}},
    {{0.5f, 0.5f, -0.5f}, {0.0f, 0.0f, 1.0f}, {1.0f, 1.0f}},
    {{-0.5f, 0.5f, -0.5f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f}}
};

const std::vector<uint16_t> indices = {
    0, 1, 2, 2, 3, 0,
    4, 5, 6, 6, 7, 4
};

Run your program now and you'll see something resembling an Escher illustration:

The problem is that the fragments of the lower square are drawn over the fragments of the upper square, simply because it comes later in the index array. There are two ways to solve this:

Sort all of the draw calls by depth from back to front
Use depth testing with a depth buffer

The first approach is commonly used for drawing transparent objects, because order-independent transparency is a difficult challenge to solve. However, the problem of ordering fragments by depth is much more commonly solved using a depth buffer. A depth buffer is an additional attachment that stores the depth for every position, just like the color attachment stores the color of every position. Every time the rasterizer produces a fragment, the depth test will check if the new fragment is closer than the previous one. If it isn't, then the new fragment is discarded. A fragment that passes the depth test writes its own depth to the depth buffer. It is possible to manipulate this value from the fragment shader, just like you can manipulate the color output.

#define GLM_FORCE_RADIANS
#define GLM_FORCE_DEPTH_ZERO_TO_ONE
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>

The perspective projection matrix generated by GLM will use the OpenGL depth range of -1.0 to 1.0 by default. We need to configure it to use the Vulkan range of 0.0 to 1.0 using the GLM_FORCE_DEPTH_ZERO_TO_ONE definition.

Depth image and view

A depth attachment is based on an image, just like the color attachment. The difference is that the swap chain will not automatically create depth images for us. We only need a single depth image, because only one draw operation is running at once. The depth image will again require the trifecta of resources: image, memory and image view.

VkImage depthImage;
VkDeviceMemory depthImageMemory;
VkImageView depthImageView;

Create a new function createDepthResources to set up these resources:

void initVulkan() {
    ...
    createCommandPool();
    createDepthResources();
    createTextureImage();
    ...
}

...

void createDepthResources() {

}

Creating a depth image is fairly straightforward. It should have the same resolution as the color attachment, defined by the swap chain extent, an image usage appropriate for a depth attachment, optimal tiling and device local memory. The only question is: what is the right format for a depth image? The format must contain a depth component, indicated by _D??_ in the VK_FORMAT_.

Unlike the texture image, we don't necessarily need a specific format, because we won't be directly accessing the texels from the program. It just needs to have a reasonable accuracy, at least 24 bits is common in real-world applications. There are several formats that fit this requirement:

VK_FORMAT_D32_SFLOAT: 32-bit float for depth
VK_FORMAT_D32_SFLOAT_S8_UINT: 32-bit signed float for depth and 8 bit stencil component
VK_FORMAT_D24_UNORM_S8_UINT: 24-bit float for depth and 8 bit stencil component

The stencil component is used for stencil tests, which is an additional test that can be combined with depth testing. We'll look at this in a future chapter.

We could simply go for the VK_FORMAT_D32_SFLOAT format, because support for it is extremely common (see the hardware database), but it's nice to add some extra flexibility to our application where possible. We're going to write a function findSupportedFormat that takes a list of candidate formats in order from most desirable to least desirable, and checks which is the first one that is supported:

VkFormat findSupportedFormat(const std::vector<VkFormat>& candidates, VkImageTiling tiling, VkFormatFeatureFlags features) {

}

The support of a format depends on the tiling mode and usage, so we must also include these as parameters. The support of a format can be queried using the vkGetPhysicalDeviceFormatProperties function:

for (VkFormat format : candidates) {
    VkFormatProperties props;
    vkGetPhysicalDeviceFormatProperties(physicalDevice, format, &props);
}

The VkFormatProperties struct contains three fields:

linearTilingFeatures: Use cases that are supported with linear tiling
optimalTilingFeatures: Use cases that are supported with optimal tiling
bufferFeatures: Use cases that are supported for buffers

Only the first two are relevant here, and the one we check depends on the tiling parameter of the function:

if (tiling == VK_IMAGE_TILING_LINEAR && (props.linearTilingFeatures & features) == features) {
    return format;
} else if (tiling == VK_IMAGE_TILING_OPTIMAL && (props.optimalTilingFeatures & features) == features) {
    return format;
}

If none of the candidate formats support the desired usage, then we can either return a special value or simply throw an exception:

VkFormat findSupportedFormat(const std::vector<VkFormat>& candidates, VkImageTiling tiling, VkFormatFeatureFlags features) {
    for (VkFormat format : candidates) {
        VkFormatProperties props;
        vkGetPhysicalDeviceFormatProperties(physicalDevice, format, &props);

        if (tiling == VK_IMAGE_TILING_LINEAR && (props.linearTilingFeatures & features) == features) {
            return format;
        } else if (tiling == VK_IMAGE_TILING_OPTIMAL && (props.optimalTilingFeatures & features) == features) {
            return format;
        }
    }

    throw std::runtime_error("failed to find supported format!");
}

We'll use this function now to create a findDepthFormat helper function to select a format with a depth component that supports usage as depth attachment:

VkFormat findDepthFormat() {
    return findSupportedFormat(
        {VK_FORMAT_D32_SFLOAT, VK_FORMAT_D32_SFLOAT_S8_UINT, VK_FORMAT_D24_UNORM_S8_UINT},
        VK_IMAGE_TILING_OPTIMAL,
        VK_FORMAT_FEATURE_DEPTH_STENCIL_ATTACHMENT_BIT
    );
}

Make sure to use the VK_FORMAT_FEATURE_ flag instead of VK_IMAGE_USAGE_ in this case. All of these candidate formats contain a depth component, but the latter two also contain a stencil component. We won't be using that yet, but we do need to take that into account when performing layout transitions on images with these formats. Add a simple helper function that tells us if the chosen depth format contains a stencil component:

bool hasStencilComponent(VkFormat format) {
    return format == VK_FORMAT_D32_SFLOAT_S8_UINT || format == VK_FORMAT_D24_UNORM_S8_UINT;
}

Call the function to find a depth format from createDepthResources:

VkFormat depthFormat = findDepthFormat();

We now have all the required information to invoke our createImage and createImageView helper functions:

createImage(swapChainExtent.width, swapChainExtent.height, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
depthImageView = createImageView(depthImage, depthFormat);

However, the createImageView function currently assumes that the subresource is always the VK_IMAGE_ASPECT_COLOR_BIT, so we will need to turn that field into a parameter:

VkImageView createImageView(VkImage image, VkFormat format, VkImageAspectFlags aspectFlags) {
    ...
    viewInfo.subresourceRange.aspectMask = aspectFlags;
    ...
}

Update all calls to this function to use the right aspect:

swapChainImageViews[i] = createImageView(swapChainImages[i], swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT);
...
depthImageView = createImageView(depthImage, depthFormat, VK_IMAGE_ASPECT_DEPTH_BIT);
...
textureImageView = createImageView(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_ASPECT_COLOR_BIT);

That's it for creating the depth image. We don't need to map it or copy another image to it, because we're going to clear it at the start of the render pass like the color attachment.

Explicitly transitioning the depth image

We don't need to explicitly transition the layout of the image to a depth attachment because we'll take care of this in the render pass. However, for completeness I'll still describe the process in this section. You may skip it if you like.

Make a call to transitionImageLayout at the end of the createDepthResources function like so:

transitionImageLayout(depthImage, depthFormat, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL);

The undefined layout can be used as initial layout, because there are no existing depth image contents that matter. We need to update some of the logic in transitionImageLayout to use the right subresource aspect:

if (newLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL) {
    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT;

    if (hasStencilComponent(format)) {
        barrier.subresourceRange.aspectMask |= VK_IMAGE_ASPECT_STENCIL_BIT;
    }
} else {
    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
}

Although we're not using the stencil component, we do need to include it in the layout transitions of the depth image.

Finally, add the correct access masks and pipeline stages:

if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) {
    barrier.srcAccessMask = 0;
    barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

    sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
    destinationStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
} else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) {
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

    sourceStage = VK_PIPELINE_STAGE_TRANSFER_BIT;
    destinationStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
} else if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL) {
    barrier.srcAccessMask = 0;
    barrier.dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

    sourceStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
    destinationStage = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
} else {
    throw std::invalid_argument("unsupported layout transition!");
}

The depth buffer will be read from to perform depth tests to see if a fragment is visible, and will be written to when a new fragment is drawn. The reading happens in the VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT stage and the writing in the VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT. You should pick the earliest pipeline stage that matches the specified operations, so that it is ready for usage as depth attachment when it needs to be.

Render pass

We're now going to modify createRenderPass to include a depth attachment. First specify the VkAttachmentDescription:

VkAttachmentDescription depthAttachment{};
depthAttachment.format = findDepthFormat();
depthAttachment.samples = VK_SAMPLE_COUNT_1_BIT;
depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
depthAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

The format should be the same as the depth image itself. This time we don't care about storing the depth data (storeOp), because it will not be used after drawing has finished. This may allow the hardware to perform additional optimizations. Just like the color buffer, we don't care about the previous depth contents, so we can use VK_IMAGE_LAYOUT_UNDEFINED as initialLayout.

VkAttachmentReference depthAttachmentRef{};
depthAttachmentRef.attachment = 1;
depthAttachmentRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;

Add a reference to the attachment for the first (and only) subpass:

VkSubpassDescription subpass{};
subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;
subpass.colorAttachmentCount = 1;
subpass.pColorAttachments = &colorAttachmentRef;
subpass.pDepthStencilAttachment = &depthAttachmentRef;

Unlike color attachments, a subpass can only use a single depth (+stencil) attachment. It wouldn't really make any sense to do depth tests on multiple buffers.

std::array<VkAttachmentDescription, 2> attachments = {colorAttachment, depthAttachment};
VkRenderPassCreateInfo renderPassInfo{};
renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;
renderPassInfo.attachmentCount = static_cast<uint32_t>(attachments.size());
renderPassInfo.pAttachments = attachments.data();
renderPassInfo.subpassCount = 1;
renderPassInfo.pSubpasses = &subpass;
renderPassInfo.dependencyCount = 1;
renderPassInfo.pDependencies = &dependency;

Next, update the VkSubpassDependency struct to refer to both attachments.

dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT;
dependency.srcAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;
dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;

Finally, we need to extend our subpass dependencies to make sure that there is no conflict between the transitioning of the depth image and it being cleared as part of its load operation. The depth image is first accessed in the early fragment test pipeline stage and because we have a load operation that clears, we should specify the access mask for writes.

Framebuffer

The next step is to modify the framebuffer creation to bind the depth image to the depth attachment. Go to createFramebuffers and specify the depth image view as second attachment:

std::array<VkImageView, 2> attachments = {
    swapChainImageViews[i],
    depthImageView
};

VkFramebufferCreateInfo framebufferInfo{};
framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO;
framebufferInfo.renderPass = renderPass;
framebufferInfo.attachmentCount = static_cast<uint32_t>(attachments.size());
framebufferInfo.pAttachments = attachments.data();
framebufferInfo.width = swapChainExtent.width;
framebufferInfo.height = swapChainExtent.height;
framebufferInfo.layers = 1;

The color attachment differs for every swap chain image, but the same depth image can be used by all of them because only a single subpass is running at the same time due to our semaphores.

You'll also need to move the call to createFramebuffers to make sure that it is called after the depth image view has actually been created:

void initVulkan() {
    ...
    createDepthResources();
    createFramebuffers();
    ...
}

Clear values

Because we now have multiple attachments with VK_ATTACHMENT_LOAD_OP_CLEAR, we also need to specify multiple clear values. Go to recordCommandBuffer and create an array of VkClearValue structs:

std::array<VkClearValue, 2> clearValues{};
clearValues[0].color = {{0.0f, 0.0f, 0.0f, 1.0f}};
clearValues[1].depthStencil = {1.0f, 0};

renderPassInfo.clearValueCount = static_cast<uint32_t>(clearValues.size());
renderPassInfo.pClearValues = clearValues.data();

The range of depths in the depth buffer is 0.0 to 1.0 in Vulkan, where 1.0 lies at the far view plane and 0.0 at the near view plane. The initial value at each point in the depth buffer should be the furthest possible depth, which is 1.0.

Note that the order of clearValues should be identical to the order of your attachments.

Depth and stencil state

The depth attachment is ready to be used now, but depth testing still needs to be enabled in the graphics pipeline. It is configured through the VkPipelineDepthStencilStateCreateInfo struct:

VkPipelineDepthStencilStateCreateInfo depthStencil{};
depthStencil.sType = VK_STRUCTURE_TYPE_PIPELINE_DEPTH_STENCIL_STATE_CREATE_INFO;
depthStencil.depthTestEnable = VK_TRUE;
depthStencil.depthWriteEnable = VK_TRUE;

The depthTestEnable field specifies if the depth of new fragments should be compared to the depth buffer to see if they should be discarded. The depthWriteEnable field specifies if the new depth of fragments that pass the depth test should actually be written to the depth buffer.

depthStencil.depthCompareOp = VK_COMPARE_OP_LESS;

The depthCompareOp field specifies the comparison that is performed to keep or discard fragments. We're sticking to the convention of lower depth = closer, so the depth of new fragments should be less.

depthStencil.depthBoundsTestEnable = VK_FALSE;
depthStencil.minDepthBounds = 0.0f; // Optional
depthStencil.maxDepthBounds = 1.0f; // Optional

The depthBoundsTestEnable, minDepthBounds and maxDepthBounds fields are used for the optional depth bound test. Basically, this allows you to only keep fragments that fall within the specified depth range. We won't be using this functionality.

depthStencil.stencilTestEnable = VK_FALSE;
depthStencil.front = {}; // Optional
depthStencil.back = {}; // Optional

The last three fields configure stencil buffer operations, which we also won't be using in this tutorial. If you want to use these operations, then you will have to make sure that the format of the depth/stencil image contains a stencil component.

pipelineInfo.pDepthStencilState = &depthStencil;

Update the VkGraphicsPipelineCreateInfo struct to reference the depth stencil state we just filled in. A depth stencil state must always be specified if the render pass contains a depth stencil attachment.

If you run your program now, then you should see that the fragments of the geometry are now correctly ordered:

Handling window resize

The resolution of the depth buffer should change when the window is resized to match the new color attachment resolution. Extend the recreateSwapChain function to recreate the depth resources in that case:

void recreateSwapChain() {
    int width = 0, height = 0;
    while (width == 0 || height == 0) {
        glfwGetFramebufferSize(window, &width, &height);
        glfwWaitEvents();
    }

    vkDeviceWaitIdle(device);

    cleanupSwapChain();

    createSwapChain();
    createImageViews();
    createDepthResources();
    createFramebuffers();
}

The cleanup operations should happen in the swap chain cleanup function:

void cleanupSwapChain() {
    vkDestroyImageView(device, depthImageView, nullptr);
    vkDestroyImage(device, depthImage, nullptr);
    vkFreeMemory(device, depthImageMemory, nullptr);

    ...
}

Congratulations, your application is now finally ready to render arbitrary 3D geometry and have it look right. We're going to try this out in the next chapter by drawing a textured model!

C++ code / Vertex shader / Fragment shader

Loading models

Introduction

Your program is now ready to render textured 3D meshes, but the current geometry in the vertices and indices arrays is not very interesting yet. In this chapter we're going to extend the program to load the vertices and indices from an actual model file to make the graphics card actually do some work.

Many graphics API tutorials have the reader write their own OBJ loader in a chapter like this. The problem with this is that any remotely interesting 3D application will soon require features that are not supported by this file format, like skeletal animation. We will load mesh data from an OBJ model in this chapter, but we'll focus more on integrating the mesh data with the program itself rather than the details of loading it from a file.

Library

We will use the tinyobjloader library to load vertices and faces from an OBJ file. It's fast and it's easy to integrate because it's a single file library like stb_image. Go to the repository linked above and download the tiny_obj_loader.h file to a folder in your library directory.

Visual Studio

Add the directory with tiny_obj_loader.h in it to the Additional Include Directories paths.

Makefile

Add the directory with tiny_obj_loader.h to the include directories for GCC:

VULKAN_SDK_PATH = /home/user/VulkanSDK/x.x.x.x/x86_64
STB_INCLUDE_PATH = /home/user/libraries/stb
TINYOBJ_INCLUDE_PATH = /home/user/libraries/tinyobjloader

...

CFLAGS = -std=c++17 -I$(VULKAN_SDK_PATH)/include -I$(STB_INCLUDE_PATH) -I$(TINYOBJ_INCLUDE_PATH)

Sample mesh

In this chapter we won't be enabling lighting yet, so it helps to use a sample model that has lighting baked into the texture. An easy way to find such models is to look for 3D scans on Sketchfab. Many of the models on that site are available in OBJ format with a permissive license.

For this tutorial I've decided to go with the Viking room model by nigelgoh (CC BY 4.0). I tweaked the size and orientation of the model to use it as a drop in replacement for the current geometry:

Feel free to use your own model, but make sure that it only consists of one material and that is has dimensions of about 1.5 x 1.5 x 1.5 units. If it is larger than that, then you'll have to change the view matrix. Put the model file in a new models directory next to shaders and textures, and put the texture image in the textures directory.

Put two new configuration variables in your program to define the model and texture paths:

const uint32_t WIDTH = 800;
const uint32_t HEIGHT = 600;

const std::string MODEL_PATH = "models/viking_room.obj";
const std::string TEXTURE_PATH = "textures/viking_room.png";

And update createTextureImage to use this path variable:

stbi_uc* pixels = stbi_load(TEXTURE_PATH.c_str(), &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);

Loading vertices and indices

We're going to load the vertices and indices from the model file now, so you should remove the global vertices and indices arrays now. Replace them with non-const containers as class members:

std::vector<Vertex> vertices;
std::vector<uint32_t> indices;
VkBuffer vertexBuffer;
VkDeviceMemory vertexBufferMemory;

You should change the type of the indices from uint16_t to uint32_t, because there are going to be a lot more vertices than 65535. Remember to also change the vkCmdBindIndexBuffer parameter:

vkCmdBindIndexBuffer(commandBuffer, indexBuffer, 0, VK_INDEX_TYPE_UINT32);

The tinyobjloader library is included in the same way as STB libraries. Include the tiny_obj_loader.h file and make sure to define TINYOBJLOADER_IMPLEMENTATION in one source file to include the function bodies and avoid linker errors:

#define TINYOBJLOADER_IMPLEMENTATION
#include <tiny_obj_loader.h>

We're now going to write a loadModel function that uses this library to populate the vertices and indices containers with the vertex data from the mesh. It should be called somewhere before the vertex and index buffers are created:

void initVulkan() {
    ...
    loadModel();
    createVertexBuffer();
    createIndexBuffer();
    ...
}

...

void loadModel() {

}

A model is loaded into the library's data structures by calling the tinyobj::LoadObj function:

void loadModel() {
    tinyobj::attrib_t attrib;
    std::vector<tinyobj::shape_t> shapes;
    std::vector<tinyobj::material_t> materials;
    std::string warn, err;

    if (!tinyobj::LoadObj(&attrib, &shapes, &materials, &warn, &err, MODEL_PATH.c_str())) {
        throw std::runtime_error(warn + err);
    }
}

An OBJ file consists of positions, normals, texture coordinates and faces. Faces consist of an arbitrary amount of vertices, where each vertex refers to a position, normal and/or texture coordinate by index. This makes it possible to not just reuse entire vertices, but also individual attributes.

The attrib container holds all of the positions, normals and texture coordinates in its attrib.vertices, attrib.normals and attrib.texcoords vectors. The shapes container contains all of the separate objects and their faces. Each face consists of an array of vertices, and each vertex contains the indices of the position, normal and texture coordinate attributes. OBJ models can also define a material and texture per face, but we will be ignoring those.

The err string contains errors and the warn string contains warnings that occurred while loading the file, like a missing material definition. Loading only really failed if the LoadObj function returns false. As mentioned above, faces in OBJ files can actually contain an arbitrary number of vertices, whereas our application can only render triangles. Luckily the LoadObj has an optional parameter to automatically triangulate such faces, which is enabled by default.

We're going to combine all of the faces in the file into a single model, so just iterate over all of the shapes:

for (const auto& shape : shapes) {

}

The triangulation feature has already made sure that there are three vertices per face, so we can now directly iterate over the vertices and dump them straight into our vertices vector:

for (const auto& shape : shapes) {
    for (const auto& index : shape.mesh.indices) {
        Vertex vertex{};

        vertices.push_back(vertex);
        indices.push_back(indices.size());
    }
}

For simplicity, we will assume that every vertex is unique for now, hence the simple auto-increment indices. The index variable is of type tinyobj::index_t, which contains the vertex_index, normal_index and texcoord_index members. We need to use these indices to look up the actual vertex attributes in the attrib arrays:

vertex.pos = {
    attrib.vertices[3 * index.vertex_index + 0],
    attrib.vertices[3 * index.vertex_index + 1],
    attrib.vertices[3 * index.vertex_index + 2]
};

vertex.texCoord = {
    attrib.texcoords[2 * index.texcoord_index + 0],
    attrib.texcoords[2 * index.texcoord_index + 1]
};

vertex.color = {1.0f, 1.0f, 1.0f};

Unfortunately the attrib.vertices array is an array of float values instead of something like glm::vec3, so you need to multiply the index by 3. Similarly, there are two texture coordinate components per entry. The offsets of 0, 1 and 2 are used to access the X, Y and Z components, or the U and V components in the case of texture coordinates.

Run your program now with optimization enabled (e.g. Release mode in Visual Studio and with the -O3 compiler flag for GCC`). This is necessary, because otherwise loading the model will be very slow. You should see something like the following:

Great, the geometry looks correct, but what's going on with the texture? The OBJ format assumes a coordinate system where a vertical coordinate of 0 means the bottom of the image, however we've uploaded our image into Vulkan in a top to bottom orientation where 0 means the top of the image. Solve this by flipping the vertical component of the texture coordinates:

vertex.texCoord = {
    attrib.texcoords[2 * index.texcoord_index + 0],
    1.0f - attrib.texcoords[2 * index.texcoord_index + 1]
};

When you run your program again, you should now see the correct result:

All that hard work is finally beginning to pay off with a demo like this!

As the model rotates you may notice that the rear (backside of the walls) looks a bit funny. This is normal and is simply because the model is not really designed to be viewed from that side.

Vertex deduplication

Unfortunately we're not really taking advantage of the index buffer yet. The vertices vector contains a lot of duplicated vertex data, because many vertices are included in multiple triangles. We should keep only the unique vertices and use the index buffer to reuse them whenever they come up. A straightforward way to implement this is to use a map or unordered_map to keep track of the unique vertices and respective indices:

#include <unordered_map>

...

std::unordered_map<Vertex, uint32_t> uniqueVertices{};

for (const auto& shape : shapes) {
    for (const auto& index : shape.mesh.indices) {
        Vertex vertex{};

        ...

        if (uniqueVertices.count(vertex) == 0) {
            uniqueVertices[vertex] = static_cast<uint32_t>(vertices.size());
            vertices.push_back(vertex);
        }

        indices.push_back(uniqueVertices[vertex]);
    }
}

Every time we read a vertex from the OBJ file, we check if we've already seen a vertex with the exact same position and texture coordinates before. If not, we add it to vertices and store its index in the uniqueVertices container. After that we add the index of the new vertex to indices. If we've seen the exact same vertex before, then we look up its index in uniqueVertices and store that index in indices.

The program will fail to compile right now, because using a user-defined type like our Vertex struct as key in a hash table requires us to implement two functions: equality test and hash calculation. The former is easy to implement by overriding the == operator in the Vertex struct:

bool operator==(const Vertex& other) const {
    return pos == other.pos && color == other.color && texCoord == other.texCoord;
}

A hash function for Vertex is implemented by specifying a template specialization for std::hash<T>. Hash functions are a complex topic, but cppreference.com recommends the following approach combining the fields of a struct to create a decent quality hash function:

namespace std {
    template<> struct hash<Vertex> {
        size_t operator()(Vertex const& vertex) const {
            return ((hash<glm::vec3>()(vertex.pos) ^
                   (hash<glm::vec3>()(vertex.color) << 1)) >> 1) ^
                   (hash<glm::vec2>()(vertex.texCoord) << 1);
        }
    };
}

This code should be placed outside the Vertex struct. The hash functions for the GLM types need to be included using the following header:

#define GLM_ENABLE_EXPERIMENTAL
#include <glm/gtx/hash.hpp>

The hash functions are defined in the gtx folder, which means that it is technically still an experimental extension to GLM. Therefore you need to define GLM_ENABLE_EXPERIMENTAL to use it. It means that the API could change with a new version of GLM in the future, but in practice the API is very stable.

You should now be able to successfully compile and run your program. If you check the size of vertices, then you'll see that it has shrunk down from 1,500,000 to 265,645! That means that each vertex is reused in an average number of ~6 triangles. This definitely saves us a lot of GPU memory.

C++ code / Vertex shader / Fragment shader

Generating Mipmaps

Introduction

Our program can now load and render 3D models. In this chapter, we will add one more feature, mipmap generation. Mipmaps are widely used in games and rendering software, and Vulkan gives us complete control over how they are created.

Mipmaps are precalculated, downscaled versions of an image. Each new image is half the width and height of the previous one. Mipmaps are used as a form of Level of Detail or LOD. Objects that are far away from the camera will sample their textures from the smaller mip images. Using smaller images increases the rendering speed and avoids artifacts such as Moiré patterns. An example of what mipmaps look like:

Image creation

In Vulkan, each of the mip images is stored in different mip levels of a VkImage. Mip level 0 is the original image, and the mip levels after level 0 are commonly referred to as the mip chain.

The number of mip levels is specified when the VkImage is created. Up until now, we have always set this value to one. We need to calculate the number of mip levels from the dimensions of the image. First, add a class member to store this number:

...
uint32_t mipLevels;
VkImage textureImage;
...

The value for mipLevels can be found once we've loaded the texture in createTextureImage:

int texWidth, texHeight, texChannels;
stbi_uc* pixels = stbi_load(TEXTURE_PATH.c_str(), &texWidth, &texHeight, &texChannels, STBI_rgb_alpha);
...
mipLevels = static_cast<uint32_t>(std::floor(std::log2(std::max(texWidth, texHeight)))) + 1;

This calculates the number of levels in the mip chain. The max function selects the largest dimension. The log2 function calculates how many times that dimension can be divided by 2. The floor function handles cases where the largest dimension is not a power of 2. 1 is added so that the original image has a mip level.

To use this value, we need to change the createImage, createImageView, and transitionImageLayout functions to allow us to specify the number of mip levels. Add a mipLevels parameter to the functions:

void createImage(uint32_t width, uint32_t height, uint32_t mipLevels, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, VkImage& image, VkDeviceMemory& imageMemory) {
    ...
    imageInfo.mipLevels = mipLevels;
    ...
}

VkImageView createImageView(VkImage image, VkFormat format, VkImageAspectFlags aspectFlags, uint32_t mipLevels) {
    ...
    viewInfo.subresourceRange.levelCount = mipLevels;
    ...

void transitionImageLayout(VkImage image, VkFormat format, VkImageLayout oldLayout, VkImageLayout newLayout, uint32_t mipLevels) {
    ...
    barrier.subresourceRange.levelCount = mipLevels;
    ...

Update all calls to these functions to use the right values:

createImage(swapChainExtent.width, swapChainExtent.height, 1, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
...
createImage(texWidth, texHeight, mipLevels, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);

swapChainImageViews[i] = createImageView(swapChainImages[i], swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT, 1);
...
depthImageView = createImageView(depthImage, depthFormat, VK_IMAGE_ASPECT_DEPTH_BIT, 1);
...
textureImageView = createImageView(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_ASPECT_COLOR_BIT, mipLevels);

transitionImageLayout(depthImage, depthFormat, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, 1);
...
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, mipLevels);

Generating Mipmaps

Our texture image now has multiple mip levels, but the staging buffer can only be used to fill mip level 0. The other levels are still undefined. To fill these levels we need to generate the data from the single level that we have. We will use the vkCmdBlitImage command. This command performs copying, scaling, and filtering operations. We will call this multiple times to blit data to each level of our texture image.

vkCmdBlitImage is considered a transfer operation, so we must inform Vulkan that we intend to use the texture image as both the source and destination of a transfer. Add VK_IMAGE_USAGE_TRANSFER_SRC_BIT to the texture image's usage flags in createTextureImage:

...
createImage(texWidth, texHeight, mipLevels, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
...

Like other image operations, vkCmdBlitImage depends on the layout of the image it operates on. We could transition the entire image to VK_IMAGE_LAYOUT_GENERAL, but this will most likely be slow. For optimal performance, the source image should be in VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL and the destination image should be in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL. Vulkan allows us to transition each mip level of an image independently. Each blit will only deal with two mip levels at a time, so we can transition each level into the optimal layout between blits commands.

transitionImageLayout only performs layout transitions on the entire image, so we'll need to write a few more pipeline barrier commands. Remove the existing transition to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL in createTextureImage:

...
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, mipLevels);
    copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
//transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL while generating mipmaps
...

This will leave each level of the texture image in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL. Each level will be transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL after the blit command reading from it is finished.

We're now going to write the function that generates the mipmaps:

void generateMipmaps(VkImage image, int32_t texWidth, int32_t texHeight, uint32_t mipLevels) {
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();

    VkImageMemoryBarrier barrier{};
    barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER;
    barrier.image = image;
    barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
    barrier.subresourceRange.baseArrayLayer = 0;
    barrier.subresourceRange.layerCount = 1;
    barrier.subresourceRange.levelCount = 1;

    endSingleTimeCommands(commandBuffer);
}

We're going to make several transitions, so we'll reuse this VkImageMemoryBarrier. The fields set above will remain the same for all barriers. subresourceRange.miplevel, oldLayout, newLayout, srcAccessMask, and dstAccessMask will be changed for each transition.

int32_t mipWidth = texWidth;
int32_t mipHeight = texHeight;

for (uint32_t i = 1; i < mipLevels; i++) {

}

This loop will record each of the VkCmdBlitImage commands. Note that the loop variable starts at 1, not 0.

barrier.subresourceRange.baseMipLevel = i - 1;
barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;

vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0,
    0, nullptr,
    0, nullptr,
    1, &barrier);

First, we transition level i - 1 to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL. This transition will wait for level i - 1 to be filled, either from the previous blit command, or from vkCmdCopyBufferToImage. The current blit command will wait on this transition.

VkImageBlit blit{};
blit.srcOffsets[0] = { 0, 0, 0 };
blit.srcOffsets[1] = { mipWidth, mipHeight, 1 };
blit.srcSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
blit.srcSubresource.mipLevel = i - 1;
blit.srcSubresource.baseArrayLayer = 0;
blit.srcSubresource.layerCount = 1;
blit.dstOffsets[0] = { 0, 0, 0 };
blit.dstOffsets[1] = { mipWidth > 1 ? mipWidth / 2 : 1, mipHeight > 1 ? mipHeight / 2 : 1, 1 };
blit.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
blit.dstSubresource.mipLevel = i;
blit.dstSubresource.baseArrayLayer = 0;
blit.dstSubresource.layerCount = 1;

Next, we specify the regions that will be used in the blit operation. The source mip level is i - 1 and the destination mip level is i. The two elements of the srcOffsets array determine the 3D region that data will be blitted from. dstOffsets determines the region that data will be blitted to. The X and Y dimensions of the dstOffsets[1] are divided by two since each mip level is half the size of the previous level. The Z dimension of srcOffsets[1] and dstOffsets[1] must be 1, since a 2D image has a depth of 1.

vkCmdBlitImage(commandBuffer,
    image, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL,
    image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    1, &blit,
    VK_FILTER_LINEAR);

Now, we record the blit command. Note that textureImage is used for both the srcImage and dstImage parameter. This is because we're blitting between different levels of the same image. The source mip level was just transitioned to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL and the destination level is still in VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL from createTextureImage.

Beware if you are using a dedicated transfer queue (as suggested in Vertex buffers): vkCmdBlitImage must be submitted to a queue with graphics capability.

The last parameter allows us to specify a VkFilter to use in the blit. We have the same filtering options here that we had when making the VkSampler. We use the VK_FILTER_LINEAR to enable interpolation.

barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL;
barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
barrier.srcAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0,
    0, nullptr,
    0, nullptr,
    1, &barrier);

This barrier transitions mip level i - 1 to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL. This transition waits on the current blit command to finish. All sampling operations will wait on this transition to finish.

    ...
    if (mipWidth > 1) mipWidth /= 2;
    if (mipHeight > 1) mipHeight /= 2;
}

At the end of the loop, we divide the current mip dimensions by two. We check each dimension before the division to ensure that dimension never becomes 0. This handles cases where the image is not square, since one of the mip dimensions would reach 1 before the other dimension. When this happens, that dimension should remain 1 for all remaining levels.

    barrier.subresourceRange.baseMipLevel = mipLevels - 1;
    barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
    barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
    barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;

    vkCmdPipelineBarrier(commandBuffer,
        VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, 0,
        0, nullptr,
        0, nullptr,
        1, &barrier);

    endSingleTimeCommands(commandBuffer);
}

Before we end the command buffer, we insert one more pipeline barrier. This barrier transitions the last mip level from VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL. This wasn't handled by the loop, since the last mip level is never blitted from.

Finally, add the call to generateMipmaps in createTextureImage:

transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, mipLevels);
    copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
//transitioned to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL while generating mipmaps
...
generateMipmaps(textureImage, texWidth, texHeight, mipLevels);

Our texture image's mipmaps are now completely filled.

Linear filtering support

It is very convenient to use a built-in function like vkCmdBlitImage to generate all the mip levels, but unfortunately it is not guaranteed to be supported on all platforms. It requires the texture image format we use to support linear filtering, which can be checked with the vkGetPhysicalDeviceFormatProperties function. We will add a check to the generateMipmaps function for this.

First add an additional parameter that specifies the image format:

void createTextureImage() {
    ...

    generateMipmaps(textureImage, VK_FORMAT_R8G8B8A8_SRGB, texWidth, texHeight, mipLevels);
}

void generateMipmaps(VkImage image, VkFormat imageFormat, int32_t texWidth, int32_t texHeight, uint32_t mipLevels) {

    ...
}

In the generateMipmaps function, use vkGetPhysicalDeviceFormatProperties to request the properties of the texture image format:

void generateMipmaps(VkImage image, VkFormat imageFormat, int32_t texWidth, int32_t texHeight, uint32_t mipLevels) {

    // Check if image format supports linear blitting
    VkFormatProperties formatProperties;
    vkGetPhysicalDeviceFormatProperties(physicalDevice, imageFormat, &formatProperties);

    ...

The VkFormatProperties struct has three fields named linearTilingFeatures, optimalTilingFeatures and bufferFeatures that each describe how the format can be used depending on the way it is used. We create a texture image with the optimal tiling format, so we need to check optimalTilingFeatures. Support for the linear filtering feature can be checked with the VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT:

if (!(formatProperties.optimalTilingFeatures & VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT)) {
    throw std::runtime_error("texture image format does not support linear blitting!");
}

There are two alternatives in this case. You could implement a function that searches common texture image formats for one that does support linear blitting, or you could implement the mipmap generation in software with a library like stb_image_resize. Each mip level can then be loaded into the image in the same way that you loaded the original image.

It should be noted that it is uncommon in practice to generate the mipmap levels at runtime anyway. Usually they are pregenerated and stored in the texture file alongside the base level to improve loading speed. Implementing resizing in software and loading multiple levels from a file is left as an exercise to the reader.

Sampler

While the VkImage holds the mipmap data, VkSampler controls how that data is read while rendering. Vulkan allows us to specify minLod, maxLod, mipLodBias, and mipmapMode ("Lod" means "Level of Detail"). When a texture is sampled, the sampler selects a mip level according to the following pseudocode:

lod = getLodLevelFromScreenSize(); //smaller when the object is close, may be negative
lod = clamp(lod + mipLodBias, minLod, maxLod);

level = clamp(floor(lod), 0, texture.mipLevels - 1);  //clamped to the number of mip levels in the texture

if (mipmapMode == VK_SAMPLER_MIPMAP_MODE_NEAREST) {
    color = sample(level);
} else {
    color = blend(sample(level), sample(level + 1));
}

If samplerInfo.mipmapMode is VK_SAMPLER_MIPMAP_MODE_NEAREST, lod selects the mip level to sample from. If the mipmap mode is VK_SAMPLER_MIPMAP_MODE_LINEAR, lod is used to select two mip levels to be sampled. Those levels are sampled and the results are linearly blended.

The sample operation is also affected by lod:

if (lod <= 0) {
    color = readTexture(uv, magFilter);
} else {
    color = readTexture(uv, minFilter);
}

If the object is close to the camera, magFilter is used as the filter. If the object is further from the camera, minFilter is used. Normally, lod is non-negative, and is only 0 when close the camera. mipLodBias lets us force Vulkan to use lower lod and level than it would normally use.

To see the results of this chapter, we need to choose values for our textureSampler. We've already set the minFilter and magFilter to use VK_FILTER_LINEAR. We just need to choose values for minLod, maxLod, mipLodBias, and mipmapMode.

void createTextureSampler() {
    ...
    samplerInfo.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
    samplerInfo.minLod = 0.0f; // Optional
    samplerInfo.maxLod = VK_LOD_CLAMP_NONE;
    samplerInfo.mipLodBias = 0.0f; // Optional
    ...
}

To allow the full range of mip levels to be used, we set minLod to 0.0f, and maxLod to VK_LOD_CLAMP_NONE. This constant is equal to 1000.0f, which means that all available mipmap levels in the texture will be sampled. We have no reason to change the lod value, so we set mipLodBias to 0.0f.

Now run your program and you should see the following:

It's not a dramatic difference, since our scene is so simple. There are subtle differences if you look closely.

The most noticeable difference is the writing on the papers. With mipmaps, the writing has been smoothed. Without mipmaps, the writing has harsh edges and gaps from Moiré artifacts.

You can play around with the sampler settings to see how they affect mipmapping. For example, by changing minLod, you can force the sampler to not use the lowest mip levels:

samplerInfo.minLod = static_cast<float>(mipLevels / 2);

These settings will produce this image:

This is how higher mip levels will be used when objects are further away from the camera.

C++ code / Vertex shader / Fragment shader

Multisampling

Introduction

Our program can now load multiple levels of detail for textures which fixes artifacts when rendering objects far away from the viewer. The image is now a lot smoother, however on closer inspection you will notice jagged saw-like patterns along the edges of drawn geometric shapes. This is especially visible in one of our early programs when we rendered a quad:

This undesired effect is called "aliasing" and it's a result of a limited numbers of pixels that are available for rendering. Since there are no displays out there with unlimited resolution, it will be always visible to some extent. There's a number of ways to fix this and in this chapter we'll focus on one of the more popular ones: Multisample anti-aliasing (MSAA).

In ordinary rendering, the pixel color is determined based on a single sample point which in most cases is the center of the target pixel on screen. If part of the drawn line passes through a certain pixel but doesn't cover the sample point, that pixel will be left blank, leading to the jagged "staircase" effect.

What MSAA does is it uses multiple sample points per pixel (hence the name) to determine its final color. As one might expect, more samples lead to better results, however it is also more computationally expensive.

In our implementation, we will focus on using the maximum available sample count. Depending on your application this may not always be the best approach and it might be better to use less samples for the sake of higher performance if the final result meets your quality demands.

Getting available sample count

Let's start off by determining how many samples our hardware can use. Most modern GPUs support at least 8 samples but this number is not guaranteed to be the same everywhere. We'll keep track of it by adding a new class member:

...
VkSampleCountFlagBits msaaSamples = VK_SAMPLE_COUNT_1_BIT;
...

By default we'll be using only one sample per pixel which is equivalent to no multisampling, in which case the final image will remain unchanged. The exact maximum number of samples can be extracted from VkPhysicalDeviceProperties associated with our selected physical device. We're using a depth buffer, so we have to take into account the sample count for both color and depth. The highest sample count that is supported by both (&) will be the maximum we can support. Add a function that will fetch this information for us:

VkSampleCountFlagBits getMaxUsableSampleCount() {
    VkPhysicalDeviceProperties physicalDeviceProperties;
    vkGetPhysicalDeviceProperties(physicalDevice, &physicalDeviceProperties);

    VkSampleCountFlags counts = physicalDeviceProperties.limits.framebufferColorSampleCounts & physicalDeviceProperties.limits.framebufferDepthSampleCounts;
    if (counts & VK_SAMPLE_COUNT_64_BIT) { return VK_SAMPLE_COUNT_64_BIT; }
    if (counts & VK_SAMPLE_COUNT_32_BIT) { return VK_SAMPLE_COUNT_32_BIT; }
    if (counts & VK_SAMPLE_COUNT_16_BIT) { return VK_SAMPLE_COUNT_16_BIT; }
    if (counts & VK_SAMPLE_COUNT_8_BIT) { return VK_SAMPLE_COUNT_8_BIT; }
    if (counts & VK_SAMPLE_COUNT_4_BIT) { return VK_SAMPLE_COUNT_4_BIT; }
    if (counts & VK_SAMPLE_COUNT_2_BIT) { return VK_SAMPLE_COUNT_2_BIT; }

    return VK_SAMPLE_COUNT_1_BIT;
}

We will now use this function to set the msaaSamples variable during the physical device selection process. For this, we have to slightly modify the pickPhysicalDevice function:

void pickPhysicalDevice() {
    ...
    for (const auto& device : devices) {
        if (isDeviceSuitable(device)) {
            physicalDevice = device;
            msaaSamples = getMaxUsableSampleCount();
            break;
        }
    }
    ...
}

Setting up a render target

In MSAA, each pixel is sampled in an offscreen buffer which is then rendered to the screen. This new buffer is slightly different from regular images we've been rendering to - they have to be able to store more than one sample per pixel. Once a multisampled buffer is created, it has to be resolved to the default framebuffer (which stores only a single sample per pixel). This is why we have to create an additional render target and modify our current drawing process. We only need one render target since only one drawing operation is active at a time, just like with the depth buffer. Add the following class members:

...
VkImage colorImage;
VkDeviceMemory colorImageMemory;
VkImageView colorImageView;
...

This new image will have to store the desired number of samples per pixel, so we need to pass this number to VkImageCreateInfo during the image creation process. Modify the createImage function by adding a numSamples parameter:

void createImage(uint32_t width, uint32_t height, uint32_t mipLevels, VkSampleCountFlagBits numSamples, VkFormat format, VkImageTiling tiling, VkImageUsageFlags usage, VkMemoryPropertyFlags properties, VkImage& image, VkDeviceMemory& imageMemory) {
    ...
    imageInfo.samples = numSamples;
    ...

For now, update all calls to this function using VK_SAMPLE_COUNT_1_BIT - we will be replacing this with proper values as we progress with implementation:

createImage(swapChainExtent.width, swapChainExtent.height, 1, VK_SAMPLE_COUNT_1_BIT, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
...
createImage(texWidth, texHeight, mipLevels, VK_SAMPLE_COUNT_1_BIT, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);

We will now create a multisampled color buffer. Add a createColorResources function and note that we're using msaaSamples here as a function parameter to createImage. We're also using only one mip level, since this is enforced by the Vulkan specification in case of images with more than one sample per pixel. Also, this color buffer doesn't need mipmaps since it's not going to be used as a texture:

void createColorResources() {
    VkFormat colorFormat = swapChainImageFormat;

    createImage(swapChainExtent.width, swapChainExtent.height, 1, msaaSamples, colorFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT | VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, colorImage, colorImageMemory);
    colorImageView = createImageView(colorImage, colorFormat, VK_IMAGE_ASPECT_COLOR_BIT, 1);
}

For consistency, call the function right before createDepthResources:

void initVulkan() {
    ...
    createColorResources();
    createDepthResources();
    ...
}

Now that we have a multisampled color buffer in place it's time to take care of depth. Modify createDepthResources and update the number of samples used by the depth buffer:

void createDepthResources() {
    ...
    createImage(swapChainExtent.width, swapChainExtent.height, 1, msaaSamples, depthFormat, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, depthImage, depthImageMemory);
    ...
}

We have now created a couple of new Vulkan resources, so let's not forget to release them when necessary:

void cleanupSwapChain() {
    vkDestroyImageView(device, colorImageView, nullptr);
    vkDestroyImage(device, colorImage, nullptr);
    vkFreeMemory(device, colorImageMemory, nullptr);
    ...
}

And update the recreateSwapChain so that the new color image can be recreated in the correct resolution when the window is resized:

void recreateSwapChain() {
    ...
    createImageViews();
    createColorResources();
    createDepthResources();
    ...
}

We made it past the initial MSAA setup, now we need to start using this new resource in our graphics pipeline, framebuffer, render pass and see the results!

Adding new attachments

Let's take care of the render pass first. Modify createRenderPass and update color and depth attachment creation info structs:

void createRenderPass() {
    ...
    colorAttachment.samples = msaaSamples;
    colorAttachment.finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    ...
    depthAttachment.samples = msaaSamples;
    ...

You'll notice that we have changed the finalLayout from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. That's because multisampled images cannot be presented directly. We first need to resolve them to a regular image. This requirement does not apply to the depth buffer, since it won't be presented at any point. Therefore we will have to add only one new attachment for color which is a so-called resolve attachment:

    ...
    VkAttachmentDescription colorAttachmentResolve{};
    colorAttachmentResolve.format = swapChainImageFormat;
    colorAttachmentResolve.samples = VK_SAMPLE_COUNT_1_BIT;
    colorAttachmentResolve.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    colorAttachmentResolve.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
    colorAttachmentResolve.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;
    colorAttachmentResolve.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;
    colorAttachmentResolve.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    colorAttachmentResolve.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR;
    ...

The render pass now has to be instructed to resolve multisampled color image into regular attachment. Create a new attachment reference that will point to the color buffer which will serve as the resolve target:

    ...
    VkAttachmentReference colorAttachmentResolveRef{};
    colorAttachmentResolveRef.attachment = 2;
    colorAttachmentResolveRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    ...

Set the pResolveAttachments subpass struct member to point to the newly created attachment reference. This is enough to let the render pass define a multisample resolve operation which will let us render the image to screen:

    ...
    subpass.pResolveAttachments = &colorAttachmentResolveRef;
    ...

Since we're reusing the multisampled color image, it's necessary to update the srcAccessMask of the VkSubpassDependency. This update ensures that any write operations to the color attachment are completed before subsequent ones begin, thus preventing write-after-write hazards that can lead to unstable rendering results:

    ...
    dependency.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    ...

Now update render pass info struct with the new color attachment:

    ...
    std::array<VkAttachmentDescription, 3> attachments = {colorAttachment, depthAttachment, colorAttachmentResolve};
    ...

With the render pass in place, modify createFramebuffers and add the new image view to the list:

void createFramebuffers() {
        ...
        std::array<VkImageView, 3> attachments = {
            colorImageView,
            depthImageView,
            swapChainImageViews[i]
        };
        ...
}

Finally, tell the newly created pipeline to use more than one sample by modifying createGraphicsPipeline:

void createGraphicsPipeline() {
    ...
    multisampling.rasterizationSamples = msaaSamples;
    ...
}

Now run your program and you should see the following:

Just like with mipmapping, the difference may not be apparent straight away. On a closer look you'll notice that the edges are not as jagged anymore and the whole image seems a bit smoother compared to the original.

The difference is more noticable when looking up close at one of the edges:

Quality improvements

There are certain limitations of our current MSAA implementation which may impact the quality of the output image in more detailed scenes. For example, we're currently not solving potential problems caused by shader aliasing, i.e. MSAA only smoothens out the edges of geometry but not the interior filling. This may lead to a situation when you get a smooth polygon rendered on screen but the applied texture will still look aliased if it contains high contrasting colors. One way to approach this problem is to enable Sample Shading which will improve the image quality even further, though at an additional performance cost:


void createLogicalDevice() {
    ...
    deviceFeatures.sampleRateShading = VK_TRUE; // enable sample shading feature for the device
    ...
}

void createGraphicsPipeline() {
    ...
    multisampling.sampleShadingEnable = VK_TRUE; // enable sample shading in the pipeline
    multisampling.minSampleShading = .2f; // min fraction for sample shading; closer to one is smoother
    ...
}

In this example we'll leave sample shading disabled but in certain scenarios the quality improvement may be noticeable:

Conclusion

It has taken a lot of work to get to this point, but now you finally have a good base for a Vulkan program. The knowledge of the basic principles of Vulkan that you now possess should be sufficient to start exploring more of the features, like:

Push constants
Instanced rendering
Dynamic uniforms
Separate images and sampler descriptors
Pipeline cache
Multi-threaded command buffer generation
Multiple subpasses
Compute shaders

The current program can be extended in many ways, like adding Blinn-Phong lighting, post-processing effects and shadow mapping. You should be able to learn how these effects work from tutorials for other APIs, because despite Vulkan's explicitness, many concepts still work the same.

C++ code / Vertex shader / Fragment shader

Compute Shader

Introduction

In this bonus chapter we'll take a look at compute shaders. Up until now all previous chapters dealt with the traditional graphics part of the Vulkan pipeline. But unlike older APIs like OpenGL, compute shader support in Vulkan is mandatory. This means that you can use compute shaders on every Vulkan implementation available, no matter if it's a high-end desktop GPU or a low-powered embedded device.

This opens up the world of general purpose computing on graphics processor units (GPGPU), no matter where your application is running. GPGPU means that you can do general computations on your GPU, something that has traditionally been a domain of CPUs. But with GPUs having become more and more powerful and more flexible, many workloads that would require the general purpose capabilities of a CPU can now be done on the GPU in realtime.

A few examples of where the compute capabilities of a GPU can be used are image manipulation, visibility testing, post processing, advanced lighting calculations, animations, physics (e.g. for a particle system) and much more. And it's even possible to use compute for non-visual computational only work that does not require any graphics output, e.g. number crunching or AI related things. This is called "headless compute".

Advantages

Doing computationally expensive calculations on the GPU has several advantages. The most obvious one is offloading work from the CPU. Another one is not requiring moving data between the CPU's main memory and the GPU's memory. All of the data can stay on the GPU without having to wait for slow transfers from main memory.

Aside from these, GPUs are heavily parallelized with some of them having tens of thousands of small compute units. This often makes them a better fit for highly parallel workflows than a CPU with a few large compute units.

The Vulkan pipeline

It's important to know that compute is completely separated from the graphics part of the pipeline. This is visible in the following block diagram of the Vulkan pipeline from the official specification:

In this diagram we can see the traditional graphics part of the pipeline on the left, and several stages on the right that are not part of this graphics pipeline, including the compute shader (stage). With the compute shader stage being detached from the graphics pipeline we'll be able to use it anywhere we see fit. This is very different from e.g. the fragment shader which is always applied to the transformed output of the vertex shader.

The center of the diagram also shows that e.g. descriptor sets are also used by compute, so everything we learned about descriptors layouts, descriptor sets and descriptors also applies here.

An example

An easy to understand example that we will implement in this chapter is a GPU based particle system. Such systems are used in many games and often consist of thousands of particles that need to be updated at interactive frame rates. Rendering such a system requires 2 main components: vertices, passed as vertex buffers, and a way to update them based on some equation.

The "classical" CPU based particle system would store particle data in the system's main memory and then use the CPU to update them. After the update, the vertices need to be transferred to the GPU's memory again so it can display the updated particles in the next frame. The most straight-forward way would be recreating the vertex buffer with the new data for each frame. This is obviously very costly. Depending on your implementation, there are other options like mapping GPU memory so it can be written by the CPU (called "resizable BAR" on desktop systems, or unified memory on integrated GPUs) or just using a host local buffer (which would be the slowest method due to PCI-E bandwidth). But no matter what buffer update method you choose, you always require a "round-trip" to the CPU to update the particles.

With a GPU based particle system, this round-trip is no longer required. Vertices are only uploaded to the GPU at the start and all updates are done in the GPU's memory using compute shaders. One of the main reasons why this is faster is the much higher bandwidth between the GPU and it's local memory. In a CPU based scenario, you'd be limited by main memory and PCI-express bandwidth, which is often just a fraction of the GPU's memory bandwidth.

When doing this on a GPU with a dedicated compute queue, you can update particles in parallel to the rendering part of the graphics pipeline. This is called "async compute", and is an advanced topic not covered in this tutorial.

Here is a screenshot from this chapter's code. The particles shown here are updated by a compute shader directly on the GPU, without any CPU interaction:

Data manipulation

In this tutorial we already learned about different buffer types like vertex and index buffers for passing primitives and uniform buffers for passing data to a shader. And we also used images to do texture mapping. But up until now, we always wrote data using the CPU and only did reads on the GPU.

An important concept introduced with compute shaders is the ability to arbitrarily read from and write to buffers. For this, Vulkan offers two dedicated storage types.

Shader storage buffer objects (SSBO)

A shader storage buffer (SSBO) allows shaders to read from and write to a buffer. Using these is similar to using uniform buffer objects. The biggest differences are that you can alias other buffer types to SSBOs and that they can be arbitrarily large.

Going back to the GPU based particle system, you might now wonder how to deal with vertices being updated (written) by the compute shader and read (drawn) by the vertex shader, as both usages would seemingly require different buffer types.

But that's not the case. In Vulkan you can specify multiple usages for buffers and images. So for the particle vertex buffer to be used as a vertex buffer (in the graphics pass) and as a storage buffer (in the compute pass), you simply create the buffer with those two usage flags:

VkBufferCreateInfo bufferInfo{};
...
bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT;
...

if (vkCreateBuffer(device, &bufferInfo, nullptr, &shaderStorageBuffers[i]) != VK_SUCCESS) {
    throw std::runtime_error("failed to create vertex buffer!");
}

The two flags VK_BUFFER_USAGE_VERTEX_BUFFER_BIT and VK_BUFFER_USAGE_STORAGE_BUFFER_BIT set with bufferInfo.usage tell the implementation that we want to use this buffer for two different scenarios: as a vertex buffer in the vertex shader and as a store buffer. Note that we also added the VK_BUFFER_USAGE_TRANSFER_DST_BIT flag in here so we can transfer data from the host to the GPU. This is crucial as we want the shader storage buffer to stay in GPU memory only (VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT) we need to to transfer data from the host to this buffer.

Here is the same code using using the createBuffer helper function:

createBuffer(bufferSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, shaderStorageBuffers[i], shaderStorageBuffersMemory[i]);

The GLSL shader declaration for accessing such a buffer looks like this:

struct Particle {
  vec2 position;
  vec2 velocity;
  vec4 color;
};

layout(std140, binding = 1) readonly buffer ParticleSSBOIn {
   Particle particlesIn[ ];
};

layout(std140, binding = 2) buffer ParticleSSBOOut {
   Particle particlesOut[ ];
};

In this example we have a typed SSBO with each particle having a position and velocity value (see the Particle struct). The SSBO then contains an unbound number of particles as marked by the []. Not having to specify the number of elements in an SSBO is one of the advantages over e.g. uniform buffers. std140 is a memory layout qualifier that determines how the member elements of the shader storage buffer are aligned in memory. This gives us certain guarantees, required to map the buffers between the host and the GPU.

Writing to such a storage buffer object in the compute shader is straight-forward and similar to how you'd write to the buffer on the C++ side:

particlesOut[index].position = particlesIn[index].position + particlesIn[index].velocity.xy * ubo.deltaTime;

Storage images

Note that we won't be doing image manipulation in this chapter. This paragraph is here to make readers aware that compute shaders can also be used for image manipulation.

A storage image allows you read from and write to an image. Typical use cases are applying image effects to textures, doing post processing (which in turn is very similar) or generating mip-maps.

This is similar for images:

VkImageCreateInfo imageInfo {};
...
imageInfo.usage = VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_STORAGE_BIT;
...

if (vkCreateImage(device, &imageInfo, nullptr, &textureImage) != VK_SUCCESS) {
    throw std::runtime_error("failed to create image!");
}

The two flags VK_IMAGE_USAGE_SAMPLED_BIT and VK_IMAGE_USAGE_STORAGE_BIT set with imageInfo.usage tell the implementation that we want to use this image for two different scenarios: as an image sampled in the fragment shader and as a storage image in the computer shader;

The GLSL shader declaration for storage image looks similar to sampled images used e.g. in the fragment shader:

layout (binding = 0, rgba8) uniform readonly image2D inputImage;
layout (binding = 1, rgba8) uniform writeonly image2D outputImage;

A few differences here are additional attributes like rgba8 for the format of the image, the readonly and writeonly qualifiers, telling the implementation that we will only read from the input image and write to the output image. And last but not least we need to use the image2D type to declare a storage image.

Reading from and writing to storage images in the compute shader is then done using imageLoad and imageStore:

vec3 pixel = imageLoad(inputImage, ivec2(gl_GlobalInvocationID.xy)).rgb;
imageStore(outputImage, ivec2(gl_GlobalInvocationID.xy), pixel);

Compute queue families

In the physical device and queue families chapter we already learned about queue families and how to select a graphics queue family. Compute uses the queue family properties flag bit VK_QUEUE_COMPUTE_BIT. So if we want to do compute work, we need to get a queue from a queue family that supports compute.

Note that Vulkan requires an implementation which supports graphics operations to have at least one queue family that supports both graphics and compute operations, but it's also possible that implementations offer a dedicated compute queue. This dedicated compute queue (that does not have the graphics bit) hints at an asynchronous compute queue. To keep this tutorial beginner friendly though, we'll use a queue that can do both graphics and compute operations. This will also save us from dealing with several advanced synchronization mechanisms.

For our compute sample we need to change the device creation code a bit:

uint32_t queueFamilyCount = 0;
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr);

std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount);
vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data());

int i = 0;
for (const auto& queueFamily : queueFamilies) {
    if ((queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT) && (queueFamily.queueFlags & VK_QUEUE_COMPUTE_BIT)) {
        indices.graphicsAndComputeFamily = i;
    }

    i++;
}

The changed queue family index selection code will now try to find a queue family that supports both graphics and compute.

We can then get a compute queue from this queue family in createLogicalDevice:

vkGetDeviceQueue(device, indices.graphicsAndComputeFamily.value(), 0, &computeQueue);

The compute shader stage

In the graphics samples we have used different pipeline stages to load shaders and access descriptors. Compute shaders are accessed in a similar way by using the VK_SHADER_STAGE_COMPUTE_BIT pipeline. So loading a compute shader is just the same as loading a vertex shader, but with a different shader stage. We'll talk about this in detail in the next paragraphs. Compute also introduces a new binding point type for descriptors and pipelines named VK_PIPELINE_BIND_POINT_COMPUTE that we'll have to use later on.

Loading compute shaders

Loading compute shaders in our application is the same as loading any other other shader. The only real difference is that we'll need to use the VK_SHADER_STAGE_COMPUTE_BIT mentioned above.

auto computeShaderCode = readFile("shaders/compute.spv");

VkShaderModule computeShaderModule = createShaderModule(computeShaderCode);

VkPipelineShaderStageCreateInfo computeShaderStageInfo{};
computeShaderStageInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
computeShaderStageInfo.stage = VK_SHADER_STAGE_COMPUTE_BIT;
computeShaderStageInfo.module = computeShaderModule;
computeShaderStageInfo.pName = "main";
...

Preparing the shader storage buffers

Earlier on we learned that we can use shader storage buffers to pass arbitrary data to compute shaders. For this example we will upload an array of particles to the GPU, so we can manipulate it directly in the GPU's memory.

In the frames in flight chapter we talked about duplicating resources per frame in flight, so we can keep the CPU and the GPU busy. First we declare a vector for the buffer object and the device memory backing it up:

std::vector<VkBuffer> shaderStorageBuffers;
std::vector<VkDeviceMemory> shaderStorageBuffersMemory;

In the createShaderStorageBuffers we then resize those vectors to match the max. number of frames in flight:

shaderStorageBuffers.resize(MAX_FRAMES_IN_FLIGHT);
shaderStorageBuffersMemory.resize(MAX_FRAMES_IN_FLIGHT);

With this setup in place we can start to move the initial particle information to the GPU. We first initialize a vector of particles on the host side:

    // Initialize particles
    std::default_random_engine rndEngine((unsigned)time(nullptr));
    std::uniform_real_distribution<float> rndDist(0.0f, 1.0f);

    // Initial particle positions on a circle
    std::vector<Particle> particles(PARTICLE_COUNT);
    for (auto& particle : particles) {
        float r = 0.25f * sqrt(rndDist(rndEngine));
        float theta = rndDist(rndEngine) * 2 * 3.14159265358979323846;
        float x = r * cos(theta) * HEIGHT / WIDTH;
        float y = r * sin(theta);
        particle.position = glm::vec2(x, y);
        particle.velocity = glm::normalize(glm::vec2(x,y)) * 0.00025f;
        particle.color = glm::vec4(rndDist(rndEngine), rndDist(rndEngine), rndDist(rndEngine), 1.0f);
    }

We then create a staging buffer in the host's memory to hold the initial particle properties:

    VkDeviceSize bufferSize = sizeof(Particle) * PARTICLE_COUNT;

    VkBuffer stagingBuffer;
    VkDeviceMemory stagingBufferMemory;
    createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);

    void* data;
    vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
    memcpy(data, particles.data(), (size_t)bufferSize);
    vkUnmapMemory(device, stagingBufferMemory);

Using this staging buffer as a source we then create the per-frame shader storage buffers and copy the particle properties from the staging buffer to each of these:

    for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
        createBuffer(bufferSize, VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, shaderStorageBuffers[i], shaderStorageBuffersMemory[i]);
        // Copy data from the staging buffer (host) to the shader storage buffer (GPU)
        copyBuffer(stagingBuffer, shaderStorageBuffers[i], bufferSize);
    }
}

Descriptors

Setting up descriptors for compute is almost identical to graphics. The only difference is that descriptors need to have the VK_SHADER_STAGE_COMPUTE_BIT set to make them accessible by the compute stage:

std::array<VkDescriptorSetLayoutBinding, 3> layoutBindings{};
layoutBindings[0].binding = 0;
layoutBindings[0].descriptorCount = 1;
layoutBindings[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
layoutBindings[0].pImmutableSamplers = nullptr;
layoutBindings[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
...

Note that you can combine shader stages here, so if you want the descriptor to be accessible from the vertex and compute stage, e.g. for a uniform buffer with parameters shared across them, you simply set the bits for both stages:

layoutBindings[0].stageFlags = VK_SHADER_STAGE_VERTEX_BIT | VK_SHADER_STAGE_COMPUTE_BIT;

Here is the descriptor setup for our sample. The layout looks like this:

std::array<VkDescriptorSetLayoutBinding, 3> layoutBindings{};
layoutBindings[0].binding = 0;
layoutBindings[0].descriptorCount = 1;
layoutBindings[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
layoutBindings[0].pImmutableSamplers = nullptr;
layoutBindings[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

layoutBindings[1].binding = 1;
layoutBindings[1].descriptorCount = 1;
layoutBindings[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
layoutBindings[1].pImmutableSamplers = nullptr;
layoutBindings[1].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

layoutBindings[2].binding = 2;
layoutBindings[2].descriptorCount = 1;
layoutBindings[2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
layoutBindings[2].pImmutableSamplers = nullptr;
layoutBindings[2].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;

VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = 3;
layoutInfo.pBindings = layoutBindings.data();

if (vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr, &computeDescriptorSetLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create compute descriptor set layout!");
}

Looking at this setup, you might wonder why we have two layout bindings for shader storage buffer objects, even though we'll only render a single particle system. This is because the particle positions are updated frame by frame based on a delta time. This means that each frame needs to know about the last frames' particle positions, so it can update them with a new delta time and write them to it's own SSBO:

For that, the compute shader needs to have access to the last and current frame's SSBOs. This is done by passing both to the compute shader in our descriptor setup. See the storageBufferInfoLastFrame and storageBufferInfoCurrentFrame:

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    VkDescriptorBufferInfo uniformBufferInfo{};
    uniformBufferInfo.buffer = uniformBuffers[i];
    uniformBufferInfo.offset = 0;
    uniformBufferInfo.range = sizeof(UniformBufferObject);

    std::array<VkWriteDescriptorSet, 3> descriptorWrites{};
    ...

    VkDescriptorBufferInfo storageBufferInfoLastFrame{};
    storageBufferInfoLastFrame.buffer = shaderStorageBuffers[(i - 1) % MAX_FRAMES_IN_FLIGHT];
    storageBufferInfoLastFrame.offset = 0;
    storageBufferInfoLastFrame.range = sizeof(Particle) * PARTICLE_COUNT;

    descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    descriptorWrites[1].dstSet = computeDescriptorSets[i];
    descriptorWrites[1].dstBinding = 1;
    descriptorWrites[1].dstArrayElement = 0;
    descriptorWrites[1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    descriptorWrites[1].descriptorCount = 1;
    descriptorWrites[1].pBufferInfo = &storageBufferInfoLastFrame;

    VkDescriptorBufferInfo storageBufferInfoCurrentFrame{};
    storageBufferInfoCurrentFrame.buffer = shaderStorageBuffers[i];
    storageBufferInfoCurrentFrame.offset = 0;
    storageBufferInfoCurrentFrame.range = sizeof(Particle) * PARTICLE_COUNT;

    descriptorWrites[2].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    descriptorWrites[2].dstSet = computeDescriptorSets[i];
    descriptorWrites[2].dstBinding = 2;
    descriptorWrites[2].dstArrayElement = 0;
    descriptorWrites[2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
    descriptorWrites[2].descriptorCount = 1;
    descriptorWrites[2].pBufferInfo = &storageBufferInfoCurrentFrame;

    vkUpdateDescriptorSets(device, 3, descriptorWrites.data(), 0, nullptr);
}

Remember that we also have to request the descriptor types for the SSBOs from our descriptor pool:

std::array<VkDescriptorPoolSize, 2> poolSizes{};
...

poolSizes[1].type = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
poolSizes[1].descriptorCount = static_cast<uint32_t>(MAX_FRAMES_IN_FLIGHT) * 2;

We need to double the number of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER types requested from the pool by two because our sets reference the SSBOs of the last and current frame.

Compute pipelines

As compute is not a part of the graphics pipeline, we can't use vkCreateGraphicsPipelines. Instead we need to create a dedicated compute pipeline with vkCreateComputePipelines for running our compute commands. Since a compute pipeline does not touch any of the rasterization state, it has a lot less state than a graphics pipeline:

VkComputePipelineCreateInfo pipelineInfo{};
pipelineInfo.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
pipelineInfo.layout = computePipelineLayout;
pipelineInfo.stage = computeShaderStageInfo;

if (vkCreateComputePipelines(device, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &computePipeline) != VK_SUCCESS) {
    throw std::runtime_error("failed to create compute pipeline!");
}

The setup is a lot simpler, as we only require one shader stage and a pipeline layout. The pipeline layout works the same as with the graphics pipeline:

VkPipelineLayoutCreateInfo pipelineLayoutInfo{};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = 1;
pipelineLayoutInfo.pSetLayouts = &computeDescriptorSetLayout;

if (vkCreatePipelineLayout(device, &pipelineLayoutInfo, nullptr, &computePipelineLayout) != VK_SUCCESS) {
    throw std::runtime_error("failed to create compute pipeline layout!");
}

Compute space

Before we get into how a compute shader works and how we submit compute workloads to the GPU, we need to talk about two important compute concepts: work groups and invocations. They define an abstract execution model for how compute workloads are processed by the compute hardware of the GPU in three dimensions (x, y, and z).

Work groups define how the compute workloads are formed and processed by the the compute hardware of the GPU. You can think of them as work items the GPU has to work through. Work group dimensions are set by the application at command buffer time using a dispatch command.

And each work group then is a collection of invocations that execute the same compute shader. Invocations can potentially run in parallel and their dimensions are set in the compute shader. Invocations within a single workgroup have access to shared memory.

This image shows the relation between these two in three dimensions:

The number of dimensions for work groups (defined by vkCmdDispatch) and invocations depends (defined by the local sizes in the compute shader) on how input data is structured. If you e.g. work on a one-dimensional array, like we do in this chapter, you only have to specify the x dimension for both.

As an example: If we dispatch a work group count of [64, 1, 1] with a compute shader local size of [32, 32, ,1], our compute shader will be invoked 64 x 32 x 32 = 65,536 times.

Note that the maximum count for work groups and local sizes differs from implementation to implementation, so you should always check the compute related maxComputeWorkGroupCount, maxComputeWorkGroupInvocations and maxComputeWorkGroupSize limits in VkPhysicalDeviceLimits.

Compute shaders

Now that we have learned about all the parts required to setup a compute shader pipeline, it's time to take a look at compute shaders. All of the things we learned about using GLSL shaders e.g. for vertex and fragment shaders also applies to compute shaders. The syntax is the same, and many concepts like passing data between the application and the shader are the same. But there are some important differences.

A very basic compute shader for updating a linear array of particles may look like this:

#version 450

layout (binding = 0) uniform ParameterUBO {
    float deltaTime;
} ubo;

struct Particle {
    vec2 position;
    vec2 velocity;
    vec4 color;
};

layout(std140, binding = 1) readonly buffer ParticleSSBOIn {
   Particle particlesIn[ ];
};

layout(std140, binding = 2) buffer ParticleSSBOOut {
   Particle particlesOut[ ];
};

layout (local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

void main() 
{
    uint index = gl_GlobalInvocationID.x;  

    Particle particleIn = particlesIn[index];

    particlesOut[index].position = particleIn.position + particleIn.velocity.xy * ubo.deltaTime;
    particlesOut[index].velocity = particleIn.velocity;
    ...
}

The top part of the shader contains the declarations for the shader's input. First is a uniform buffer object at binding 0, something we already learned about in this tutorial. Below we declare our Particle structure that matches the declaration in the C++ code. Binding 1 then refers to the shader storage buffer object with the particle data from the last frame (see the descriptor setup), and binding 2 points to the SSBO for the current frame, which is the one we'll be updating with this shader.

An interesting thing is this compute-only declaration related to the compute space:

layout (local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

This defines the number invocations of this compute shader in the current work group. As noted earlier, this is the local part of the compute space. Hence the local_ prefix. As we work on a linear 1D array of particles we only need to specify a number for x dimension in local_size_x.

The main function then reads from the last frame's SSBO and writes the updated particle position to the SSBO for the current frame. Similar to other shader types, compute shaders have their own set of builtin input variables. Built-ins are always prefixed with gl_. One such built-in is gl_GlobalInvocationID, a variable that uniquely identifies the current compute shader invocation across the current dispatch. We use this to index into our particle array.

Running compute commands

Dispatch

Now it's time to actually tell the GPU to do some compute. This is done by calling vkCmdDispatch inside a command buffer. While not perfectly true, a dispatch is for compute as a draw call like vkCmdDraw is for graphics. This dispatches a given number of compute work items in at max. three dimensions.

VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;

if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
    throw std::runtime_error("failed to begin recording command buffer!");
}

...

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipeline);
vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 0, 1, &computeDescriptorSets[i], 0, 0);

vkCmdDispatch(computeCommandBuffer, PARTICLE_COUNT / 256, 1, 1);

...

if (vkEndCommandBuffer(commandBuffer) != VK_SUCCESS) {
    throw std::runtime_error("failed to record command buffer!");
}

The vkCmdDispatch will dispatch PARTICLE_COUNT / 256 local work groups in the x dimension. As our particles array is linear, we leave the other two dimensions at one, resulting in a one-dimensional dispatch. But why do we divide the number of particles (in our array) by 256? That's because in the previous paragraph we defined that every compute shader in a work group will do 256 invocations. So if we were to have 4096 particles, we would dispatch 16 work groups, with each work group running 256 compute shader invocations. Getting the two numbers right usually takes some tinkering and profiling, depending on your workload and the hardware you're running on. If your particle size would be dynamic and can't always be divided by e.g. 256, you can always use gl_GlobalInvocationID at the start of your compute shader and return from it if the global invocation index is greater than the number of your particles.

And just as was the case for the compute pipeline, a compute command buffer contains a lot less state than a graphics command buffer. There's no need to start a render pass or set a viewport.

Submitting work

As our sample does both compute and graphics operations, we'll be doing two submits to both the graphics and compute queue per frame (see the drawFrame function):

...
if (vkQueueSubmit(computeQueue, 1, &submitInfo, nullptr) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit compute command buffer!");
};
...
if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit draw command buffer!");
}

The first submit to the compute queue updates the particle positions using the compute shader, and the second submit will then use that updated data to draw the particle system.

Synchronizing graphics and compute

Synchronization is an important part of Vulkan, even more so when doing compute in conjunction with graphics. Wrong or lacking synchronization may result in the vertex stage starting to draw (=read) particles while the compute shader hasn't finished updating (=write) them (read-after-write hazard), or the compute shader could start updating particles that are still in use by the vertex part of the pipeline (write-after-read hazard).

So we must make sure that those cases don't happen by properly synchronizing the graphics and the compute load. There are different ways of doing so, depending on how you submit your compute workload but in our case with two separate submits, we'll be using semaphores and fences to ensure that the vertex shader won't start fetching vertices until the compute shader has finished updating them.

This is necessary as even though the two submits are ordered one-after-another, there is no guarantee that they execute on the GPU in this order. Adding in wait and signal semaphores ensures this execution order.

So we first add a new set of synchronization primitives for the compute work in createSyncObjects. The compute fences, just like the graphics fences, are created in the signaled state because otherwise, the first draw would time out while waiting for the fences to be signaled as detailed here:

std::vector<VkFence> computeInFlightFences;
std::vector<VkSemaphore> computeFinishedSemaphores;
...
computeInFlightFences.resize(MAX_FRAMES_IN_FLIGHT);
computeFinishedSemaphores.resize(MAX_FRAMES_IN_FLIGHT);

VkSemaphoreCreateInfo semaphoreInfo{};
semaphoreInfo.sType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO;

VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;

for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
    ...
    if (vkCreateSemaphore(device, &semaphoreInfo, nullptr, &computeFinishedSemaphores[i]) != VK_SUCCESS ||
        vkCreateFence(device, &fenceInfo, nullptr, &computeInFlightFences[i]) != VK_SUCCESS) {
        throw std::runtime_error("failed to create compute synchronization objects for a frame!");
    }
}

We then use these to synchronize the compute buffer submission with the graphics submission:

// Compute submission
vkWaitForFences(device, 1, &computeInFlightFences[currentFrame], VK_TRUE, UINT64_MAX);

updateUniformBuffer(currentFrame);

vkResetFences(device, 1, &computeInFlightFences[currentFrame]);

vkResetCommandBuffer(computeCommandBuffers[currentFrame], /*VkCommandBufferResetFlagBits*/ 0);
recordComputeCommandBuffer(computeCommandBuffers[currentFrame]);

submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &computeCommandBuffers[currentFrame];
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &computeFinishedSemaphores[currentFrame];

if (vkQueueSubmit(computeQueue, 1, &submitInfo, computeInFlightFences[currentFrame]) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit compute command buffer!");
};

// Graphics submission
vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX);

...

vkResetFences(device, 1, &inFlightFences[currentFrame]);

vkResetCommandBuffer(commandBuffers[currentFrame], /*VkCommandBufferResetFlagBits*/ 0);
recordCommandBuffer(commandBuffers[currentFrame], imageIndex);

VkSemaphore waitSemaphores[] = { computeFinishedSemaphores[currentFrame], imageAvailableSemaphores[currentFrame] };
VkPipelineStageFlags waitStages[] = { VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;

submitInfo.waitSemaphoreCount = 2;
submitInfo.pWaitSemaphores = waitSemaphores;
submitInfo.pWaitDstStageMask = waitStages;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffers[currentFrame];
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &renderFinishedSemaphores[currentFrame];

if (vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]) != VK_SUCCESS) {
    throw std::runtime_error("failed to submit draw command buffer!");
}

Similar to the sample in the semaphores chapter, this setup will immediately run the compute shader as we haven't specified any wait semaphores. This is fine, as we are waiting for the compute command buffer of the current frame to finish execution before the compute submission with the vkWaitForFences command.

The graphics submission on the other hand needs to wait for the compute work to finish so it doesn't start fetching vertices while the compute buffer is still updating them. So we wait on the computeFinishedSemaphores for the current frame and have the graphics submission wait on the VK_PIPELINE_STAGE_VERTEX_INPUT_BIT stage, where vertices are consumed.

But it also needs to wait for presentation so the fragment shader won't output to the color attachments until the image has been presented. So we also wait on the imageAvailableSemaphores on the current frame at the VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT stage.

Drawing the particle system

Earlier on, we learned that buffers in Vulkan can have multiple use-cases and so we created the shader storage buffer that contains our particles with both the shader storage buffer bit and the vertex buffer bit. This means that we can use the shader storage buffer for drawing just as we used "pure" vertex buffers in the previous chapters.

We first setup the vertex input state to match our particle structure:

struct Particle {
    ...

    static std::array<VkVertexInputAttributeDescription, 2> getAttributeDescriptions() {
        std::array<VkVertexInputAttributeDescription, 2> attributeDescriptions{};

        attributeDescriptions[0].binding = 0;
        attributeDescriptions[0].location = 0;
        attributeDescriptions[0].format = VK_FORMAT_R32G32_SFLOAT;
        attributeDescriptions[0].offset = offsetof(Particle, position);

        attributeDescriptions[1].binding = 0;
        attributeDescriptions[1].location = 1;
        attributeDescriptions[1].format = VK_FORMAT_R32G32B32A32_SFLOAT;
        attributeDescriptions[1].offset = offsetof(Particle, color);

        return attributeDescriptions;
    }
};

Note that we don't add velocity to the vertex input attributes, as this is only used by the compute shader.

We then bind and draw it like we would with any vertex buffer:

vkCmdBindVertexBuffers(commandBuffer, 0, 1, &shaderStorageBuffer[currentFrame], offsets);

vkCmdDraw(commandBuffer, PARTICLE_COUNT, 1, 0, 0);

Conclusion

In this chapter, we learned how to use compute shaders to offload work from the CPU to the GPU. Without compute shaders, many effects in modern games and applications would either not be possible or would run a lot slower. But even more than graphics, compute has a lot of use-cases, and this chapter only gives you a glimpse of what's possible. So now that you know how to use compute shaders, you may want to take look at some advanced compute topics like:

Shared memory
Asynchronous compute
Atomic operations
Subgroups

You can find some advanced compute samples in the official Khronos Vulkan Samples repository.

C++ code / Vertex shader / Fragment shader / Compute shader

FAQ

This page lists solutions to common problems that you may encounter while developing Vulkan applications.

I get an access violation error in the core validation layer

Make sure that MSI Afterburner / RivaTuner Statistics Server is not running, because it has some compatibility problems with Vulkan.

I don't see any messages from the validation layers / Validation layers are not available

First make sure that the validation layers get a chance to print errors by keeping the terminal open after your program exits. You can do this from Visual Studio by running your program with Ctrl-F5 instead of F5, and on Linux by executing your program from a terminal window. If there are still no messages and you are sure that validation layers are turned on, then you should ensure that your Vulkan SDK is correctly installed by following the "Verify the Installation" instructions on this page. Also ensure that your SDK version is at least 1.1.106.0 to support the VK_LAYER_KHRONOS_validation layer.

vkCreateSwapchainKHR triggers an error in SteamOverlayVulkanLayer64.dll

This appears to be a compatibility problem in the Steam client beta. There are a few possible workarounds: * Opt out of the Steam beta program. * Set the DISABLE_VK_LAYER_VALVE_steam_overlay_1 environment variable to 1 * Delete the Steam overlay Vulkan layer entry in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ImplicitLayers

Example:

vkCreateInstance fails with VK_ERROR_INCOMPATIBLE_DRIVER

If you are using MacOS with the latest MoltenVK SDK then vkCreateInstance may return the VK_ERROR_INCOMPATIBLE_DRIVER error. This is because Vulkan SDK version 1.3.216 or newer requires you to enable the VK_KHR_PORTABILITY_subset extension to use MoltenVK, because it is currently not fully conformant.

You have to add the VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR flag to your VkInstanceCreateInfo and add VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME to your instance extension list.

Code example:

...

std::vector<const char*> requiredExtensions;

for(uint32_t i = 0; i < glfwExtensionCount; i++) {
    requiredExtensions.emplace_back(glfwExtensions[i]);
}

requiredExtensions.emplace_back(VK_KHR_PORTABILITY_ENUMERATION_EXTENSION_NAME);

createInfo.flags |= VK_INSTANCE_CREATE_ENUMERATE_PORTABILITY_BIT_KHR;

createInfo.enabledExtensionCount = (uint32_t) requiredExtensions.size();
createInfo.ppEnabledExtensionNames = requiredExtensions.data();

if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) {
    throw std::runtime_error("failed to create instance!");
}

Privacy policy

General

This privacy policy applies to the information that is collected when you use vulkan-tutorial.com or any of its subdomains. It describes how the owner of this website, Alexander Overvoorde, collects, uses and shares information about you.

Analytics

This website collects analytics about visitors using a self-hosted instance of Matomo (https://matomo.org/), formerly known as Piwik. It records which pages you visit, what type of device and browser you use, how long you view a given page and where you came from. This information is anonymized by only recording the first two bytes of your IP address (e.g. 123.123.xxx.xxx). These anonymized logs are stored for an indefinite amount of time.

These analytics are used for the purpose of tracking how content on the website is consumed, how many people visit the website in general, and which other websites link here. This makes it easier to engage with the community and determine which areas of the website should be improved, for example if extra time should be spent on facilitating mobile reading.

This data is not shared with third parties.

This website uses a third-party advertisement server that may use cookies to track activities on the website to measure engagement with advertisements.

Comments

Each chapter includes a comment section at the end that is provided by the third-party Disqus service. This service collects identity data to facilitate the reading and submission of comments, and aggregate usage information to improve their service.

The full privacy policy of this third-party service can be found at https://help.disqus.com/terms-and-policies/disqus-privacy-policy.

Vulkan Tutorial