This update further improves shader management and pipeline handling:
- Add advanced heuristics for smarter async shader compilation in both OpenGL
and Vulkan renderers, with better detection of UI and critical shaders
- Implement thread pool for prioritized shader compilation with proper progress
tracking and reporting
- Add predictive shader loading system to preload related shaders based on
pipeline transitions
- Implement pipeline deduplication through Clone() method to reduce memory
usage and improve performance
- Add memory optimizations for shader translation and SPIR-V generation
- Enhance error handling and logging for shader operations
- Introduce batch loading and directory-based shader preloading capabilities
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Fix multiple issues in the Vulkan asynchronous presentation implementation:
- Convert regular mutexes to timed_mutex for timeout support
- Use condition_variable_any for compatibility with timed_mutex
- Fix thread synchronization with proper locking and error handling
- Add VkResultToString helper to replace missing ToString function
- Implement better error recovery with recreation attempt limits
- Add comprehensive logging for better troubleshooting
These changes make the asynchronous presentation feature more robust
and less prone to deadlocks, while keeping it disabled by default since
it may still cause instability in some games.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
This commit disables GPU memory management, memory snapshots, and NCE
functionality by forcing these settings to false throughout the codebase.
The changes include:
- Hardcoding false values in renderer_vulkan.cpp initialization
- Disabling UI controls for these features in configuration screens
- Forcing settings to false during configuration read/write operations
These features are being disabled to improve stability and prevent
potential memory-related issues.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Resolves a variable shadowing issue in AsyncCompileShader where the callback
lambda parameter was shadowing the outer callback variable. This was causing
compilation warnings/errors in Android Studio. The fix:
- Renames the outer callback variable to 'outer_callback'
- Renames the inner lambda callback parameters to 'inner_callback'
- Maintains consistent naming across all error handling paths
This change improves code clarity and eliminates compiler warnings while
maintaining the same functionality for async shader compilation.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Implements a more robust error handling approach when storage buffer lookups
fail in the buffer cache. Instead of returning a null binding, the code now:
- Provides a fallback buffer with safe default values
- Implements warning rate limiting to prevent log spam
- Tracks warning counts per cbuf_index
- Logs detailed debug information periodically
This change helps prevent potential crashes when storage buffer lookups fail
while still maintaining visibility into the issue through strategic logging.
The fallback mechanism uses a safe static address and a reasonable buffer
size (16KB) to handle cases where the normal GPU to CPU address translation
fails.
Also updates copyright headers to include citron Emulator Project.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Implement several improvements to the texture cache memory management system
to address memory leaks that occur in memory-intensive games like TOTK
(Title ID 0100F2C0115B6000). These changes prevent the gradual memory
increase that eventually leads to crashes or undefined behavior.
Key improvements:
- Enhance garbage collection with more aggressive cleanup thresholds
- Add emergency resource cleanup for persistent high memory usage
- Improve DeleteImage to ensure proper resource deallocation
- Make DelayedDestructionRing thread-safe with proper mutex protection
- Track consecutive high-memory frames to detect potential leaks
- Add emergency cleanup mechanism for extreme memory pressure situations
- Use proper type casting in std::max to fix compilation errors
This should significantly improve stability during extended gameplay
sessions with memory-intensive titles.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
- Fix variable shadowing in ShaderManager constructor by renaming parameter
- Remove unused variables in vk_texture_manager.cpp to avoid warnings
- Fix int conversion warning in syscall return value
These changes fix build errors when using certain optimized compile flags for Linux.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
- Remove special handling for reversed depth scenarios that were added for Civilization 7
- Remove excessive logging in Vulkan renderer
- Update Discord client ID
- Update Vulkan-related external dependencies
Signed-off-by: Zephyron <zephyron@citron-emu.org>
- Added missing <thread> header for std::thread usage
- Added <fcntl.h> for O_CLOEXEC and O_NONBLOCK definitions
- Fixed struct initialization order in uffdio_copy to match declaration order
Signed-off-by: Zephyron <zephyron@citron-emu.org>
This commit adds Windows-specific implementation of the fault-managed memory
system, providing similar functionality to the existing Linux/Android implementation.
Key changes:
- Added Windows-specific memory management using VirtualAlloc/VirtualFree
- Implemented Windows vectored exception handler for page fault handling
- Added proper memory protection and page fault handling on Windows
- Updated memory snapshot functionality to work on Windows
- Added proper cleanup of Windows-specific resources
- Fixed type conversion issues in memory management code
- Added proper error handling for Windows memory operations
- Fixed VRAM Memory Layout Mode to allow up to 12Gb
The implementation uses Windows-specific APIs:
- VirtualAlloc/VirtualFree for memory management
- AddVectoredExceptionHandler for page fault handling
- VirtualProtect for memory protection management
This change maintains feature parity with the Linux/Android implementation
while using Windows-native APIs for better performance and reliability.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Adds a new cross-platform memory management system with enhanced capabilities:
- Fault-managed memory allocation for Linux/Android platforms
- Memory snapshot and differential snapshot support
- Predictive memory reuse tracking for optimized access patterns
- Vulkan compute buffer integration
- User-configurable settings for enabling features
The system integrates with the existing Vulkan renderer to provide more
efficient memory handling, especially for compute-intensive workloads.
Co-authored-by: boss.sfc <boss.sfc@citron-emu.org>
Co-committed-by: boss.sfc <boss.sfc@citron-emu.org>
Signed-off-by: Zephyron <zephyron@citron-emu.org>
This commit makes two significant improvements:
1. Vulkan renderer:
- Detect and properly handle reversed depth buffers (clear_depth < 0.5)
- Force depth write enable when needed with reversed depth
- Use GREATER_OR_EQUAL comparison for reversed depth scenarios
- Fix transparency issues in games like Civilization 7 by adjusting blend factors
- Add detailed logging for depth buffer operations
2. Friend service:
- Implement previously stubbed functions including EnsureFriendListAvailable
and EnsureBlockedUserListAvailable
- Add proper event signaling to prevent games from hanging
- Implement Cancel function for improved compatibility
- Update copyright notice for the Citron project
These changes improve compatibility with modern games using reversed depth
buffers and prevent hangs in titles that rely on Friend service functionality.
Co-authored-by: m33ts4k0z <m33ts4k0z@citron-emu.org>
Co-committed-by: m33ts4k0z <m33ts4k0z@citron-emu.org>
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Implement a robust asynchronous shader compilation system inspired by commit
1fd5fefcb1. This enhancement provides:
- True multi-threaded shader compilation with atomic status tracking
- Persistent disk caching for faster shader loading
- Command queue system for background processing
- Integration with Citron's scheduler for better resource management
- Parallel shader loading to reduce startup times
- Improved error handling and recovery mechanisms
These changes significantly reduce shader compilation stuttering and improve
overall performance when using asynchronous shaders. The implementation
maintains compatibility with Citron's existing architecture while adding
more robust threading capabilities.
Co-authored-by: boss.sfc <boss.sfc@citron-emu.org>
Co-committed-by: boss.sfc <boss.sfc@citron-emu.org>
Signed-off-by: Zephyron <zephyron@citron-emu.org>
This commit adds improved Vulkan functionality to the Citron emulator:
- Add thread-safe texture management with automatic error recovery
- Implement shader caching with validation support
- Add robust error handling for Vulkan operations
- Implement platform-specific initialization for Windows, Linux, and Android
These enhancements improve stability when handling texture loading errors
and provide better recovery mechanisms for Vulkan failures.
Co-authored-by: boss.sfc <boss.sfc@citron-emu.org>
Co-committed-by: boss.sfc <boss.sfc@citron-emu.org>
Signed-off-by: Zephyron <zephyron@citron-emu.org>
This commit adds new settings and optimizations for shader compilation:
- Add new settings:
- use_enhanced_shader_building: Enable enhanced shader compilation
- shader_compilation_priority: Control shader compilation priority
- Improve shader compilation performance:
- Optimize worker thread allocation based on CPU cores
- Add smarter async shader compilation heuristics
- Prioritize vertex and fragment shader compilation
- Add performance tracking and logging
- Add performance monitoring:
- Track shader compilation times
- Log slow shader compilations
- Monitor async shader compilation statistics
This is a work in progress commit. Further optimizations and refinements
will be needed based on testing and feedback.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
This commit addresses critical TLB (Translation Lookaside Buffer) issues on Android by implementing several optimizations:
- Add new BufferCacheAccelerator to manage memory range overlap detection
- Implement TLB-aware memory barriers to prevent unnecessary invalidations
- Add a TLB caching system to avoid redundant flushing operations
- Create a counter to limit outstanding memory operations and prevent TLB thrashing
- Implement TLB prefetching for better memory access patterns
- Add targeted memory barriers for more precise synchronization
These changes significantly reduce the likelihood of the "0.0 FPS deadlock" issue on Android devices by maintaining a more stable TLB state and preventing cache thrashing.
TODO: Merge & Adapt Camille LaVey's TLB Method To Further Improve
Signed-off-by: Zephyron <zephyron@citron-emu.org>
- Added conditional check for AMD graphics drivers
- Automatically disable logic operations when float vertex attributes
are present to work around driver quirks
- Maintain original logic op state to preserve emulator behavior
- Prepare dynamic state management infrastructure for future
OpenGL implementation changes
OpenGL implementation will follow in subsequent commits.
Signed-off-by: Zephyron <zephyron@citron-emu.org>
Enhances the Vulkan memory allocator with better OOM handling and memory
alignment:
* Add memory recovery by cleaning up empty allocations before failing
* Implement proper fallback to non-device-local memory
* Simplify memory alignment handling for different vendors
* Add better error logging for allocation failures
* Add IsEmpty() helper to track unused allocations
* Fix alignment requirements for Adreno (4KB) vs other vendors
These changes improve the robustness of memory allocation, particularly
in low-memory situations, and streamline vendor-specific alignment
requirements.
- Add operator== to ImageInfo and ImageViewInfo classes to enable direct equality comparisons. The implementations use std::tie to perform member-wise comparisons of all relevant fields in a safe and efficient manner.
This allows for easier comparison of texture cache entries and view
information, which can be useful for cache management and debugging.
Update fmt library to version 11.0.2 and make necessary adjustments:
- Replace fmt/format.h includes with fmt/ranges.h
- Add const qualifiers to formatter::format functions
- Update CMake to require fmt version 11
Additional dependency updates:
- Update SDL2 bundled version from 2.28.2 to 2.32.0
- Update catch2 to version 3.8.0
- Update vcpkg baseline to c82f74667287d3dc386bce81e44964370c91a289
Fix Android compilation with latest NDK (28.0.13004108) and Java JDK 21 by
renaming the resolve_image variable to resolve_image_holder to avoid potential
naming conflicts. This change helps maintain compatibility with the updated
build toolchain while keeping the core functionality intact.
The change affects the MSAA image copy operation in the Vulkan texture cache
implementation.
Update magic numbers and default identifiers across codebase:
- UUID default value
- Thread magic number
- Amiibo name
- Vulkan cache magic number
- Shader cache magic number
Add workarounds for Samsung Xclipse GPUs:
- Disable extendedDynamicState3ColorBlendEquation as it is broken in Samsung
drivers, similar to AMD drivers
- Add Samsung's proprietary driver to the validated driver list for clock
boosting
- Fix log message to indicate both AMD and Samsung drivers have broken
color blend equation support
Remove stray logical OR operator from validated_driver condition.
Refactors and improves the texture format conversion system in the Vulkan
renderer:
- Adds proper sRGB to linear conversion for depth formats
- Improves shader accuracy for ABGR8 SRGB to D24S8 conversion
- Adds gamma correction and proper depth range clamping
- Moves GetSupportedFormat implementation to header
- Cleans up format conversion switch statement
- Removes redundant format conversion paths
The changes improve accuracy when converting between color and depth
formats, particularly for sRGB sources. The shader improvements ensure
proper gamma correction and depth range handling.
Technical changes:
- Improves sRGB to linear conversion in fragment shader
- Adds proper depth value clamping
- Consolidates format conversion logic
- Removes duplicate GetSupportedFormat implementation
Implements hardware-accelerated MSAA resolve functionality in the Vulkan
texture cache instead of relying on compute shaders. This change:
- Adds proper MSAA to non-MSAA image copy support using VkResolveImage
- Creates temporary resolve images with appropriate memory allocation
- Handles format compatibility checks with proper fallback to compute
- Manages image layout transitions and memory barriers
- Preserves existing compute shader fallback for unsupported formats
The implementation follows Vulkan best practices for MSAA resolve
operations and should provide better performance for supported formats.
Adds special handling for memory allocation size on AMD and Qualcomm (Adreno)
drivers by aligning allocations to 4KB boundaries. This fixes potential memory
allocation issues on these drivers where unaligned allocations may fail or
cause undefined behavior.
Affected drivers:
- AMD Proprietary (AMDVLK)
- AMD Open Source (RADV)
- Qualcomm Proprietary (Adreno)
Adds several new shader-based format conversion pipelines to support additional
texture formats and operations:
- RGBA8 to BGRA8 conversion
- YUV420/RGB conversions
- BC7 to RGBA8 decompression
- ASTC HDR to RGBA16F decompression
- RGBA16F to RGBA8 conversion
- Temporal dithering
- Dynamic resolution scaling
Updates the texture cache runtime to handle these new conversion paths and adds
helper functions to check format compatibility for dithering and scaling
operations.
The changes include:
- New shader files and CMake entries
- Additional conversion pipeline setup in BlitImageHelper
- Extended format conversion logic in TextureCacheRuntime
- New format compatibility check helpers
Implements conversion from sRGB color formats to D24S8 depth-stencil format
in the Vulkan renderer. This change includes:
- New fragment shader convert_abgr8_srgb_to_d24s8.frag that handles proper
sRGB to linear conversion before depth calculation
- Added shader to CMake build system
- Extended BlitImageHelper with new conversion pipeline and methods
- Updated texture cache to handle sRGB to D24S8 format conversion paths
The conversion properly handles sRGB color space by first converting to
linear space before calculating luminance values for the depth component,
while preserving alpha channel data for the stencil component.
Adjusts VRAM allocation strategy to be more conservative while maintaining
performance:
- Increases reserve memory from 1/8th to 1/4th (max 2GB) for discrete GPUs
- Increases base memory limit from 6GB to 8GB
- Doubles resolution scaling memory from 1GB to 2GB per scale factor
- Reduces system memory reservation from 8GB to 4GB for integrated GPUs
- Increases maximum memory limit from 4GB to 6GB for integrated GPUs
These changes help prevent memory leaks while still providing adequate
VRAM for optimal performance.
Adds missing include for common/logging/log.h in gpu.h which was causing
compilation failures on Android. This header is needed for logging
functionality used in GPU-related operations.
The include was previously indirectly available through other headers,
but making it explicit improves code clarity and prevents potential
future compilation issues.
Reverts overly restrictive storage buffer validation and size calculation
that was causing rendering issues in The Legend of Zelda: Tears of the
Kingdom, particularly in underground/depth areas. The simplified approach:
- Uses GetMemoryLayoutSize() instead of manual page probing
- Removes unnecessary 4GB memory bounds validation
- Streamlines address translation and alignment handling
This fixes numerous reported cases of missing or corrupted rendering in
TOTK's underground areas where storage buffer operations are heavily used
for depth-related effects.
Relaxes validation checks in the NVDRV GPU service and improves error notifier
handling to prevent potential hangs. Key changes:
- Remove strict size validation in SetErrorNotifier
- Relax GPFIFO entry count validation to only check for non-zero values
- Add proper error notifier state tracking in GPU class
- Improve debug logging messages
The previous strict validation was causing issues with some games like ACNH.
These changes maintain necessary checks while being more permissive with
edge cases that don't impact functionality.
Technical changes:
- Store error notifier state in GPU class for future implementation
- Remove upper bound check on GPFIFO entries
- Simplify error notifier setup flow
This should resolve hanging issues while maintaining core functionality.
Implements several previously stubbed functions in the NVDRV service:
- Initialize proper transfer memory handling
- Add error notifier configuration
- Implement channel timeout and timeslice management
- Add object context allocation and tracking
- Add GPU interface stubs for new functionality
The changes improve the accuracy of GPU-related operations while maintaining
compatibility with the existing codebase. All functions now properly validate
parameters and handle endianness correctly using _le types.
Two main changes in this commit:
1. Replace NVIDIA-specific GL_NV_gpu_shader5 extension with the more widely
supported GL_EXT_shader_explicit_arithmetic_types_float16 in the scaleforce
shader. This improves compatibility across different GPU vendors.
2. Refactor ARM32 exception handling:
- Restructure exception cases for better readability
- Update exception handling to match current Dynarmic API
- Fix indentation in switch statement
- Remove AccessViolation case as it's no longer supported in current API
These changes improve shader compatibility and align the exception handling
with the current Dynarmic implementation.
- In commit b3facaa6bb, the copyright header was
updated to include "Citron Homebrew Project" across multiple files, regardless
of whether any contributions were made.
- This commit removes the incorrect attribution and reverts the copyright header
to its previous state.
- Copyright attribution should only be added when meaningful contributions have
been made to the file.
- This commit ensures proper compliance with copyright standards and maintains
correct attribution to the respective contributors.
- Special thanks to Tachi for pointing out the need for these corrections and
ensuring that proper attribution practices are followed.
- Update Gradle from 8.1 to 8.10.2
- Update Android Gradle Plugin from 8.1.2 to 8.8.0
- Update Kotlin from 1.9.20 to 2.1.20-Beta1
- Update various AndroidX dependencies to latest versions
- Update NDK from 26.1 to 27.2 (with RC1/RC2 options commented)
- Update Java target from 17 to 21
- Update Android target SDK from 34 to 35
- Enable additional optimizations in release builds:
- Enable shrinkResources
- Use proguard-android-optimize.txt
- Disable JNI debugging
- Replace vector drawables with PNG for ic_citron
- Comment out unused ComputeInline constant in dma_pusher.cpp
- Update Gradle wrapper and configuration files
This updates the Android build system and dependencies to their latest
versions while enabling additional optimizations for release builds.
The ic_citron icon was converted from vector to PNG format for better
compatibility.
Previously, the code would unconditionally add a constant buffer descriptor
at index 0 whenever storage buffers were present, which could cause conflicts
and crashes. This change:
- Adds validation to check if constant buffer 0 already exists
- Only adds the descriptor if it's not already present
- Prevents potential descriptor conflicts in shaders
This should resolve crashes in Vulkan games related to invalid descriptor
layouts and resource binding conflicts.
Add support for bindless texture constant buffers in the compute pipeline
creation process. When storage buffer descriptors are present, create a
constant buffer descriptor to handle bindless textures. This fixes the
"Failed to track bindless texture constant buffer" error.
Changes:
- Add constant buffer descriptor with index 0 and count 1 when storage
buffers are present
- Place descriptor creation before SPIR-V code generation to ensure proper
shader compilation
This resolves issues with bindless texture access in compute shaders.
Fixes the StorageBufferBinding function to properly handle memory validation
and size detection. Key changes include:
- Fix ReadBlock usage to properly handle void return values
- Implement safer memory validation using byte-level reads
- Improve size detection logic for storage buffers
- Fix NVN buffer size reading
- Add proper bounds checking for device memory addresses
- Add better error logging for invalid conditions
This addresses the "Failed to find storage buffer for cbuf index 0" errors
by implementing more robust memory validation and size detection. The changes
ensure proper handling of invalid memory addresses and prevent crashes from
accessing out-of-bounds memory.
- Modify DmaPusher to use safe memory reads when handling compute
operations at High GPU accuracy
- Prevent potential memory corruption issues that could lead to
invalid dispatch parameters
- Previously, unsafe reads could result in corrupted launch_description
data in KeplerCompute::ProcessLaunch, causing invalid vkCmdDispatch
calls
- By enforcing safe reads specifically for compute operations, we
maintain performance for other GPU tasks while ensuring compute
dispatch stability
This change requires >= High GPU accuracy level to take effect.