PIXLRelight: Controllable Relighting via Intrinsic Conditioning

Abstract

We present PIXLRelight, a feed-forward approach for physically controllable single-image relighting. Existing methods either provide limited lighting control (for example through text or environment maps), accumulate errors when chaining inverse and forward rendering, or require costly per-image optimization. Our key idea is to bridge physically based rendering (PBR) and learned image synthesis through a shared intrinsic conditioning that can be obtained from either real photographs or PBR renders. At training time, paired multi-illumination photographs are decomposed into albedo, diffuse shading, and non-diffuse residuals, which condition the model. At inference time, the same conditioning is computed from a path-traced render of a coarse 3D reconstruction of the input under user-specified PBR lights. A transformer-based neural renderer then applies the target illumination to the source photograph, preserving fine image detail through a per-pixel affine modulation. PIXLRelight enables arbitrary PBR-style lighting control, achieves state-of-the-art relighting quality, and runs in under a tenth of a second per image.

Inference pipeline

Inference pipeline. Given a single input image, geometry is recovered by Depth Anything 3 and unprojected to a triangle mesh, and materials are recovered by Marigold-IID-Appearance. The textured mesh is loaded into Blender, where the user authors the desired illumination; Blender Cycles then renders the scene and produces the target intrinsic maps. PIXLRelight takes as input the original image together with the target intrinsic maps and produces the final relit prediction.

Training pipeline

Training pipeline. The source image is patchified by a ViT branch and the channel-wise concatenated target intrinsics — extracted from the target image by a frozen Marigold-IID-Lighting model — are patchified by a ConvNeXt branch. The two token grids are fused per spatial location, projected to a common dimension, and processed by a self-attention transformer trunk. A DPT head reads out intermediate trunk features and predicts a per-pixel affine modulation of the source. Training is supervised end-to-end against the target image with pixel and perceptual losses.

@article{farinha2026pixlrelight, title = {PIXLRelight: Controllable Relighting via Intrinsic Conditioning}, author = {Farinha, Miguel and Clark, Ronald}, journal = {arXiv preprint arXiv:2605.18735}, year = {2026} }

PIXLRelight:

Controllable Relighting via Intrinsic Conditioning

Abstract

Inference pipeline

Training pipeline

Qualitative comparison

BibTeX