Creating Visual Cognitive Illusions

2025 Fall PKU Computer Vision Final Project

Zihan Yang1, Yuming Fang1, Yueran Wang2
1Peking University, School of Electronics Engineering and Computer Science, 2Peking University, School of Mathematical Sciences

This homepage was completely built by Zihan Yang alone.

Interactive Demos

Flip Illusion

Click and drag to rotate, double-click to reset!

giraffe/penguin
ship/ship
dress/dress

Rotation Overlay Illusion

Click and drag to rotate the top image. Watch how the combined image changes!

Hidden Overlay Illusion

Drag multiple transparent images together to reveal a hidden pattern underneath!

Scan it!

Real-World Experimental Validation

Real-world experiments demonstrating successful physical realizations.
The images above showcase successful cases printed and observed in the physical world. These results validate that our algorithm is robust not only in digital simulation but also effectively applicable in real-world scenarios.

Rotational Hidden Overlay

Drag to rotate the top disk. Align it to the correct angle to reveal the secret!

You can find three angles that reveal 'PKU', 'EECS', and 'SMS' respectively.
You can find one angle that reveals a heart and the letters 'LOVE'.
You can find one angle that reveals a cup.

Distance-Dependent Spectral Hybridization

Dynamic Distance Simulation

(High-frequency details dominate up close, while low-frequency shapes emerge from afar.)

Interactive Verification

Note: The animation above involves no visual tricks or editing; it strictly simulates changing viewing distance by resizing the image. You can verify this scale-dependent perception yourself by selecting an image and dragging the slider below.

Current Scale: 100%

Orthogonal Voxel Projection Synthesis

A 3D voxel structure that projects different semantic objects from three orthogonal views.

Loading 3D Model...
Interactive 3D Viewer (Click and drag to rotate, scroll to zoom)

Orthogonal Views

Front View: Butterfly
Side View: Tree
Top View: Apple

Differentiable Cylindrical Anamorphosis

Top-Left: Official Demo (LookingGlass)
Top-Right: Real-world Experiment Gap
Bottom-Left: Simulation ("Your Name")
Bottom-Right: Simulation ("To The Moon")

We were deeply inspired by the impressive visual effects presented in LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping (Chang et al., CVPR 2025 Oral). Since the official codebase was unavailable, we attempted to reproduce the differentiable cylindrical anamorphosis effect within our own framework.

Our computer simulations yielded promising results, as shown in the bottom row. The bottom-left image demonstrates a generated example where the flat pattern appears as a starry sky (left sub-panel), yet reveals the iconic hand-holding scene from the movie Your Name when reflected on a cylinder (middle sub-panel), matching the target poster (right sub-panel). Similarly, the bottom-right image presents another example where a flattened monster pattern transforms into the silhouette of the heroine from the game To the Moon.

However, the top-right image illustrates our attempt to replicate this in the physical world by printing the generated pattern and observing it through a reflective cylinder. The resulting reflection is indistinct (compared to the official demo in the top-left), highlighting a significant sim-to-real gap in our current implementation that requires further refinement in material modeling and alignment.

Motion Integration Steganography

Left: Static "City" → Motion "Rose"
Middle: Static "Forest" → Motion "Panda"
Right: Reference (Geng et al., 2024)

In this section, we present a novel capability of our framework: generating images that possess motion-dependent semantics. These images convey one visual meaning when static, but perceptually integrate into a completely different object when subjected to specific motion patterns (e.g., jittering).

As demonstrated in the left video, the static image depicts "a crayon drawing of a city", which transforms into "a rose" upon shaking. Similarly, the middle video reveals a hidden "panda" within a static "Vertical Forest Architecture". We drew inspiration from Factorized Diffusion: Perceptual Illusions by Noise Decomposition (Geng et al., 2024), whose official demo is shown in the right video. Crucially, we reproduced this effect entirely within the unified framework proposed in our paper, without relying on their specific codebase.

Experimental Evaluation

To ensure the quality of our generated illusions, we designed an automated pipeline consisting of two stages: Automated Generation using Diffusion Models (DeepFloyd IF / SD v1.4) and Intelligent Evaluation using Vision Language Models (Qwen3-VL-4B-Instruct).

We implemented a "blind test" strategy where the VLM evaluates the generated images without knowing the target words beforehand. This ensures the objectivity of the scores.

Qualitative Comparison

We generated thousands of samples. Below is a comparison showing the disparity in generation quality. Higher VLM scores indicate clearer text legibility and better illusion consistency.

Score 0 Sample

Score: 0

Complete failure in text formation.

Score 5 Sample

Score: 5

Recognizable but lacks consistency.

Score 9 Sample

Score: 9

High legibility and perfect semantic alignment.

Quantitative Analysis

We generated 3,333 candidate images for the (EECS, SMS) word pair. The table below shows the score distribution evaluated by Qwen3-VL. As observed, generating high-quality ambigrams is challenging, with only ~6% of samples achieving a score of 8.0 or higher.

VLM Score Count Percentage VLM Score Count Percentage
0.01,17135.1% 5.0682.0%
0.5361.1% 5.5150.5%
1.01434.3% 6.01063.2%
1.5501.5% 6.5280.8%
2.0772.3% 7.01935.8%
2.5260.8% 7.566620.0%
3.0752.3% 8.046213.9%
3.5150.5% 8.51815.4%
4.0581.7% 9.040.1%
4.5130.4%

Limitations: Typography & Algorithmic Alternatives

Despite the visual fidelity of Stable Diffusion, its capability to generate precise, legible typography remains a significant bottleneck. For geometric illusions requiring exact character alignment and strict perspective consistency—such as the perspective-based text concealment shown below—generative approaches often yield illegible glyphs or fail to adhere to the rigid geometric constraints required for the illusion to work.

Consequently, for this specific category, we found that traditional algorithmic generation (deterministic rendering via projection code) significantly outperforms latent diffusion models. As demonstrated below, our code-based implementation successfully creates a structure that appears as chaotic noise from a neutral viewpoint but clearly resolves into distinct sentences ("I HATE PKU ICS" and "I LOVE PKU CV") when viewed from specific angles. This level of structural precision is currently difficult to achieve using standard Stable Diffusion pipelines due to the lack of explicit character-aware mechanisms in the latent space.

1. Reference Inspiration

Inspiration Reference
(a) Inspiration: The reference wire-frame text illusion.

2. Our Algorithmic Generation

Our Algorithmic Generation
(b) Our reproduction using a deterministic code-based pipeline. The chaotic wireframe resolves into "I HATE PKU ICS" and "I LOVE PKU CV" from specific viewpoints.
This homepage was completely built by Zihan Yang alone.