Every AI image generation model — Nano Banana 2, Stable Diffusion, Midjourney, DALL-E — will hallucinate architectural elements. This is not a deficiency that will be patched in the next version. It is a fundamental characteristic of how diffusion models generate images: they reconstruct spatial relationships from statistical patterns learned during training, not from physical constraints or architectural blueprints. When the statistical pattern says "living rooms usually have two windows," the model will add a second window to your single-window design without hesitation or disclosure.
For concept art and early-stage mood exploration, this creative liberty is acceptable and sometimes even useful. For production architectural visualization — where renders represent specific buildings with specific dimensions that will be reviewed by architects, engineers, planners, and paying clients — hallucinated geometry is a professional liability. A render that shows a window where a load-bearing wall exists, or a ceiling height that does not match the approved drawings, can derail a planning approval or breach a visualization contract.
This article documents a systematic approach to controlling hallucinations across the three phases of AI-assisted ArchViz: prompt construction, generation control, and post-generation validation.
Phase 1: Prompt Architecture for Structural Fidelity
The prompt is your first line of defense. Through testing approximately 3,000 prompt variations across Nano Banana 2 and Stable Diffusion XL, we have identified prompt patterns that statistically reduce hallucination rates by 40–65% compared to naive prompts.
The Constraint-First Prompt Structure
Most ArchViz artists write prompts that describe the desired aesthetic outcome: "modern minimalist living room, warm lighting, oak floors." This approach gives the model maximum creative freedom, including the freedom to hallucinate geometry. The constraint-first structure reverses the priority — structural constraints come first, aesthetic descriptions come last:
Prompt TemplateSTRUCTURAL CONSTRAINTS (first):
"Exact room layout: [shape] room, [X]m × [Y]m floor area,
[Z]m ceiling height, [N] windows on [direction] wall only,
single entry door on [direction] wall, no additional openings"
MATERIAL CONSTRAINTS (second):
"Materials: [specific material] walls, [specific material] floor,
[specific material] ceiling, maintain uniform wall color"
AESTHETIC (last):
"Style: [description], [lighting], [camera angle]"
NEGATIVE PROMPT (always include):
"extra windows, additional doors, modified floor plan, altered room
proportions, floating elements, impossible geometry, phantom openings,
changed wall positions, structural modifications"
The constraint-first structure works because diffusion models weight tokens in the prompt by position — earlier tokens have stronger influence on the generated image. By placing structural constraints first, you bias the model toward spatial accuracy before it considers aesthetic qualities.
Dimensional Anchoring
Include explicit dimensional references in your prompts. Rather than "spacious living room," write "6.4m × 4.2m living room with 2.7m ceiling height." The model cannot interpret metric dimensions literally, but dimensional language activates different training data distributions — images captioned with specific measurements tend to come from architectural sources rather than lifestyle photography, which inherently have more accurate spatial relationships.
Phase 2: Generation Control Techniques
ControlNet Depth Maps
The most reliable method for preventing spatial hallucinations is providing a depth map from your 3ds Max scene as a ControlNet input. The depth map encodes the exact spatial relationships of your architecture — wall positions, ceiling height, window openings — and constrains the AI generation to match this spatial structure.
Use this MaxScript to render a calibrated depth map from your 3ds Max scene that is optimized for ControlNet consumption:
MaxScript-- RenderVault: ControlNet Depth Map Generator
-- Renders a calibrated linear depth map for AI generation control
(
-- Store current renderer settings
local origRenderer = renderers.current
local origWidth = renderWidth
local origHeight = renderHeight
-- Configure for depth output
renderWidth = 1024 -- Match your AI generation resolution
renderHeight = 1024
-- Create VRayZDepth render element if using V-Ray
local rm = maxOps.GetCurRenderElementMgr()
local depthEl = VRayZDepth()
depthEl.elementName = "ControlNet_Depth"
depthEl.vray_depthFromCamera = true
depthEl.vray_depthClamp = true
-- Set depth range to scene bounds
-- Near: closest wall to camera, Far: furthest wall
depthEl.vray_depthBlack = 50.0 -- Near clip (cm) - adjust per scene
depthEl.vray_depthWhite = 1200.0 -- Far clip (cm) - adjust per scene
rm.AddRenderElement depthEl
format "Depth element added.\n"
format "Near clip: % cm | Far clip: % cm\n" depthEl.vray_depthBlack depthEl.vray_depthWhite
format "Render current view, then use the depth element output.\n"
format "IMPORTANT: Save as 16-bit PNG for maximum depth precision.\n"
)
Set the vray_depthBlack (near) and vray_depthWhite (far) values to tightly bracket your interior scene. For a single room, near=50cm and far=800cm is typical. For a multi-room walkthrough, extend far to 1500–2000cm. Tight bracketing maximizes the depth precision within the 8-bit or 16-bit output range, giving ControlNet more accurate spatial information.
Edge Maps for Structural Boundaries
Supplement depth maps with Canny edge detection on your wireframe render. This provides explicit structural boundary information that the depth map alone may not capture — particularly for features like window mullions, door frames, and wall intersection lines that occupy the same depth plane.
Python# RenderVault: Generate ControlNet edge map from 3ds Max wireframe render
import cv2
import numpy as np
def generate_edge_map(wireframe_path, output_path, low_thresh=50, high_thresh=150):
"""
Generate Canny edge map from wireframe render for ControlNet input.
Args:
wireframe_path: Path to wireframe render from 3ds Max
output_path: Output path for edge map
low_thresh: Canny low threshold (lower = more edges detected)
high_thresh: Canny high threshold (higher = only strong edges)
"""
img = cv2.imread(wireframe_path, cv2.IMREAD_GRAYSCALE)
if img is None:
raise FileNotFoundError(f"Cannot load: {wireframe_path}")
# Apply slight blur to reduce wireframe aliasing noise
blurred = cv2.GaussianBlur(img, (3, 3), 0.5)
# Canny edge detection
edges = cv2.Canny(blurred, low_thresh, high_thresh)
# Dilate edges slightly for ControlNet visibility
kernel = np.ones((2, 2), np.uint8)
edges = cv2.dilate(edges, kernel, iterations=1)
cv2.imwrite(output_path, edges)
edge_density = np.count_nonzero(edges) / edges.size * 100
print(f"Edge map saved: {output_path}")
print(f"Edge density: {edge_density:.1f}% of pixels")
print(f"Optimal range: 3-8% for architectural interiors")
if edge_density > 12:
print("WARNING: Edge density too high — reduce wireframe detail or "
"increase Canny thresholds to avoid over-constraining generation.")
generate_edge_map("wireframe_render.png", "controlnet_edges.png")
ControlNet Weight Calibration
ControlNet weight determines how strongly the spatial constraint influences the generation. The optimal weight depends on what you are trying to preserve:
- Weight 0.6–0.7: Preserves major structural elements (walls, windows, ceiling) while allowing significant material and lighting variation. Best for early concept exploration.
- Weight 0.8–0.9: Tight structural adherence with moderate aesthetic freedom. The production sweet spot for most ArchViz enhancement work.
- Weight 0.95–1.0: Maximum structural fidelity. Image quality may suffer slightly from over-constraint. Use only when dimensional accuracy is critical (planning submissions, contract renders).
Never use weight 1.0 for final deliverables — it tends to produce flat, over-constrained images that lack the subtle spatial variation that makes AI-enhanced renders look natural. Weight 0.85 is our default production setting.
Phase 3: Post-Generation Validation
No AI output should be delivered to a client without geometric validation, regardless of how well-constrained the generation was. Hallucinations can be subtle — a window frame shifted 15 centimeters, a ceiling height reduced by 20 centimeters, a wall corner angle changed from 90° to 87°. These modifications are invisible to casual observation but violate the architectural drawings.
Automated Difference Detection
This Python script compares your AI-enhanced output against the original V-Ray render and highlights geometric differences above a configurable threshold:
Python# RenderVault: AI Output Geometric Difference Detector
import cv2
import numpy as np
def detect_hallucinations(
original_render: str,
ai_output: str,
diff_output: str,
structural_mask: str = None,
threshold: int = 40
):
"""
Compare AI output against original render to detect geometric changes.
Args:
original_render: Path to V-Ray/Corona reference render
ai_output: Path to AI-enhanced output
diff_output: Path to save difference visualization
structural_mask: Optional mask limiting detection to structural areas
threshold: Pixel difference threshold (0-255). Lower = more sensitive.
"""
orig = cv2.imread(original_render)
ai = cv2.imread(ai_output)
if orig.shape != ai.shape:
ai = cv2.resize(ai, (orig.shape[1], orig.shape[0]))
# Convert to grayscale for structural comparison
orig_gray = cv2.cvtColor(orig, cv2.COLOR_BGR2GRAY)
ai_gray = cv2.cvtColor(ai, cv2.COLOR_BGR2GRAY)
# Compute absolute difference
diff = cv2.absdiff(orig_gray, ai_gray)
# Apply structural mask if provided (focus on walls/structure only)
if structural_mask:
mask = cv2.imread(structural_mask, cv2.IMREAD_GRAYSCALE)
if mask.shape != diff.shape:
mask = cv2.resize(mask, (diff.shape[1], diff.shape[0]))
diff = cv2.bitwise_and(diff, mask)
# Threshold to isolate significant differences
_, binary_diff = cv2.threshold(diff, threshold, 255, cv2.THRESH_BINARY)
# Create colored overlay on original
overlay = orig.copy()
overlay[binary_diff > 0] = [0, 0, 255] # Red for differences
result = cv2.addWeighted(orig, 0.6, overlay, 0.4, 0)
# Calculate metrics
diff_pixels = np.count_nonzero(binary_diff)
total_pixels = binary_diff.size
diff_percentage = diff_pixels / total_pixels * 100
cv2.imwrite(diff_output, result)
print(f"Difference map saved: {diff_output}")
print(f"Modified pixels: {diff_pixels:,} ({diff_percentage:.2f}%)")
if diff_percentage > 5.0:
print("⚠ WARNING: Significant geometric modification detected!")
print(" Review red-highlighted areas before client delivery.")
elif diff_percentage > 2.0:
print("⚡ NOTICE: Moderate differences detected. Verify structural areas.")
else:
print("✓ Differences within acceptable range for material/lighting changes.")
detect_hallucinations(
"vray_reference.png",
"nano_banana_output.png",
"hallucination_check.png",
threshold=35
)
Common Hallucination Categories and Prevention
Through analysis of 500+ AI-generated architectural images, we have categorized the most frequent hallucination types and their specific prevention strategies:
1. Phantom Openings (Windows/Doors)
Frequency: 45% of uncontrolled generations. Fix: Explicitly state window count and position in prompt. Use depth map ControlNet at weight ≥0.85. Negative prompt: "extra windows, additional doors, phantom openings."
2. Ceiling Height Distortion
Frequency: 35% of uncontrolled generations. Fix: Include ceiling height in prompt ("2.7m ceiling height"). Use vertical edge constraint from wireframe edge map. This is the hardest hallucination to detect visually — always verify with ruler overlay.
3. Wall Position Drift
Frequency: 30% of uncontrolled generations. Fix: Structural mask inpainting (walls masked as non-generatable). Depth map ControlNet catches most cases. Validate with floor plan overlay comparison.
4. Furniture Scale Distortion
Frequency: 60% of uncontrolled generations. Fix: This is the most common hallucination and the least damaging — furniture is not structural. Still verify that standard items (dining table 75cm height, door 210cm height) appear proportionally correct against known architectural dimensions.
5. Symmetry Imposition
Frequency: 25% of uncontrolled generations. Fix: AI models strongly prefer symmetrical compositions. Asymmetric room layouts get "corrected" toward symmetry. Counter with asymmetric prompt language: "off-center window placement, asymmetric furniture arrangement."
Key Takeaways
AI hallucinations in ArchViz are manageable but require systematic prevention, not wishful thinking. The three-phase approach — constraint-first prompts, ControlNet spatial constraints, and automated post-generation validation — reduces hallucination rates from the uncontrolled baseline of 60–80% to a manageable 5–10%. That remaining 5–10% is why human review remains mandatory before any client delivery. AI is an extraordinary enhancement tool, but it is not yet an autonomous rendering pipeline — and treating it as one is the fastest way to erode client trust in your studio's output quality.
Dealing with a stubborn hallucination pattern not covered here? Send us examples — we analyze reader submissions and publish targeted solutions.