The demo was impressive. The robot picked objects with precision, handled variations smoothly, and recovered from disturbances. The research team achieved strong benchmark results. Everyone agreed: this system is ready for the real world.
Six months later, the pilot is struggling. The robot fails on objects that look slightly different from the training data. The lighting changes throughout the day cause perception errors. Integration with the warehouse management system took three months longer than expected. The maintenance team can't diagnose failures.
This is the lab-to-factory gap: the difference between systems that work in controlled research environments and systems that operate in production facilities. Closing this gap requires understanding what production environments actually demand — requirements that research environments don't impose.
Why do physical AI systems fail in production?
AI systems often fail in production because they are optimized for controlled lab conditions and are not tested under real-world variations, such as lighting, object variability, and operational complexity.
What Makes Production Different
Research labs and production facilities differ across nearly every dimension that matters for Physical AI:
Environmental Conditions
-
Lab: Controlled lighting (consistent intensity, color temperature), Clean backgrounds (solid colors, minimal clutter), Stable temperature and humidity, Isolated from other activities.
-
Production: Variable lighting (skylights, shift changes, seasonal variation), Cluttered backgrounds (other equipment, materials, people), Temperature swings (HVAC cycles, door openings, seasonal), Continuous activity from other operations.
A policy trained in lab conditions has never seen the variations it will encounter in production.
Object and Task Variability
-
Lab: Standardized object sets, Controlled object presentation, Consistent task parameters, Known object properties.
-
Production: Variable products (different SKUs, packaging changes, supplier variations), Random presentation (how items arrive, not how they're arranged), Changing task requirements (new products, process changes), Unknown properties (damaged items, mislabeled products).
Production throws constant variation at systems trained on standardized inputs.
Operational Context
-
Lab: Isolated experiments, Researcher supervision, Flexible timing, Failure is data.
-
Production: Part of larger workflows, Operator oversight (not researcher), Strict timing requirements, Failure is costly.
The operational context changes everything about how systems must behave.
How can I ensure my AI system is production-ready?Test your AI system under varied environmental conditions, ensure it integrates with other systems, and ensure it meets safety and operational standards.
The Seven Requirements for Production Physical AI
Based on what production environments actually demand, here's what Physical AI systems must provide:
Robustness to Environmental Variation
Production systems must perform consistently despite environmental changes:
Lighting robustness:
-
Handle intensity variations (2x-10x range)
-
Adapt to color temperature changes
-
Perform under flickering or inconsistent lighting
Background robustness:
-
Ignore irrelevant visual clutter
-
Distinguish objects from similar backgrounds
-
Handle dynamic backgrounds (movement, other equipment)
Sensor robustness:
-
Maintain calibration over time
-
Handle sensor degradation gracefully
-
Operate through a temporary sensor issue
How to evaluate: Test the system under deliberate environmental variation, not just optimal conditions. Vary lighting, add clutter, and introduce movement.
Handling of Object and Task Variation
Production systems encounter variation that research systems never see:
Object variation:
-
Products change (new SKUs, packaging updates)
-
Conditions vary (damaged, wet, dusty)
-
Presentations differ (orientations, groupings)
Task variation:
-
Requirements change (new products, process updates)
-
Priorities shift (rush orders, exceptions)
-
Edge cases appear (unusual requests)
How to evaluate: Introduce novel objects and task variations. How quickly does the system adapt? How does it handle complete novelty?
Real-Time Performance on Deployment Hardware
Research can run inference on GPU clusters. Production runs on what fits in the facility:
Latency requirements:
-
Control loops at 20-100Hz for manipulation
-
Response within milliseconds for safety
-
Consistent timing, not average timing
Hardware constraints:
-
Edge compute within size/power/cost limits
-
Reliable operation in industrial conditions
-
Maintainable by facility technicians
How to evaluate: Measure latency on actual deployment hardware, not research infrastructure. Test under load and overextended operation
Integration with Enterprise Systems
Production robots don't operate in isolation:
Upstream integration:
-
Receive tasks from WMS/MES/ERP
-
Accept priority changes and exceptions
-
Handle scheduling and sequencing
Peer integration:
-
Coordinate with other robots
-
Synchronize with conveyors, machinery
-
Share floor space safely
Downstream integration:
-
Report completion status
-
Update inventory systems
-
Log events for analytics
How to evaluate: Map all required integrations before deployment. Verify APIs, data formats, and timing requirements.
Safety for Human-Adjacent Operation
Production environments have people:
Regulatory compliance:
-
ISO 10218 (industrial robot safety)
-
ISO/TS 15066 (collaborative robots)
-
Industry-specific requirements
Operational safety:
-
Speed and force limits near humans
-
Emergency stop integration
-
Clear safety zones and procedures
Safety verification:
-
Documented safety case
-
Tested failure modes
-
Audit trail for incidents
How to evaluate: Understand the regulatory requirements for your environment. Verify the system meets them with documentation.
The Production Readiness Checklist
Before deploying Physical AI, verify readiness across all dimensions. For each requirement, you should check if it has been verified and evaluated.
Environment
| Requirement | Question | Verified? |
|---|---|---|
| Lighting variation | Tested under 5x lighting range? | ☑ |
| Background clutter | Tested with production backgrounds? | ☑ |
| Temperature range | Operates in the facility temperature range? | ☑ |
| Other equipment | Handles vibration, EMI from other machines? | ☑ |
Performance
| Requirement | Question | Verified? |
|---|---|---|
| Success rate | What's the rate in production-like conditions? | ☑ |
| Latency | Measured on deployment hardware? | ☑ |
| Throughput | Meets operational requirements? | ☑ |
| Degradation | Tested for performance over extended operation? | ☑ |
Integration
| Requirement | Question | Verified? |
|---|---|---|
| WMS/MES integration | Tested with actual systems? | ☑ |
| Fleet coordination | Tested with other robots? | ☑ |
| Reporting | Provides required data to downstream systems? | ☑ |
| Timing | Meets synchronization requirements? | ☑ |
Operations
| Requirement | Question | Verified? |
|---|---|---|
| Safety compliance | Meets regulatory requirements? | ☑ |
| Maintenance procedures | Documented and tested? | ☑ |
| Training | Operators trained on procedures? | ☑ |
| Support | Escalation path for issues? |
☑ |
Improvement
| Requirement | Question | Verified? |
|---|---|---|
| Data collection | Captures deployment data? | ☑ |
| Update mechanism | Can it improve without disruption? | ☑ |
| Performance tracking | Measures reliability over time? | ☑ |
Why Most Pilots Fail
Most Physical AI pilots don't reach production. The common failure modes map directly to these requirements:
-
"It worked in the demo" → Environmental variation not tested.
-
"Integration took forever." → Enterprise systems are underestimated.
-
"It's too slow" → Hardware constraints not considered.
-
"We can't diagnose failures" → Maintainability not designed.
-
"It's not improving" → No continuous learning mechanism.
-
"We can't certify it" → Safety requirements not addressed.
Each of these is predictable and avoidable — but only if you evaluate against production requirements, not research benchmarks.
Building for Production from Day One
The path from lab to factory isn't a phase after research. It must be designed from the beginning:
Architecture Decisions
-
Edge-first design: Build for deployment hardware constraints, not research clusters.
-
Hybrid architectures: Combine learned and programmed components for bounded failure modes.
-
Modular integration: Design clean interfaces to enterprise systems.
Development Practices
-
Production-distribution training: Train on data matching deployment conditions.
-
Continuous testing: Evaluate on environmental variations, not just benchmarks.
-
Failure analysis: Systematically understand and address failure modes.
Operational Readiness
-
Documentation first: Maintenance procedures before deployment.
-
Training programs: Operator readiness before go-live.
-
Support structures: Escalation paths for issues.
Summary
Production environments differ from labs in environmental conditions, object/task variation, operational context, and constraints.
Seven requirements define production readiness:
-
Robustness to environmental variation
-
Handling of object and task variation
-
Real-time performance on deployment hardware
-
Integration with enterprise systems
-
Safety for human-adjacent operation
-
Maintainability by non-researchers
-
Continuous improvement from deployment data
Most pilots fail because they're evaluated against research metrics, not production requirements. Production readiness must be designed from day one — in architecture, development practices, and operational preparation.
What is the importance of system integration in Physical AI?
Integration is crucial because physical AI systems must work within larger enterprise environments, coordinating with WMS, ERP, and other machinery, ensuring smooth workflows and data flow.