Vision-Language-Action Model Integration
I integrated Vision-Language-Action (VLA) models with a UR5e robotic arm to enable natural language control of robotic manipulation tasks. The project involved deploying a complete perception and control pipeline on an NVIDIA Jetson Orin, implementing real-time motion execution in ROS + URScript, and achieving sub-200ms end-to-end latency from command input to motion initiation. Key challenges included debugging singularities, handling workspace edge cases, calibrating coordinate frames, and optimizing inference performance for autonomous manipulation.


